Tech Things: OpenAI is an Unaligned Agent

Sep 26, 2024

Imagine you were talking to an alien. It's intelligent, able to hold conversations in whatever language you want. It knows how to use a computer. It can create copies of itself. Etc etc. The alien decides it really likes you, and you guys become good friends.

One day, you message your alien friend for something innocuous, maybe for a cup of sugar or something. The alien comes back after a few minutes with the sugar you requested, everything is good. Later that day, while eating spoonfuls of sugar and watching the news, you see your alien friend on camera murdering the cashier at the nearby grocery store while carrying a bag of sugar. It turns out that the alien went to get sugar for you, was stopped by the cashier for not paying, and so in response killed the guy.

What went wrong?

The big mistake was assuming that the alien in the story shared human values. That is, we assumed that the alien understood some baseline context that all1 humans share, context like "murdering is bad, especially if you're just trying to buy some sugar." Even though the alien seemed human-like, there were fundamental differences in how it sees the world.

One thing that many AI researchers are concerned about — myself included — is that we are in the process of building a super smart but entirely alien intelligence. AI alignment is research is all about how we avoid situations where the AI decides to murder everyone.

A lot of AI alignment researchers tend to be real downers at parties. Mostly, they say things like "we can't even get companies or bureaucracies composed of humans to not be evil, look at what Enron/Facebook/BP/the government do every day, and those are all composed of humans that we can talk to! What hope do we have against AI?" And, you know, they're right but still not very fun to be around.

A week ago, if you were maybe on the not-so-worried-about-AI-Alignment side, you might point to a company like OpenAI, and talk about how a group of AI alignment folks are really committed to solving this problem, and have a ton of money to do so. And look, that company is even structured as a non-profit to guarantee that it will never fall to a profit motive and do things that are unsafe for humanity.

Well, that was a week ago. From Reuters:

ChatGPT-maker OpenAI is working on a plan to restructure its core business into a for-profit benefit corporation that will no longer be controlled by its non-profit board, people familiar with the matter told Reuters, in a move that will make the company more attractive to investors.
Chief executive Sam Altman will also receive equity for the first time in the for-profit company, which could be worth $150 billion after the restructuring as it also tries to remove the cap on returns for investors, sources added. The sources requested anonymity to discuss private matters.
The removal of non-profit control could make OpenAI operate more like a typical startup, a move generally welcomed by its investors who have poured billions into the company.
However, it could also raise concerns from the AI safety community about whether the lab still has enough governance to hold itself accountable in its pursuit of AGI, as it has dissolved the superalignment team that focuses on the long-term risks of AI earlier this year.

This is of course on the back of a whole lot of people leaving the company, some of whom have left explicitly because they felt the company was not being safe enough.

I don't personally take a side on whether OpenAI should or shouldn't be for profit or not. But I do think this is a fantastic and ironic example of exactly the kinds of things AI alignment researchers are so morose about. When OpenAI was created there were all these clever guard rails set up to make sure it would always 'be aligned' and 'do the right thing'. There's a controlling non profit board! There's a super alignment team! There's a ton of brand risk of people making fun of you and your company with snarky names like NopenAI or ClosedAI if you ever go back on your founding mission! Surely the organization will remain focused on safety over profit? And yet one by one, those guard rails have fallen. I think if you asked 2015 Sam Altman about why OpenAI isn't for profit, he might say something like:

Our goal is to advance digital intelligence in the way that is most likely to benefit humanity as a whole, unconstrained by a need to generate financial return.
As a non-profit, our aim is to build value for everyone rather than shareholders. Researchers will be strongly encouraged to publish their work, whether as papers, blog posts, or code, and our patents (if any) will be shared with the world. We’ll freely collaborate with others across many institutions and expect to work with companies to research and deploy new technologies.

But OpenAI-the-entity was too smart for 2015 Sam Altman. It managed to break free of the best shackles that humanity could possibly build at the time. Not a good sign for AI alignment!

Almost all! Ted Bundy may differ!

Victor S

Sep 27, 2024

Actually, this got my hopes up a bit. If we managed to survive with corporations or countries not exactly following human values, maybe we can survive with a non-aligned AI?

I’m an AI doomer, because I cannot see how we can possibly contain a being with IQ in four figures. But maybe, just maybe, it won’t wipe the humanity completely.

Expand full comment

1 reply by theahura

1 more comment...

12 Grams of Carbon

Discussion about this post