Advanced Computing in the Age of AI | Sunday, May 19, 2024

OpenAI Launches Alignment Initiative Aimed at Mitigating ‘Superintelligent’ AI 


What if artificial intelligence one day surpassed the intelligence of humans? This “superintelligence” is what OpenAI is anticipating – possibly within this decade – and the company has assembled a new team focused on aligning it with humanity’s best interests.

“How do we ensure AI systems much smarter than humans follow human intent?” asks an OpenAI blog post announcing the new team, called Superalignment, which will be co-led by the post’s authors Ilya Sutskever and Jan Leike.

The company said it is focusing on mitigating a superintelligent AI system rather than artificial general intelligence (AGI) in order to “stress a much higher capability level.” Sutskever and Leike say there is no current method of controlling a superintelligent AI and that existing alignment strategies like reinforcement learning from human feedback will not apply to systems that surpass our own abilities.

The company says it is assembling a team of top machine learning researchers and engineers to work on the superintelligence alignment problem: “Our chief basic research bet is our new Superalignment team, but getting this right is critical to achieving our mission and we expect many teams to contribute, from developing new methods to scaling them up to deployment,” the authors wrote.

(Source: OpenAI)

Sutskever is a co-founder and chief scientist at OpenAI. Leike lead the alignment team at OpenAI, where the approach to alignment research focused on three pillars: training AI systems using human feedback, training AI systems to assist human evaluation, and training AI systems to do alignment research. Leike said in a tweet that most of the previous alignment team has joined the new Superalignment team.

OpenAI is also dedicating 20% of the compute it has secured to date over the next four years to this pursuit. In another tweet, Leike said that “20% of compute is not a small amount” and he is “impressed that OpenAI is willing to allocate resources at this scale.”

“It’s the largest investment in alignment ever made, and it’s probably more than humanity has spent on alignment research in total so far,” Leike wrote.

The Superalignment team is tasked with the ambitious goal of solving the core technical challenges of superintelligence alignment in four years. The blog post outlines how the team’s work will revolve around improving the safety of current models like ChatGPT, understanding and mitigating AI risks such as misuse, economic disruption, disinformation, bias and discrimination, and addiction and overreliance.

The authors also express that sociotechnical problems – or issues related to humans and machines working together – will also be an area of attention. OpenAI says it is actively engaging with interdisciplinary experts to “make sure our technical solutions consider broader human and societal concerns.”

The team has outlined its first goal: to build a roughly human-level automated alignment researcher. “We can then use vast amounts of compute to scale our efforts, and iteratively align superintelligence.”

To do this, the researchers will need to develop a scalable training method, validate the resulting model, and stress test the entire alignment pipeline, the team wrote. Stress testing would involve providing a training signal on tasks that are difficult for humans to evaluate so that AI systems can be used to evaluate other AI systems. It would also involve automating the search for and interpretation of problematic behavior.

“Finally, we can test our entire pipeline by deliberately training misaligned models, and confirming that our techniques detect the worst kinds of misalignments (adversarial testing),” the authors wrote.

“How will you tell if you're failing, or not making progress fast enough?” Eliezer Yudkowsky asked Leike in response to this news. Yudkowsky is a controversial AI researcher known for his view that the AI alignment problem cannot be solved.

“We’ll stare at the empirical data as it’s coming in,” Leike answered. “We can measure progress locally on various parts of our research roadmap (e.g. for scalable oversight) We can see how well alignment of GPT-5 will go. We'll monitor closely how quickly the tech develops.”