The AI Daily Brief: Artificial Intelligence News and Analysis - The Latest Developments in Superalignment

Episode Date: December 15, 2023

OpenAI's Superalignment team, launched this summer, has just published their first paper about weak-to-strong generalizations, and how they can analogize using weaker models to train more advanced mod...els to simulate humans trying to control superhuman AI. Before that on the Brief, Intel's latest in the AI chip race. Today's Sponsors: Listen to the chart-topping podcast 'web3 with a16z crypto' wherever you get your podcasts or here: https://link.chtbl.com/xz5kFVEK?sid=AIBreakdown  ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI.  Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/

Transcript
Discussion (0)
Starting point is 00:00:00 Today on the AI breakdown, we're looking at the first research paper from OpenAI Superalignment team. Before that on the brief, another entrant in the AI Chip Wars. The AI breakdown is a daily podcast and video about the most important news and discussions in AI. Go to Breakdown.com. For more information about our YouTube, our newsletter, and our Discord. Welcome back to the AI breakdown brief, all the AI headline news you need in around five minutes. One of the dominant themes of 2023 has been the AI chip wars. Really, this hasn't been much of a war. Invidia has just been the big winner in the entire space.
Starting point is 00:00:41 In fact, I have seen some people argue that when the history books are written, they will see Nvidia as this year's big AI winner over OpenAI. Now, I think that that assumes that OpenAI ultimately gets beat by some of the bigger labs or gets absorbed into one of them, as I don't think you can really make an argument that this rush was started by anything other than OpenAI's ChatGBT, but it stands to make the point about just what a monster year Nvidia has had. Now, of course, the pressures of capitalism to increase GPUs beyond just Nvidia
Starting point is 00:01:09 have been working overtime this year, as big progress has been halted because people don't have access to compute. I mean, hell, OpenAI, arguably the leading lab in this space, had to turn off its chat GPT plus subscriptions for some meaningful amount of time, only just turning them back on this week because they didn't have enough access to GPUs. Well, every company and their mother is trying to catch up and address this in some different way. Microsoft announced a chip effort.
Starting point is 00:01:33 Amazon is increasing its chip efforts. Google is increasing its chip efforts. And of course, the existing chip makers are ramping up as well. AMD has launched a new chip that's meant to compete with NVIDIA's H-100 and their forthcoming H-200. And now, as of yesterday, Intel has announced the Gaudi-3. The Gowdy 3 is specifically designed and focused on artificial intelligence. CNBC writes, While the company was light on details,
Starting point is 00:01:56 Gaudi-3 will compete with NVIDIA's H-100, the main choice among companies that build huge farms of the chips to power AI applications, and AMD's forthcoming MI-300X when it starts shipping to customer in 2024. Intel has been building Gowdy chips since 2019 when it bought a chip developer called Hibana Labs. Now, interestingly, in addition to the Gowdy 3 going after that data center market, Intel is also clearly thinking about AI on personal devices. In fact, their CEO, Pat Gelsinger said, we think the AI PC will be the star of the show for the upcoming year. Along those lines, Intel also announced their core ultra chips, which are designed for Windows, laptops, and PCs,
Starting point is 00:02:30 as well as a fifth-generation Xon server chip, both which include a specialized AI computer. component called an NPU that is designed to run AI programs faster. Now, this is not even meant to be powerful enough to do something like run a chat GPT locally without an internet connection. But for the array of smaller AI tasks that are going to increasingly come to applications, this should involve some increased performance. Now, whether it can keep up with Apple's M3 Max, which is also explicitly designed to speak to this AIPC market, seems pretty unlikely, but it's interesting just how many different companies are thinking in the same way. 2024, in other words, is the year that AI is coming to your phones and laptops.
Starting point is 00:03:07 Next up, more officials in D.C. are picking up the call, first echoed by people like Gary Gensler from the SEC. That, as CNN business puts it, AI is a danger to the financial system. Basically, we just had a meeting of the Financial Stability Oversight Council or FSOC, which is a group of regulators across departments in the U.S., created after the 2008 financial crisis that's designed to understand, identify, and ward against future upcoming threats to financial stability. FSOC's annual report came out this week, and AI was prominently featured. Said the report, AI has the potential to spur innovation and drive efficiency, but its use in financial services requires thoughtful implementation and supervision to manage potential risks.
Starting point is 00:03:46 Some of those risks that they warned about include cybersecurity concerns, compliance risks, and privacy issues. They also flagged hallucinations, and something that SEC Chair Gary Gensler has talked about before is a concern that if all trading starts to be done on the back of super intelligent bots, that it could mean that everyone trades in the exact same direction at once because they're all just piling into the same trades based on the same models. Now, I think it's highly unrealistic to assume, even in a world of super intelligent trading bots, that people would all be using the same bots that had the same models that had the same conclusions. But that is something that is now actively being talked about in Washington, D.C. Speaking of Washington, D.C., one refugee from that place, former Speaker of the House, Kevin McCarthy,
Starting point is 00:04:26 is, according to Axios, looking to get into the AI field. Basically, McCarthy did a long exit interview with Axios and mentioned that AI is an interest area for him. He said that he'd been friends with Elon Musk for a decade and he could imagine working with him. In the interview, McCarthy said, I view AI as a positive. AI is where California is going to come back. The knowledge and capability of AI is streaming from California. Now, given how deep and extensive the regulatory efforts around AI are going to be in the coming year, I can imagine this being a pretty lucrative place for McCarthy.
Starting point is 00:04:55 One more vaguely related to Elon. apparently Grimes, who is of course Elon's former partner and the mother of three of his children, has built a new line of AI plush toys and named one Grock. So apparently, Grimes worked with a toy company called Curio, who themselves are working with Open AI models, and the idea is that these plush toys converse with the kids that own them and learn their personalities over time. Grimes provides the voice for all three of the initial plushies,
Starting point is 00:05:21 which are named Gabo, Grim, and Grock. They claim that Grock is a shortening of Grockett, but obviously people are making the connection to Elon Musk's new AI chatbot, which is of course called GROC. Lending some credence to their case that it's not related, the U.S. Patent and Trademark Office said that Curio filed an application to trademark the name GROC on September 12th, and XAI didn't apply for a trademark for GROC until October 27th. Anyway, this type of thing is going to be an absolute Roershack test for how people feel about AI. Some people will think it's amazing, that these toys are so much more sophisticated, engaging, and interactive. Others will view it
Starting point is 00:05:54 as literally an R. L. Stein, Night of the Living Dummy Style Horror. Over in the world of new AI models. Earlier this year, Google announced its AI Music Generator Music FX, but they've just updated it and made it more widely available. This is an area that I'm super excited about just on a completely personal level because I'm fascinated with music creation, and so at some point I'll do a demo video testing out all of these new music generation models that have come out recently.
Starting point is 00:06:18 Another new model that's getting attention is Alibaba's just released 12V Gen XL. This is a new image-to-video model, which is obviously part of a much broader trend towards video creation that is really capturing people's attention right now. You've got PICA and Runway as startups that are pushing in this area. Stable Diffusion just released their first image-to-video models. And overall, it feels like we're setting up for a 20-24 that has a heck of a lot more people creating short films and animations than have ever done so before. Now, lastly, something that if you've spent any time on X or Twitter in the last 24 hours or so,
Starting point is 00:06:50 you've probably seen, there is a new tool called Kampi, camera AI by FAL, which is basically a real-time deep fake. One demo video has someone transforming themselves into Elon Musk in real time, with the fake Musk's face imitating their head movement and facial expressions and copying them as they put on sunglasses. If you're watching this video right now, you might be seeing me talk in George Clooney's face, which is pretty freaky. Honestly, the resemblance is pretty uncanny. And although this might be an impressive model or demonstration generally. I think it's mitigated by the fact that I already look so much like Clooney,
Starting point is 00:07:24 but still, it's impressive technology and something it's going to be excited to see how people play with. Anyways, that is going to do it for today's AI breakdown brief. Next up, the main AI breakdown. Quickly a brief word from today's sponsor. As a listener of this show, I suspect you like to stay up to date on all things AI and tech, which is why you have to check out the chart-topping podcast Web3 with A16Z Crypto, Produced by venture firm Andresen Horowitz, Web3 with A16Z is the perfect companion podcast to the AI breakdown. Web3 with A16Z Crypto is your definitive resource for the future of the internet.
Starting point is 00:07:59 Whether you're interested in the convergence of AI and crypto or simply curious about what's next. If you need a place to start, they recently released an excellent episode with Stanford Cryptography Professor Dan Bonay and former Google X engineer Aliya in conversation with host Sonal Choxi about the intersection of AI and crypto. From fighting deepfakes and proving humanity to large language models like ChatGBT, they cover it all. I highly recommend checking it out, especially if you'd like to learn more about how AI and crypto will impact our everyday lives. Beyond Crypto and AI, this show is for creators seeking more ways to truly own their work,
Starting point is 00:08:33 for business leaders trying to prepare for the future today, and for innovators exploring trending tech topics. Don't miss out. Follow Web3 with A16Z Crypto on Apple Podcasts, Spotify, or your favorite listening app. Welcome back to the AI breakdown. Yesterday, Greg Brockman tweeted New Direction for AI alignment, weak to strong generalization,
Starting point is 00:08:54 promising initial results. We used outputs from a weak model, a fine-tuned GPT2, to communicate a task to a stronger model, GPT4, resulting in intermediate GPT3 level performance. So what is Greg talking about? Why is this important? What does it have to do with the robots not killing us?
Starting point is 00:09:10 Well, to understand what's going on, let's go back to the summer when OpenAI introduced their super alignment team. Their announcement blog post reads, We need scientific and technical breakthroughs to steer and control AI systems much smarter than us. To solve this problem within four years, we're starting a new team and dedicating 20% of the compute we've secured to date to this effort. So, basically, this is not a team that was focused on how to align the current crop of models. This is a team that's focused on superintelligence.
Starting point is 00:09:38 Superintelligence OpenAI argues will be the most impactful technology humanity has ever invented and could help us solve many of the world's most important problems. But the vast power of superintelligence could also be very dangerous and could lead to the disempowerment of humanity or even human extinction. While superintelligence seems far off now, we believe it could arrive this decade. Managing these risks will require, among other things, new institutions for governance and solving the problem of superintelligence alignment. How do we ensure AI systems much smarter than humans follow human intent?
Starting point is 00:10:08 Currently, we don't have a solution for steering or controlling a potentially superintelligent AI and preventing it from going rogue. Our current techniques for aligning AI, such as reinforcement learning from human feedback, rely on humans' ability to supervise AI. But humans won't be able to reliably supervise AI systems much smarter than us, and so our current alignment techniques will not scale to superintelligence. We need new scientific and technical breakthroughs. Now, even back then, they said that their goal was to build a roughly human-level automated alignment researcher. They said we can then use vast amounts of compute to scale our efforts
Starting point is 00:10:38 and iteratively align superintelligence. To align the first automated alignment researcher, we will need to, one, develop a scalable training method, two, validate the resulting model, and three, stress test our entire pipeline. Now, there are a couple things that were notable about this announcement at the time. One was the scale of its ambition. They had set a four-year timeline to solve the core technical challenges of superintelligence alignment, which made observant spectators kind of wonder how fast they think superintelligence is actually going to arrive. But on top of that, they also showed up with resources to do so. That 20% of the compute that they had secured to date commitment was certainly not nothing. Finally, they were bringing
Starting point is 00:11:13 in some big hitters. Notably, Ilya, the co-founder and chief scientist of OpenAI, was going to be leading this team, along with Jan Leakey, who's the head of alignment. Now, Ilya leading the superintelligence alignment team was part of why people thought that perhaps the whole OpenAI board Sam Altman fight had to do with some technical breakthrough, given that Ilya had initially not supported Sam. Now, at this stage, we don't currently know what Ilya's long-term status with open AI is going to be. By all accounts, it is still up in the air. However, in spite of that, this new announcement is the first time that we've seen what they've been working on. So, the co-lead of the superalignment team, Jan Leakey writes, super excited about our new research direction
Starting point is 00:11:53 for aligning smarter than human AI. We fine-tune large models to generalize from weak supervision, using small models instead of humans as weak supervisors. Yang continues, for lots of important tasks, we don't have ground truth supervision. Is this statement true? Is this code buggy? We want to elicit the strong model's capabilities on these tasks without access to ground truth. This is pretty central to aligning superhuman models. We find that large models generally do better than their weak supervisor, a smaller model, but not by much. This suggests reward models won't be much better than their human supervisors. In other words, RLHF won't scale. But even our simple technique can significantly improve weak-to-strong generalization. This is great news. We can make measurable progress
Starting point is 00:12:32 on this problem today. I believe more progress in this direction will help us align superhuman models. Now, Colin Burns takes this explanation a little bit further. And apparently, according to a shout out from Sam Altman, it was actually Colin who inspired this line of research and convinced the whole team to go down this path. Colin writes, humans won't be able to supervise models smarter than us. For example, if a superhuman model generates a million lines of extremely complicated code, we won't be able to tell if it's safe to run or not, if it follows our instructions or not, and so on. This is a key difficulty of aligning superhuman models. Unlike in most of machine learning, we will need to supervise models smarter than us.
Starting point is 00:13:07 Despite its importance, it's not obvious how to even begin to empirically study this issue. We propose a simple, simple analogy to study this problem today. Can we use weak models to supervise strong models? If we can learn superhuman reward models or safety classifiers from weak supervision, that would be a huge advancement for super alignment. So editors note here, basically the technique that they're taking is humans in this analysis. are playing the role of the weak model, the GBT2, whereas superintelligence is taking the role
Starting point is 00:13:34 of the strong model or GPT4. Now back to Colin, he writes, intuitively this may be feasible because the strong model should already be very capable at the key alignment relevant tasks we care about. All the weak supervisor needs to do is elicit key capabilities that already exist within the strong model. We empirically test this setup and find that if we fine tune a strong pre-trained model using weak model supervision, it consistently outperforms the weak model, usually by a large
Starting point is 00:13:59 margin. Generalization appears to be a promising approach to alignment. But directly fine-tuning a big model to imitate a small model is suboptimal. Intuitively, we want to nudge the generalization towards outputting what it internally knows. We test a simple model for doing this that makes the strong model more confident in its predictions. Across a large number of data sets, this simple method drastically improves weak to strong generalization performance. On our NLP tasks, we can fine-tune GPT-4 using a GPT-2-level supervisor and attain performance close to GPT3.5. There's still a amount of work to be done in this setting. Our methods still don't always work well, for example, performance isn't as good on our chat GPT preference dataset, and our setup still has
Starting point is 00:14:36 disanalogies with the future alignment problems we care about. But we can make rapid iterative empirical progress on this problem today. Our setup is simple, general, and easy to try out. And there is still a huge amount of low-hanging fruit. Alignment feels more solvable than ever before. Now, another member of the team, Leopold Aschenbrenner, tries to explain this in more lay terms that I think are useful as well. Leopold writes, intuitively, superhuman AI systems should know if they're acting safely, but can we summon such concepts from strong models with only weak supervision? Incredibly excited to finally share what we've been working on weak to strong generalization.
Starting point is 00:15:10 Future AI systems will be able to do crazy complicated things, e.g. generate 1 million lines of code. We'll want to add side constraints to their behavior, like don't lie, or don't escape your server. But how do we do that if we can't even fully understand what they're doing? Normally, we train AI systems with human labels, but relative to AI systems much smarter than us, humans will be weak supervisors. Maybe we can supervise them on easy problems, but on hard problems will only be able to provide incomplete or flawed labels.
Starting point is 00:15:37 But there's reason to be optimistic. Concepts like, is this action dangerous, should already be saliently internally represented by strong models. Can we just get them to tell us what they know, even about the cases too difficult for us to supervise directly? We have a really neat setup to study this. What happens when we use a small model to supervise a large model? Will the strong model just imitate the weak supervisor, including its errors,
Starting point is 00:15:58 or will the strong model generalize beyond the underlying task or concept? It turns out, we can often nudge deep learning's remarkable generalization properties to work in our favor. Even when we use GPT2, which can barely count to 10, to supervise GPT4, which ACE's high school tests, we can often recover 80% of the performance we would get with perfect labels. Now, this only works in some setting so far, and naive weak supervision without methods is far from recovering full performance. In that sense, we provide evidence that naively applying current alignment techniques like RLHF
Starting point is 00:16:26 will scale poorly to superhuman models. But this feels like a super tractable problem. Generalizing beyond weak supervisors is a widespread phenomenon, and we can drastically improve generalization with simple methods. There's tons of low-hanging fruit here. There are exciting directions for future work. Better methods. Scientific understanding.
Starting point is 00:16:43 When and why do we see good generalization? Analogous setups. There are still important disanalogies between our setup and the future superalignment problem. Can we fix it? Perhaps what I find most exciting, we can make iterative empirical progress today on a core challenge of aligning future superhuman models. Lots of prior alignment work has been stuck in theory or been empirical, but failed to confront core challenges head on. But the question, of course, becomes how to get people to work on those problems. Well, in addition to continuing their work with the superalignment team, OpenAI has also announced alongside Eric Schmidt,
Starting point is 00:17:12 a $10 million program they're calling superalignment fast grants. This is a $10 million pot, they say, for, quote, technical research on aligning superhuman AI systems, including weak to strong generalization, interpretability, scalable oversight, and more. Basically, this $10 million is divided into $100K to $2 million grants for academic labs, nonprofits, and individual researchers. For graduate students, they're sponsoring a one-year 150K OpenAI Superalignment Fellowship, which includes 75K in stipend and 75K in compute and research funding. And importantly, they say that no prior experience working on alignment is required.
Starting point is 00:17:42 They want this to be a pipeline in for researchers to start working on alignment. Now, what are they looking to fund? They say they're particularly interested in the following research directions. One, weak to strong generalizations, everything we've just been talking about. Two, interpretability. How can we understand model internals? And can we use this to EG build an AI lie detector? Scalable oversight.
Starting point is 00:18:02 How can we use AI systems to assist humans in evaluating the outputs of other AI systems on complex tasks? And then finally, many other research directions, including but not limited to, honesty, chain of thought faithfulness, adversarial robustness, avals and testbeds, and more. You can apply to get these grants until, February 18th. So how are people receiving this? The MIT Technology Review says, unlike many of the company's announcements, this heralds no big breakthrough. In a low-key research paper, the team describes a technique that lets a less powerful LLM supervise a more powerful one
Starting point is 00:18:31 and suggests that this might be a small step towards figuring out how humans might supervise superhuman machines. Now, they also got commentary from a number of different AI researchers, including Tilo Hagendorf, who wrote, it is an interesting idea, but he told MIT, he thinks that GPT2 might be too dumb to be a good teacher. Quote, GBT2 tends to give nonsensical responses to any task that is slightly complex or requires reasoning. Continuing the MIT Technology Review writes, he also notes that this approach does not address Ilius Hootskaver's hypothetical scenario in which a superintelligence hides its true behavior and pretends to be aligned when it isn't. Says Hagendorf, future superhuman models
Starting point is 00:19:06 will likely possess emergent abilities which are unknown to researchers. How can alignment work in these cases? Still, even for the skeptics, the broad senses that this is valuable progress being made. Now, Open AI also on the same day dropped another research paper that they summed up as identifying seven practices for keeping increasingly agentic AI systems safe and accountable as they become more common and more capable. Alongside this, they also announced research grants for a range of open questions in this area. Now, I'm not going to get fully into this paper as well, but what strikes me as notable is that
Starting point is 00:19:36 this is two fairly significant research papers around questions of AI safety and AI alignment, right at a time when people are speculating the GPT 4.5 is around the corner. I would say only that it doesn't not make sense to me, that in advance of releasing the most advanced model ever, you would drop all of the progress that you've made on safety and alignment issues to try to make it feel as though those things were progressing at the same speed the overall capabilities were. I think this is a little less tinfoil hat and a little more just my PR and marketing brain observing something that I could see doing. I will note that after Sam Altman called out Colin Burns for his work on this issue, when someone asked him
Starting point is 00:20:11 GPT 4.5 leak legit or no, he said nah, to which everyone scrambled to figure out if he was denying the pricing screenshot or that GPT 4.5 is coming this month. At the time of recording, still no answer on that front, but the speculation and rumors live on. Overall, my favorite comment on this research comes from Kirtik, who wrote, in my opinion, a better model to study this would be to have government run by children age 10 to 15, with all the employees still being adults. I think there are a fair few people out there who believe that in that circumstance, the results might be better than what we have. But anyways, guys, that is the new super alignment research. I'm glad to see things coming out of this section of OpenAI, and I'm looking forward to more experiments and
Starting point is 00:20:50 research to come. For now, that's going to do it for today's AI breakdown. Until next time, peace.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.