No Priors: Artificial Intelligence | Technology | Startups - How can we make sure that everyone has access to AI? Can small models outperform large models? With Stability AI’s Emad Mostaque

Episode Date: February 16, 2023

AI-generated images have been everywhere over the past year, but one company has fueled an explosive developer ecosystem around large image models: Stability AI. Stability builds open AI tools with a ...mission to improve humanity. Stability AI is most known for Stable Diffusion, the AI model where a user puts in a natural language prompt and the AI generates images. But they're also engaged in progressing models in natural language, voice, video, and biology. This week on the podcast, Emad Mostaque joins Sarah Guo and Elad Gil to talk about how this barely one-year-old, London-based company has changed the AI landscape, scaling laws, progress in different modalities, frameworks for AI safety and why the future of AI is open. Show Links: Stability.AI Stable Diffusion V2 on Hugging Face  Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @EMostaque Show Notes:  [2:00] - Emad’s background as one of the largest investors in video games and artificial intelligence [7:24] - Open-source efforts in AI [13:09] - Stability.AI as the only independent multimodal AI company in the world [15:28] - Computational biology, medical information and medical models [23:29] - Pace of Adoption [26:31] - AGI versus intelligence augmentation [31:38] - Stability.AI’s business model [37:44] - AI Safety

Transcript
Discussion (0)
Starting point is 00:00:00 What if the route to AGI is not one big model to rule them all trained on the whole internet and then narrowed down to human preferences, but instead millions of models that reflect the diversity of humanity that are then bought together? I think that is an interesting way to kind of look at it, because that will also be more likely to be a human-aligned AGI, rather than trying to make this massive elder god of weirdness bow to your will, which is what it feels like at the moment. This is the No Pryors podcast. I'm Saragua. I'm Alad Gail. We invest in, advise, and help start technology companies. In this podcast, we're talking with the leading founders and researchers
Starting point is 00:00:51 in AI about the biggest questions. AI generated images have been everywhere over the past year, but one company has fueled an explosive developer ecosystem around large image models, and that's Stability AI. This week on the podcast, we'll talk to Imad Mastak, the founder and CEO. Stability builds open AI tools. They're most known for stable diffusion, the unreasonably effective AI model where a user puts in a natural language prompt,
Starting point is 00:01:23 and the AI generates images. But they're also engaged in progressing models, in natural language, voice, video, and biology. We'll talk about how this barely one-year-old London-based company has changed the AI landscape, scaling laws, progress in different modalities, safety, and why he thinks the future of AI is open. Amad, welcome to no priors.
Starting point is 00:01:42 Thank you for having me on, Sarah, your lad. Let's start with personal story. You have a background in computer science and you were working in the hedge fund world. That's a hard left turn, or it looks like it, from that world to being a driving force in the A.A. How did you end up working in this field? Yeah, I've always been interested in AI and technology.
Starting point is 00:02:00 So on the hedge fund, I was one of the largest investors in video games and artificial intelligence. But then my real interest came when my son was diagnosed with autism. And I was told there was no cure or treatment. And I was like, well, let's try and see what we can do. So I built up a team and did AI-based literature review. This was about 12 years ago of the existing treatments and papers to try and figure out commonalities. and then did some biomolecular pathway analysis of neurotransmitters for drug-be-purposing and came down to a few different things that could be causing it, you know, worked with doctors
Starting point is 00:02:34 to treat him, and he went to mainstream school, and that was fantastic. Went back to running a hedge fund, won some awards, and then I was like, let's try and make the world better. And so the first one was non-AI-enhanced education tablets for refugees and others, and that's Imagine Worldwide, my co-founder's charity. And then in 2020, COVID came, and I saw something like autism, a multi-systemic condition that existing mechanisms that extrapolated the future from the past wouldn't be able to keep up with and thought, could we use AI to make this understandable? And so I set up an AI initiative with the World Bank, UNESCO and others to try and understand what caused COVID and try and make that available to everyone. then I hit the institutional wall in a variety of places and realized that the models and technologies
Starting point is 00:03:23 that had evolved were far beyond anything that happened before. And there were some interesting arbitrage opportunities from a business perspective, and more on that, a bit of a moral imperative to make this technology available to everyone, because we're now going to very narrow superhuman performance, and everyone should have access to that. It's an amazing journey, and congratulations on all the impact you've already had. So as you say, or as you imply, the AI field in recent years has been increasingly driven by labs and private companies. And one of the most obvious paths to performance progress is to just make models bigger, right? Scaling data, parameters, GPUs, which is very expensive.
Starting point is 00:04:04 And then in reaction, just to set the stage a little bit, there's been some efforts over the previous years to be more community driven and open and build alternatives like Luther. How did you start engaging in that? And how did stability change the game here? Yeah. So when I was doing the COVID work, we tried to get access to various models. In some cases, the companies blew up. Other cases, we weren't given access, despite it being a high profile project. And so I started supporting Aluthor AI as part of the second wave. So, you know, Stella and Connor and others kind of led it on the language model side. But really, one of my main interests was the image model side. I have Afantasia, so I can't visualize anything in my brain, which is more common than people
Starting point is 00:04:47 would think. In fact, a lot of the developers in this space have that. Like, we've got nothing in our brain. You just see words? What's in there? Just feelings. So, like, again, I thought it was a metaphor. Imagine yourself on a beach. I was like, okay, I feel a beach. No, apparently, you guys have pictures in your heads. It must be like just disconcerting. But then with the arrival of clip released by Open AI a couple of years ago, you could suddenly, take generative models and guide them to text prompts. So it was VQGAN, which is kind of the slightly mushy, more abstract version first. But I build a model for my daughter while I was recovering, ironically, from COVID. And then she took the output and sold it as an NFT for three
Starting point is 00:05:27 and a half thousand dollars and donated to India Covalief. And I was like, wow, that's crazy. So I started supporting the whole space at Luther and beyond, giving jobs to the developers, compute for the model creators, funding the various notebooks from disco diffusion to these other things, you know, giving grants to people like mid-Jurney that were kind of kicking this off. Just personally. Just personally. They were doing all the hard work. And I was like, can I catalyse this? Because this is good for society. Then about 15 months ago, I was like, well, these communities are growing. It'd be great if we could create this as a common good. And originally I thought, you've got communities, you've got to make them kind of coordinated.
Starting point is 00:06:04 Could a Tao work or a Tao of Dao's? And that's how stability started. After about a week, I realized that was not going to work, and it was incredibly difficult. So then I figured out commercial open source software could be the way to create aligned technology, not just an image, but beyond, that would potentially change the game by making this stuff accessible. Because as you said, one of the key things, this is in the state of the AI report, this is in AI index as well, is that most research has been subject to scaling laws and other things.
Starting point is 00:06:35 Transformers seem to work for everything. and so it was moving more and more towards private companies, but the power of this technology is double-edged. One is that there are fears about what could go wrong, so it's not released. And the other one is, why not keep it for excess returns, right? So you've had this massive brain drain occurring, and no real option. You work in an academic lab, you have a couple of GPUs,
Starting point is 00:06:57 or you go and work at big tech or slash open AI, or you set up your own startup, which is very, very difficult, as you guys know. So I wanted to create another option, and that's what we did with Eleuther and Stability and the other communities that we have grown and incubated. Could you talk more broadly about why you think it's important for there to be open source efforts in AI and what your view of the world is? Because I think stability has really helped create this alternative to a lot of the closed ecosystems, particularly around image gin, protein folding, a variety of different areas. And those are incredibly important efforts. So I just love to hear more about your thoughts on, you know, why is this important? how you all view the participation of the industry over time,
Starting point is 00:07:36 and also what you think the world looks like and five years, 10 years, et cetera, in terms of closed versus open systems? So I think there's a fundamental misunderstanding about this technology, because it's a very new thing, right? Fascal open source is lots of people working together with a bit of direction. It's a bit chaotic, but then you've seen Red Hat
Starting point is 00:07:54 and other things emerge from this. There aren't many people that train these models, right? We don't invite the whole community and you have 100 people training a model. It's usually 5 to 10, plus a super-com. computer and a data team and things like that. And the models when they come out are a new type of programming primitive infrastructure because you can have a stable diffusion that's two gigabytes that deterministically converts
Starting point is 00:08:16 a string into an image. That's a bit insane and that's what's led to the adoption here. You know, on GitHub styles, we've overtaken Ethereum and Bitcoin cumulatively. It took them 10 years. We got there in like three, four months. If you look at the whole ecosystem, it's the most popular open source software ever, not just AI. Why? Because again, it is this new translation file. And you do the pre-compute, as it were, on these big supercomputers,
Starting point is 00:08:40 which means the inference required to create an image is very low. And that's not what people would have expected five years ago, or to create a chat GPT output. So as infrastructure, I think that's how it should be viewed. And so my take was that what would happen is everyone would be closed because you need a talent, data, and supercomput. And those would be lacking, as it were. So it'd be. the big companies only. They would go four or five years, and then someone would defect and go open source, and it would collapse the market as they would commoditize everyone else's complement. So similar to Google offering free Gmail and all sorts of stuff around their core business. But more than that, I realized that governments and others would need this infrastructure,
Starting point is 00:09:22 because if a company has it privately, they will sell to business to business, so maybe a bit of B2C. But we've seen the Cambrian explosion of people building around this technology. But who's building the Japan model or the India model or others. Well, we are. And then that means that you can tap into infrastructure spending, which is very important because it needs billions. But the reality is that's actually a small drop in the ocean. Self-driving cars got 100 billion of investment. We have three hundreds of billions. 5G trillions. And for me, this is 5G level. So from an ethical, moral perspective, I was like, we've got to make this as equitably available as possible. So this model perspective, I thought it was a good idea as well. But I thought we were held here inevitably.
Starting point is 00:10:04 So I decided to create stability to help coordinate and drive this forward in what's hopefully a moral and reasonable way. Like, you know, the decisions that we make have a lot of import and they're not easy, but we are trying to be kind of Switzerland in the middle of all of this and provide infrastructure that will uplift everyone here. What do you think this world looks like in five years or ten years? Do you think that there's a mix of closed and open source? Do you think the most cutting-edge models, the giant, the giant language models are going to be both? Or do you think, like, Capital will eventually become such a large obstacle that it'll make
Starting point is 00:10:36 the private world more likely to drive progress forward? And I know you have plans in terms of how to offset that, but I just love to hear about those. The reality is we have more compute available to us than Microsoft or Google. So I have access to national supercomputers, and I'm helping multiple nations build X-Scale computers. So to give you an example, we just got a 7 million-hour grant on Summit, one of the fastest supercomputers in the U.S. And like I said, we're building exoscale computers that are literally the fastest in the world.
Starting point is 00:11:04 Private companies don't have access to that infrastructure, because governments, thanks to us, are realizing that this is infrastructure of the future. So we have more compute access. We have more cooperation from the whole of academia than all of them do because their agreements tend to be commercial. There's no way that private enterprise can keep up with us. And our costs are zero as well when you actually consider that, whereas they have to ramp up tens of billions of dollars of compute. So my take is that foundation models will all be open source for the deep learning phase. Because we're actually got multiple phases now.
Starting point is 00:11:35 The first stage is deep learning. That's creating of these large models. And we'll be the coordinator of the open source. The next stage is the reinforcement learning, the instruct models, Flan Palm or Instruct GPT or others. That requires very specified annotation, and that's something that private companies can excel in. The next stage beyond that is fine tuning. So actually, let's give a practical example.
Starting point is 00:11:56 palm is a 540 billion parameter model it achieves about 50% on medical answers right flan palm is the instructed version of that and that achieves 70% med palm they took medical information they fed it in this is a recent paper from a few weeks ago achieved 92% which is human level on the answers and the final stage for that is you take this med palm and you put it into clinical practice with human in the loop for me the private sector will be focused on the instruct to human in the loop area, and the base models will be infrastructure available to everyone on an international generalized and national basis, particularly because when you combine models together, I think that's superior to creating multilingual models. So that's quite a bit there, and I'm sure you want to unpack that. Yeah, that's very exciting, yeah.
Starting point is 00:12:47 Could you actually talk about the range of things or efforts that are going on at stability right now? I know that you've done everything from these foundation models on the language side. protein folding, image, and et cetera, if you could just kind of explain what is a spectrum of stuff that stability does and supports and works with, and then what are the areas that you're putting the most emphasis behind going forward?
Starting point is 00:13:08 Yes, I think we are the only independent multimodal AI company in the world. So you have amazing research labs like Fair, Meta, and others, and Deep Mind, doing everything from protein folding to language to image. And there are cross-learnings from all of these. Basically, we do
Starting point is 00:13:23 everything from audio to, language, coding, models, any kind of almost private model, we are looking at what the open equivalent looks like, and that's not always a replication, right? So with stable diffusion, for example, we optimized it for a 24-gibight V-Ram GPU. Now, as of the release of distilled stable diffusion, it will run in a couple of seconds on an iPhone, and we have neural engine access, because our view of the future is creating models that aren't necessarily bigger, but that are customizable and editable. So this is a bit of a different emphasis.
Starting point is 00:13:58 And we think that's a superior thing than scaling. I think things like the chinchilla paper, that's the 67 billion parameter model, that's as performance as GPT3, at 175 billion, are important in that because it said that training more is important. And actually when you dig into it, it actually said data quality is important.
Starting point is 00:14:14 Because now we're seeing that the first stage, the DL stage, is it where the deep learning stage is, let's use all the tokens on the internet, you know? But maybe we can use better tokens. And that's what we see when we instruct and use reinforcement learning with human feedback. And we've also been releasing technology around that. So our Carpa Lab, representative learning, we released our instruct framework that allows you to instruct these big models to be more human. The way I kind of put it is that our focus is thinking, what are the foundation models that will advance humanity, be it commercial or not, what needs to be there and what's very susceptible to this transformer-based architecture that takes about 80% of all research in the space?
Starting point is 00:14:54 making that compute and knowledge and understanding of how to build these models available to academia, independent research, and our own researchers, and then from a business perspective, really focusing on where our edges and our edges in two areas. One is media, and so this is why image models, video models, and audio models have been a focus, 3D soon as well. And the other area is private and regulated data, because what's the probability that a GPT3 model weight or a palm model weight will be put on-prem? It's very low. Versus an open model, it's very high. And there's a lot more valuable private data than there is public data. So it is a bit of everything, but like I said, there are certain focuses on the business side on media. And then I think on a breakthrough side, computational biology will be the biggest.
Starting point is 00:15:43 one. That's really cool. And on the computational biology side, I guess there's a few different areas. There's things like protein folding, and then to your point, there's things like Med Palm. And so, are you thinking of playing a role in both of those types of models in terms of both the medical information? Yes. We will release an open MedPom model, well, med-stable GPT. And then protein folding, we are one of the key drivers of OpenFold right now. So we just released a paper on that, much faster ablations than AlphaFold. We're doing as well DNA diffusion for predicting the outcome of DNA of sequences. We have biolm around taking language models for chemical reactions,
Starting point is 00:16:19 and that's an area that we will aggressively build because there's a lot of demand from the computational biology side for some level of standardization there. There have been initiatives like Melody and others looking at federated learning, but there is a misalignment of incentives in that space that I think we could come in and fix. And I think that's where we really view ourselves.
Starting point is 00:16:37 How can you really align incentive structures and create a foundational element that brings people together? and I think that's where we are most valuable because private sector can't do it that well, public sector can't do it that well, a mission-oriented private company that has this broad base and all these areas could potentially.
Starting point is 00:16:54 Yeah, I think also the global nature of your focus is really exciting because when I look at things like medical information or medical models, ultimately the big vision there, which a number of people have talked about for decades at this point is that you'd have a machine that would allow you to have very high access to care and medical information
Starting point is 00:17:12 to matter where you're in the world, and especially since you can take images with your phone and then interpret them with different types of models and then have like an output, you know, you should, if you have a cardiac issue, you should have care equivalent to the world's best cardiologist from Stanford or, you know, you name the Center of Excellence available
Starting point is 00:17:29 to anybody in the world, whether they're rich, poor, developing country, not, etc. And so, you know, it's very compelling to see this big wave of technology and sort of the things that may be able to enable, including some of the things that you mentioned around AI and medicine. So it's very exciting stuff.
Starting point is 00:17:43 I think it's very interesting as well because this technology is being adopted so fast. I mean, let's face it, Microsoft and Google, $2 trillion companies have made it core of their strategy, which is crazy insane for technology that's basically five years old, let's say two years old, really breaking through. Because it can adapt to existing infrastructure, you know, like it sits there and it absorbs knowledge when you fine tune it. But then my thing is, I look to the future and I'm like, like that best doctor, which bits of that should be infrastructure for everyone and which bits
Starting point is 00:18:17 of that should be private. And so that's how I kind of oriented my business. I look to the future, I come back and I think, what should be public infrastructure and how can I help build that and coordinate that? And that's valuable. And then everything else, other people can build around. How do you think about the traditional pushback that's existed in the medical world around some of these technologies? So, for example, you know, the first time an expert system or a computer could actually outperform Stanford University physicians at predicting
Starting point is 00:18:44 infectious disease was in the 1970s with this Micing project where they literally trained an expert system or designed an expert system to be able to predict infectious disease
Starting point is 00:18:53 but here we are almost 50 years later with none of that technology adopted and so do you think it's just we have to do a lot of human-the-loop things and it's a doctor's assistant and that'll be good enough
Starting point is 00:19:03 do you think it's just a sea change there aren't enough physicians like what do you think is the driver for the technological adoption and something so important today So I think the infrastructural barriers are huge for adoption of technologies, particularly in private sector. I think there is a new space of open source technology adoption that could be very interesting
Starting point is 00:19:20 and a willingness now that people kind of understand this, which wasn't there even 10 years ago, you know, the nature of open source. Now it runs the world's servers and databases. I think there's another level of open source, which is open source complex systems, as it were. Previously in other discussions, I've talked about our education work. So right now, we're deploying four million tablets to every child in Malawi. By next year, we'll have hundreds of millions of kids, hopefully, that we deploy to.
Starting point is 00:19:45 It's not just education. It's healthcare. And it's working with the government. It's working with multilateral to say, can we build a healthcare system from the bottom up that can do all of these things without an existing infrastructure? Because they don't have an existing infrastructure. It's one doctor per thousand kids, 10,000 kids. One teacher per 400 kids. I am certain that system will outperform anything in the West within five years, which is crazy to say.
Starting point is 00:20:11 But then our Western systems can then take bits of that and adapt to it, because I think this competitive pressure is required, because Western systems are very hard to change. And in the UK, we've done that with HDR, UK, the genomic banks and others. And that was a massive uphill battle, as you know, to get these technologies adopted, because I mean, there should be barriers to adoption of this technology when it comes to things as important as healthcare. But at the same time, I think now is the time to open it up. Yeah, I think there is an interesting loose analogy to different pace of adoption of different technologies and different geos in the past, right? So one that comes to mind is today, I think it's very commonplace amongst
Starting point is 00:20:49 consumer internet investors to look at what's happened with mobile in East Asia as a precursor to interactions that might happen here. And, you know, mobile technology advanced much more rapidly in China, Korea, many other places, one, because of private public partnership, and two, because there was more, I guess, green field in terms of access to information and different infrastructure that supported mobile is the primary communication medium. And I could certainly see that happening with some AI native products. I think that's an excellent point. I agree 100%. I think just as they leapfrog to mobile, a lot of the emerging markets, Asia in particular, will leapfrog to generative AI or personalized AI.
Starting point is 00:21:34 And I can see this because I'm having discussions with the governments right now. What is the reaction? Over the Christmas holiday, I was getting a few hours of sleep finally. I got like six calls from headmasters of UK schools saying, Emad, what is our generative
Starting point is 00:21:47 AI strategy? I was like, you're what? And they were like, all our kids are using chat GPT to do their homework. And so it's kind of one of the first global moments, an amazing interface that Open AI built. It's going mainstream. And I was like, well, get good. you know, stop assigning essays.
Starting point is 00:22:03 So now in some of the top private schools in the UK, they actually have to write the essays during the lessons without computers, which I think is wrong. Because my discussions in an Asian context, for example, with certain leading governments that are about to put tens of billions into this space, they're embracing the technology, and they're like, how can we have our own versions of this? And how can we implement this to help our students get even better, right?
Starting point is 00:22:24 Because also it's very, even though there might be bureaucracy in some of these nations, if they want to get something done, they get it done. And this technology is very different in that the costs are not continuous, like a 5G network. The KAPX profile and other things are very different. Like, you know, you can say it costs $10 million to train a GPT. It doesn't cost that much anymore. That's really valuable if you can have a chat GPT for everyone. Like the ROI is a huge.
Starting point is 00:22:51 So, yeah, I do think that a lot of these nations, like the African context is one that we're driving forward with education as a core piece. And right now we're teaching kids with the most basic AI in the world. world, literacy and numeracy in 13 months on one hour a day in refugee camps. That's insane. That's already better. It's going to get even better. But I think Asia in particular, they're going to go directly to this technology and embrace it fully. And then we have to have a question. If you're not embracing this in the West, in America and the UK, you're going to fall behind because ultimately this can translate between structure and unstructured data quicker than anything. I'd like to see what pace of adoption we can have in the United States,
Starting point is 00:23:30 this technology as well, but I can see the prediction coming true. If we just go back to the core, I guess not the core necessarily, but the most advanced, like mature use case with instability and, as you said, media as an advantage, what does the future of media look like? And actually, even if we go back before that, you know, you're involved in sort of early ecosystem efforts with Luther and such. How did you even identify that this was an area of interest for you versus everything else going on across modalities. So, you know, I've always been interested in meaning, like semantic is even part of my email address,
Starting point is 00:24:06 and that's my religious studies as well around epistemology and ethics, ironically. The way that I viewed it is that the easiest way for us to communicate is what we're doing right now via words, right? And that's held constant, but now we can communicate via phones and podcasts or whatever, and it's nice. Writing was more difficult, and the Gutenberg Press made it easier, but visual communication is incredibly difficult, be it a PowerPoint, which is visual communication,
Starting point is 00:24:28 or art, which is visual communication, and then you have video and things like that, which is just impossible. Now you have TikToks and others making it easier. I saw this technology, and I was like, if the pace of acceleration continues, visual communication becomes easy for everyone. Like my mom sending me memes every day,
Starting point is 00:24:45 telling me to call more or kind of whatever. And I'm like, that's amazing, because that creation will make humanity happier. Like you see art therapy, that's visual communication, and it's the most effective form of therapy. What if you could give that to everyone? So there was that aspect to it. But then I saw movie creation and things like that.
Starting point is 00:25:04 So my first job was actually organizing the British Independent Film Awards and being a review for the Rain Dance Film Festival. So, you know, every year I put a movie on for my birthday and we give the proceeds to charity. I get to see my favorite movie with my friends. It's pretty cool. And then I was the biggest video game investor in the world at one point. So these types of communication and interaction are really interesting.
Starting point is 00:25:24 And I thought that people really misunderstood the metaverse UGC and the nature of what could happen if anyone could create anything instantly. It's not going to be a world for everyone or a world that everyone visits. It's going to be everyone sharing their own worlds and seeing the richness of humanity. And again, I thought that was an amazing
Starting point is 00:25:41 ethical slash moral imperative for making humanity better, but also an amazing business opportunity because the nature and way that we create media will transform as a result of this technology and we're seeing it right now. We have amazing apps like Descript, right? where you could take this podcast and you can edit it with your words live, you know.
Starting point is 00:26:00 You have amazing kind of gaming things come out where you create assets and instances or, you know, some of this new 3D Nerf technology where you can reshoot stuff. We are working with multiple movie studios at the moment who are saving millions of dollars just implementing stable diffusion by itself, let alone these other technologies. And that was, for me, tremendously exciting to allow anyone not to be creative because people are creative, but to access creativity. And then allow the creatives to be even more creative and tell even better stories. I believe Sam from OpenAI said they don't think image generation is kind of like core on the path to AGI. It's obviously really important to you personally and to stability.
Starting point is 00:26:42 Tell us about your stance on AGI and if that's part of the stability mission. Yeah, I don't care about AGI except for it not killing us. They can care about it. What I care about is intelligence augmentation. You know, this is the classic kind of memex type of thing. How can we make humans better? Our mission is to build the foundation to activate humanity's potential. So, look, AGI is fine. Again, we have to have some things around that. I do believe that they are incorrect around multimodality being or images being a core
Starting point is 00:27:09 component of that. But I think there are two paradigms here. One is stack more layers, and I'm sure GPT4 and Palm 18 and all these things will be amazing, stacking more layers and having better data as well. but one of the things we saw, for example, stable diffusion, we put it out together and then people trained hundreds of different models. When you combine those models, it learns all sorts of features like perfect faces and perfect fingers and other things. And this kind of is related to the work that deep-minded with Gato and others that show the auto-regression of these models
Starting point is 00:27:44 and the latent spaces becomes really, really interesting. So what if the route to AGI is not one big model to rule them all trained on the whole internet and then narrowed down to human preferences, but instead millions of models that reflect the diversity of humanity that are then brought together. I think that is an interesting way to kind of look at it, because that will also be more likely to be a human-aligned AGI, rather than trying to make this massive elder god of weirdness bow to your will, you know, which is what it feels like at the moment. So we're going to have a hive elder god instead. You've mentioned that stability is still working on language. The application of diffusion models to image is a really unique breakthrough and it's not as computationally intensive as like the known approaches to language so far. I think you've said that the core training run for the original stable diffusion was 150,000 A100 hours, which is like not that huge in the grander scheme of things. What can you tell us about your approach to language? So yeah, so via the kind of Luther AI side of things and our team there, you know, we released GPT, NeoJ, and X, which have been downloaded 20 million times. They're the most popular language models in the world. You kind of basically either use GPT3 or use those. They go up to about 20 billion parameters. And like I said, we've released our TRLX from the Carper Lab, which is the instruct framework. They're training, you know, multiple models up to 100 billion parameters now. I don't think you need more, chinchilla optimal, to enable a chat, open chat,
Starting point is 00:29:13 P-PT equivalent, you know, enable an open-clawed equivalent. I think that will be an amazing foundation from which to train sector-specific and other models that then again can be auto-regressed, and there will be very interesting things around that. Language requires more, not necessarily because of the approach and diffusion breakthroughs. Like recently Google had their newspaper where they showed a transformer actually can replace the VAE, so you don't necessarily need diffusion for great images. It's more because language is semantically dense, I think, versus images. And there's a lot more accuracy that's required for these things.
Starting point is 00:29:46 That, I think there are various breakthroughs that can occur. Like we have an attention-free transformer model basis in our RWKV that we've been funding. We've got a 14 billion parameter version of that coming out that has showing amazing kind of progress. But I think that the way to kind of look at this is we haven't gone through the optimization cycle of language yet. So Open AI, again, amazing work they do. They announced in Struct GPT, their 1.3 billion parameters. version outperformed 175 billion parameter GPT3. You look at kind of Flan T5,
Starting point is 00:30:20 the instruct version of the T5XXL model from Google. The 3 billion parameter version outperforms GPT at 175 billion parameters in certain cases, you know? These are very interesting results, and it's one of those things that as these things get released, it gets optimized. So like with stable diffusion, leave aside the architecture. day one, 5.6 seconds for an image generation using an A100.
Starting point is 00:30:45 Now 0.9 seconds. With the additional breakthroughs that are coming through, it'll be 25 images a second. That's 100 times speed up over 100 times, just from getting it out there and people interested in doing that. I think language models will be similar. And I don't think that you need to have ridiculous scale when you can understand how humans interact with the models and when you can learn from the collective of humanity. So, like I said, a very different approach of small language models. or medium ones versus let's train a trillion parameters or whatever.
Starting point is 00:31:15 And I think there will be room for both. I think it will be use these amazingly packaged services from Microsoft and Google if you just want something out the box or if you need something trained on your own data with privacy and things like that. That may not be as good, but maybe better for you. Use an open source space and work with our partners at SageMaker or whoever else, you know. Can you talk more about that in the context of your business model and your approach?
Starting point is 00:31:39 you mentioned that you think that some of the areas that stability will be focused on as media and then proprietary and regulated data sets. And if there's things you can share right now in that area, if not, no worries. But if you can, it would be interesting to learn more about, you know, how you view the business evolving. Sure. So, like, now we're training on hundreds and soon thousands of Bollywood movies to create Bollywood video models with our partnership with Eros, and that is exclusively licensed. We'll have audio models coming as well, see if you're A. Rahman model or whatever.
Starting point is 00:32:07 You know, we're talking to various other entities as well, and this is why we have the partnership with Amazon and SageMaker. So there'll be additional services that can train models for your behalf of most people. Our focus is on the big models for kind of nations, the big models for the largest corporates, who will need to train their own models one day. And that's really difficult. There's only like 100 people who can train models in the world. Like it's not really a science. It's more an art. Like losses explode all over the place when you try to do something.
Starting point is 00:32:34 And so we're going to make it easy for them. and we're going to be inside the data centers, training their own models that they control, and our open source models then become the benchmark models for everyone. Again, we have access to the neural engine, dedicated teams at Intel and others kind of working on optimizing these. That is the model. The framework and the open model is optimized,
Starting point is 00:32:53 and then we take and create private models. And again, I think that's complementary to the APIs and other things you will see from Microsoft, Google, etc. Because, yeah, you would want both. Yeah. Some of the other areas that you've focused on, or at least talked about, I think, in interesting ways, is about how AI can be used to make our democracy more direct and digital, a little bit more about, you know, broader global impact.
Starting point is 00:33:15 Could you extrapolate a bit more there? Yeah, so I think, you know, we have to look at intelligence augmentation, right? Like, information theory in classical Shannon ways, information is valuable in as much as it changes the state. And we've obviously seen political information become more and more influenced by manipulation of stories and things like that. So the divide has been grown. what if we could create an AI that could translate between various things, make things easier to
Starting point is 00:33:38 understand, and make people more informed? I think that would be ideal with some of these national and public models and interfaces being provided to people. And then that can be very positive for democracy and allowing people to really understand the needs. Like, you can already, with chat GPT, when you train it on nature of yourself, it can summarize for your perspective. You know, that's amazing thing, right? You can talk to talk like a five-year-old or a six-year-old or eight-year-old or a 10-year-old. Once it starts understanding Sarah and Elad, that'll be even better. Again, you don't need to send open students to do that. The Open AI embeddings API is fantastic. But I think there'll be more and more of these services that allow there to be that
Starting point is 00:34:16 filter layer between us and this mass of information on the internet, that will be amazing. I think if we build the education systems and other things correctly as well, this young ladies illustrated primer, that we're going to give to all the kids in Africa and beyond, like, again, let's really blue sky think, how can we get people engaged with their communities in societies because it will be a full open source stack, not only education and healthcare and beyond, that's super exciting. I think, again, that's the future of how we come together, because you want to come together to form a human colossus, like in the Waitbook, Why style, where you get shit done, pardon my
Starting point is 00:34:49 language. And I think this is one of the best ways for us to do that leveraging these technologies. Okay, we don't have commercial sponsors. There's actually a book called Lady of Mazes that's a, aGI-centric book from like 10 years ago, and basically the idea is sort of what you mentioned, where as different AGIs gain models of how a subset of the population thinks about certain issues, it substantiates into a virtual person who's basically representing them and someone has a representative's equivalent, so you don't actually have to vote. The AGI just kind of synthesizes group opinions and then turns it
Starting point is 00:35:21 into representatives. Yeah, and you have to think about, you know, with the advances like META's amazing work on Cicero, for example, beating humans on diplomacy. They used eight different language models combined. Again, I think this is the future, not just zero shot. Multiple models interacting with each other is the way, full stop. The issue and mechanism design perspective of kind of the game theory of our current economy is that there is no central organizing factor that we trust. Like, what is the trust in Congress? Like, I think they trust Congress less than cockroaches. No offense to Congress, please don't bring me up. Like, it's just a poll, right? People will err towards trusting machines, as it were, and machines are capable of load balancing. Now they're capable of
Starting point is 00:36:00 low balancing facts and things. And so we have to be super careful as we integrate these things, what that looks like, because they will make more and more decisions for us. That could be for our benefit, having an AI that speaks on our behalf and amalgamates. But then we need to make sure that these aren't too braille and fragile as we seed more and more of our own personal authority to them because they are optimizing. This is also one of the dangers on the alignment side. As we introduce RLHF into some of these large models, there are very weird instances of
Starting point is 00:36:27 mode collapse and out of sample things. I do say these large models as well should be viewed as fiction creative models, not fact models, because otherwise we've created the most efficient compression in the world. Does it make sense you can take terabytes of data and compress it down to a few gigabytes with no loss? No, of course you lose something. You lose the factualness of them, but you keep the principle-based analysis of them. So we have to be very clear about what these models can and can't do, because I think we will seed more and more of our authority, individually as a society to the coordinators
Starting point is 00:36:59 of these models. Could you talk more about that in the context of safety because ultimately one of the concerns that's sort of increased in the AI community is AI safety and there's sort of three or four components of that. There's alignment you know, we'll bots kill humans or whatever form you want to put it in. The farm is not
Starting point is 00:37:15 kill us. Farmers, yes, good point. They'll just build a giant RLHF farm on top of us or something. There's the concern around certain types of content, pitifilia, etc., that people don't want to have existence society for all sorts of positive reasons. There's politics. You know, there's concerns, for example, that AI may become the next big battleground after social media in terms of political viewpoints being represented in these models with
Starting point is 00:37:39 the claims that they're not political viewpoints. And so I'm sort of curious how you think about AI safety more broadly, particularly when you talk about trust of models, to your point, part of it is fact versus fiction, but part of it may also be, well, it looks like it's political. And so therefore, maybe I can't trust it at all. Yeah, I don't think technology is neutral. So I'm not one of the people that adheres to that, especially with the way we build it. It does reflect the biases and other things that we have in there.
Starting point is 00:38:04 I did kind of follow the open source thing, because I think we can adjust that. On the alignment side, you know, it was interesting, a Luther basically split into two. Part of it is stability and the people who work here on capabilities. The other part is conjecture that does specific work on alignment, and they're also based here in London. And I think it's not easy, right? I think that everyone is ramping up at the same time, and we don't really understand how this technology works, but we're doing our best, you know?
Starting point is 00:38:30 You have people like kind of Riley Goodside and others prompt whispers who are like, wait, like what earth? It can do all these kind of things. I think that there needs to be more formalized work, and I actually think there needs to be regulation around this, because we are dealing with an unknown, unknown. And I don't think we're doing good enough kind of tying things together, particularly as we stack more layers.
Starting point is 00:38:52 and we get bigger and bigger and bigger. I think small models are less dangerous, but then the combination of them may not be safe. But again, we don't know this yet. You've mentioned before this support for the idea of regulation of large models. What would be a productive outcome of that regulation that you can imagine? I think that a productive outcome, that regulation is anything above a certain level of flops needs to be registered similar to, well, bio-weapons and things that have the potential for dual use.
Starting point is 00:39:18 I think there needs to be a dedicated international team of people who can actually understand and put in place some regulations and how do we test these models for things? Like, you know, the amazing work Anthropic recently did with constitutional models and other things like that. We need to stop pulling this knowledge
Starting point is 00:39:33 as opposed to keeping it secret, but there is this game theoretic thing of one of the optimal ways to stop AGI happening is to build your own AGI first. And so I'm not sure if that will ever happen, but we're in a bit of a bind right now, which means that everyone's having their own arms race. When governments decide,
Starting point is 00:39:49 and I don't believe they decided yet, that having an AGI is the number one thing, tens of billions, hundreds of billions will go into building bigger models. And again, this is very dangerous, I think, from a variety of business perspectives. So I prefer multilateral action right now, as opposed to in the future. So I put that out there, but I can't really drive it. I'm really dying from all the everywhere because it is. But I do believe that should be the case. I think going on to kind of the next one, I just said the political biases and things like that, we can use this as filters in various ways. And I think one of the interesting things and the other thing I've called for regulation
Starting point is 00:40:20 of, maybe I should do it a bit more loudly, is you have a lot of companies that have ads as one of their key elements, right? And ads are basically manipulation. And these models are really convincing. They can write really great pros. My sister-in-law creates a company Synantic that can do human, realistic, emotional voices. She did like Val Kilmer's voice for his documentary and stuff like that before selling to Spotify. It's going to be crazy, the types of ads that you see. and we need to have regulation about those soon because you're going to see META and Google and others trying to optimize for engagement and manipulation fundamentally. And I think that those can then be co-opted by various other parties as well on the political spectrum.
Starting point is 00:40:59 So we need to start building some sort of protections around that. What was the final one? Sorry, Elad. I was just asking about, and I think Sarah asked the question around, you know, where do you think regulations should be applied and what would be positive outcomes of that versus negative outcomes? Yes, I think there should be these elements around identification of AI, especially on advertising. I think that there should be regulation on very large models in particular. The European Union introducing a C-E mark slash generative AI restrictions where the creators are responsible for the outputs, I think is the wrong way. But there are other ones as well.
Starting point is 00:41:34 Like I would call four opt-out mechanisms, and I think we're the only one's building those for data sets. So we're also building some of these data sets. and trying to figure out attribution mechanisms for opt-in as well, on the other side. Like right now, the only thing that is really kind of checked is robot's dot text, which is kind of thing on scraping. But I think, again, it's evolving so fast that people might be okay with scraping, but they may not be okay with this. Legally, it's fine.
Starting point is 00:42:00 But then I think we should make this more and more inclusive as things go forward. So that's, for example, if an artist doesn't want their work represented in a corpus that a machine is trained on, for example. Yes, and it's difficult. It isn't just a case of, you know, don't look at deviant art or my website. Like, what if your picture is on the BBC or CNN with a label? It will pick that up, you know? So it's a lot harder. This is why, like, we trained our own open clip model.
Starting point is 00:42:24 We have the new clip G landing this week. That's even better on zero, I think 80%. Because we need to know what data was on the generative and the guidance side so that we could start offering opt out and opt-in and these other things. So it's not as easy as some people kind of know. Yeah. And then I guess one other area that people often talk about safety is more around defense applications and the ethics of using some of these models in the context of defense or offense from a national
Starting point is 00:42:49 perspective. What's your view on that? I think the bad guys, I'm going to put that in quotes, have access to these models already in thousands and thousands of A100s. I think you have to start building defense, but it's a very difficult one. Like, we were going to do a $200,000 deep-fate detector prize, but then it was pointed out quite reasonably that if you create a prize for a detector, then creates, well, a bouncing effect, where you have a generator and a detector, and they bounce off each other, and you just get better and better and better. So now we're trying to rethink that.
Starting point is 00:43:17 Maybe we'll offer a prize for the best suggestion of how to kind of do this. Similar to chat GPT is detectable, but not really. So I think the defence implications of this, it's largely around kind of misinformation and disinformation. This is an area that I have advised multiple governments on with my work on counter extremism and others. It's a very difficult one to unpick, but I think one of the key things here is having attribution-based mechanisms and other things for curation.
Starting point is 00:43:40 because our networks are curated. And so this is where we've teamed up with like Adobe on Content Authenticity.org and others. I think that metadata element is probably the winning one here, but we have to standardize as quickly as possible around trusted sources. I think people already don't believe what they see there, which is a good thing and a bad thing. We want to have those trusted coordinators around this.
Starting point is 00:44:00 Beyond that, in some of the more severe kind of things around drones and slaughter bots and things like that, I don't know how to stop that, unfortunately. And I think that's a very complicated. thing, but we need an international compact on that because, again, this technology is incredibly dangerous when used in those areas. And I don't think there's been enough discussion at the highest levels on this, given the pace of adoption right now. I think that's all we have time for today. So one last important question for you. What controversial prediction? You seem like an optimist,
Starting point is 00:44:28 but good or bad about AI do you have over the next five years. I think that small models will outperform large models massively, like I said, the hive model aspect. And you will see chat GPT-level models running on the edge on smartphones in five years, which will be crazy. Great. Thanks so much for joining us. Amazing conversation, as usual. It's my pleasure. Thank you for listening to this week's episode of No Priors. Follow No Priors for a new guest each week and let us know online what you think and who
Starting point is 00:44:57 and AI you want to hear from. You can keep in touch with me and conviction by following at Serenormis. You can follow me on Twitter at Alad Gill. Thanks for listening. No Priors is produced in partnership with Pod. People. Special thanks to our team, Cynthia Geldea and Pranav Reddy, and the production team at Pod People. Alex Vigmanis, Matt Saab, Amy Machado, Ashton, Ashton Carter, Danielle Roth, Carter Wogan, and Billy Libby. Also, our parents, our children, the Academy, GovGBT, and our future AGI overlords.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.