No Priors: Artificial Intelligence | Technology | Startups - How can we make sure that everyone has access to AI? Can small models outperform large models? With Stability AI’s Emad Mostaque
Episode Date: February 16, 2023AI-generated images have been everywhere over the past year, but one company has fueled an explosive developer ecosystem around large image models: Stability AI. Stability builds open AI tools with a ...mission to improve humanity. Stability AI is most known for Stable Diffusion, the AI model where a user puts in a natural language prompt and the AI generates images. But they're also engaged in progressing models in natural language, voice, video, and biology. This week on the podcast, Emad Mostaque joins Sarah Guo and Elad Gil to talk about how this barely one-year-old, London-based company has changed the AI landscape, scaling laws, progress in different modalities, frameworks for AI safety and why the future of AI is open. Show Links: Stability.AI Stable Diffusion V2 on Hugging Face Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @EMostaque Show Notes: [2:00] - Emad’s background as one of the largest investors in video games and artificial intelligence [7:24] - Open-source efforts in AI [13:09] - Stability.AI as the only independent multimodal AI company in the world [15:28] - Computational biology, medical information and medical models [23:29] - Pace of Adoption [26:31] - AGI versus intelligence augmentation [31:38] - Stability.AI’s business model [37:44] - AI Safety
Transcript
Discussion (0)
What if the route to AGI is not one big model to rule them all trained on the whole
internet and then narrowed down to human preferences, but instead millions of models that reflect
the diversity of humanity that are then bought together?
I think that is an interesting way to kind of look at it, because that will also be more
likely to be a human-aligned AGI, rather than trying to make this massive elder god of
weirdness bow to your will, which is what it feels like at the moment.
This is the No Pryors podcast. I'm Saragua. I'm Alad Gail. We invest in, advise, and help start
technology companies. In this podcast, we're talking with the leading founders and researchers
in AI about the biggest questions.
AI generated images have been everywhere over the past year,
but one company has fueled an explosive developer ecosystem around large image models,
and that's Stability AI.
This week on the podcast, we'll talk to Imad Mastak, the founder and CEO.
Stability builds open AI tools.
They're most known for stable diffusion,
the unreasonably effective AI model where a user puts in a natural language prompt,
and the AI generates images.
But they're also engaged in progressing models,
in natural language, voice, video, and biology.
We'll talk about how this barely one-year-old London-based company
has changed the AI landscape, scaling laws,
progress in different modalities, safety,
and why he thinks the future of AI is open.
Amad, welcome to no priors.
Thank you for having me on, Sarah, your lad.
Let's start with personal story.
You have a background in computer science
and you were working in the hedge fund world.
That's a hard left turn, or it looks like it,
from that world to being a driving force in the A.A.
How did you end up working in this field?
Yeah, I've always been interested in AI and technology.
So on the hedge fund, I was one of the largest investors in video games and artificial intelligence.
But then my real interest came when my son was diagnosed with autism.
And I was told there was no cure or treatment.
And I was like, well, let's try and see what we can do.
So I built up a team and did AI-based literature review.
This was about 12 years ago of the existing treatments and papers to try and figure out commonalities.
and then did some biomolecular pathway analysis of neurotransmitters for drug-be-purposing
and came down to a few different things that could be causing it, you know, worked with doctors
to treat him, and he went to mainstream school, and that was fantastic.
Went back to running a hedge fund, won some awards, and then I was like, let's try and make
the world better.
And so the first one was non-AI-enhanced education tablets for refugees and others, and that's
Imagine Worldwide, my co-founder's charity.
And then in 2020, COVID came, and I saw something like autism, a multi-systemic condition that existing mechanisms that extrapolated the future from the past wouldn't be able to keep up with and thought, could we use AI to make this understandable?
And so I set up an AI initiative with the World Bank, UNESCO and others to try and understand what caused COVID and try and make that available to everyone.
then I hit the institutional wall in a variety of places and realized that the models and technologies
that had evolved were far beyond anything that happened before.
And there were some interesting arbitrage opportunities from a business perspective,
and more on that, a bit of a moral imperative to make this technology available to everyone,
because we're now going to very narrow superhuman performance, and everyone should have access to that.
It's an amazing journey, and congratulations on all the impact you've already had.
So as you say, or as you imply, the AI field in recent years has been increasingly driven by
labs and private companies. And one of the most obvious paths to performance progress is to
just make models bigger, right? Scaling data, parameters, GPUs, which is very expensive.
And then in reaction, just to set the stage a little bit, there's been some efforts over the previous
years to be more community driven and open and build alternatives like Luther. How did
you start engaging in that? And how did stability change the game here? Yeah. So when I was doing
the COVID work, we tried to get access to various models. In some cases, the companies blew up.
Other cases, we weren't given access, despite it being a high profile project. And so I started
supporting Aluthor AI as part of the second wave. So, you know, Stella and Connor and others kind
of led it on the language model side. But really, one of my main interests was the image model
side. I have Afantasia, so I can't visualize anything in my brain, which is more common than people
would think. In fact, a lot of the developers in this space have that. Like, we've got nothing
in our brain. You just see words? What's in there? Just feelings. So, like, again, I thought it was
a metaphor. Imagine yourself on a beach. I was like, okay, I feel a beach. No, apparently,
you guys have pictures in your heads. It must be like just disconcerting. But then with the
arrival of clip released by Open AI a couple of years ago, you could suddenly,
take generative models and guide them to text prompts. So it was VQGAN, which is kind of the
slightly mushy, more abstract version first. But I build a model for my daughter while I was
recovering, ironically, from COVID. And then she took the output and sold it as an NFT for three
and a half thousand dollars and donated to India Covalief. And I was like, wow, that's crazy.
So I started supporting the whole space at Luther and beyond, giving jobs to the developers,
compute for the model creators, funding the various notebooks from
disco diffusion to these other things, you know, giving grants to people like mid-Jurney that were
kind of kicking this off. Just personally. Just personally. They were doing all the hard work.
And I was like, can I catalyse this? Because this is good for society. Then about 15 months ago,
I was like, well, these communities are growing. It'd be great if we could create this as a common good.
And originally I thought, you've got communities, you've got to make them kind of coordinated.
Could a Tao work or a Tao of Dao's? And that's how stability started. After about a week,
I realized that was not going to work, and it was incredibly difficult.
So then I figured out commercial open source software could be the way to create
aligned technology, not just an image, but beyond, that would potentially change the game
by making this stuff accessible.
Because as you said, one of the key things, this is in the state of the AI report, this
is in AI index as well, is that most research has been subject to scaling laws and other
things.
Transformers seem to work for everything.
and so it was moving more and more towards private companies,
but the power of this technology is double-edged.
One is that there are fears about what could go wrong,
so it's not released.
And the other one is, why not keep it for excess returns, right?
So you've had this massive brain drain occurring, and no real option.
You work in an academic lab, you have a couple of GPUs,
or you go and work at big tech or slash open AI,
or you set up your own startup, which is very, very difficult, as you guys know.
So I wanted to create another option, and that's what we did with Eleuther and Stability and the other communities that we have grown and incubated.
Could you talk more broadly about why you think it's important for there to be open source efforts in AI and what your view of the world is?
Because I think stability has really helped create this alternative to a lot of the closed ecosystems, particularly around image gin, protein folding, a variety of different areas.
And those are incredibly important efforts.
So I just love to hear more about your thoughts on, you know, why is this important?
how you all view the participation of the industry over time,
and also what you think the world looks like
and five years, 10 years, et cetera,
in terms of closed versus open systems?
So I think there's a fundamental misunderstanding
about this technology, because it's a very new thing, right?
Fascal open source is lots of people working together
with a bit of direction.
It's a bit chaotic, but then you've seen Red Hat
and other things emerge from this.
There aren't many people that train these models, right?
We don't invite the whole community
and you have 100 people training a model.
It's usually 5 to 10, plus a super-com.
computer and a data team and things like that.
And the models when they come out are a new type of programming primitive infrastructure
because you can have a stable diffusion that's two gigabytes that deterministically converts
a string into an image.
That's a bit insane and that's what's led to the adoption here.
You know, on GitHub styles, we've overtaken Ethereum and Bitcoin cumulatively.
It took them 10 years.
We got there in like three, four months.
If you look at the whole ecosystem, it's the most popular open source software ever,
not just AI.
Why? Because again, it is this new translation file. And you do the pre-compute, as it were, on these big supercomputers,
which means the inference required to create an image is very low. And that's not what people would have
expected five years ago, or to create a chat GPT output. So as infrastructure, I think that's how it should
be viewed. And so my take was that what would happen is everyone would be closed because you need
a talent, data, and supercomput. And those would be lacking, as it were. So it'd be.
the big companies only. They would go four or five years, and then someone would defect and go
open source, and it would collapse the market as they would commoditize everyone else's
complement. So similar to Google offering free Gmail and all sorts of stuff around their core
business. But more than that, I realized that governments and others would need this infrastructure,
because if a company has it privately, they will sell to business to business, so maybe a bit of B2C.
But we've seen the Cambrian explosion of people building around this technology. But who's building the
Japan model or the India model or others. Well, we are. And then that means that you can tap into
infrastructure spending, which is very important because it needs billions. But the reality is that's
actually a small drop in the ocean. Self-driving cars got 100 billion of investment. We have three
hundreds of billions. 5G trillions. And for me, this is 5G level. So from an ethical, moral
perspective, I was like, we've got to make this as equitably available as possible. So this model
perspective, I thought it was a good idea as well. But I thought we were held here inevitably.
So I decided to create stability to help coordinate and drive this forward in what's hopefully
a moral and reasonable way. Like, you know, the decisions that we make have a lot of import and
they're not easy, but we are trying to be kind of Switzerland in the middle of all of this
and provide infrastructure that will uplift everyone here.
What do you think this world looks like in five years or ten years? Do you think that there's
a mix of closed and open source? Do you think the most cutting-edge models, the giant, the giant
language models are going to be both?
Or do you think, like, Capital will eventually become such a large obstacle that it'll make
the private world more likely to drive progress forward?
And I know you have plans in terms of how to offset that, but I just love to hear about those.
The reality is we have more compute available to us than Microsoft or Google.
So I have access to national supercomputers, and I'm helping multiple nations build
X-Scale computers.
So to give you an example, we just got a 7 million-hour grant on Summit, one of the fastest
supercomputers in the U.S.
And like I said, we're building exoscale computers that are literally the fastest in the world.
Private companies don't have access to that infrastructure, because governments, thanks to us,
are realizing that this is infrastructure of the future.
So we have more compute access.
We have more cooperation from the whole of academia than all of them do because their agreements tend to be commercial.
There's no way that private enterprise can keep up with us.
And our costs are zero as well when you actually consider that, whereas they have to ramp up tens of billions of dollars of compute.
So my take is that foundation models will all be open source for the deep learning phase.
Because we're actually got multiple phases now.
The first stage is deep learning.
That's creating of these large models.
And we'll be the coordinator of the open source.
The next stage is the reinforcement learning, the instruct models,
Flan Palm or Instruct GPT or others.
That requires very specified annotation, and that's something that private companies can excel in.
The next stage beyond that is fine tuning.
So actually, let's give a practical example.
palm is a 540 billion parameter model it achieves about 50% on medical answers right flan palm is the instructed version of that and that achieves 70% med palm they took medical information they fed it in this is a recent paper from a few weeks ago achieved 92% which is human level on the answers and the final stage for that is you take this med palm and you put it into clinical practice with human in the loop for me the private sector
will be focused on the instruct to human in the loop area,
and the base models will be infrastructure available to everyone
on an international generalized and national basis,
particularly because when you combine models together,
I think that's superior to creating multilingual models.
So that's quite a bit there, and I'm sure you want to unpack that.
Yeah, that's very exciting, yeah.
Could you actually talk about the range of things or efforts that are going on at stability right now?
I know that you've done everything from these foundation models on the language side.
protein folding, image,
and et cetera, if you could just kind of explain
what is a spectrum of stuff
that stability does and supports and works
with, and then what are the areas
that you're putting the most emphasis behind going forward?
Yes, I think we are the only
independent multimodal AI company in the world.
So you have amazing research labs
like Fair, Meta,
and others, and Deep Mind, doing everything
from protein folding to language to image.
And there are cross-learnings from all of these.
Basically, we do
everything from audio to,
language, coding, models, any kind of almost private model, we are looking at what the
open equivalent looks like, and that's not always a replication, right?
So with stable diffusion, for example, we optimized it for a 24-gibight V-Ram GPU.
Now, as of the release of distilled stable diffusion, it will run in a couple of seconds
on an iPhone, and we have neural engine access, because our view of the future is creating
models that aren't necessarily bigger, but that are customizable and editable.
So this is a bit of a different emphasis.
And we think that's a superior thing than scaling.
I think things like the chinchilla paper,
that's the 67 billion parameter model,
that's as performance as GPT3,
at 175 billion,
are important in that because it said that training more is important.
And actually when you dig into it,
it actually said data quality is important.
Because now we're seeing that the first stage,
the DL stage, is it where the deep learning stage is,
let's use all the tokens on the internet, you know?
But maybe we can use better tokens.
And that's what we see when we instruct and use reinforcement learning with human feedback.
And we've also been releasing technology around that.
So our Carpa Lab, representative learning, we released our instruct framework that allows you to instruct these big models to be more human.
The way I kind of put it is that our focus is thinking, what are the foundation models that will advance humanity, be it commercial or not, what needs to be there and what's very susceptible to this transformer-based architecture that takes about 80% of all research in the space?
making that compute and knowledge and understanding of how to build these models available to academia, independent research, and our own researchers, and then from a business perspective, really focusing on where our edges and our edges in two areas.
One is media, and so this is why image models, video models, and audio models have been a focus, 3D soon as well.
And the other area is private and regulated data, because what's the probability that a GPT3 model weight or a palm model weight will be put on-prem?
It's very low.
Versus an open model, it's very high.
And there's a lot more valuable private data than there is public data.
So it is a bit of everything, but like I said, there are certain focuses on the business side on media.
And then I think on a breakthrough side, computational biology will be the biggest.
one. That's really cool. And on the computational biology side, I guess there's a few different
areas. There's things like protein folding, and then to your point, there's things like Med Palm.
And so, are you thinking of playing a role in both of those types of models in terms of both
the medical information? Yes. We will release an open MedPom model, well, med-stable GPT.
And then protein folding, we are one of the key drivers of OpenFold right now. So we just
released a paper on that, much faster ablations than AlphaFold. We're doing as well DNA diffusion
for predicting the outcome of DNA of sequences.
We have biolm around taking language models for chemical reactions,
and that's an area that we will aggressively build
because there's a lot of demand from the computational biology side
for some level of standardization there.
There have been initiatives like Melody and others
looking at federated learning,
but there is a misalignment of incentives in that space
that I think we could come in and fix.
And I think that's where we really view ourselves.
How can you really align incentive structures
and create a foundational element that brings people together?
and I think that's where we are most valuable
because private sector can't do it that well,
public sector can't do it that well,
a mission-oriented private company
that has this broad base
and all these areas could potentially.
Yeah, I think also the global nature of your focus
is really exciting because when I look at things
like medical information or medical models,
ultimately the big vision there,
which a number of people have talked about for decades
at this point is that you'd have a machine
that would allow you to have very high access
to care and medical information
to matter where you're in the world,
and especially since you can take images with your phone
and then interpret them with different types of models
and then have like an output,
you know, you should, if you have a cardiac issue,
you should have care equivalent
to the world's best cardiologist from Stanford
or, you know, you name the Center of Excellence available
to anybody in the world,
whether they're rich, poor, developing country, not, etc.
And so, you know, it's very compelling to see
this big wave of technology
and sort of the things that may be able to enable,
including some of the things that you mentioned
around AI and medicine.
So it's very exciting stuff.
I think it's very interesting as well because this technology is being adopted so fast.
I mean, let's face it, Microsoft and Google, $2 trillion companies have made it core of
their strategy, which is crazy insane for technology that's basically five years old,
let's say two years old, really breaking through.
Because it can adapt to existing infrastructure, you know, like it sits there and it absorbs
knowledge when you fine tune it.
But then my thing is, I look to the future and I'm like,
like that best doctor, which bits of that should be infrastructure for everyone and which bits
of that should be private. And so that's how I kind of oriented my business. I look to the future,
I come back and I think, what should be public infrastructure and how can I help build that and
coordinate that? And that's valuable. And then everything else, other people can build around.
How do you think about the traditional pushback that's existed in the medical world around some of these
technologies? So, for example, you know, the first time an expert system or a computer could actually
outperform
Stanford University
physicians at predicting
infectious disease
was in the 1970s
with this Micing
project where they literally
trained an expert system
or designed an expert system
to be able to predict
infectious disease
but here we are
almost 50 years later
with none of that technology adopted
and so do you think
it's just we have to do
a lot of human-the-loop things
and it's a doctor's assistant
and that'll be good enough
do you think it's just a sea change
there aren't enough physicians
like what do you think
is the driver for the technological
adoption and something so important today
So I think the infrastructural barriers are huge for adoption of technologies, particularly
in private sector.
I think there is a new space of open source technology adoption that could be very interesting
and a willingness now that people kind of understand this, which wasn't there even 10 years
ago, you know, the nature of open source.
Now it runs the world's servers and databases.
I think there's another level of open source, which is open source complex systems,
as it were.
Previously in other discussions, I've talked about our education work.
So right now, we're deploying four million tablets to every child in Malawi.
By next year, we'll have hundreds of millions of kids, hopefully, that we deploy to.
It's not just education.
It's healthcare.
And it's working with the government.
It's working with multilateral to say, can we build a healthcare system from the bottom up that can do all of these things without an existing infrastructure?
Because they don't have an existing infrastructure.
It's one doctor per thousand kids, 10,000 kids.
One teacher per 400 kids.
I am certain that system will outperform anything in the West within five years, which is crazy to say.
But then our Western systems can then take bits of that and adapt to it, because I think
this competitive pressure is required, because Western systems are very hard to change.
And in the UK, we've done that with HDR, UK, the genomic banks and others.
And that was a massive uphill battle, as you know, to get these technologies adopted, because
I mean, there should be barriers to adoption of this technology when it comes to things as important
as healthcare. But at the same time, I think now is the time to open it up. Yeah, I think there is an
interesting loose analogy to different pace of adoption of different technologies and different geos
in the past, right? So one that comes to mind is today, I think it's very commonplace amongst
consumer internet investors to look at what's happened with mobile in East Asia as a precursor
to interactions that might happen here. And, you know, mobile technology advanced much more rapidly
in China, Korea, many other places, one, because of private public partnership, and two, because
there was more, I guess, green field in terms of access to information and different infrastructure
that supported mobile is the primary communication medium. And I could certainly see that happening
with some AI native products. I think that's an excellent point. I agree 100%. I think just as they
leapfrog to mobile, a lot of the emerging markets, Asia in particular, will leapfrog to
generative AI or personalized AI.
And I can see this because I'm having
discussions with the governments right now.
What is the reaction?
Over the Christmas holiday, I was getting a few
hours of sleep finally.
I got like six calls from
headmasters of UK schools
saying, Emad, what is our generative
AI strategy? I was like, you're what?
And they were like, all our kids are using chat GPT
to do their homework.
And so it's kind of one of the first
global moments, an amazing interface that
Open AI built. It's going
mainstream. And I was like, well, get good.
you know, stop assigning essays.
So now in some of the top private schools in the UK,
they actually have to write the essays during the lessons without computers,
which I think is wrong.
Because my discussions in an Asian context, for example,
with certain leading governments that are about to put tens of billions into this space,
they're embracing the technology,
and they're like, how can we have our own versions of this?
And how can we implement this to help our students get even better, right?
Because also it's very, even though there might be bureaucracy in some of these nations,
if they want to get something done, they get it done.
And this technology is very different in that the costs are not continuous, like a 5G network.
The KAPX profile and other things are very different.
Like, you know, you can say it costs $10 million to train a GPT.
It doesn't cost that much anymore.
That's really valuable if you can have a chat GPT for everyone.
Like the ROI is a huge.
So, yeah, I do think that a lot of these nations, like the African context is one that we're driving forward with education as a core piece.
And right now we're teaching kids with the most basic AI in the world.
world, literacy and numeracy in 13 months on one hour a day in refugee camps. That's insane.
That's already better. It's going to get even better. But I think Asia in particular,
they're going to go directly to this technology and embrace it fully. And then we have to have a
question. If you're not embracing this in the West, in America and the UK, you're going to
fall behind because ultimately this can translate between structure and unstructured data quicker
than anything. I'd like to see what pace of adoption we can have in the United States,
this technology as well, but I can see the prediction coming true. If we just go back to the core,
I guess not the core necessarily, but the most advanced, like mature use case with instability
and, as you said, media as an advantage, what does the future of media look like? And actually,
even if we go back before that, you know, you're involved in sort of early ecosystem efforts
with Luther and such. How did you even identify that this was an area of interest for you versus
everything else going on across modalities.
So, you know, I've always been interested in meaning,
like semantic is even part of my email address,
and that's my religious studies as well around epistemology and ethics, ironically.
The way that I viewed it is that the easiest way for us to communicate
is what we're doing right now via words, right?
And that's held constant, but now we can communicate via phones and podcasts or whatever,
and it's nice.
Writing was more difficult, and the Gutenberg Press made it easier,
but visual communication is incredibly difficult,
be it a PowerPoint, which is visual communication,
or art, which is visual communication,
and then you have video and things like that,
which is just impossible.
Now you have TikToks and others making it easier.
I saw this technology, and I was like,
if the pace of acceleration continues,
visual communication becomes easy for everyone.
Like my mom sending me memes every day,
telling me to call more or kind of whatever.
And I'm like, that's amazing,
because that creation will make humanity happier.
Like you see art therapy, that's visual communication,
and it's the most effective form of therapy.
What if you could give that to everyone?
So there was that aspect to it.
But then I saw movie creation and things like that.
So my first job was actually organizing the British Independent Film Awards
and being a review for the Rain Dance Film Festival.
So, you know, every year I put a movie on for my birthday
and we give the proceeds to charity.
I get to see my favorite movie with my friends.
It's pretty cool.
And then I was the biggest video game investor in the world at one point.
So these types of communication and interaction are really interesting.
And I thought that people really misunderstood the metaverse UGC
and the nature of what could happen
if anyone could create anything instantly.
It's not going to be a world for everyone
or a world that everyone visits.
It's going to be everyone sharing their own worlds
and seeing the richness of humanity.
And again, I thought that was an amazing
ethical slash moral imperative
for making humanity better,
but also an amazing business opportunity
because the nature and way that we create media
will transform as a result of this technology
and we're seeing it right now.
We have amazing apps like Descript, right?
where you could take this podcast and you can edit it with your words live, you know.
You have amazing kind of gaming things come out where you create assets and instances
or, you know, some of this new 3D Nerf technology where you can reshoot stuff.
We are working with multiple movie studios at the moment who are saving millions of dollars
just implementing stable diffusion by itself, let alone these other technologies.
And that was, for me, tremendously exciting to allow anyone not to be creative because people are
creative, but to access creativity. And then allow the creatives to be even more creative and tell
even better stories. I believe Sam from OpenAI said they don't think image generation is kind of
like core on the path to AGI. It's obviously really important to you personally and to stability.
Tell us about your stance on AGI and if that's part of the stability mission. Yeah, I don't care about
AGI except for it not killing us. They can care about it. What I care about is intelligence augmentation.
You know, this is the classic kind of memex type of thing.
How can we make humans better?
Our mission is to build the foundation to activate humanity's potential.
So, look, AGI is fine.
Again, we have to have some things around that.
I do believe that they are incorrect around multimodality being or images being a core
component of that.
But I think there are two paradigms here.
One is stack more layers, and I'm sure GPT4 and Palm 18 and all these things will be
amazing, stacking more layers and having better data as well.
but one of the things we saw, for example, stable diffusion, we put it out together and then
people trained hundreds of different models. When you combine those models, it learns all sorts
of features like perfect faces and perfect fingers and other things. And this kind of is related
to the work that deep-minded with Gato and others that show the auto-regression of these models
and the latent spaces becomes really, really interesting. So what if the route to AGI is not one big
model to rule them all trained on the whole internet and then narrowed down to human preferences,
but instead millions of models that reflect the diversity of humanity that are then brought
together. I think that is an interesting way to kind of look at it, because that will also be
more likely to be a human-aligned AGI, rather than trying to make this massive elder god of
weirdness bow to your will, you know, which is what it feels like at the moment.
So we're going to have a hive elder god instead. You've mentioned that stability is still working on language. The application of diffusion models to image is a really unique breakthrough and it's not as computationally intensive as like the known approaches to language so far. I think you've said that the core training run for the original stable diffusion was 150,000 A100 hours, which is like not that huge in the grander scheme of things. What can you tell us about your approach to language?
So yeah, so via the kind of Luther AI side of things and our team there, you know, we released GPT, NeoJ, and X, which have been downloaded 20 million times. They're the most popular language models in the world. You kind of basically either use GPT3 or use those. They go up to about 20 billion parameters. And like I said, we've released our TRLX from the Carper Lab, which is the instruct framework. They're training, you know, multiple models up to 100 billion parameters now. I don't think you need more, chinchilla optimal, to enable a chat, open chat,
P-PT equivalent, you know, enable an open-clawed equivalent.
I think that will be an amazing foundation from which to train sector-specific and other models
that then again can be auto-regressed, and there will be very interesting things around that.
Language requires more, not necessarily because of the approach and diffusion breakthroughs.
Like recently Google had their newspaper where they showed a transformer actually can replace
the VAE, so you don't necessarily need diffusion for great images.
It's more because language is semantically dense, I think, versus images.
And there's a lot more accuracy that's required for these things.
That, I think there are various breakthroughs that can occur.
Like we have an attention-free transformer model basis in our RWKV that we've been funding.
We've got a 14 billion parameter version of that coming out that has showing amazing kind of progress.
But I think that the way to kind of look at this is we haven't gone through the optimization cycle of language yet.
So Open AI, again, amazing work they do.
They announced in Struct GPT, their 1.3 billion parameters.
version outperformed 175 billion parameter GPT3.
You look at kind of Flan T5,
the instruct version of the T5XXL model from Google.
The 3 billion parameter version outperforms GPT at 175 billion parameters
in certain cases, you know?
These are very interesting results,
and it's one of those things that as these things get released,
it gets optimized.
So like with stable diffusion, leave aside the architecture.
day one, 5.6 seconds for an image generation using an A100.
Now 0.9 seconds.
With the additional breakthroughs that are coming through, it'll be 25 images a second.
That's 100 times speed up over 100 times, just from getting it out there and people interested in doing that.
I think language models will be similar.
And I don't think that you need to have ridiculous scale when you can understand how humans interact with the models
and when you can learn from the collective of humanity.
So, like I said, a very different approach of small language models.
or medium ones versus let's train a trillion parameters or whatever.
And I think there will be room for both.
I think it will be use these amazingly packaged services from Microsoft and Google
if you just want something out the box
or if you need something trained on your own data with privacy and things like that.
That may not be as good, but maybe better for you.
Use an open source space and work with our partners at SageMaker or whoever else, you know.
Can you talk more about that in the context of your business model
and your approach?
you mentioned that you think that some of the areas that stability will be focused on as media
and then proprietary and regulated data sets.
And if there's things you can share right now in that area, if not, no worries.
But if you can, it would be interesting to learn more about, you know, how you view the business evolving.
Sure.
So, like, now we're training on hundreds and soon thousands of Bollywood movies to create Bollywood
video models with our partnership with Eros, and that is exclusively licensed.
We'll have audio models coming as well, see if you're A. Rahman model or whatever.
You know, we're talking to various other entities as well, and this is why we have the partnership with Amazon and SageMaker.
So there'll be additional services that can train models for your behalf of most people.
Our focus is on the big models for kind of nations, the big models for the largest corporates, who will need to train their own models one day.
And that's really difficult.
There's only like 100 people who can train models in the world.
Like it's not really a science.
It's more an art.
Like losses explode all over the place when you try to do something.
And so we're going to make it easy for them.
and we're going to be inside the data centers,
training their own models that they control,
and our open source models then become the benchmark models for everyone.
Again, we have access to the neural engine,
dedicated teams at Intel and others kind of working on optimizing these.
That is the model.
The framework and the open model is optimized,
and then we take and create private models.
And again, I think that's complementary to the APIs
and other things you will see from Microsoft, Google, etc.
Because, yeah, you would want both.
Yeah.
Some of the other areas that you've focused on,
or at least talked about, I think, in interesting ways, is about how AI can be used to make
our democracy more direct and digital, a little bit more about, you know, broader global impact.
Could you extrapolate a bit more there?
Yeah, so I think, you know, we have to look at intelligence augmentation, right?
Like, information theory in classical Shannon ways, information is valuable in as much as it
changes the state.
And we've obviously seen political information become more and more influenced by manipulation
of stories and things like that.
So the divide has been grown.
what if we could create an AI that could translate between various things, make things easier to
understand, and make people more informed? I think that would be ideal with some of these national
and public models and interfaces being provided to people. And then that can be very positive
for democracy and allowing people to really understand the needs. Like, you can already, with chat GPT,
when you train it on nature of yourself, it can summarize for your perspective. You know,
that's amazing thing, right? You can talk to talk like a five-year-old or a six-year-old or
eight-year-old or a 10-year-old. Once it starts understanding Sarah and Elad, that'll be even
better. Again, you don't need to send open students to do that. The Open AI embeddings API is
fantastic. But I think there'll be more and more of these services that allow there to be that
filter layer between us and this mass of information on the internet, that will be amazing.
I think if we build the education systems and other things correctly as well, this young
ladies illustrated primer, that we're going to give to all the kids in Africa and beyond, like,
again, let's really blue sky think, how can we get people engaged with their communities
in societies because it will be a full open source stack, not only education and healthcare
and beyond, that's super exciting.
I think, again, that's the future of how we come together, because you want to come together
to form a human colossus, like in the Waitbook, Why style, where you get shit done, pardon my
language.
And I think this is one of the best ways for us to do that leveraging these technologies.
Okay, we don't have commercial sponsors.
There's actually a book called Lady of Mazes that's a, aGI-centric book from like 10 years
ago, and basically the idea is sort of what you mentioned, where as different AGIs gain models
of how a subset of the population thinks about certain issues, it substantiates into a virtual
person who's basically representing them and someone has a representative's equivalent, so you
don't actually have to vote. The AGI just kind of synthesizes group opinions and then turns it
into representatives. Yeah, and you have to think about, you know, with the advances like META's
amazing work on Cicero, for example, beating humans on diplomacy. They used eight different language
models combined. Again, I think this is the future, not just zero shot. Multiple models interacting
with each other is the way, full stop. The issue and mechanism design perspective of kind of the
game theory of our current economy is that there is no central organizing factor that we trust.
Like, what is the trust in Congress? Like, I think they trust Congress less than cockroaches.
No offense to Congress, please don't bring me up. Like, it's just a poll, right? People will err towards
trusting machines, as it were, and machines are capable of load balancing. Now they're capable of
low balancing facts and things.
And so we have to be super careful as we integrate these things, what that looks like,
because they will make more and more decisions for us.
That could be for our benefit, having an AI that speaks on our behalf and amalgamates.
But then we need to make sure that these aren't too braille and fragile as we seed more
and more of our own personal authority to them because they are optimizing.
This is also one of the dangers on the alignment side.
As we introduce RLHF into some of these large models, there are very weird instances of
mode collapse and out of sample things. I do say these large models as well should be viewed as
fiction creative models, not fact models, because otherwise we've created the most efficient
compression in the world. Does it make sense you can take terabytes of data and compress it down
to a few gigabytes with no loss? No, of course you lose something. You lose the factualness of them,
but you keep the principle-based analysis of them. So we have to be very clear about what these
models can and can't do, because I think we will seed more and more of our authority,
individually as a society
to the coordinators
of these models. Could you talk more about that
in the context of safety because ultimately
one of the concerns that's
sort of increased in the AI community
is AI safety and there's sort of
three or four components of that. There's alignment
you know, we'll bots kill humans or whatever
form you want to put it in. The farm is not
kill us. Farmers, yes, good point. They'll just
build a giant RLHF farm on top
of us or something. There's the concern
around certain types of content, pitifilia, etc., that
people don't want to have existence society for all sorts of positive reasons.
There's politics.
You know, there's concerns, for example, that AI may become the next big battleground after
social media in terms of political viewpoints being represented in these models with
the claims that they're not political viewpoints.
And so I'm sort of curious how you think about AI safety more broadly, particularly when
you talk about trust of models, to your point, part of it is fact versus fiction, but part
of it may also be, well, it looks like it's political.
And so therefore, maybe I can't trust it at all.
Yeah, I don't think technology is neutral.
So I'm not one of the people that adheres to that, especially with the way we build it.
It does reflect the biases and other things that we have in there.
I did kind of follow the open source thing, because I think we can adjust that.
On the alignment side, you know, it was interesting, a Luther basically split into two.
Part of it is stability and the people who work here on capabilities.
The other part is conjecture that does specific work on alignment, and they're also based here in London.
And I think it's not easy, right?
I think that everyone is ramping up at the same time,
and we don't really understand how this technology works,
but we're doing our best, you know?
You have people like kind of Riley Goodside and others prompt whispers
who are like, wait, like what earth?
It can do all these kind of things.
I think that there needs to be more formalized work,
and I actually think there needs to be regulation around this,
because we are dealing with an unknown, unknown.
And I don't think we're doing good enough kind of tying things together,
particularly as we stack more layers.
and we get bigger and bigger and bigger.
I think small models are less dangerous,
but then the combination of them may not be safe.
But again, we don't know this yet.
You've mentioned before this support for the idea of regulation of large models.
What would be a productive outcome of that regulation that you can imagine?
I think that a productive outcome, that regulation is anything above a certain level of flops
needs to be registered similar to, well, bio-weapons and things that have the potential for dual use.
I think there needs to be a dedicated international team of people
who can actually understand
and put in place some regulations
and how do we test these models for things?
Like, you know, the amazing work
Anthropic recently did with constitutional models
and other things like that.
We need to stop pulling this knowledge
as opposed to keeping it secret,
but there is this game theoretic thing
of one of the optimal ways to stop AGI happening
is to build your own AGI first.
And so I'm not sure if that will ever happen,
but we're in a bit of a bind right now,
which means that everyone's having their own arms race.
When governments decide,
and I don't believe they decided yet,
that having an AGI is the number one thing, tens of billions, hundreds of billions will go into
building bigger models. And again, this is very dangerous, I think, from a variety of business
perspectives. So I prefer multilateral action right now, as opposed to in the future. So I put
that out there, but I can't really drive it. I'm really dying from all the everywhere because
it is. But I do believe that should be the case. I think going on to kind of the next one,
I just said the political biases and things like that, we can use this as filters in various
ways. And I think one of the interesting things and the other thing I've called for regulation
of, maybe I should do it a bit more loudly, is you have a lot of companies that have ads as one
of their key elements, right? And ads are basically manipulation. And these models are really
convincing. They can write really great pros. My sister-in-law creates a company Synantic that
can do human, realistic, emotional voices. She did like Val Kilmer's voice for his documentary and
stuff like that before selling to Spotify. It's going to be crazy, the types of ads that you see.
and we need to have regulation about those soon because you're going to see META and Google and others
trying to optimize for engagement and manipulation fundamentally.
And I think that those can then be co-opted by various other parties as well on the political spectrum.
So we need to start building some sort of protections around that.
What was the final one? Sorry, Elad.
I was just asking about, and I think Sarah asked the question around, you know, where do you think
regulations should be applied and what would be positive outcomes of that versus negative outcomes?
Yes, I think there should be these elements around identification of AI, especially on advertising.
I think that there should be regulation on very large models in particular.
The European Union introducing a C-E mark slash generative AI restrictions where the creators are responsible for the outputs, I think is the wrong way.
But there are other ones as well.
Like I would call four opt-out mechanisms, and I think we're the only one's building those for data sets.
So we're also building some of these data sets.
and trying to figure out attribution mechanisms for opt-in as well, on the other side.
Like right now, the only thing that is really kind of checked is robot's dot text,
which is kind of thing on scraping.
But I think, again, it's evolving so fast that people might be okay with scraping,
but they may not be okay with this.
Legally, it's fine.
But then I think we should make this more and more inclusive as things go forward.
So that's, for example, if an artist doesn't want their work represented in a corpus that a machine is trained on, for example.
Yes, and it's difficult.
It isn't just a case of, you know, don't look at deviant art or my website.
Like, what if your picture is on the BBC or CNN with a label?
It will pick that up, you know?
So it's a lot harder.
This is why, like, we trained our own open clip model.
We have the new clip G landing this week.
That's even better on zero, I think 80%.
Because we need to know what data was on the generative and the guidance side so that we
could start offering opt out and opt-in and these other things.
So it's not as easy as some people kind of know.
Yeah.
And then I guess one other area that people often talk about safety is more around defense applications
and the ethics of using some of these models in the context of defense or offense from a national
perspective. What's your view on that?
I think the bad guys, I'm going to put that in quotes, have access to these models already
in thousands and thousands of A100s. I think you have to start building defense, but it's a very
difficult one. Like, we were going to do a $200,000 deep-fate detector prize, but then it was
pointed out quite reasonably that if you create a prize for a detector, then creates, well,
a bouncing effect, where you have a generator and a detector, and they bounce off each other,
and you just get better and better and better.
So now we're trying to rethink that.
Maybe we'll offer a prize for the best suggestion of how to kind of do this.
Similar to chat GPT is detectable, but not really.
So I think the defence implications of this, it's largely around kind of misinformation and
disinformation.
This is an area that I have advised multiple governments on with my work on counter extremism
and others.
It's a very difficult one to unpick, but I think one of the key things here is having
attribution-based mechanisms and other things for curation.
because our networks are curated.
And so this is where we've teamed up with like Adobe
on Content Authenticity.org and others.
I think that metadata element is probably the winning one here,
but we have to standardize as quickly as possible around trusted sources.
I think people already don't believe what they see there,
which is a good thing and a bad thing.
We want to have those trusted coordinators around this.
Beyond that, in some of the more severe kind of things around drones
and slaughter bots and things like that,
I don't know how to stop that, unfortunately.
And I think that's a very complicated.
thing, but we need an international compact on that because, again, this technology is incredibly
dangerous when used in those areas. And I don't think there's been enough discussion at the highest
levels on this, given the pace of adoption right now. I think that's all we have time for today.
So one last important question for you. What controversial prediction? You seem like an optimist,
but good or bad about AI do you have over the next five years. I think that small models will
outperform large models massively, like I said, the hive model aspect. And you will see chat GPT-level models
running on the edge on smartphones in five years, which will be crazy.
Great. Thanks so much for joining us.
Amazing conversation, as usual.
It's my pleasure.
Thank you for listening to this week's episode of No Priors.
Follow No Priors for a new guest each week and let us know online what you think and who
and AI you want to hear from.
You can keep in touch with me and conviction by following at Serenormis.
You can follow me on Twitter at Alad Gill.
Thanks for listening.
No Priors is produced in partnership with Pod.
People. Special thanks to our team, Cynthia Geldea and Pranav Reddy, and the production team at Pod People.
Alex Vigmanis, Matt Saab, Amy Machado, Ashton, Ashton Carter, Danielle Roth, Carter Wogan, and Billy Libby.
Also, our parents, our children, the Academy, GovGBT, and our future AGI overlords.