No Priors: Artificial Intelligence | Technology | Startups - Context windows, computer constraints, and energy consumption with Sarah and Elad
Episode Date: May 9, 2024This week on No Priors hosts, Sarah and Elad are catching up on the latest AI news. They discuss the recent developments in AI like Meta’s new AI assistant and the latest in music generation, and ...if you’re interested in generative AI music, stay tuned for next week’s interview! Sarah and Elad also get into device-resident models, AI hardware, and ask just how smart smaller models can really get. These hardware constraints were compared to the hurdles AI platforms are continuing to face including computing constraints, energy consumption, context windows, and how to best integrate these products in apps that users are familiar with. Have a question for our next host-only episode or feedback for our team? Reach out to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil Show Notes: (0:00) Intro (1:25) Music AI generation (4:02) Apple’s LLM (11:39) The role of AI-specific hardware (15:25) AI platform updates (18:01) Forward thinking in investing in AI (20:33) Unlimited context (23:03) Energy constraints
Transcript
Discussion (0)
Hey listeners, you are here for another episode of No Pryors, just with me and Alad.
And there's been a lot going on in earnings and in the technical world.
So I think we will start with maybe just one fun thing that seems to have taken flight in terms of music generation.
Alad, what do you make of the popularity that Suno and Udio have found?
When you said fun things, I thought you were going to talk about my hats.
I have two hats.
We can talk about your hats.
I have my Bitcoin halving hat that I got from Coinbase, as you see,
the Bitcoin having.
And then Zaid has these Maki Eye Great, again, hats.
He's actually selling.
I think it's going to fund his data labeling habit.
That's a lot of hats.
I know, yeah.
That's all I got.
It's all I got left, sir.
If anybody has read Jared Kushner's book,
There's a great bit about how much money they're making from all the MAGA swag.
And so listeners, Elad and I are making hats and tequila for the guests.
Yeah.
But for the low, low price of one H100 GPU, we will give each of those to you too.
Or a Bitcoin. Either way.
Okay, I'll take the Bitcoin instead. Yeah.
Me too. I know. Yeah.
I think we all wait at this point.
We can check in again in a couple months and the D100s come out or whatever time period.
So, yeah, so, you know, as you know, there's been some really interesting things happening on the music generation side.
And so there's both Suno and UDO and both seem to be kind of taking fuel by storm in terms of really interesting music-based models.
And it feels like one of those things where it's early, but it's really giving a glimpse of what's coming in terms of the ability to create other types of content.
Obviously, the very first content wave in some sense with simple text-based things on GPT3, like Jasper.
And then we hit an image gen wave.
and that was my journey and stable diffusion and things like that.
And then we had obviously chat come out as sort of a new type of format
and interaction modality.
And then we had video with things like PICA.
And so it just feels like sequentially we're hitting these different formats.
And then obviously Suno from Open AI.
And now we have these really interesting music models
where you can specify the type of music that you want.
You can write the lyrics.
It'll add vocals.
And so, you know, these really seem to be the two.
models initially, at least the people are really adopting.
And so it just seems like an interesting moment in time from the perspective of
look at all these different creative things that people are now empowered to do
and look at the different ways to engage.
And of course, you could imagine going forward in time and saying, okay, at some point
they'll be voice cloning where, and I think it was Drake who put out a song, right,
where he had two or three other rappers that he just voiced cloned in.
And you can imagine a world where you could use anybody's voice, assuming there's
permissions and everything else, to generate your own songs and content.
things. So it just seems like a very exciting future world between UDOs, you know, and some of the
companies. Yeah, I think one of the things that's not obvious here is in like media platforms in
general, they're like the ratio varies, right? But there are a lot more readers on X or consumers,
like people who scroll a feed on TikTok or not TikTok anymore, I suppose, but whatever it is than
than creators.
And so I think, like, one thing that is just, like, unknown is how many people actually
want to create music if you make it a lot easier to, you know, create something that's
any good, right?
And if the music we get changes.
And so I was talking to one of these founders, and he was like, everybody should have a
personalized soundtrack for their life, but in the voice of Taylor Swift, in the style of
Taylor Swift.
So I already have one of those, but yeah.
So what is yours?
I can't really share it publicly, but we can talk about it later.
Okay, okay, yeah.
It's going to be a little bit, I think it's going to be a little bit of personal thing.
Yeah, it's great.
What else is going on?
I don't know.
I mean, what about local LMs in the Apple release?
You want to talk about that?
Yeah, so it's interesting.
Apple has entered the chat with a release of, you know, relatively small models that are now on hugging face and such.
And I think it's just worth talking about the fact that,
What some of the initial open source releases, Mixtral Lama did, was create, you know, pretty impressive reasoning capabilities that were open that developers could use.
But there's been huge demand for models that actually have, you know, that level of capability or at least useful capability in a one and three billion parameter size that'll fit on edge devices.
And of course, like, if you enable that, you have a very different latency paradigm in terms of what experiences for a consumer are possible.
And then not paying for the compute of inference all the time means you can do things much more easily that are ongoing and passive and proactive.
And so I think there's a lot of developer demand.
And I think of like foreshadows we should see Apple creating interfaces for running models locally as part of their ecosystem.
That's my general prediction.
Yeah, I think it's kind of interesting because I've seen a number of people building apps recently
that have been these Mac or iPhone OS resident sort of exposure to LLMs.
And in some cases, it's let me index everything on your computer and let you interact with it
or search on it or build an embedding on top of it or set up or embeddings.
Or let me just integrate chat GPT or other things into your desktop.
And there used to be like search bar apps and things like that and sort of prior generations.
And so I think a lot of that is going to be really interesting from user experience perspective.
And one can imagine a couple years from now, that's just going to be something that comes standard on your device, either direct from Apple or through some sort of partnership or something else.
Do you believe any of those things can persist independently, right?
Because I'm reminded of like the launcher platforms.
Like, you know, if you think about Android as a more permissive platform, people trying to choose core, change sort of core user experiences like that at what felt like an operating system level was kind of tough.
I think it depends on whether or not you truly take advantage of the browser and you're also accessing browser-related third-party applications that you're just getting through the web and indexing them or not.
And so really it comes down to like what's the footprint of content that you're searching over in some sense.
And that to me is the biggest pivot point in terms of whether you'll end up with something that's going to be specific to the OS company or broader.
I think in general, platforms tend to integrate the most valuable things into themselves,
and so it's always risky.
They do that as a strategy, and every once in a company that has breakout success,
even though it's built on top of a platform.
So I think the oddest example of that is Viva.
Do you know Viva?
They're like a $40 billion vertical SaaS company focused on pharma.
Oh, yes, yeah.
Yeah, was built on top of Salesforce.
It was just a Salesforce app.
And they were just selling this into pharma, and then it started working,
and they eventually swapped out Salesforce in the back end.
But they were literally just some like thin layer on top of Salesforce for years.
And so that's the only example I can really think of
as something that's getting truly massive
on top of a platform that wasn't then subsumed by that platform.
But I'm sure there's other examples too.
Yeah, I think they're like all kind of apples and oranges.
Yeah, Viva, sorry, so out of context.
I'm like, what does I have to do with AI, man?
And like operating systems.
Yeah, I think choosing the vertical thing,
make sense to me there, right? You've got compliance around workflows for selling to doctors
for life sciences that a Salesforce, like, didn't really have the expertise or maybe they just didn't
see how big it was to attack. Well, they verticalized. I mean, Salesforce always had these
verticals, right? Let's see. What's Salesforce market cap? It'd be interesting to see, like,
relative to Viva, how much value. Salesforce is 266 billion. So yeah, Viva is like 20% of Salesforce
or something like that.
Yeah, that was worth investing in.
Yeah, I think people also forget that all of Microsoft Office at some point were third-party
applications.
So in the 80s, there were separate companies like Lotus and others that were providing
what turned into Excel.
There was a PowerPoint company that was very popular.
Before Word, there was documents that people would buy separate applications, and then Microsoft
just ended up subsuming them into its own distribution platform.
And so, again, it's a very standard kind of traditional thing to have happened.
the default case would be that those things
eventually become part of the operating system
or the operating system company,
but again, you never know.
So I always think it's interesting to see what people do
and can they make it cross-platform?
Does that matter?
The one other thing that I think is interesting
about small models
is the question of how smart can small models get
and how much can you actually pack into them?
And to some extent, if you look at an LLM,
there's like three or four pieces of capability
that people care about.
There's sort of the reasoning part of it.
there's a set of capabilities in terms of what it can do from a synthesis or other
perspective. There's a multimodality. The knowledge base or knowledge set is actually
resident in the model. And you can only step so much into a 3B model or a 7B model or whatever
size you end up eventually you end up eventually running on device. And so that's the other
question is what are the capabilities that you can actually have that are device resident versus
which you all of them have to go out to the cloud for. And obviously, devices always expand their
capabilities over time because of the microprocessors that have and everything else.
But fundamentally, there are going to be limitations.
And so the question is, where is that line?
And you could probably define that analytically and just say, okay, today, this is all the
stuff that's going to get cut off if you just do it on device and here's all the things that
you can do.
And as we move up in terms of device capabilities and their ability to provide access to
larger and larger LMs that are resident, what are the capabilities that come with that?
And so I haven't really seen that analysis, but that seems like something that should be
to be straightforward to do in terms of just trying to understand.
understand what can you really do on device versus not. And therefore, what apps or products can
you build that are going to be device resident? Yeah, well, I think there's like an obvious
analogy to draw to just like the weight of like how much computing is done like on client or
in cloud, client server if you're really old. Right. And I don't think it's like obvious that
there is any sort of principled answer here, except it's going to vary by application and how
quickly, as you said, like capabilities actually fit into a whatever hardware people already have
because now there are companies doing AI specific hardware already, right? And, you know,
if you're trying to make something that feels very AI native work, there is a question of like,
okay, well, you know, do we try to make the model work? What hardware do we, what compute do we want
in the device itself? How much pre-processing do we do to, you know, send, you know,
send less data over the network to large model in the cloud.
And so I definitely think there's going to be like a distribution of that computer,
just like the religious wars about like, you know, what gets done, you know,
on client and server in your CDN, whatever.
But at one-circups test is, do you think that's not just your phone and your watch
or something?
Like where do you think the new AI hardware will matter, at least from a consumer perspective?
One of the most interesting theories, and we can actually talk about like the meta-AI launch as well,
is whether or not you want some sort of passive device
that is like continually collecting data about the world
from a vision environment perspective,
which is why you get like rayband better glasses
instead of just like a phone or a watch
that's sitting in your pocket, right?
Sure. I guess you could also just fix small cameras
to other devices as well.
So it just comes down to do you need a new form factor or not,
although I think it's an interesting area,
an interesting direction.
And I guess relatedly, like when do you need that
extra data for that information and under what circumstances we'll use it. I mean, it's super
interesting, right? I think in general, if I look at the early mobile device products that
emerged from it, a lot of them just took advantage of the accelerometer if you were doing fitness
or GPS if you were doing everything from Uber to a variety of other applications. So I do think
those sorts of new capabilities from a primitive's perspective are super interesting. So again,
it just comes down to, okay, what are you going to use?
for now and when, you know, there's also really interesting unexpected applications of vision
to other areas right now. So I really know if it's, you know, the meta thing I think is super
interesting. Have you tried the product? Yeah. Yeah, I was very impressed. Yeah. They enter the race
with like multiple different products, like different modalities up front. It's interesting that
they haven't actually, they haven't pushed it that aggressively into their existing surfaces
yet because they have like unlimited distribution, right?
And it's just meta.com, like, independent product.
But I imagine they're going to, I imagine they're going to phase into it.
Yeah, it seems like you would potentially, and who knows what their plans are, have like a,
you know, a channel or bot on WhatsApp or on, you know, Messenger or on multiple different
services that you can just start interacting with to do things for you.
And that could effectively be a way to encapsulate meta.
AI is like a just another line item or another account that you're interacting with in chat
or another sort of properties they own.
But I thought it was really well done.
I've been making different images with my kids on the different services, you know,
playground and open AI and meta and everything else.
And they really enjoyed the one-click animation.
It's like a feature.
Yeah.
I think it's an impressive product launch.
And then on the open source side,
They've also made a splash where I think one of the things that surprise people is,
there's an argument generally in research about like how much do you want to try to change
like the efficient frontier of model training or like do you just, you know, ride the curve
and scale up.
And if you are meta and you have somewhere between, you know, 22,000 GPU clusters and
350,000 GPUs available, then continuing to train past supposedly optimal points,
like does improve performance, apparently, and doesn't just fully asymptote as soon as maybe
a lot of the research community predicted. And so I think it just is actually a point in favor
of the very large firms who are really willing to invest against this and don't, like,
it begs the question of how important is efficiency.
I think efficiency and creativity and architectural approach is going to end up being really
important for lots of different use cases.
Like over time, applications are going to want efficient inference.
And it's like really large models are impossible today to serve for the vast majority of
use cases from a cost and speed perspective.
But if all you're trying to do is a technical demonstration, like this is very impressive.
Yeah, it did really nice work.
I guess the other thing on the sort of data compute platform side while we're talking about
all these different platforms is Snowflake and Databricks and sort of that layer of companies.
Do you think you need to own the model as a data or compute platform? Or how do you think about
these various OpenSvores models that these folks are starting to launch now?
One thing that's tricky as an investor or as an opera in the space is like it begins to look
like all one blended landscape of competition. So the Snowflake platform hosts a bunch of different
models from other players, including, for example, Mistral.
and they're also training their own and their models are available on, for example,
like Microsoft Azure, right? And so you've got a huge number of players all in
competition somewhere between models and inference. And it's like, I can't tell you how I think
it is going to work. I do think that in the immediate term, like it is less obvious to me that
they necessarily, like these compute platforms necessarily need to own the models, especially
if there's a big landscape of open source out there versus need to demonstrate to customers
that and actually have the expertise of, let's say, like, training models and fine-tuning them
and deploying them and customizing them to different use cases. So I think there's a piece of it
that is actually learning something that they can deliver to their customers and marketing too.
Yeah, it definitely feels like in the long run there's a bit of
of a capital scale question, which is if you're focused on the frontier models at least,
you need more and more scale in terms of compute or other things, which means you need more
and more money to invest behind it. And that's why a lot of the models seem to have sponsors
in the hyperscalers or other sort of large corporations. And so the question is, how long do some
of these other players want to keep up in terms of trying to build things that are bigger and
bigger? Do they just focus on a specific subclass of models that are more kind of medium to small
that have more specialized use cases? And so one could
argue that in the long run, that's where those types of models should go, is the inference
platforms probably provide things that are more in that range, and then the hyperscalers and their
partners, you know, a handful of them will be at the frontier because those are the ones
that have business models that can actually afford the subsidization of these massive frontier
models with the idea that the ROI pays for itself dramatically later as you scale into more
and more intelligence and more applications and things like that. So what's kind of investing far ahead
And the question is how far I had concern
in companies invest.
I'm sure you've been getting this question,
but I get like a stressed out question
from investors of essentially like
how much money should the world spend on this
and like is it rational?
And I think this is an impossible question
because should is like this is a question
of people's business decisions,
like a very small number of individuals actually, right?
And something that feels unique in AI to me
is you have these full,
who, like, they really believe in technology and, you know, strong intelligence, they can reason
about exponential growth. That's something like Silicon Valley people are very proud of, but also
all of these people are very capable of. And they have, like, very high risk appetite. And then,
you know, they can control and marshal a huge amount of resources, $10 billion plus spend,
$20 billion plus spend if you're meta every year on GPUs, 30, I think, 30 to 35 this year,
They may have changed the estimate there against these bets.
And I actually think it's interesting to put in context.
Like, this isn't completely unheard of in terms of investment toward some future goal.
Like, you know, you and I have talked about chip fabs before and how much they cost.
But if you think in aggregate, a handful of players in terms of the hyperscalers are spending almost $200 billion this year on compute for AI.
And then you try to think about the other types of investment in a new ecosystem.
Like oil majors spend $80 billion a year. Broadband providers, you know, last few, like it's
about $100 billion a year of spent in infrastructure, right? Like CAPEX for high speed internet.
This is like going back a decade or two. But if you look at railroad freight, like there are years
where you're spending a huge number on railroad CAPEX. And so I think,
like the different dimensions you can think about spent are like well what is it in overall
overall context and then you know what is what does that get you over time i think both these
questions are pretty pretty hard to answer but it's not unheard of in terms of scale yeah 100
and i again it depends on your belief in terms of where all this goes but i think it's notable that the
hyperscalers are doing it and then the one other potential source of immense scale in the long run
maybe sovereigns as people want to customize models that are specific to their region or customs
or language or culture or whatever maybe. So there are a few areas of research that have been
like debated hotly recently. And one is this idea increasingly of like unlimited context,
which I think is like mostly a term for like actively managing context. But you're an investor
in magic. Magic is one of the key players here. Can you can you talk about this? Oh, sure. I mean,
In Magic, I think, was one of the first companies to launch a really long context window.
I think they launched a 5 million token context window, oh, you know, six or 12 months ago now,
so it's been a while.
And then when Google came out with Gemini 1.5, I think it was a million token context window.
And then it seems lucky that a lot of people end up in the 10 million plus range in the next, say, year or two.
Or, you know, some reasonable time frame I had.
And what's happening is with longer and longer context windows,
the way that you think about what you put into a prompt changes pretty dramatically
because you can start dropping everything from entire code repos to all sorts of documents
if you're dealing with legal on through to, you know, you kind of name it.
You can also drop in all the context of a giant customer support queue, you know,
because, you know, some of these things eventually get big enough that you can do really
significant things with them.
I think one of the most striking examples of long context window being important is actually
some biology models that have come out recently
where just increasing the context window
for things like protein folding
seems to really make a big difference
in terms of your end result
in terms of the fidelity of that folding.
So I think this is going to end up being
one of those really significant things.
And it reminds me a little bit of microprocessors
or bandwidth in the 90s
where each generation of
microprocessors that came out
had a big step up in things that you could do with it
or bandwidth, you know, instead of a dial-up modem
some of you had a fatter connection
and then eventually had broadband, and now people have fiber in some cases.
And so it feels like a similar thing where there will be a really long period,
and long period in this, and AI means like two weeks or whatever,
there'll be a really long, I mean, it's going to be much longer than that.
There's a really long period where bigger and bigger context windows will matter,
and then there will be some shift where, okay, now we've kind of maxed out
what we're really going to be able to do with it.
But it seems like a significant shift in thinking around how important this is
going to be. And again, in areas that I didn't expect necessarily like biology. I guess the other
thing that people have been looking at, you know, we talked a little bit about context window.
We talked a little about compute platforms. We talked a little about the capabilities with small
models and where they constrained. The other potential future constraint is energy. So I'm just
curious how you think about, you know, if you have a 500 megawatt or gigawatt data centers, like
does the way we think about energy shifts? Like there have been things like job ads posted on the
Microsoft website for like nuclear engineering and things like that. And so what do you think
happens from an energy constraint perspective? Because you kind of look at it. The first constraint
was like chips and then it was packaging for the chips and then there's probably going to be some
data center constraint where everybody miscalculates how many data centers we actually need and
then energy is sort of related to that. So I'm just curious how you think about future energetic
needs. I think like one supposed coming limit to scale is going to be energy and that nobody's
built like a 500 megawatt gigawatt data center yet. And if you think of it as the equivalent
of like a nuclear power plant's worth of energy going toward a single data center, it is
quite large. And I mean, the basic understanding here is today to train these large models.
You need all of the GPUs co-located because there is enough data transfer between different
chips, right, between your nodes. And there's a physical constraint on that in that you need to get
that much power to a data center.
And I think it's kind of, it's kind of interesting because Sam made the point recently
that when you have constraints that require permitting and physical world changes,
like you'll see a slow down in terms of how quickly you can go deploy this, right?
It's no longer a software engineering problem only.
And I think the fear, I think we're likely to work through a lot of these things.
But the fear is that some of the potential limits to progress are going to be, like, we can't just throw more compute at the problem because it's like physically hard to throw more compute at the problem, energy data centers, right?
And then also, you know, the concept of the data wall, right, like the number of cheap available tokens on the internet we have used.
And now we have to go figure out how to go get more, collect more.
in the world or more likely like, you know, or in combination with generating synthetic data.
That still feels like a bit's not atoms problem and you can solve pretty fast. But you can even see it
in the designees for the new DHS safety board that the AI safety board that just got announced,
right? Some of the players are very obvious. So the, you know, Sam and Satyos of the world, but it also
includes people who work on energy to this point and infrastructure security.
Yeah, it makes a lot of sense. I mean, I actually found that the news around Microsoft
investing $1.5 billion in Abu Dhabi's G42 was super interesting from the perspective of
you're effectively setting up a big AI data center. A, you're starting to sort of democratize
or broaden global access to it. But B, you're doing it. You're an energy center. And so, you know,
I think that's really interesting.
I think, you know, in general, if you view the last 50 or 70 or 100 years
as two sort of competing philosophies around progress and anti-progress or abundance and scarcity,
you know, one of the biggest wins on the scarcity side was really shutting down nuclear power
in the 70s, at least in the U.S.
or the ability to add new nuclear power since, you know, 17 or 18 percent of U.S. power generation
today is still nuclear.
it's like 70% in France, it's 30-ish percent in Japan.
So it's still quite significant in some places,
but it could have been 70% in the U.S. and a bunch of other countries.
And if you would have thought of that large, abundant cheap resource
and now how that ties into things like compute and other areas,
it's a real game changer, right, in terms of capabilities.
And so there are solutions.
The question is, are we going to adopt those solutions at any point?
But these are actually very solvable problems if we choose to solve them.
Which is why I thought that job posting on the Microsoft website was so interesting.
Yeah, well, and I'd love to see that reversed, even beyond just like general energy needs.
If you do think of AI as a strategic issue and a national security issue, not using every energy resource we have and every energy technology we have, like you and I have talked about, you know, generations of nuclear technology that have existed, not deplore.
in the United States for really like policy reasons.
It is yet another dependency that we're creating for ourselves on other countries without
needing to, right?
It makes the sort of energy dependency even worse to have AI rely on it.
Yeah, all of geopolitics would be dramatically different if we just had a lot of widespread
and clear power.
And it's interesting to think of that.
And you think of some of the biggest producers being Russia and Venezuela.
obviously the Gulf, but, you know, Iran, et cetera.
And you think of geopolitical policy,
it's kind of interesting to think about how the world would be different
if we weren't dependent or as dependent on some of these forces.
So anything else we should talk about?
I think we're good.
Do you want to see my hats again?
Yeah, let's see the hats.
That's the other piece of infrastructure
where people spend a lot of money in the past
and just sort of hat-making, you know?
Yeah.
Millions of dollars a year under the infrastructure.
The trade is good for anybody listening.
We will offer you one brand,
branded bottle of tequila or mock keelah or whatever you'd call it and a uh a cool no priors hat
for h 100s let us know okay that's all we got find us on twitter at no priors pod subscribe to
our youtube channel if you want to see our faces follow the show on apple podcasts spotify or
wherever you listen that way you get a new episode every week and sign up for emails or find
transcripts for every episode at no dash priors dot com
Thank you.