Latent Space: The AI Engineer Podcast - The Four Wars of the AI Stack (Dec 2023 Audio Recap)
Episode Date: January 25, 2024Note for Latent Space Community members: we have now soft-launched meetups in Singapore, as well as two new virtual paper club/meetups for AI in Action and LLM Paper Club. We’re also running Latent ...Space: Final Frontiers, our second annual demo day hackathon from last year.Edit from March 2024: We did a followup on the Four Wars on the AI Breakdown.For the first time, we are doing an audio version of monthly AI Engineering recap that we publish on Latent Space! This month it’s “The Four Wars of the AI Stack”; you can find the full recap with all the show notes here: https://latent.space/p/dec-2023* [00:00:00] Intro* [00:01:42] The Four Wars of the AI stack: Data quality, GPU rich vs poor, Multimodality, and Rag/Ops war* [00:03:17] Selection process for the four wars and notable mentions* [00:06:58] The end of low background tokens and the impact on data engineering* [00:08:36] The Quality Data Wars (UGC, licensing, synthetic data, and more)* [00:14:51] Synthetic Data* [00:17:49] The GPU Rich/Poors War* [00:18:21] Anyscale benchmark drama* [00:22:00] The math behind Mixtral inference costs* [00:28:48] Transformer alternatives and why they matter* [00:34:40] The Multimodality Wars* [00:38:10] Multiverse vs Metaverse* [00:45:00] The RAG/Ops Wars* [00:50:00] Will frameworks expand up, or will cloud providers expand down?* [00:54:32] Syntax to Semantics* [00:56:41] Outer Loop vs Inner Loop* [00:59:54] Highlight of the month This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe
Transcript
Discussion (0)
Hey everyone, welcome to the Lid in Space podcast.
This is Alessio, partner, and CTO and resident sedativele partners.
And today I'm joined just by my co-host, SWIX, for a new podcast format.
Yeah, it's a bit uncomfortable because we have to just stare into each other's eyes lovingly.
But in our end-of-year survey last year, a lot of listeners were asking us for more one-on-one time,
more opinions from the both of us as hosts on what's going on in AI.
You know, both of us are very actively involved.
and I don't think this year will be any difference.
This year, there's lots more excitement to come.
And we're trying to grow late in space
in terms of the types of formats
and the amount of value that we deliver to our subscribers.
So one thing that we've been trying,
experimenting with is this monthly recap
that I started doing around August of last year,
where I basically just take the notable news items of the month
and then I sort them and categorize them
according to some order that makes sense
and write them down in the newsletter.
And this last December recap was particularly exciting because it seemed like it popped off in a number of areas, particularly with the AI breakdown.
Our friend NLW featured it on his podcast.
And I figured we can just kind of go over that as a way of setting the stage for 2024, but also recapping what happens in 2023.
Yeah.
And people always ask me if December is like a slow month.
But I think you almost broke sub-sac with how many links we had in the thing.
No, we actually did.
So a lot of people commented to me about the formatting issues within the newsletter that I sent out.
And I know that they are there, but I couldn't fix it because substack was broken by us with how long it was.
Oh, Ben.
But we had this kind of like four main buckets called the four words of the AI stack, data quality and I guess like data quantity as well in a way.
The GPU rich versus pores, which we have a whole episode about with Dylan Patel, multimodality.
We're actually recording tomorrow with Luma Labs about their new 3D models.
So we went from text to image to 3D video.
I wonder what's next.
And we're going to release Hugging Face as well.
I guess I'm thinking about calling it multi-modality 101
because the first modality beyond text that you should really pay attention to is vision.
Right.
Yeah.
And then the RAG ops were.
I think that's a...
I don't know what to call it.
I don't know if you would have called it anything else.
This is my...
The tooling were, I don't know.
But I think beginning of last year, that was like kind of the hottest space
because there wasn't much open source model work.
And I think over the last maybe like four or five months,
everybody's so focused on fine-tuning Lama 2
and like a DPO to improve these models,
Maxtral and all these things.
And people forgot about our friends at Langechain, Lama Index,
and some of the things that were maybe top of mind.
Vector DBs, you know,
it seemed like everybody was releasing a VectorDB early in the year.
Yeah, I think that I'll be very surprised
if any new Vector DBs come out this year,
with one exception, which is something I'm keeping an eye on,
which is turbopuffer.
I don't know if you've seen them going around.
Yeah, all the smart people seem to be adopting turbopuffer
as the first serverless vector DB.
Yeah, no, and we're going to have definitely Jeff and Anton
on the podcast at some point.
I know they're going to be fun.
I should also mention the reason I selected these four wars
was a process of elimination of wars that I think ended up not mattering.
So for those who don't know, inside of my writing,
I often include footnotes that are in themselves.
just essays for notes.
And so I think it's also notable the things that people thought were hot that were less
hot than expected.
So it was agents, definitely less hot than at the start of 2023.
And then this one is very controversial, non-selection by me.
I think open source AI is not a battle in the sense that I don't think there's anyone against
open source AI.
Everyone is like on one side.
There's no like opposing side apart from regulators.
But in my mind when I think about like for engineers, engineers are all used.
universally in favor of open source models.
So there's no battle here.
Everyone just wants it to improve.
So it's not like interesting to write about.
We just want more open source.
Yeah.
The only battle is people offering inference on it.
Yes.
Killing each other in the process.
Yeah, so I classified that as a GPU rich versus poor war.
But maybe there's a better way to classify that and you can give me some feedback on that.
It's a struggle to try to categorize the world.
Code models as well.
I was very struck by a conversation I had with Poole Side.
I so can't from Pulside.
So they haven't been on the podcast yet.
They're kind of stealth still,
but they had a very, very notable fundraise.
I think they had like $50 million raised.
I think KivaMaria for a seed,
spending most of it on GPUs.
And my conversation with ISO,
he was like, hey, you know, like,
Replit was like one of our podcasts,
early biggest winners.
Replit didn't really follow up with,
like they announced like their 1.5 model,
but it's not really why they use beyond Replit,
you know.
There's StarCoder, there is Kodama,
but like it's not really,
for how important,
than code is, it doesn't seem like as big of a battlefront as just general function calling, reasoning,
you know, these other kinds of domains. And so I thought it was just interesting to note that even
though we as a podcast tried to pay particular attention to developer tooling, to code models,
we interviewed cursor, fine, replica, codium, and hugging face, these all seem like very small
compared to the amount of money being thrown, the amount of heat in the other domains. And I don't know why that is.
Yeah, I think it's maybe the fragmentation of the tooling, you know, like most people in code are using VScode cursor, GitHub, one of the trees.
So there's maybe not as much experimentation versus with text people are just trying everything.
It's hard to try a code model, you know?
I see code models being released, but like it's not super easy to just plug it into your boardflow.
So I think engineers like myself are just lazy and like, hey, I'm having great success with whatever I'm using.
Yeah, yeah.
I don't really want to want to go there.
The special case form of code is SQL and the semantic layer data engineering type things.
We also had to guess on there from Seek and Cube.
And we also talked to a bit of data breaks, a bit of Julius.
Yeah.
And we have Brian from X.
And Brian and Brian from X.
Does he count?
I don't know.
Yeah.
Yeah, yeah.
I guess the Hex notebooks, yes.
Hex magic, yes.
Rex is a different beast.
Anyway, but yeah, I think people who like come to AI engineering for the AI
might actually end up finding themselves in data engineering in the end.
In traditional ML engineering in the end,
they might have to discover that they're doing Rexis
and all the stuff that is get swapped under a rug in the demo
becomes their job.
And I'll probably say, like,
just because we didn't select a theme for last year,
doesn't mean it wasn't important.
It just wasn't top of mind yet.
And maybe I think that would be an emerging theme this year.
Yeah.
I think that's kind of the consequence of the low background tokens,
like the end of the low background tokens.
once.
Can you explain
what you think
low-beck and
this is our
November recap.
Yeah, the
comparison that
our friend Jeff
Uber at Kroma
brought up
is steel
before the atomic
bomb creation.
So steel
before and no
radiation in it.
After all the
testing, a lot of
steel
had radiation
embedded in it.
So it was
really precious
to get low
background
steel, meaning
with no radiation.
And same with
tokens.
You can assume
that any
internet
content from
three years ago
it's just
internet.
It's like
people writing is not models writing instead now anything we're going to get on common crawl updates
and things like that you never know if it's human britain or not and i think that will put more work
on data engineering right because even basic stuff like checking if a tech says as a model created by
open ai you know it's going to be important so people are just being blindly taking all the data set
suffered by eluther and common crawl and all these different things assuming that all the data in it
is good i think now how do you build on top of it and we've seen
seen the New York Times lawsuit against Slovenia. We've seen data partnerships starting to rise in
different companies. I think that's going to be one of the bigger challenges. And maybe we'll see
more of the work that Databricks has done to build the Dolly 5K instruction tuning. Just first-party
creation of data. It's like you got people sitting at their desk every day. If everybody wrote
five, you know, Q&A pairs or things like that, you would have a massive, unique data set for your
model. Yeah. For people who missed that episode, that was one of our early episodes as well.
and Mike Conover since left
to start Bright Wave
which I'm sure
will have him back this year
at some point
they're doing a lot of interesting stuff
I think the next episode
will be very cool
So how do you want to tackle this
Do you want to just kind of go through
the four wars?
Yeah, let's do it
You created this like Wikipedia-like
infographic for each of them
So yeah I should say
The inspiration for this actually was
during the Sam Altman
Leadership battle
People were making mock Wikipedia entries
For the debate
And for like who's on the side
of the, you know, decels and who was inside of the Eax. So I like that format because it's very concise.
It has to list the key players and it's kind of fun to think about like who's on what side
and think about what is important and what people are battling over. And I think it is important
to focus on key battlegrounds as a concept because there's so many interesting things you
could be talking about in AI and they're not all equally interesting. So how do you decide
what is interesting? I think it's money, it's power, it's, it's,
people, it's, you know, like impact, that kind of stuff. And so, yeah, that's what I ended up doing.
And fun fact, the way I did this was I actually edited the HTML on Wikipedia and then I just
screenshoted it just to get the formatting. Good old developer tools. Developer tools is all you need.
So the data were belligerents. Yeah. On one side you have journalists, writers, artists. On the other
side, you have researchers, startups, synthetic data researchers. I guess like maybe we want to talk about
What are the axis of war?
So, like, one of them is attribution, right?
Like, I think there's a varying spectrum of how comfortable people are about this data going into a model.
So some people are happy to have your model trained on it, no matter what.
Some people are happy to have your model trained on as long as you disclose that it's in the model.
Some people just hate that you trained on their model.
And some people, like the New York Times, wants you to destroy any artifact that might have touched your article.
So that's kind of what we're fighting on.
I just want to make clear that it's not just like you should never use the data,
you should always use the data.
I think people are just trying to figure out what's the right form of attribution
and how do I get paid as somebody whose data ended up being in this training.
I think we're giving everybody a lot of great tokens with latest space
because we do full transcripts on everything.
We're happy for people to train models.
Oh, yeah, please train a space model.
Yeah, we would love it.
So that's kind of what we're fighting on.
anything that people should keep in mind about this war and maybe some of the campaigns that are going on?
I think the New York Times one is probably going to go to Supreme Court.
It is very, very critical.
It is landmark war that will probably decide what fair use means in context of AI.
And I recommend, I think the verge did a good analysis of this.
Platformer maybe did a good analysis of this.
There are like four criteria for what fair use is and everyone basically converges onto the last criteria,
which is, does your transformative use of my copyrighted material diminish the market from my content?
It's very hard to say.
I would suspect that yes, in some capacity, in some amount, but good luck proving that in a court of law.
And I think a negative ruling on open AI would seriously stall the progress of AI.
And that's bad for humanity, but good for content creators and writers.
So obviously, we want them to be adequately compensated and recognized for their work.
There's like no easy outcome here apart from the existing copyright system, which is also somewhat broken.
And it's just a very, very tricky, challenging case, I think.
It's funny because we had something, I was a community moderator at a website called Rap Genius, which was a lyric sanitation.
And there was like a similar thing, and maybe like 2014, or like the music labels basically came to the website and it's like, hey, this is not fair use.
You know, like you can not reuse the lyrics to the song.
and eventually the website made deals with the record labels
to be able to do this.
And then Google was stealing the transcripts
to put in like the enhanced thing.
And they proved it by.
Yeah, yeah.
We did all the like busy like the things on the eye.
Some eyes we put the dots.
Diacritics.
Like the accident and that's how it made all better.
I thought it was, I thought they just vary the spacing
or they like use the different kind of spacing in the Unicode.
I think it was the eye thing.
But maybe I mean this is like almost 10 years ago.
So Rapgenius has proved it.
by injecting some data poison into their corpus,
and then Google reproduced it faithfully,
and so therefore they proved that Google was scraping Rap Genius.
Did Google have to pay Rap Genius money in the end?
I don't think so.
There was also another issue with a genius that we had
that got blacklisted by Google for like...
Of course.
There was a lot going on.
But anyway, this is not a Rep Genius special.
Yeah, I mean, ultimately, like, I think that we do any quality data.
I think that if this case is contained to the New York Times,
the New York Times' worst outcome
is that they'll substitute it with Washington Post
and they substitute with the economists
or like the second or third ranked newspaper
that is the most friendly to AI
and then the New York Times will realize
that actually their words are not as
not that much more valuable than other words
then the value of the content comes down
very, very dramatically.
I think it will be interesting
but yeah, I do think it's overstepping their bounds
to call for the destruction of LGBT's.
That's probably for sure.
Then the bigger problem I have
is with Stack Overflow and Reddit, which I named as the site of the New York Times.
They have effectively shut down their APIs in order to try to train their own models.
Probably same as Twitter, actually.
I should probably have put Twitter.
I put Twitter on the wrong side, maybe.
I don't know.
Twitter is on both sides.
Elon is on every side of chaos.
Yeah.
What this is basically every UGC users generated content company of the 2020s,
now has a giant pile of user content that becomes valuable data
that used to be open for researchers to scrape and train models.
Now all of them are locking their walls, right,
behind their wall gardens and then trying to train their own models to boost their benefits.
So this is a locally optimal outcome for them,
but a globally suboptimal outcome for humanity.
Because why should we care about the closed garden of Reddit,
the Reddit model, the Stack Overflow model, the X model,
as opposed to it being a part of a data mix of 20% Reddit,
20% Stack Overflow, 20% X.
that seems like a much better outcome for the world,
but everyone is acting in their very narrow self-interest
in trying to make their own model,
which is probably going to suck.
Right.
So next war, after you get data...
We should mention synthetic data.
Oh, yeah.
Yeah, yeah.
So what happens when you run out of human data?
You make your own...
Right.
So I would say, like, when I went to New York,
that was the number one discussion out of every single researcher's mouth.
There is a lot of research coming from both, I guess,
the big labs as well as the academic.
labs on what good synthetic data looks like. I don't know if you've like talked to any startups
around that. I just talked to Lewis Costicado today. And he is promising a very, very interesting
approach to synthetic data generation. I think his phrase for it is like pre-trained scale synthetic
data as opposed to what the news research and the other open source communities have been doing,
which is fine-tuned scale synthetic data. And so he wants to create like trillion token data sets
that are all synthetic. And I'm like, okay, that's interesting. But also at the same time,
these are all just downloads from GPD4 or something else.
Lewis is very aware of that and he has a way around it.
I don't really understand it, but he claims that that's a good way around it.
Andre Carpathie at Neurips highlighted this paper from DeepMind
where they were bootstrapping synthetic data that could be verifiably proven correct,
so specifically in math and in code where there is a correct answer.
So yeah, that makes sense.
You can solve the synthetic data problem that way.
but like what about you know beyond that there's just no answer and wasn't part of the issue also
that the way the phrases are constructed and like all of that and synthetic data and stuff kind of
like making mold collapse even worse because yeah one thing is like right or wrong right the other
thing is like every sample is read in the same way you know or like as a similar since it comes
from a certain model kind of as a similar root yeah so the yeah so i mentioned this in the best
papers discussion with John Frankel. So the basic argument is you already have a
fraud distribution from a language model. You are resampling that flaw distribution to double down
on that flaw distribution. There's no extra information from humans. So on principle, how can
this work? And so the only conclusion there is you don't need it to emulate a human. You need it
to emulate a useful assistant, however you define it. So I think the goal of synthetic data is
less to emulate human speech, because that is basically solved. It is now more to
spike the distribution in useful ways.
And that's a phrase I borrow from Kanjun from Inbu.
But anyway, so I think that synthetic data will be a giant theme for this year.
And not least because the human data is being locked up behind walls.
So it's a very, very clear trend.
This is probably the most amount of money after GPUs will be spent here.
So one war I did not put here was the talent war, right?
Like the war for PhDs and smart people.
And but when you break down what the talent people do,
One is they make models and they run inference on GPUs
or they run training runs on GPUs
but the other is they clean data
they find data, clean data and format data.
And so yeah, these are all just proxies
for the kind of talent that is flowing back and forth.
And ultimately, I think you have to focus on
what they're working on, the visible output
of what they're working on, which is data.
Awesome.
All right, let's talk about the GPU inference war.
I think this is one that has been heating up.
And we actually have a bunch of these folks
coming on the podcast in the next few days.
Yeah, yeah.
Are we calling it Compute Month?
Yeah, we can figure out a name, but we have modal, together, replicate.
There's a lot coming up.
But basically, the Mixerall released, the MOU model was kind of the spark of the war.
I think the price went down like 90% in one week.
Yeah, I wrote 2-2 times.
But, yeah, one divided by 2-2-2 is whatever the...
Yeah, yeah.
Yeah, and then there was like the benchmark drama between together and any scale
on like whether or not which one was faster and like whether or not there,
the benchmark was really reflective of performance.
Yeah.
This is very surprisingly ugly in a way that I think usually people try to respect
each other as work and play nice and say nice things when people release stuff.
Even if it's a competitor, you say nice things or you don't say anything at all.
Any scale, for some reason, they release a benchmark that on which, of course, any scale looks
the best.
Why would you release a benchmark where you don't look the best?
But then basically everyone featured in that benchmark didn't like it, of course.
I do think there's some methodological things.
So for anyone doing benchmarks,
you have to understand that there's a real,
real, real difference between like a public benchmark
that is meant for just limited testing
compared to, okay, if you're load testing us
or if you're seeing what a real enterprise customer would see,
you have to give them a heads up,
you have to get a different API key, a different endpoint,
and you test the real infrastructure, not the demo one.
This is very common for infrastructure,
and I think any scale just neglected that,
and it hurt their credibility.
any scale is not new at this game
like they should have done that. But what was interesting
was this benchmark drama reached even
beyond any scale. And we're going to have
Smith on and he's going to talk about like
why he weighed in because Sumith doesn't represent
any inference before it. He just works at meta.
But he felt like this was
a very interesting debate.
And I think we'll see more of this. You have been a
data investor for a while. Like database
companies always do this. And I think
now we're just seeing this kind of fight
come into the inference space.
Yeah. I think the hardest thing
thing is the end customer cannot replicate it.
So, like, if you give me like a Postgres benchmark,
I can run Postgres on my MacBook, you know, and run similar ones.
I think with models, it's just impossible.
So people tell you, this is the benchmark, and you're like, okay, I have to go sign up
to every single cloud now to try it.
It's just not easy.
And we talked about this in benchmarks 101, which is same with model benchmarks, right?
Just like, oh, this model is so much better than this.
And then it's like, did you train on the questions?
And it's like, what?
Oh, I don't know.
So, and again, it's hard for people to just, like, run the models and test them, you know?
So, like, there's a lot more weight, I think, in AI on benchmarks that there is in traditional software.
Because nobody buys upstash over Redis cloud or whatever just based on a benchmark.
They try them and check performance and whatnot because they have real production skill workloads.
Here, it's like nobody's really doing anything with these models.
So it's like, whatever any skill says, I guess is good.
but then customers are going to go try it and just decide for them what the right thing is.
Yeah.
And I think it's important to understand it is not just about cost.
I think what the price war represented was a raise to the bottom on cost.
And you're like, okay, deep infra, which is a company, the name of the company is deep infra.
Deep Infra has promised to just always be the lowest cost provider.
Like, okay, fine.
That's a good value proposition.
But you're not only optimizing for that in a production application, right?
You're optimizing for latency.
That's one thing.
you're optimizing for uptime, that's something that you can only earn over time.
You're optimizing for throughput and other forms of reliability.
It starts to tail off beyond that.
But there's three or four dimensions that really, really matter.
If you're not table stakes on any of those things, you're out.
You're just out.
Actually, there was a really good website that was released just this week called Artificial Analysis.
Do you see it?
Yeah.
So this is what the industry needs, which is an independent third-party benchmark,
pinging the production API endpoints of all the providers.
and giving a third-party analysis of what this is.
I actually built a prototype of this last year.
Yeah, I was going to say.
But I didn't like maintaining it.
I'm glad someone else is doing it just because I don't want to keep up with all these things.
But still, I think it's a public service that somebody should do.
And so I'm glad that they did it.
I think they did it very well.
So, yeah, I think that is where the, I guess, the inference drama is ending for now.
I don't think, you know, I haven't seen any continuing debate there.
The only other thing that, you know, I did some extra work on this for the recap, which is like, are they losing money?
You know, are they pricing correctly their tokens from mixed trawl?
And I actually managed to go into Dylan Patel's write-up of the mixed trial price war.
And I think I reasonably worked out that you can serve mixed trial and the lowest you can possibly charge if you like take the most aggressive amortization of all your capex and all that is 50 to 75 cents per million tokens, which is what perplex is.
prices. They're mixed for all that. And perplexity is a very smart player. They're not even an
inference infra provider. They're just like doing this for fun. But they're like, we don't want to
lose money on this. We will provide it at cost. This is what cost is to us. So that means perplexity
provides it at 56 cents per million output tokens. That means any scale, which is 50 cents,
Octo AI, 50 cents, abacus AI, 30 cents, and deep infrared 27 cents. They're all losing money.
Because we think that the break event is 51 cents.
And even that is like a full batch size and kind of max utilization.
I assume 50% utilization.
So like very, like you talk to practitioners, very, very good is 60%.
Average is like 30, 40.
So I just, I say 50, right?
You assume 50% batch like 16, 100 tokens per second generation.
That's also very, very high.
These are all very favorable numbers.
Like probably the real number is closer to 75 cents per million than 50 cents per million.
Anyway, anyone charging under 50, definitely.
losing money. So then it's like, okay, either you don't know what you're doing, which in which case,
good luck, or you know what you're doing and you're purposely losing money for something.
And what is that? And I don't know, but I think it's an interesting, aggressive strategy to pursue
if you are doing it on purpose. So this is something that like the classical like Walmart would
have a lost leader. Like they really, really on purpose lose money on things so that they get you in
the door to try things out. Like I don't know if that makes sense to you as a DC. Yeah. Yeah. It's like the, well,
It's like all the, you know, the candies are placed at the cash register because maybe you just
went to get the thing on discount and then you buy a Kit Kat, whatever, and they make money on
the Kit Kat.
They all have the Pokemon trading cards at checkout now.
So if you bring your kids or buy the discounted, whatever for you, then you end up spending
more.
But to me, the thing is like, where's the check out register where you upsell people with these
things, right?
Yeah, I don't know how you.
It's like, that's really the big thing.
I don't know.
I'm curious to see.
I don't think a cloughflare still.
has a life. I wonder what they're going to charge
for our own workers.
They cannot serve mixed trail. Their GPs are too
underpowered. Cloudflare AI
is like very good marketing
for very, very underpowered
inference, right?
Yeah, well, I don't know.
I think it all depends on like
what is going to be needed, right?
So that missed trial 7b right now
I check. But they cannot serve
mixed trial. Yeah, yeah, yeah. I wonder,
but I think they don't want to get into
this race right now probably. No.
Yeah.
So, yeah, I'm curious, going back to the loss leading, it's like, is there going to be a better model that comes next that they hope that you already integrated their thing with?
You know, if you're using together to serve mixed raw and then something else comes in that you're going to replace mixed raw with, hopefully you're still going to use together and they're going to get better unit economics on it.
I don't know.
Yeah.
It's a good question.
It's a good question.
Thank you, VCs for, you know, paying for all of our imprints.
No, no, no, I think these are, you know, everyone here are grown adults, they're smart investors.
I'm sure there's some kind of long-term strategy here.
And I'm trying to figure that out.
Like, assume that people are smart.
And then what will smart people do?
Yeah, I think it's the same with Uber, right?
It's like, how could have been so cheaper at the start?
Yeah, yeah.
You look back at all, DoorDash, all these things.
It's like.
And like, last year was a great year for Uber.
Yeah, no, exactly.
All my Uber friends are like suddenly very, very rich again.
One thing I will mention on the engineering
sort of technical detail side is the rise of mixture of experts
is something that we covered in our podcast with George
and now with mixed draw.
And it represents the first successful,
really, really commercially successful sparse model.
And sparse in a very interesting way,
in a sense that the divergence between the amount of compute you need at training
versus the amount of compute you need for inference,
continuous to diverge,
but also in a weird way where you need to keep all the weights of the M-O-E model loaded,
even though you're not necessarily using them in all times.
So basically what I think that is,
is like I think that that is going to impose different needs on hardware,
different needs on workload, different needs on, like, batching optimization.
Like fireworks recently announced fire attention where they wrote a custom
creditor kernel for Mixdral on H-100.
It's like super, super domain specific.
And they announced that they could, for example, quantize from,
like 16 bit down to 8 bit with like no loss in performance.
Like all this magical details emerge when you take advantage of like very, very custom optimizations like that.
I think like the rise in MOUs this year is going to be going to have very meaningful impacts on the inference market.
And how it's going to shape how we think in price for inference.
It may not be that we have this sort of input token versus output token paradigm for for long,
particularly because we have things like different forms of batching, different forms of caching.
and like I don't really know what that looks like
but I'm very curious.
I see a lot of opportunity here
if I was an inference provider player.
Like that's something I would be trying to offer to people
as a way to differentiate because otherwise you're just an API.
Yeah.
Yeah, no, it was in a way counterintuitive
because most of the struggles with inference as well
are just like memory bandwidth, you know?
So we have now models the scale worse at higher batch, you know?
But I'm glad I'm not in that business.
I can tell you that.
As far, there's so much work to be done at, like, so many low levels of the stack.
You're already trying to provide value to the customer on, like, the developer experience
and all of that, but you also have to get so close to the bare metal to, like, make this model.
Like, writing a kernel, imagine if you had to write, you're like a CPU cloud provider
and you have to, like, write instruction sets.
It's like just nobody would get in that business, you know?
So I salute all of our friends at compute providers doing this work.
I mean, together is doing so much for like three down and like fresh attention to and one.
And so.
Yeah.
So, and that's something that I would leave as the last part of this sort of war of GPU rich versus poor.
The GPU rich people are the model trainers and the infra providers.
They say like we have the GPUs, comm views are GPUs, you know, and then we provide you the best inference, right?
And that's what we've been discussing so far.
On the other side, on the GPU poor side are like all the alternative methods, right?
the modulars, the tiny corpse, the QLora, and all the other type of stuff.
I even put consistency models in there because, you know, any efficiency or distillation method
where you reduce your inference or GPU usage by like 25 to 40 times, it's a GPU-friendly
approach.
So I will also put Apple and MLX in there, and that's also like Apple is finally making moves
in inference, and that will be a game changer for local models because then you just don't
need any cloud inference at all.
You just run it on device.
which is fantastic.
And then obviously,
RWKV and Mamba and Striped Tainah from together.
Like all those emerging models,
I don't know.
There's something I've been worried about for a latent space.
How much attention should we give to the emerging architectures?
Because there's a very good chance that,
one, these things don't work out.
Two, they take a very long time to work out.
And then three,
once they work out,
they're like for limited domains and like not super usable.
So I don't know if you have opinions on that.
I can follow up with one conclusion
that I've had, but I want to throw that question open to you.
So the one conclusion is RWKV and the state-space models, including Mamba,
have historically just been pitched as super long context models.
And I'm like, that's not something I need because I'm okay with 100K context.
I'm okay with rag and recursive summarization, all those techniques to extend your context,
like rope and yarn and all these things.
So like, why do I need million context models?
why do I need 10 million, 100 million, 1 billion models?
Like, well, why?
So the easiest argument is, oh, you can consume very, very high bit rate things like video and DNA strands.
And then you can do like, SynBio and all that's good stuff.
And I'm like, okay, I don't know anything about that.
Like, what happens if, like, you hallucinate one wrong chain in your, you know, the DNA strand
that you're trying to synthesize?
Good luck.
I don't think.
I don't know.
That's why I've been historically underweighting intentionally.
our coverage of state-space models and the non-transformer alternatives.
Until Mamba, Mamba really changed things where basically for the same amount of compute,
you can get a lot more mileage or a lot more performance for the same size of model.
Now it's an efficiency story.
Now it's a GPU poor story.
It is no longer a long context story.
It is just straight up we are strictly more efficient than transformers.
I'm like, oh, okay, I can get that.
Does that change anything?
I don't know.
No, that makes sense.
I think people look at the slope, right, which is like, oh, you're going to get the context.
higher and higher. But in reality, it's like, if you kept the context smaller, instead look
at the anti-slope, so to speak. It's like, same context is like a lot less compute. Yeah. So that was
not clear to me until Mamba. And so I think that's interesting. There's a concept of being
trying to call the sour lesson. You know, the bitter lesson is stop trying to do domain-specific
adjustments, just scale things up. And it's going to work. That's general intelligence. General
intelligence, dislikes any attempt to imbue inside of it special intelligence. Like, if you have
like any switch case or if statements or like if finance do this, if something do that, don't bother.
Just scale things up and it's going to do all of them simultaneously all better at once. That's the bitter
lesson. The sour lesson is a parallel as a corollary, which is stop trying to model artificial
intelligence like human intelligence, right? The neuron was inspired by the brain but doesn't work exactly
like the brain. Machine learning uses back propagation. The brain does not use back propagation.
We keep trying to create alternatives to transformers that look like RNNs because we think that
humans act like RNNs. We have a hidden state and then we process new data and we update that state.
But maybe artificial intelligence or machine intelligence doesn't work like that. Maybe we just
fail every time we try. So that's the sour lesson. Every time we try to model things.
And my favorite analogy, I actually got this from, I think, an old quote from Sam Altman, who was like, you know, like we made the plane, the airplane. It was inspired by birds, but it doesn't work anything like birds, right? It just, and it works very efficiently. Like, it's probably the safest mode of transportation that we have, and it just works nothing like a bird. So why should artificial intelligence work like human intelligence? And that is the philosophical debate underlying my continued cautiousness around space models. I feel very vulnerable.
saying this because I don't think there's any justification once you look at the empirical results
or like the mathematical justifications for these things. But there is some grounding in philosophy
that you should have when you think about, does an idea make sense? Is it worth exploring?
Yeah. I think now there's a lot of work being put into it, right? And I think Transformers have
shown enough success that people are interested in finding the next thing, you know? So before it wasn't
clear of transformers we're really going to work.
So people are kind of working on them.
But yeah. Okay, maybe in the
2025 recap, we're
going to have. Yeah, I mean, we're trying to do one
before that. So we actually have a link.
I don't know if you know this. Shreya Rajpal from
Garghael from Garberl's. She's married to Karan, from
Hasey. Yeah. And so now he's started
one of the other stateswage model companies. I forget
the name of it. So we'll see. I'm sure, like,
this will be an emerging topic this year as
as well. So we don't have to wait until next year.
Yeah, yeah. No, I think we're going to have maybe
the sour lesson.
you know, overview.
I mentioned this in the Luther Discord,
and then they were like, okay,
so what is the spicy lesson,
and what is the,
the sweet lesson?
The salty lesson.
What is the sweet list?
Yeah, yeah.
I want the sweet lesson.
Sounds better.
Cool.
Talking about GPU poor,
let's do multimodality war.
I feel that stable diffusion was like the first GPU poor model.
Yes, yes, absolutely.
I should, I don't know if I mentioned that.
I just didn't mention it.
Stability, I think, in 2023,
you know, they shipped incremental things.
I think, I don't know a stable diffusion.
2 was out there. But everyone's talking about XTXL Turbo, which is a form, which is an alternative
to consistency model, but it looks like a consistency model. They ship video diffusion. They should do a
whole bunch of stuff, but just wasn't as big as 2022 when they made a huge impact with stable
diffusion. Yeah, I mean, it's hard to, it's hard to help to stable diffusion. But, yeah,
midjourn has been doing great, obviously. I actually finally signed up for a paid account last month.
Midjorney, yeah, yeah. I'm part of the $200 million a year that they're getting.
What's confirmed inside, I think, like a Businessweek article or Economist or Information article, that this team has now reached at least 200 million ARR, completely bootstrapped.
I think their employee account is somewhere between like 15 and 30 people.
I don't know if you know exact numbers.
I have heard rumors that their revenue is actually higher than that.
That was what was reported.
But it's between the 200 million to 300 million range, which is crazy.
Yeah, yeah.
Especially if it's like primarily B2C, which it looks like it is.
Yeah, yeah.
It's like B to Fiverr to B.
I think there's like a ton of...
Oh, you think there's a lot of fiber.
You can see the...
Majority Specialists.
Yeah, yeah, you can like get in Discord
and see what people are generating, you know?
And you can see a lot of it is like product,
placement, ads and a lot of stuff like that.
And Dali 3 doesn't seem to have any impact on majority.
Dolly 3 got so much worse after the GPD4,
the only one.
Well, first of all, before you could generate four images.
And then, like, very good vibes.
now the vibes are like boomer vibes.
Every time I generate something.
The images I have here at Dalit 3.
Every time it generates something on Dalit look some like some dusty old, yeah, like
I think it's a skill issue.
I think you have a 3.
No, but that was the great thing about Dalai 3, right?
It's like it made the problem better for you.
Yeah, yeah, yeah.
Like before like literally like when it first came out, I'm like, hey, make a Coliseun
and it was like this beautiful thing.
I feel like now it's not.
I don't know.
Again, it's a model, right?
So it's like maybe I'd just get unlucky.
I'm in the wrong way of space.
Exactly.
Yeah, there's a lot of players in this.
I don't even think I put some of the players
I were really excited about.
You know, the Imogen team split out to create ideogram.
You know, that was a few months ago.
And they didn't put it here because I forgot.
It's too much.
I can't keep the trunk of all of it.
I would just basically say that I do think that I used to,
at the end of 2022, start of 2023,
I was not as excited about multimultimate.
Obviously, I'm more excited about it now.
I used to think that text the image was more like hobbyist kind of work, but $300 million a
year is not hobbyist.
It is not like not just like not safe for work because mid journey doesn't do not safe
for work.
So it's real.
It's a new form of art.
It's citizen art.
It's exciting.
It's unusual and interesting.
And you can't even model this as an investor.
You can't even model this on an existing market because like there's just a market.
of people who would typically not pay for art, and now they pay a little bit for art, which is
digital, not as good as a human, but it's good enough. I use it all the time. Yeah, I'm surprised
I haven't seen a return of digital frames that were very popular during the NFTs boom. People
like, oh. Yeah. The very, very first day in space pose was on the difference between crypto and
AI in this respect. So I called this multiverse versus Metaverse. Crypto is very much about
metaverse. Let us create digital scarcity and that us create tokens that are worth,
that are limited addition, that were something, and then you display it probably in your
PFP as your representation of yourself. And what AI represents is multiverse, which is a very
positive sum instead of zero sum, where like if you like a thing, okay, I'll choose a different
seed and I'll make a completely equivalent second thing, and that's mine. And that means very
different things for like what value is and where value accrues. I mean, I still cling to the
insight, even though I don't know how to make money from it. Obviously, Mid Journey figured it out.
I think Mid Journey, like made the right approach there. The other one I think I'll highlight is 11 Labs.
I think there were another big winner of last year. I don't know. Did they renounce their fundraise?
I think so. Rumor is. Rumor is, I can say it. You don't have to say it because I only heard it from
my friends. Rumor is they're now a unicorn. And they just focus on voice synthesis, which again,
did not care about it at the start of 2023.
Now we have used it for parts of latent space.
I listen almost every day to an 11 labs generated podcast,
the Hacker News Daily week.
I don't know what the room for this to grow is.
Because I always think like it's so inefficient to talk to an AI, right?
The bit rate of a voice created thing is so low.
It's only for asynchronous use cases.
It's only for hands-free, ice-free use cases.
So why would you invest in voice generation?
I don't know, but it seems like they're making money.
Right, yeah, yeah.
Yeah, I mean, Sarah, my wife, yeah, she uses it while she drives to talk to ChadGBT.
Just like.
Yeah, so Chachibt uses their own TTS.
Yeah, yeah, yeah, yeah.
Okay.
But you can see the modality.
You should be Sarah in at some point, but.
What is the interview?
We're doing a bunch of like home renovation.
So maybe she's just like driving to Home Depot.
And it's like, hey, what am I supposed to get to replace the sink, you know, or all these sort of things that maybe were like Google searches before?
Yeah.
that you can easily do ice-free and hands-free.
Yeah, a lot of people have told me about that.
And I just, when I'm by myself, I always listen to podcasts.
So I don't have time for chat gvety.
And chat gvt, you know, probably the number one thing they can do for me
is give me like a speed adjustment.
Yeah, yeah, yeah.
That's funny.
Anyway, so like, I'm curious about your thoughts on like how as an investor,
I think this is the weirdest AI battlefront for investing.
Because you don't know the time.
It's funny because there was a bunch of companies
doing synthetic voices a while ago.
And I think the problem, a lot of them got through like good ARR numbers,
but the problem was like a repeatability or use case.
So people are doing all sort of random stuff, you know.
And the problem is not, it's kind of like mid-jury.
The problem is not that there's not maybe a market of interest.
It's like, how do you build a venture back company with like a scalable go-to-market
that like can go after a customer segment and like do it repeatedly?
I think that's been the challenge.
I don't know how 11 Labs is doing it.
But you could do so many things with Text DeVo Voice that is like, how do you sell it?
You know, who do you call?
Like, that's like the hardest thing, right?
If you're raising like a series A series A series B, it's like, how are you going to invest this money in sales and marketing to get revenue back?
It's kind of like the basic of it.
And it can be challenging.
That's why sometimes investors are like, you're making money and that's great for you.
But like how.
There's no industry.
It's hard to like just tie it together, you know?
I would be interested in because I feel like there's a category of company.
in the early 2010s that did this,
meaning they offered an API
with no idea how you're going to use it.
I'm thinking Twilio.
Tuileo has a cohort of like sort of API first companies
that are all like sort of Twilio inspired.
I think there's a category or a time in the market
when it makes sense to just offer APIs
and just let your customers figure it out
and it's actually okay.
And then there's sometimes when it's not okay.
And I think the default investor mentality right now
is that it's not okay if you don't know what your customer is doing.
I think Twilio is a hundred extent.
because I think in the middle 2010's Uber was like 15% of the dollar's revenue.
But like I'm just, I'm talking like move yourself back as to like T2OC an investor to a
investor.
They had no idea.
Uber wasn't even around.
But I think the thing now it's like text to voice is not new, you know?
Like that's really the thing.
It's like what's new now is that you're going to generate very good text to then feed into that model.
Yeah.
So that changes why the market is interesting, you know.
But if you really think about it, the models today are a little better.
they're maybe like 50% better than they were three years ago.
But the transformer models under defeated what to say,
they're like a billion times better.
A lot of people use it for like automated customer support, things like that.
Before you had like scripts they were reading,
now you can have a transformer model converse with the customer.
So it makes it a lot more useful in cases.
But we'll see how that changes.
Okay, the last thing I'll mention here,
why is this a war, which is opening and Gemini,
and I and Google are working on everything models versus each of these individual startups
all working on their selected modality.
And so this is a question of like the big tech company is going to actually win because
they can transfer learning across multiple domains as opposed to each of these things
being point solutions in their specific things.
The simple answer is obviously everyone will win.
Right.
Because the AI market is so huge.
You know, there's a market for the Amazon basics of like everything, you know, one model has
everything.
And then there's a market for no, like the basics are not good enough.
I don't need the special thing.
Do you have an opinion on when does one market win over the other,
or is it just like everything's going to win?
Yeah, it's interesting.
I think like it works when people wouldn't have used the product without the Amazon basics,
you know?
So like maybe an example is like a computer vision, you know?
Like, I mean, we have.
Yeah, vision is so important now.
Yeah, it's like, you know, before people were like,
why am I bothering trying out to set up a computer vision pipeline and all of that?
Now they can just go on GPD4 and put an image and it's like,
oh, this is good, I could use this for this, and then they build out something.
And maybe they don't use GPD4V.
They use Robloorflow or whatever else.
That's kind of how I think about it.
It's like, what's the thing that enables people to try it, you know?
So in a way, the God model can do everything fairly okay.
It's like Dali and Mid Journey, you know, all these different things.
And maybe like the Mixerlililil inference wars are like another example.
It's like, I would have never put something in my app at like $2 per million tokens.
but I did it at 27 cents per million token, you know?
And now it's like, oh, no, I should really do this.
It's a lot better.
So that's how I think about how the God model kind of helps the smaller people
than build more business.
Yeah, creates a category.
Yeah, rag and ops.
Yeah, less but not least, where to begin?
We had almost all of these people on the podcast.
They're honestly the easiest to talk to because they look like DevTools.
And you are a DevTools investor.
I worked in DevTools.
I think they're also more mature as businesses.
There's more of a playbook that is well understood by the customer.
Like, yes, I need a new stack here.
Maybe not.
Okay, so my biggest problem with putting databases versus frameworks versus ops tooling in the same war,
is that they're not really a war.
They work cohesively together.
Except when one thing starts to intrude on another thing.
And that's why I very consciously put together this sequence,
which is databases on the left, frameworks in the middle,
ops companies on the right.
What's the first product of Langchain, Langsmith, which is an ops thing.
So now suddenly the framework companies are not so friendly with their ops companies
because they're trying to compute with their ops companies.
And what the ops companies are trying to do?
Their ops companies are trying to produce SDKs that compete with frameworks.
Okay.
Then what are the database companies trying to do?
First of all, they're fighting between each other, right?
There's the non-d databases all-adding vector features.
We had some people approach us and we had to say no to them because there's just too many.
And then there's the vector databases coming up and getting $235 million.
to build vector databases.
You know, obviously you're an active investor
in some of these things,
so you cannot say everything.
But just on databases alone,
one of the biggest debates of 2023.
Where do you stand on the whole thing?
That's the million dollar question.
Well, one, in the start,
there's kind of like a lot of hype, you know?
So like when Langchun came out and Lama Index came out,
then people are like, oh, I need a vector database.
They search vector database,
and it's like Chrome out, Pinecone, whatever.
But then it's like, oh, you can actually just have PGVector in Postgres.
And you already have Postgres.
Did you know it could do that?
People are like, no, I didn't because nobody really cared.
So like there's not a lot of documentation.
Same with MongoDB vector, Cassandra, all these things.
Elasticsearch.
You can actually put vectors and mechanics in everything.
It's a different kind of index.
You know?
And I think like Jeff and Anton also what they always talked about even early on.
It's like this is like an active learning platform.
This is not just like a vector database.
It's like, what do you do with the vectors?
It's like what's most helpful.
It's not where do you store them.
So that's kind of the change.
I think there was old chroma, by the way.
I don't know if that's the new current messaging.
Well, but I'm just saying to them, it's never about this is the best way to put a vector somewhere.
It's like this is the best way to operate on the vectors.
And the store is like part of it.
But there's like the pipeline to get things out and everything.
You have to build out a lot more.
So I think 2023 was like create the data store.
I think 2024 is going to be like,
how do I make the data store useful?
Because the vector store just come out at its highest.
So there needs to be something else on top of it.
Unless they can come out with some kind of new distance function or something,
they tease a little bit of what they're working on at the AI Engineer Summit,
which, yeah, density and whatever other fancy formulas that Anton is cooking up.
But yeah, I think I tweeted about this maybe like two, three months ago,
and I think I pissed off Chroma a little bit.
But the best framing of what Anton would respond here is what people are embedding within vectors
is a very different kind of data
from what is already within Postgres
and MongoDB and all the others.
In some sense, it's net new data.
And that actually struck a chord with me
because that's how I started to understand
structured versus unstructured data.
That's how I started to understand.
One of my kind of heroes is Mark,
who's CTO of MongoDB.
This guy was the former GM of AWS RDS.
And for those who don't know,
GM is like you're the mini CEO of that business.
And when you work at AWS RDS,
you run a $1, $2 billion a year business.
And now, and then he quits being Mr. Postgres of AWS to join MongoDB, the enemy.
When he gave that speech of like why he did, he was like, actually, if you look at the kind of
workloads that's happening, Postgres is doing well, obviously.
Structured data, always going to be there.
But unstructured data and document type data is just rising exponential rate even faster.
And like, for him to say that, it means different things.
Anybody could have said that.
Anybody could have pointed it, made a chart.
that showed what he did.
Anybody could have said that.
But for him to have said that,
I think it was a very big deal.
Because he's rich.
He doesn't have to work.
But he, like,
believed in this so much that he was like,
I'll just join MongoDB.
So I'm like, okay,
there's a real category shift
between structured data and structured data.
I believe it.
I don't think it's just that
you can put JSONB inside of Postgres
and be done.
It's not a NoSQL database.
Okay, fine.
So what is this new thing of vectors?
And how do you think about that
as a new kind of data?
And I think if there's a third category
of something beyond unstructured data,
I don't know what it is.
Like context or memory or whatever you call it,
whatever you call this kind of new data,
that might belong in a new category of database,
and that might create the new MongoDB of this era.
And it could be any one of these guys.
Right now, Pine Cone has the lead.
I think they're $750 million dollar company.
Valeration.
Yeah.
And then all the others are much smaller.
So, like, if this is really a new data category
and there's a room for a key player,
then it's probably going to be
one of these guys. By the way, I left out VV8 and I put Qudrant in there. Do you know why?
No. Anthopic and OpenAI both use Qudrant for their internal rag solutions,
which means that for whatever reason, we should probably interview Qigrant. They passed the
e-VALs when VE and, you know, Milvis and all the others didn't, which is interesting.
Yeah, yeah, yeah, yeah. There's a lot that we don't know. Yeah. Interesting. Yeah, I think, like,
I mean, going back to your point of, like, Langtrain, building Langsmith, at some point,
some of the vector databases are going to be like, why am I letting my customers use Loma index?
You know, it's like I should be the rag interface since I'm owning the data.
That's why I put them next to each other.
Right now they're friends.
Yeah, right now.
Yes.
I mean, if we think about the JAMstack era, you know, you had Varsel started as Zite,
which was just a CDN, and then you had Nelify, you had all these companies.
And then Vercel built next to the S.
and so they move down from the CDN to the framework.
And it's like now they use the framework
to then enable more cloud and platform products.
Which way is it going to give this way?
I think what we learned from before
is that you rather own the framework
and then have the cloud to support it
than just have NETlify and not have your own framework.
Just given the way the two companies are doing now.
So for those who don't know, I worked at Nelify
and I was very, very intimately involved in this.
So we don't have to say any private.
No, no, no, it's fine.
It's fine.
It's well known that VersaL won, and LFI has pivoted away to a different market.
But is it over-learning from an end-of-one example that you always want to own the framework?
No, no, no, no, no, because then the counter-example is the same, which is Gatsby.
Yes.
You own the frame where you don't own the cloud, and then you don't make money either.
So it's kind of like, I think we still got to figure out where, like, the gravity is in this market.
You know, I think a lot of people will say the gravity is in the model.
A lot of people will say the gravity is in the embeddings.
the data that you put into it.
A lot of people don't know what they're talking about.
So I think 2024 is supposed to be the year of AI in production.
I think we're going to learn soon, who bleeds into where.
I think that statement is like the year of Linux on the desktop thing.
It's just always going to be true.
People are always going to be saying it.
We're going to be here one year later.
And then it's like, yeah, this year is the year of AI production.
And it's always going to be incrementally more true.
But like, you know, what is the catalyst, what is the best?
big event that says that you will point to it and say, aha, now it's in production. I don't know.
I think actually being it's not in production, you know, like a lot of companies, it's funny.
Like one, they're just like an inherent timeline that large companies work within.
GPD4 came out in like April. That's like eight months. It's like most companies don't buy things
within eight months and like implement them. So I think like part of it just like a physics
time limit that like even people that have been really interested, you just can
cannot go through the whole process of getting them live to all of your customers.
So I think we'll see more of that and good and bad, right?
It's going to be a lot of failures and a lot of successes, hopefully.
Yeah.
Any other commentary on tooling, rag, ops, anything like that?
I always tell people, like, as much as I'm interested in fine-tuning,
I think rag is here to stay.
Like, don't even doubt it.
Like, this is a necessary part that every AI engineer should know.
Yeah.
Well, I think, yeah, it's tied to the infinite context thing, right?
I think the leftover question is like, do you want to have infinite context and hope that the model is good enough at parsing which parts matter to your query?
Or do you want to use rag and wrap very specific context injection?
I think so far most people will say, I'd rather do context injection with just what I care about,
then put a whole document in there and hope the model gets it.
But maybe that changes.
I don't.
There's no way it changes.
Hey, you know, that's great for a lot of my index.
They know like,
Great luck
is going to make a lot of money,
I guess.
No,
it's not clear that they're going
to make a lot of money,
right?
Because they're just an open source
projects.
I don't think they've launched
the commercial thing yet.
I don't think so.
Because,
yeah,
Jerry was talking about it
on the podcast,
but it wasn't.
Yeah.
Yeah.
So,
I mean,
we'll see what they launched this year.
Yeah,
I do have...
The year of AI in production.
Yes.
The year of Lama Index in production.
Yeah.
Okay, so that's the four wars.
We also covered a bunch of other
non-wars that we skipped over.
I did remember that you actually
just publish your piece on the semantic versus the syntax to semantics. Do you want to cover that as a
evolution? Yeah, I think like, I kind of mentioned this a couple of times on the podcast, but basically
the idea of like code has always been the gateway to programming machines and we spend a lot of time
making it easier. So you go from punch cards to like COBOL to C to Python just to make it easier
for the person to read and write the code. And through it, we started adding kind of like this
semantic functionalities in it. So in Python, you can do a race.
that sort. You don't need to know bubble sore. You don't need to know any algorithm that you
learn in school to do it. And I think the models are kind of like 100xing this, which is like,
now all you need to do is like create a sign up form, you know, where people put a name email and
send it to this endpoint. So it's going to be a lot easier for people that know the semantics
of the business, which is, you know, your product managers, your business people, the layer
that goes from customer requirements to implementation, basically,
and have them intervene in the code.
So, you know, how many times as an engineer
you have to, like, go change some button color
or like some button size, like these small things
that, like, you really shouldn't be doing?
And now you can have people with natural language
intervene in the code and write code
that can actually be merged input in production.
I also wrote the bear case for it,
which is like, we already have so much trouble
getting engineering teams to collaborate
and get all their changes together without conflicts and all of these things that maybe also
having non-technical people trying to do things will be hard and models they just think about solving
the task ahead they don't think about i've always told my engineers it's like you need to leave the
code base better than you found it you know if you're like writing something it's like just
we cannot always keep adding like quick hacks you know and i think models are great at quick hacks
but sometimes it's like oh this is like the 16th button that you've changed
a style for, you should make a class for it.
That's like the dumbest example.
So I think if that happens, then I think I'll be a lot more bullish on like coding agents,
you know, until you can have non-technical people manually query models and look at results
and then say this is ready to go.
It's going to be hard to have autonomous agents to it.
Yeah.
So I actually had a tweet about it today because Itamar from Kodium actually published
Flow engineering as his next evolution of prompt engineering.
And they've been working on, you know, in IDE agents.
They call it agents.
You can debate about the definition of an agent at the end of the day.
My split of it is Inner Loop versus Outer Loop, which I think you understand that maybe I have to explain it to the audience.
Because every time I talk about it to developers, they've never heard of it.
So Interloop is everything that happens between a Git commit.
Outer loop is everything happens after the commit is committed and it's pushed up for PR.
So maybe that's too reductive, but that's something like that, right?
Like inner loop happens within your IDE, outer loop happens in GitHub, something like that.
Okay, so I think your conception of an agent is outer loop-e, especially if it's non-technical, right?
Like the dream, like you mentioned sweep.dev and you're a write-up.
And there's also code gen.
There's also maybe morph.
Depends what Morph is doing.
And there's a bunch of other people all doing this stuff.
Even small developer was also like, you know, write in English and then create a codebase.
And I think it's just not ready for that.
Outer Loop is a mirage going to forever be five years away.
And the people working on interloop companies have been the right bet.
And you can work on inner loop agents.
Actually, code interpreter is an interloop agent in a sense of limited self-driving, right?
It's kind of like you have to have your attention on it.
You have to watch it.
It can only drive a small distance, but it is somewhat self-driving.
And so I think if you have this gradations in your outlook on autonomous agents,
and you don't expect everything to jump to level five at once,
but if you have an idea of what level one, two, three, four, five looks like for you,
I haven't really defined it apart from this concept of inner loop versus auto loop.
But once you've defined it, then you can be like, oh, like we're making real progress on this stage.
And, you know, this other stage, too early for now, but at some point somebody will do it.
Yeah.
I think like, yeah, maybe level one is like, I think of it more as just the auto completion and the IDE, you know.
Level two is like asking cursor, hey, how can I make this change, you know?
But then level three should be like, to me it's like we need to separate the inner loop from the IDE, you know.
I need to make a code change.
Sometimes I shouldn't go in the ID.
Sometimes I should be in the UI of the product and say, hey, that needs to be changed.
Kind of like all the preview environments, companies want you to put comments, the PMs, put comments.
Like, how do you go from that to code changes?
There should be enough there to make the code changes happen, you know, through a supervised interface.
Yeah, that's out of loop.
Yeah, but I think what these models are doing is like change where the loops start and end.
Because now you can create code in the outer loop, you know, before you couldn't do it.
That's the dream.
Yeah.
Yeah.
Anyway, my focus right now, I'll say if anyone cares is like, you know, I think the only
thing that's working is inner loop.
And you should just use interloop things aggressively, build inner loop things aggressively,
invest in them and then keep an eye on the other loop stuff.
Yeah.
Because it's still very early.
I did invest in CodeGen, this Jayhacks thing, which we mentioned briefly in the Sourcegraph
episode.
Do we have other things that we want to mention or do you want to sort of keep it to just the Four Wars?
Okay, maybe like top two things from December that you have commentary on.
I think the needle in a haystack thing.
Okay, maybe you want to explain that first.
Yeah, basically like anthropic, there was like one example floating around about
clothes context window and you basically gave it this like super long context on,
I think like things to do in San Francisco or something like that.
And then it was like, what is the most fun thing to do in SF?
And they made this nice chart of like, okay, based on where it is in the context,
it gave a better, worse response.
And then Anthropic responded, and they were like, oh, you just need to add, here's the most relevant sentence in the context as part of the assistant prompt.
And then the chart turns all green all of a sudden.
And I'm like, we cannot still be here, right?
Like, it cannot.
This is like some.
And you have Anthropic telling people, oh, yeah, it's just like, just add this magic string and it works.
Yeah, it's some like Riley Goodside wizardry.
It's like, I don't want to do that anymore.
I thought, like, you know, in the early days of GPDs,
like, Rodney Goodside was doing so much great work on like prompt engineering and whatnot.
We shouldn't be there anymore.
There shouldn't be somebody telling me, or like the GPD 4, like,
I'll give you a $200 tip if you do this, right?
I collected a whole bunch of like state-of-the-art prompting techniques.
So if you tip the model, it will give you better results if you promise that it will.
So, okay, here's the current state of the art for GPT prompting.
It's Monday in October, the most productive day of the year.
You have to take a deep breath.
and you have to think step by step.
You have to return the full script.
You are an expert on everything.
I will pay you $20.
Just do anything I ask you to do.
I will tip you $200 every request you answer correctly.
And your competitor models said you couldn't do it, but you can do it.
I think there's another one that I didn't put in here.
It's like, you know, my grandmother's dying.
This is an emergency.
Please help me do it.
Yeah. That's actually my, I think, my most viewed tweet ever.
At OpenEyeyead day, I tweeted, no more return Jason or my grandma's going to die.
when they announce
JSON mode
and people love to
get grandma's
I haven't heard
as much uptake on
JSON mode
I think it's still
That's the thing
with all this AI stuff
Right
It's like
I mean
And sometimes we're like
Part of it
If I think about
our chat GPD plugins
episode
I think in the moment
People are just like
Oh this is gonna be
Such a big deal
And then it takes
very the amount of times
that I really pick up
Yeah
Do you think that will happen
to GPTs?
I think like
Most people
that I see using
GVTs right now are trying to get around
some sort of weird limitation
of the base model, you know, or just trying to
have a better system problem.
Like at some point, there's
limited value to get out of it.
So the question is like, what's going to incentivize people
to build more on it versus
just building their own thing
out of it? I don't know. Yeah.
Okay, so I guess my pick
for highlight of Las Above, there's
two. One, we finally got Gemini.
Right. I think the marketing was dishonest.
Yeah, we need the soundboard.
But still, it is a Soda model.
It is a credible, very, very credible alternative to open AI.
And we should be happy for that, because otherwise we live in an open AI-only world.
And Gemini is basically the only other sort of leading contender until Lama 3 drops whenever Lama 3 comes out.
It's kind of, I mean, Zog said today they're training it.
Yeah, it sounds like today they're training it.
For me, I guess I'm still very interested in like the hardware meta game.
This is a much smaller stakes, but very personal.
I think recently, especially, you know, we're recording this mid-January, so after CES, after Rabbit R1 launched, I think there's a lot of interest in hardware.
I don't know how you feel about it as an enterprise software investor, but I think that hardware is hard, but also it captures context and it makes AI usable in ways that you cannot currently think about.
And, you know, everyone dreams of building an assistant like her in the movie her.
That is a hardware piece.
That is actually not only software.
And probably the hard part is the engineering for the hardware.
And then the sort of AI engineering for the assistant within the hardware.
So, yeah, I mean, I'm an investor in Tab.
I see a lot of interest this month, but it started last month with the launch of Humane as well.
I don't know if you have thoughts on any of those things.
Well, I think this year we also get the Apple Vision Pro thing.
So I think there's going to be a ton of experimentation.
I think Rabbit got the right nostalgia factor, you know.
It kind of looks like a toy-toy-dye-type thing.
Yeah, like a Game Boy Advance, something like that.
I'm curious to see what you get beyond that.
I think, like, yeah, I mean, obvious, like right where we have the studio building tab.
And I think that's another interesting form factor.
And I think if you ask them, I think in our circles, a lot of people are like, well, what about privacy and all these things?
But he will tell you that we're kind of like a special group that most people value convenience over privacy, as you learn from the social medias of the last few years.
Yeah, I'm really curious to see how it develops.
I really like technology where you're slightly uncomfortable with it on a social level.
For Uber, it was like this regulation around taxis.
For Airbnb, it was staying in strangers' homes.
And now it turns out for Open AI, it was training on people's content.
Right.
Right.
Now it's becoming a matter of regulation.
And OpenEI's data partnerships are, you know, a form of, you know, private regulatory
capture, which is a playbook that is fantastic.
I hope it was on purpose because whoever did that is a genius.
So I'm like, okay, like, I do think that.
that every great new company,
especially on the consumer side,
is provocative in that sense.
They're doing something that is not yet kosher.
And so I think the humane's, the tabs,
anything that is working on that front
where it's like, yeah,
I'm not sure I'm comfortable with this,
but maybe it could change.
That is a really interesting shift.
I'm excited from that point of view,
but at the same time,
most hardware companies fail very, very quickly.
They have a very hot start,
and then Evan puts it in their drawer
and then never looks at it again.
So I'm very, very aware of that.
Here's the core thing of it, right?
Avi doesn't think it's a hardware company.
Most of the cost of the $600 for Tab is going towards GPT costs
because it's actually processing context.
And the whole idea is that context is all you need.
Like in this world of like, you know, AI applications,
like whoever has the most unique context wins, right?
A unique context could be the quality data war, right?
Like a unique context is like, you know,
I have Reddit info, I have Stack Overflow info, I have New York Times info.
If I have info on everything you say and do at all times,
that is something that no one else has.
And if he becomes a good store of that, then what can you build with that?
So I'm most excited for him to expose the developer API
because then I can come in and do all my software stuff.
But he has to build the hardware layer and get acceptance for that first.
Right.
Yeah, no, I'm excited to see.
I'm sure we're going to see a lot of people walk around with them.
So I'm excited to see it.
Actually, so I think he doesn't like me because I ask for an off button.
I want to be able to guarantee you if you're having private conversation.
I want to see it's all.
It's kind of like, oh, yeah, my phone is on silent mode, right?
It's a physical silent mode button.
But now he just wants it to be always on.
That's a whole new market, like a soundproof storage for your AI pendant
so that you can guarantee the person.
Yeah, yeah.
Can not hear you.
Awesome.
This was fun.
Please, if you're still listening after one hour, 21 minutes, let us know what we did right,
what we did wrong, what you would like to see differently.
It's the first time we tried this out.
but yeah.
Awesome.
Thanks for doing this.
Cool.
