Latent Space: The AI Engineer Podcast - The Four Wars of the AI Stack (Dec 2023 Audio Recap)

Starting point is 00:00:06 Hey everyone, welcome to the Lid in Space podcast. This is Alessio, partner, and CTO and resident sedativele partners. And today I'm joined just by my co-host, SWIX, for a new podcast format. Yeah, it's a bit uncomfortable because we have to just stare into each other's eyes lovingly. But in our end-of-year survey last year, a lot of listeners were asking us for more one-on-one time, more opinions from the both of us as hosts on what's going on in AI. You know, both of us are very actively involved. and I don't think this year will be any difference.

Starting point is 00:00:37 This year, there's lots more excitement to come. And we're trying to grow late in space in terms of the types of formats and the amount of value that we deliver to our subscribers. So one thing that we've been trying, experimenting with is this monthly recap that I started doing around August of last year, where I basically just take the notable news items of the month

Starting point is 00:00:57 and then I sort them and categorize them according to some order that makes sense and write them down in the newsletter. And this last December recap was particularly exciting because it seemed like it popped off in a number of areas, particularly with the AI breakdown. Our friend NLW featured it on his podcast. And I figured we can just kind of go over that as a way of setting the stage for 2024, but also recapping what happens in 2023. Yeah. And people always ask me if December is like a slow month.

Starting point is 00:01:27 But I think you almost broke sub-sac with how many links we had in the thing. No, we actually did. So a lot of people commented to me about the formatting issues within the newsletter that I sent out. And I know that they are there, but I couldn't fix it because substack was broken by us with how long it was. Oh, Ben. But we had this kind of like four main buckets called the four words of the AI stack, data quality and I guess like data quantity as well in a way. The GPU rich versus pores, which we have a whole episode about with Dylan Patel, multimodality. We're actually recording tomorrow with Luma Labs about their new 3D models.

Starting point is 00:02:02 So we went from text to image to 3D video. I wonder what's next. And we're going to release Hugging Face as well. I guess I'm thinking about calling it multi-modality 101 because the first modality beyond text that you should really pay attention to is vision. Right. Yeah. And then the RAG ops were.

Starting point is 00:02:19 I think that's a... I don't know what to call it. I don't know if you would have called it anything else. This is my... The tooling were, I don't know. But I think beginning of last year, that was like kind of the hottest space because there wasn't much open source model work. And I think over the last maybe like four or five months,

Starting point is 00:02:34 everybody's so focused on fine-tuning Lama 2 and like a DPO to improve these models, Maxtral and all these things. And people forgot about our friends at Langechain, Lama Index, and some of the things that were maybe top of mind. Vector DBs, you know, it seemed like everybody was releasing a VectorDB early in the year. Yeah, I think that I'll be very surprised

Starting point is 00:02:54 if any new Vector DBs come out this year, with one exception, which is something I'm keeping an eye on, which is turbopuffer. I don't know if you've seen them going around. Yeah, all the smart people seem to be adopting turbopuffer as the first serverless vector DB. Yeah, no, and we're going to have definitely Jeff and Anton on the podcast at some point.

Starting point is 00:03:13 I know they're going to be fun. I should also mention the reason I selected these four wars was a process of elimination of wars that I think ended up not mattering. So for those who don't know, inside of my writing, I often include footnotes that are in themselves. just essays for notes. And so I think it's also notable the things that people thought were hot that were less hot than expected.

Starting point is 00:03:38 So it was agents, definitely less hot than at the start of 2023. And then this one is very controversial, non-selection by me. I think open source AI is not a battle in the sense that I don't think there's anyone against open source AI. Everyone is like on one side. There's no like opposing side apart from regulators. But in my mind when I think about like for engineers, engineers are all used. universally in favor of open source models.

Starting point is 00:04:02 So there's no battle here. Everyone just wants it to improve. So it's not like interesting to write about. We just want more open source. Yeah. The only battle is people offering inference on it. Yes. Killing each other in the process.

Starting point is 00:04:13 Yeah, so I classified that as a GPU rich versus poor war. But maybe there's a better way to classify that and you can give me some feedback on that. It's a struggle to try to categorize the world. Code models as well. I was very struck by a conversation I had with Poole Side. I so can't from Pulside. So they haven't been on the podcast yet. They're kind of stealth still,

Starting point is 00:04:32 but they had a very, very notable fundraise. I think they had like $50 million raised. I think KivaMaria for a seed, spending most of it on GPUs. And my conversation with ISO, he was like, hey, you know, like, Replit was like one of our podcasts, early biggest winners.

Starting point is 00:04:47 Replit didn't really follow up with, like they announced like their 1.5 model, but it's not really why they use beyond Replit, you know. There's StarCoder, there is Kodama, but like it's not really, for how important, than code is, it doesn't seem like as big of a battlefront as just general function calling, reasoning,

Starting point is 00:05:06 you know, these other kinds of domains. And so I thought it was just interesting to note that even though we as a podcast tried to pay particular attention to developer tooling, to code models, we interviewed cursor, fine, replica, codium, and hugging face, these all seem like very small compared to the amount of money being thrown, the amount of heat in the other domains. And I don't know why that is. Yeah, I think it's maybe the fragmentation of the tooling, you know, like most people in code are using VScode cursor, GitHub, one of the trees. So there's maybe not as much experimentation versus with text people are just trying everything. It's hard to try a code model, you know? I see code models being released, but like it's not super easy to just plug it into your boardflow.

Starting point is 00:05:47 So I think engineers like myself are just lazy and like, hey, I'm having great success with whatever I'm using. Yeah, yeah. I don't really want to want to go there. The special case form of code is SQL and the semantic layer data engineering type things. We also had to guess on there from Seek and Cube. And we also talked to a bit of data breaks, a bit of Julius. Yeah. And we have Brian from X.

Starting point is 00:06:07 And Brian and Brian from X. Does he count? I don't know. Yeah. Yeah, yeah. I guess the Hex notebooks, yes. Hex magic, yes. Rex is a different beast.

Starting point is 00:06:17 Anyway, but yeah, I think people who like come to AI engineering for the AI might actually end up finding themselves in data engineering in the end. In traditional ML engineering in the end, they might have to discover that they're doing Rexis and all the stuff that is get swapped under a rug in the demo becomes their job. And I'll probably say, like, just because we didn't select a theme for last year,

Starting point is 00:06:40 doesn't mean it wasn't important. It just wasn't top of mind yet. And maybe I think that would be an emerging theme this year. Yeah. I think that's kind of the consequence of the low background tokens, like the end of the low background tokens. once. Can you explain

Starting point is 00:06:53 what you think low-beck and this is our November recap. Yeah, the comparison that our friend Jeff Uber at Kroma

Starting point is 00:07:00 brought up is steel before the atomic bomb creation. So steel before and no radiation in it. After all the

Starting point is 00:07:06 testing, a lot of steel had radiation embedded in it. So it was really precious to get low background

Starting point is 00:07:12 steel, meaning with no radiation. And same with tokens. You can assume that any internet content from

Starting point is 00:07:19 three years ago it's just internet. It's like people writing is not models writing instead now anything we're going to get on common crawl updates and things like that you never know if it's human britain or not and i think that will put more work on data engineering right because even basic stuff like checking if a tech says as a model created by open ai you know it's going to be important so people are just being blindly taking all the data set

Starting point is 00:07:44 suffered by eluther and common crawl and all these different things assuming that all the data in it is good i think now how do you build on top of it and we've seen seen the New York Times lawsuit against Slovenia. We've seen data partnerships starting to rise in different companies. I think that's going to be one of the bigger challenges. And maybe we'll see more of the work that Databricks has done to build the Dolly 5K instruction tuning. Just first-party creation of data. It's like you got people sitting at their desk every day. If everybody wrote five, you know, Q&A pairs or things like that, you would have a massive, unique data set for your model. Yeah. For people who missed that episode, that was one of our early episodes as well.

Starting point is 00:08:22 and Mike Conover since left to start Bright Wave which I'm sure will have him back this year at some point they're doing a lot of interesting stuff I think the next episode will be very cool

Starting point is 00:08:31 So how do you want to tackle this Do you want to just kind of go through the four wars? Yeah, let's do it You created this like Wikipedia-like infographic for each of them So yeah I should say The inspiration for this actually was

Starting point is 00:08:42 during the Sam Altman Leadership battle People were making mock Wikipedia entries For the debate And for like who's on the side of the, you know, decels and who was inside of the Eax. So I like that format because it's very concise. It has to list the key players and it's kind of fun to think about like who's on what side and think about what is important and what people are battling over. And I think it is important

Starting point is 00:09:06 to focus on key battlegrounds as a concept because there's so many interesting things you could be talking about in AI and they're not all equally interesting. So how do you decide what is interesting? I think it's money, it's power, it's, it's, people, it's, you know, like impact, that kind of stuff. And so, yeah, that's what I ended up doing. And fun fact, the way I did this was I actually edited the HTML on Wikipedia and then I just screenshoted it just to get the formatting. Good old developer tools. Developer tools is all you need. So the data were belligerents. Yeah. On one side you have journalists, writers, artists. On the other side, you have researchers, startups, synthetic data researchers. I guess like maybe we want to talk about

Starting point is 00:09:51 What are the axis of war? So, like, one of them is attribution, right? Like, I think there's a varying spectrum of how comfortable people are about this data going into a model. So some people are happy to have your model trained on it, no matter what. Some people are happy to have your model trained on as long as you disclose that it's in the model. Some people just hate that you trained on their model. And some people, like the New York Times, wants you to destroy any artifact that might have touched your article. So that's kind of what we're fighting on.

Starting point is 00:10:23 I just want to make clear that it's not just like you should never use the data, you should always use the data. I think people are just trying to figure out what's the right form of attribution and how do I get paid as somebody whose data ended up being in this training. I think we're giving everybody a lot of great tokens with latest space because we do full transcripts on everything. We're happy for people to train models. Oh, yeah, please train a space model.

Starting point is 00:10:46 Yeah, we would love it. So that's kind of what we're fighting on. anything that people should keep in mind about this war and maybe some of the campaigns that are going on? I think the New York Times one is probably going to go to Supreme Court. It is very, very critical. It is landmark war that will probably decide what fair use means in context of AI. And I recommend, I think the verge did a good analysis of this. Platformer maybe did a good analysis of this.

Starting point is 00:11:13 There are like four criteria for what fair use is and everyone basically converges onto the last criteria, which is, does your transformative use of my copyrighted material diminish the market from my content? It's very hard to say. I would suspect that yes, in some capacity, in some amount, but good luck proving that in a court of law. And I think a negative ruling on open AI would seriously stall the progress of AI. And that's bad for humanity, but good for content creators and writers. So obviously, we want them to be adequately compensated and recognized for their work. There's like no easy outcome here apart from the existing copyright system, which is also somewhat broken.

Starting point is 00:11:52 And it's just a very, very tricky, challenging case, I think. It's funny because we had something, I was a community moderator at a website called Rap Genius, which was a lyric sanitation. And there was like a similar thing, and maybe like 2014, or like the music labels basically came to the website and it's like, hey, this is not fair use. You know, like you can not reuse the lyrics to the song. and eventually the website made deals with the record labels to be able to do this. And then Google was stealing the transcripts to put in like the enhanced thing.

Starting point is 00:12:23 And they proved it by. Yeah, yeah. We did all the like busy like the things on the eye. Some eyes we put the dots. Diacritics. Like the accident and that's how it made all better. I thought it was, I thought they just vary the spacing or they like use the different kind of spacing in the Unicode.

Starting point is 00:12:38 I think it was the eye thing. But maybe I mean this is like almost 10 years ago. So Rapgenius has proved it. by injecting some data poison into their corpus, and then Google reproduced it faithfully, and so therefore they proved that Google was scraping Rap Genius. Did Google have to pay Rap Genius money in the end? I don't think so.

Starting point is 00:12:55 There was also another issue with a genius that we had that got blacklisted by Google for like... Of course. There was a lot going on. But anyway, this is not a Rep Genius special. Yeah, I mean, ultimately, like, I think that we do any quality data. I think that if this case is contained to the New York Times, the New York Times' worst outcome

Starting point is 00:13:12 is that they'll substitute it with Washington Post and they substitute with the economists or like the second or third ranked newspaper that is the most friendly to AI and then the New York Times will realize that actually their words are not as not that much more valuable than other words then the value of the content comes down

Starting point is 00:13:30 very, very dramatically. I think it will be interesting but yeah, I do think it's overstepping their bounds to call for the destruction of LGBT's. That's probably for sure. Then the bigger problem I have is with Stack Overflow and Reddit, which I named as the site of the New York Times. They have effectively shut down their APIs in order to try to train their own models.

Starting point is 00:13:48 Probably same as Twitter, actually. I should probably have put Twitter. I put Twitter on the wrong side, maybe. I don't know. Twitter is on both sides. Elon is on every side of chaos. Yeah. What this is basically every UGC users generated content company of the 2020s,

Starting point is 00:14:04 now has a giant pile of user content that becomes valuable data that used to be open for researchers to scrape and train models. Now all of them are locking their walls, right, behind their wall gardens and then trying to train their own models to boost their benefits. So this is a locally optimal outcome for them, but a globally suboptimal outcome for humanity. Because why should we care about the closed garden of Reddit, the Reddit model, the Stack Overflow model, the X model,

Starting point is 00:14:31 as opposed to it being a part of a data mix of 20% Reddit, 20% Stack Overflow, 20% X. that seems like a much better outcome for the world, but everyone is acting in their very narrow self-interest in trying to make their own model, which is probably going to suck. Right. So next war, after you get data...

Starting point is 00:14:49 We should mention synthetic data. Oh, yeah. Yeah, yeah. So what happens when you run out of human data? You make your own... Right. So I would say, like, when I went to New York, that was the number one discussion out of every single researcher's mouth.

Starting point is 00:15:01 There is a lot of research coming from both, I guess, the big labs as well as the academic. labs on what good synthetic data looks like. I don't know if you've like talked to any startups around that. I just talked to Lewis Costicado today. And he is promising a very, very interesting approach to synthetic data generation. I think his phrase for it is like pre-trained scale synthetic data as opposed to what the news research and the other open source communities have been doing, which is fine-tuned scale synthetic data. And so he wants to create like trillion token data sets that are all synthetic. And I'm like, okay, that's interesting. But also at the same time,

Starting point is 00:15:35 these are all just downloads from GPD4 or something else. Lewis is very aware of that and he has a way around it. I don't really understand it, but he claims that that's a good way around it. Andre Carpathie at Neurips highlighted this paper from DeepMind where they were bootstrapping synthetic data that could be verifiably proven correct, so specifically in math and in code where there is a correct answer. So yeah, that makes sense. You can solve the synthetic data problem that way.

Starting point is 00:16:04 but like what about you know beyond that there's just no answer and wasn't part of the issue also that the way the phrases are constructed and like all of that and synthetic data and stuff kind of like making mold collapse even worse because yeah one thing is like right or wrong right the other thing is like every sample is read in the same way you know or like as a similar since it comes from a certain model kind of as a similar root yeah so the yeah so i mentioned this in the best papers discussion with John Frankel. So the basic argument is you already have a fraud distribution from a language model. You are resampling that flaw distribution to double down on that flaw distribution. There's no extra information from humans. So on principle, how can

Starting point is 00:16:44 this work? And so the only conclusion there is you don't need it to emulate a human. You need it to emulate a useful assistant, however you define it. So I think the goal of synthetic data is less to emulate human speech, because that is basically solved. It is now more to spike the distribution in useful ways. And that's a phrase I borrow from Kanjun from Inbu. But anyway, so I think that synthetic data will be a giant theme for this year. And not least because the human data is being locked up behind walls. So it's a very, very clear trend.

Starting point is 00:17:16 This is probably the most amount of money after GPUs will be spent here. So one war I did not put here was the talent war, right? Like the war for PhDs and smart people. And but when you break down what the talent people do, One is they make models and they run inference on GPUs or they run training runs on GPUs but the other is they clean data they find data, clean data and format data.

Starting point is 00:17:36 And so yeah, these are all just proxies for the kind of talent that is flowing back and forth. And ultimately, I think you have to focus on what they're working on, the visible output of what they're working on, which is data. Awesome. All right, let's talk about the GPU inference war. I think this is one that has been heating up.

Starting point is 00:17:52 And we actually have a bunch of these folks coming on the podcast in the next few days. Yeah, yeah. Are we calling it Compute Month? Yeah, we can figure out a name, but we have modal, together, replicate. There's a lot coming up. But basically, the Mixerall released, the MOU model was kind of the spark of the war. I think the price went down like 90% in one week.

Starting point is 00:18:14 Yeah, I wrote 2-2 times. But, yeah, one divided by 2-2-2 is whatever the... Yeah, yeah. Yeah, and then there was like the benchmark drama between together and any scale on like whether or not which one was faster and like whether or not there, the benchmark was really reflective of performance. Yeah. This is very surprisingly ugly in a way that I think usually people try to respect

Starting point is 00:18:36 each other as work and play nice and say nice things when people release stuff. Even if it's a competitor, you say nice things or you don't say anything at all. Any scale, for some reason, they release a benchmark that on which, of course, any scale looks the best. Why would you release a benchmark where you don't look the best? But then basically everyone featured in that benchmark didn't like it, of course. I do think there's some methodological things. So for anyone doing benchmarks,

Starting point is 00:19:00 you have to understand that there's a real, real, real difference between like a public benchmark that is meant for just limited testing compared to, okay, if you're load testing us or if you're seeing what a real enterprise customer would see, you have to give them a heads up, you have to get a different API key, a different endpoint, and you test the real infrastructure, not the demo one.

Starting point is 00:19:19 This is very common for infrastructure, and I think any scale just neglected that, and it hurt their credibility. any scale is not new at this game like they should have done that. But what was interesting was this benchmark drama reached even beyond any scale. And we're going to have Smith on and he's going to talk about like

Starting point is 00:19:34 why he weighed in because Sumith doesn't represent any inference before it. He just works at meta. But he felt like this was a very interesting debate. And I think we'll see more of this. You have been a data investor for a while. Like database companies always do this. And I think now we're just seeing this kind of fight

Starting point is 00:19:50 come into the inference space. Yeah. I think the hardest thing thing is the end customer cannot replicate it. So, like, if you give me like a Postgres benchmark, I can run Postgres on my MacBook, you know, and run similar ones. I think with models, it's just impossible. So people tell you, this is the benchmark, and you're like, okay, I have to go sign up to every single cloud now to try it.

Starting point is 00:20:13 It's just not easy. And we talked about this in benchmarks 101, which is same with model benchmarks, right? Just like, oh, this model is so much better than this. And then it's like, did you train on the questions? And it's like, what? Oh, I don't know. So, and again, it's hard for people to just, like, run the models and test them, you know? So, like, there's a lot more weight, I think, in AI on benchmarks that there is in traditional software.

Starting point is 00:20:36 Because nobody buys upstash over Redis cloud or whatever just based on a benchmark. They try them and check performance and whatnot because they have real production skill workloads. Here, it's like nobody's really doing anything with these models. So it's like, whatever any skill says, I guess is good. but then customers are going to go try it and just decide for them what the right thing is. Yeah. And I think it's important to understand it is not just about cost. I think what the price war represented was a raise to the bottom on cost.

Starting point is 00:21:05 And you're like, okay, deep infra, which is a company, the name of the company is deep infra. Deep Infra has promised to just always be the lowest cost provider. Like, okay, fine. That's a good value proposition. But you're not only optimizing for that in a production application, right? You're optimizing for latency. That's one thing. you're optimizing for uptime, that's something that you can only earn over time.

Starting point is 00:21:24 You're optimizing for throughput and other forms of reliability. It starts to tail off beyond that. But there's three or four dimensions that really, really matter. If you're not table stakes on any of those things, you're out. You're just out. Actually, there was a really good website that was released just this week called Artificial Analysis. Do you see it? Yeah.

Starting point is 00:21:42 So this is what the industry needs, which is an independent third-party benchmark, pinging the production API endpoints of all the providers. and giving a third-party analysis of what this is. I actually built a prototype of this last year. Yeah, I was going to say. But I didn't like maintaining it. I'm glad someone else is doing it just because I don't want to keep up with all these things. But still, I think it's a public service that somebody should do.

Starting point is 00:22:07 And so I'm glad that they did it. I think they did it very well. So, yeah, I think that is where the, I guess, the inference drama is ending for now. I don't think, you know, I haven't seen any continuing debate there. The only other thing that, you know, I did some extra work on this for the recap, which is like, are they losing money? You know, are they pricing correctly their tokens from mixed trawl? And I actually managed to go into Dylan Patel's write-up of the mixed trial price war. And I think I reasonably worked out that you can serve mixed trial and the lowest you can possibly charge if you like take the most aggressive amortization of all your capex and all that is 50 to 75 cents per million tokens, which is what perplex is.

Starting point is 00:22:49 prices. They're mixed for all that. And perplexity is a very smart player. They're not even an inference infra provider. They're just like doing this for fun. But they're like, we don't want to lose money on this. We will provide it at cost. This is what cost is to us. So that means perplexity provides it at 56 cents per million output tokens. That means any scale, which is 50 cents, Octo AI, 50 cents, abacus AI, 30 cents, and deep infrared 27 cents. They're all losing money. Because we think that the break event is 51 cents. And even that is like a full batch size and kind of max utilization. I assume 50% utilization.

Starting point is 00:23:27 So like very, like you talk to practitioners, very, very good is 60%. Average is like 30, 40. So I just, I say 50, right? You assume 50% batch like 16, 100 tokens per second generation. That's also very, very high. These are all very favorable numbers. Like probably the real number is closer to 75 cents per million than 50 cents per million. Anyway, anyone charging under 50, definitely.

Starting point is 00:23:49 losing money. So then it's like, okay, either you don't know what you're doing, which in which case, good luck, or you know what you're doing and you're purposely losing money for something. And what is that? And I don't know, but I think it's an interesting, aggressive strategy to pursue if you are doing it on purpose. So this is something that like the classical like Walmart would have a lost leader. Like they really, really on purpose lose money on things so that they get you in the door to try things out. Like I don't know if that makes sense to you as a DC. Yeah. Yeah. It's like the, well, It's like all the, you know, the candies are placed at the cash register because maybe you just went to get the thing on discount and then you buy a Kit Kat, whatever, and they make money on

Starting point is 00:24:29 the Kit Kat. They all have the Pokemon trading cards at checkout now. So if you bring your kids or buy the discounted, whatever for you, then you end up spending more. But to me, the thing is like, where's the check out register where you upsell people with these things, right? Yeah, I don't know how you. It's like, that's really the big thing.

Starting point is 00:24:46 I don't know. I'm curious to see. I don't think a cloughflare still. has a life. I wonder what they're going to charge for our own workers. They cannot serve mixed trail. Their GPs are too underpowered. Cloudflare AI is like very good marketing

Starting point is 00:24:59 for very, very underpowered inference, right? Yeah, well, I don't know. I think it all depends on like what is going to be needed, right? So that missed trial 7b right now I check. But they cannot serve mixed trial. Yeah, yeah, yeah. I wonder,

Starting point is 00:25:15 but I think they don't want to get into this race right now probably. No. Yeah. So, yeah, I'm curious, going back to the loss leading, it's like, is there going to be a better model that comes next that they hope that you already integrated their thing with? You know, if you're using together to serve mixed raw and then something else comes in that you're going to replace mixed raw with, hopefully you're still going to use together and they're going to get better unit economics on it. I don't know. Yeah. It's a good question.

Starting point is 00:25:43 It's a good question. Thank you, VCs for, you know, paying for all of our imprints. No, no, no, I think these are, you know, everyone here are grown adults, they're smart investors. I'm sure there's some kind of long-term strategy here. And I'm trying to figure that out. Like, assume that people are smart. And then what will smart people do? Yeah, I think it's the same with Uber, right?

Starting point is 00:26:02 It's like, how could have been so cheaper at the start? Yeah, yeah. You look back at all, DoorDash, all these things. It's like. And like, last year was a great year for Uber. Yeah, no, exactly. All my Uber friends are like suddenly very, very rich again. One thing I will mention on the engineering

Starting point is 00:26:19 sort of technical detail side is the rise of mixture of experts is something that we covered in our podcast with George and now with mixed draw. And it represents the first successful, really, really commercially successful sparse model. And sparse in a very interesting way, in a sense that the divergence between the amount of compute you need at training versus the amount of compute you need for inference,

Starting point is 00:26:45 continuous to diverge, but also in a weird way where you need to keep all the weights of the M-O-E model loaded, even though you're not necessarily using them in all times. So basically what I think that is, is like I think that that is going to impose different needs on hardware, different needs on workload, different needs on, like, batching optimization. Like fireworks recently announced fire attention where they wrote a custom creditor kernel for Mixdral on H-100.

Starting point is 00:27:11 It's like super, super domain specific. And they announced that they could, for example, quantize from, like 16 bit down to 8 bit with like no loss in performance. Like all this magical details emerge when you take advantage of like very, very custom optimizations like that. I think like the rise in MOUs this year is going to be going to have very meaningful impacts on the inference market. And how it's going to shape how we think in price for inference. It may not be that we have this sort of input token versus output token paradigm for for long, particularly because we have things like different forms of batching, different forms of caching.

Starting point is 00:27:44 and like I don't really know what that looks like but I'm very curious. I see a lot of opportunity here if I was an inference provider player. Like that's something I would be trying to offer to people as a way to differentiate because otherwise you're just an API. Yeah. Yeah, no, it was in a way counterintuitive

Starting point is 00:27:58 because most of the struggles with inference as well are just like memory bandwidth, you know? So we have now models the scale worse at higher batch, you know? But I'm glad I'm not in that business. I can tell you that. As far, there's so much work to be done at, like, so many low levels of the stack. You're already trying to provide value to the customer on, like, the developer experience and all of that, but you also have to get so close to the bare metal to, like, make this model.

Starting point is 00:28:27 Like, writing a kernel, imagine if you had to write, you're like a CPU cloud provider and you have to, like, write instruction sets. It's like just nobody would get in that business, you know? So I salute all of our friends at compute providers doing this work. I mean, together is doing so much for like three down and like fresh attention to and one. And so. Yeah. So, and that's something that I would leave as the last part of this sort of war of GPU rich versus poor.

Starting point is 00:28:51 The GPU rich people are the model trainers and the infra providers. They say like we have the GPUs, comm views are GPUs, you know, and then we provide you the best inference, right? And that's what we've been discussing so far. On the other side, on the GPU poor side are like all the alternative methods, right? the modulars, the tiny corpse, the QLora, and all the other type of stuff. I even put consistency models in there because, you know, any efficiency or distillation method where you reduce your inference or GPU usage by like 25 to 40 times, it's a GPU-friendly approach.

Starting point is 00:29:27 So I will also put Apple and MLX in there, and that's also like Apple is finally making moves in inference, and that will be a game changer for local models because then you just don't need any cloud inference at all. You just run it on device. which is fantastic. And then obviously, RWKV and Mamba and Striped Tainah from together. Like all those emerging models,

Starting point is 00:29:46 I don't know. There's something I've been worried about for a latent space. How much attention should we give to the emerging architectures? Because there's a very good chance that, one, these things don't work out. Two, they take a very long time to work out. And then three, once they work out,

Starting point is 00:30:02 they're like for limited domains and like not super usable. So I don't know if you have opinions on that. I can follow up with one conclusion that I've had, but I want to throw that question open to you. So the one conclusion is RWKV and the state-space models, including Mamba, have historically just been pitched as super long context models. And I'm like, that's not something I need because I'm okay with 100K context. I'm okay with rag and recursive summarization, all those techniques to extend your context,

Starting point is 00:30:31 like rope and yarn and all these things. So like, why do I need million context models? why do I need 10 million, 100 million, 1 billion models? Like, well, why? So the easiest argument is, oh, you can consume very, very high bit rate things like video and DNA strands. And then you can do like, SynBio and all that's good stuff. And I'm like, okay, I don't know anything about that. Like, what happens if, like, you hallucinate one wrong chain in your, you know, the DNA strand

Starting point is 00:31:00 that you're trying to synthesize? Good luck. I don't think. I don't know. That's why I've been historically underweighting intentionally. our coverage of state-space models and the non-transformer alternatives. Until Mamba, Mamba really changed things where basically for the same amount of compute, you can get a lot more mileage or a lot more performance for the same size of model.

Starting point is 00:31:21 Now it's an efficiency story. Now it's a GPU poor story. It is no longer a long context story. It is just straight up we are strictly more efficient than transformers. I'm like, oh, okay, I can get that. Does that change anything? I don't know. No, that makes sense.

Starting point is 00:31:34 I think people look at the slope, right, which is like, oh, you're going to get the context. higher and higher. But in reality, it's like, if you kept the context smaller, instead look at the anti-slope, so to speak. It's like, same context is like a lot less compute. Yeah. So that was not clear to me until Mamba. And so I think that's interesting. There's a concept of being trying to call the sour lesson. You know, the bitter lesson is stop trying to do domain-specific adjustments, just scale things up. And it's going to work. That's general intelligence. General intelligence, dislikes any attempt to imbue inside of it special intelligence. Like, if you have like any switch case or if statements or like if finance do this, if something do that, don't bother.

Starting point is 00:32:13 Just scale things up and it's going to do all of them simultaneously all better at once. That's the bitter lesson. The sour lesson is a parallel as a corollary, which is stop trying to model artificial intelligence like human intelligence, right? The neuron was inspired by the brain but doesn't work exactly like the brain. Machine learning uses back propagation. The brain does not use back propagation. We keep trying to create alternatives to transformers that look like RNNs because we think that humans act like RNNs. We have a hidden state and then we process new data and we update that state. But maybe artificial intelligence or machine intelligence doesn't work like that. Maybe we just fail every time we try. So that's the sour lesson. Every time we try to model things.

Starting point is 00:32:56 And my favorite analogy, I actually got this from, I think, an old quote from Sam Altman, who was like, you know, like we made the plane, the airplane. It was inspired by birds, but it doesn't work anything like birds, right? It just, and it works very efficiently. Like, it's probably the safest mode of transportation that we have, and it just works nothing like a bird. So why should artificial intelligence work like human intelligence? And that is the philosophical debate underlying my continued cautiousness around space models. I feel very vulnerable. saying this because I don't think there's any justification once you look at the empirical results or like the mathematical justifications for these things. But there is some grounding in philosophy that you should have when you think about, does an idea make sense? Is it worth exploring? Yeah. I think now there's a lot of work being put into it, right? And I think Transformers have shown enough success that people are interested in finding the next thing, you know? So before it wasn't clear of transformers we're really going to work. So people are kind of working on them.

Starting point is 00:34:00 But yeah. Okay, maybe in the 2025 recap, we're going to have. Yeah, I mean, we're trying to do one before that. So we actually have a link. I don't know if you know this. Shreya Rajpal from Garghael from Garberl's. She's married to Karan, from Hasey. Yeah. And so now he's started one of the other stateswage model companies. I forget

Starting point is 00:34:15 the name of it. So we'll see. I'm sure, like, this will be an emerging topic this year as as well. So we don't have to wait until next year. Yeah, yeah. No, I think we're going to have maybe the sour lesson. you know, overview. I mentioned this in the Luther Discord, and then they were like, okay,

Starting point is 00:34:29 so what is the spicy lesson, and what is the, the sweet lesson? The salty lesson. What is the sweet list? Yeah, yeah. I want the sweet lesson. Sounds better.

Starting point is 00:34:38 Cool. Talking about GPU poor, let's do multimodality war. I feel that stable diffusion was like the first GPU poor model. Yes, yes, absolutely. I should, I don't know if I mentioned that. I just didn't mention it. Stability, I think, in 2023,

Starting point is 00:34:52 you know, they shipped incremental things. I think, I don't know a stable diffusion. 2 was out there. But everyone's talking about XTXL Turbo, which is a form, which is an alternative to consistency model, but it looks like a consistency model. They ship video diffusion. They should do a whole bunch of stuff, but just wasn't as big as 2022 when they made a huge impact with stable diffusion. Yeah, I mean, it's hard to, it's hard to help to stable diffusion. But, yeah, midjourn has been doing great, obviously. I actually finally signed up for a paid account last month. Midjorney, yeah, yeah. I'm part of the $200 million a year that they're getting.

Starting point is 00:35:24 What's confirmed inside, I think, like a Businessweek article or Economist or Information article, that this team has now reached at least 200 million ARR, completely bootstrapped. I think their employee account is somewhere between like 15 and 30 people. I don't know if you know exact numbers. I have heard rumors that their revenue is actually higher than that. That was what was reported. But it's between the 200 million to 300 million range, which is crazy. Yeah, yeah. Especially if it's like primarily B2C, which it looks like it is.

Starting point is 00:35:53 Yeah, yeah. It's like B to Fiverr to B. I think there's like a ton of... Oh, you think there's a lot of fiber. You can see the... Majority Specialists. Yeah, yeah, you can like get in Discord and see what people are generating, you know?

Starting point is 00:36:06 And you can see a lot of it is like product, placement, ads and a lot of stuff like that. And Dali 3 doesn't seem to have any impact on majority. Dolly 3 got so much worse after the GPD4, the only one. Well, first of all, before you could generate four images. And then, like, very good vibes. now the vibes are like boomer vibes.

Starting point is 00:36:25 Every time I generate something. The images I have here at Dalit 3. Every time it generates something on Dalit look some like some dusty old, yeah, like I think it's a skill issue. I think you have a 3. No, but that was the great thing about Dalai 3, right? It's like it made the problem better for you. Yeah, yeah, yeah.

Starting point is 00:36:44 Like before like literally like when it first came out, I'm like, hey, make a Coliseun and it was like this beautiful thing. I feel like now it's not. I don't know. Again, it's a model, right? So it's like maybe I'd just get unlucky. I'm in the wrong way of space. Exactly.

Starting point is 00:36:58 Yeah, there's a lot of players in this. I don't even think I put some of the players I were really excited about. You know, the Imogen team split out to create ideogram. You know, that was a few months ago. And they didn't put it here because I forgot. It's too much. I can't keep the trunk of all of it.

Starting point is 00:37:14 I would just basically say that I do think that I used to, at the end of 2022, start of 2023, I was not as excited about multimultimate. Obviously, I'm more excited about it now. I used to think that text the image was more like hobbyist kind of work, but $300 million a year is not hobbyist. It is not like not just like not safe for work because mid journey doesn't do not safe for work.

Starting point is 00:37:38 So it's real. It's a new form of art. It's citizen art. It's exciting. It's unusual and interesting. And you can't even model this as an investor. You can't even model this on an existing market because like there's just a market. of people who would typically not pay for art, and now they pay a little bit for art, which is

Starting point is 00:37:56 digital, not as good as a human, but it's good enough. I use it all the time. Yeah, I'm surprised I haven't seen a return of digital frames that were very popular during the NFTs boom. People like, oh. Yeah. The very, very first day in space pose was on the difference between crypto and AI in this respect. So I called this multiverse versus Metaverse. Crypto is very much about metaverse. Let us create digital scarcity and that us create tokens that are worth, that are limited addition, that were something, and then you display it probably in your PFP as your representation of yourself. And what AI represents is multiverse, which is a very positive sum instead of zero sum, where like if you like a thing, okay, I'll choose a different

Starting point is 00:38:39 seed and I'll make a completely equivalent second thing, and that's mine. And that means very different things for like what value is and where value accrues. I mean, I still cling to the insight, even though I don't know how to make money from it. Obviously, Mid Journey figured it out. I think Mid Journey, like made the right approach there. The other one I think I'll highlight is 11 Labs. I think there were another big winner of last year. I don't know. Did they renounce their fundraise? I think so. Rumor is. Rumor is, I can say it. You don't have to say it because I only heard it from my friends. Rumor is they're now a unicorn. And they just focus on voice synthesis, which again, did not care about it at the start of 2023.

Starting point is 00:39:17 Now we have used it for parts of latent space. I listen almost every day to an 11 labs generated podcast, the Hacker News Daily week. I don't know what the room for this to grow is. Because I always think like it's so inefficient to talk to an AI, right? The bit rate of a voice created thing is so low. It's only for asynchronous use cases. It's only for hands-free, ice-free use cases.

Starting point is 00:39:38 So why would you invest in voice generation? I don't know, but it seems like they're making money. Right, yeah, yeah. Yeah, I mean, Sarah, my wife, yeah, she uses it while she drives to talk to ChadGBT. Just like. Yeah, so Chachibt uses their own TTS. Yeah, yeah, yeah, yeah. Okay.

Starting point is 00:39:55 But you can see the modality. You should be Sarah in at some point, but. What is the interview? We're doing a bunch of like home renovation. So maybe she's just like driving to Home Depot. And it's like, hey, what am I supposed to get to replace the sink, you know, or all these sort of things that maybe were like Google searches before? Yeah. that you can easily do ice-free and hands-free.

Starting point is 00:40:16 Yeah, a lot of people have told me about that. And I just, when I'm by myself, I always listen to podcasts. So I don't have time for chat gvety. And chat gvt, you know, probably the number one thing they can do for me is give me like a speed adjustment. Yeah, yeah, yeah. That's funny. Anyway, so like, I'm curious about your thoughts on like how as an investor,

Starting point is 00:40:35 I think this is the weirdest AI battlefront for investing. Because you don't know the time. It's funny because there was a bunch of companies doing synthetic voices a while ago. And I think the problem, a lot of them got through like good ARR numbers, but the problem was like a repeatability or use case. So people are doing all sort of random stuff, you know. And the problem is not, it's kind of like mid-jury.

Starting point is 00:40:58 The problem is not that there's not maybe a market of interest. It's like, how do you build a venture back company with like a scalable go-to-market that like can go after a customer segment and like do it repeatedly? I think that's been the challenge. I don't know how 11 Labs is doing it. But you could do so many things with Text DeVo Voice that is like, how do you sell it? You know, who do you call? Like, that's like the hardest thing, right?

Starting point is 00:41:21 If you're raising like a series A series A series B, it's like, how are you going to invest this money in sales and marketing to get revenue back? It's kind of like the basic of it. And it can be challenging. That's why sometimes investors are like, you're making money and that's great for you. But like how. There's no industry. It's hard to like just tie it together, you know? I would be interested in because I feel like there's a category of company.

Starting point is 00:41:43 in the early 2010s that did this, meaning they offered an API with no idea how you're going to use it. I'm thinking Twilio. Tuileo has a cohort of like sort of API first companies that are all like sort of Twilio inspired. I think there's a category or a time in the market when it makes sense to just offer APIs

Starting point is 00:42:01 and just let your customers figure it out and it's actually okay. And then there's sometimes when it's not okay. And I think the default investor mentality right now is that it's not okay if you don't know what your customer is doing. I think Twilio is a hundred extent. because I think in the middle 2010's Uber was like 15% of the dollar's revenue. But like I'm just, I'm talking like move yourself back as to like T2OC an investor to a

Starting point is 00:42:22 investor. They had no idea. Uber wasn't even around. But I think the thing now it's like text to voice is not new, you know? Like that's really the thing. It's like what's new now is that you're going to generate very good text to then feed into that model. Yeah. So that changes why the market is interesting, you know.

Starting point is 00:42:39 But if you really think about it, the models today are a little better. they're maybe like 50% better than they were three years ago. But the transformer models under defeated what to say, they're like a billion times better. A lot of people use it for like automated customer support, things like that. Before you had like scripts they were reading, now you can have a transformer model converse with the customer. So it makes it a lot more useful in cases.

Starting point is 00:43:04 But we'll see how that changes. Okay, the last thing I'll mention here, why is this a war, which is opening and Gemini, and I and Google are working on everything models versus each of these individual startups all working on their selected modality. And so this is a question of like the big tech company is going to actually win because they can transfer learning across multiple domains as opposed to each of these things being point solutions in their specific things.

Starting point is 00:43:30 The simple answer is obviously everyone will win. Right. Because the AI market is so huge. You know, there's a market for the Amazon basics of like everything, you know, one model has everything. And then there's a market for no, like the basics are not good enough. I don't need the special thing. Do you have an opinion on when does one market win over the other,

Starting point is 00:43:47 or is it just like everything's going to win? Yeah, it's interesting. I think like it works when people wouldn't have used the product without the Amazon basics, you know? So like maybe an example is like a computer vision, you know? Like, I mean, we have. Yeah, vision is so important now. Yeah, it's like, you know, before people were like,

Starting point is 00:44:04 why am I bothering trying out to set up a computer vision pipeline and all of that? Now they can just go on GPD4 and put an image and it's like, oh, this is good, I could use this for this, and then they build out something. And maybe they don't use GPD4V. They use Robloorflow or whatever else. That's kind of how I think about it. It's like, what's the thing that enables people to try it, you know? So in a way, the God model can do everything fairly okay.

Starting point is 00:44:29 It's like Dali and Mid Journey, you know, all these different things. And maybe like the Mixerlililil inference wars are like another example. It's like, I would have never put something in my app at like $2 per million tokens. but I did it at 27 cents per million token, you know? And now it's like, oh, no, I should really do this. It's a lot better. So that's how I think about how the God model kind of helps the smaller people than build more business.

Starting point is 00:44:54 Yeah, creates a category. Yeah, rag and ops. Yeah, less but not least, where to begin? We had almost all of these people on the podcast. They're honestly the easiest to talk to because they look like DevTools. And you are a DevTools investor. I worked in DevTools. I think they're also more mature as businesses.

Starting point is 00:45:12 There's more of a playbook that is well understood by the customer. Like, yes, I need a new stack here. Maybe not. Okay, so my biggest problem with putting databases versus frameworks versus ops tooling in the same war, is that they're not really a war. They work cohesively together. Except when one thing starts to intrude on another thing. And that's why I very consciously put together this sequence,

Starting point is 00:45:35 which is databases on the left, frameworks in the middle, ops companies on the right. What's the first product of Langchain, Langsmith, which is an ops thing. So now suddenly the framework companies are not so friendly with their ops companies because they're trying to compute with their ops companies. And what the ops companies are trying to do? Their ops companies are trying to produce SDKs that compete with frameworks. Okay.

Starting point is 00:45:53 Then what are the database companies trying to do? First of all, they're fighting between each other, right? There's the non-d databases all-adding vector features. We had some people approach us and we had to say no to them because there's just too many. And then there's the vector databases coming up and getting $235 million. to build vector databases. You know, obviously you're an active investor in some of these things,

Starting point is 00:46:12 so you cannot say everything. But just on databases alone, one of the biggest debates of 2023. Where do you stand on the whole thing? That's the million dollar question. Well, one, in the start, there's kind of like a lot of hype, you know? So like when Langchun came out and Lama Index came out,

Starting point is 00:46:27 then people are like, oh, I need a vector database. They search vector database, and it's like Chrome out, Pinecone, whatever. But then it's like, oh, you can actually just have PGVector in Postgres. And you already have Postgres. Did you know it could do that? People are like, no, I didn't because nobody really cared. So like there's not a lot of documentation.

Starting point is 00:46:45 Same with MongoDB vector, Cassandra, all these things. Elasticsearch. You can actually put vectors and mechanics in everything. It's a different kind of index. You know? And I think like Jeff and Anton also what they always talked about even early on. It's like this is like an active learning platform. This is not just like a vector database.

Starting point is 00:47:03 It's like, what do you do with the vectors? It's like what's most helpful. It's not where do you store them. So that's kind of the change. I think there was old chroma, by the way. I don't know if that's the new current messaging. Well, but I'm just saying to them, it's never about this is the best way to put a vector somewhere. It's like this is the best way to operate on the vectors.

Starting point is 00:47:25 And the store is like part of it. But there's like the pipeline to get things out and everything. You have to build out a lot more. So I think 2023 was like create the data store. I think 2024 is going to be like, how do I make the data store useful? Because the vector store just come out at its highest. So there needs to be something else on top of it.

Starting point is 00:47:44 Unless they can come out with some kind of new distance function or something, they tease a little bit of what they're working on at the AI Engineer Summit, which, yeah, density and whatever other fancy formulas that Anton is cooking up. But yeah, I think I tweeted about this maybe like two, three months ago, and I think I pissed off Chroma a little bit. But the best framing of what Anton would respond here is what people are embedding within vectors is a very different kind of data from what is already within Postgres

Starting point is 00:48:08 and MongoDB and all the others. In some sense, it's net new data. And that actually struck a chord with me because that's how I started to understand structured versus unstructured data. That's how I started to understand. One of my kind of heroes is Mark, who's CTO of MongoDB.

Starting point is 00:48:25 This guy was the former GM of AWS RDS. And for those who don't know, GM is like you're the mini CEO of that business. And when you work at AWS RDS, you run a $1, $2 billion a year business. And now, and then he quits being Mr. Postgres of AWS to join MongoDB, the enemy. When he gave that speech of like why he did, he was like, actually, if you look at the kind of workloads that's happening, Postgres is doing well, obviously.

Starting point is 00:48:51 Structured data, always going to be there. But unstructured data and document type data is just rising exponential rate even faster. And like, for him to say that, it means different things. Anybody could have said that. Anybody could have pointed it, made a chart. that showed what he did. Anybody could have said that. But for him to have said that,

Starting point is 00:49:07 I think it was a very big deal. Because he's rich. He doesn't have to work. But he, like, believed in this so much that he was like, I'll just join MongoDB. So I'm like, okay, there's a real category shift

Starting point is 00:49:18 between structured data and structured data. I believe it. I don't think it's just that you can put JSONB inside of Postgres and be done. It's not a NoSQL database. Okay, fine. So what is this new thing of vectors?

Starting point is 00:49:29 And how do you think about that as a new kind of data? And I think if there's a third category of something beyond unstructured data, I don't know what it is. Like context or memory or whatever you call it, whatever you call this kind of new data, that might belong in a new category of database,

Starting point is 00:49:44 and that might create the new MongoDB of this era. And it could be any one of these guys. Right now, Pine Cone has the lead. I think they're $750 million dollar company. Valeration. Yeah. And then all the others are much smaller. So, like, if this is really a new data category

Starting point is 00:50:00 and there's a room for a key player, then it's probably going to be one of these guys. By the way, I left out VV8 and I put Qudrant in there. Do you know why? No. Anthopic and OpenAI both use Qudrant for their internal rag solutions, which means that for whatever reason, we should probably interview Qigrant. They passed the e-VALs when VE and, you know, Milvis and all the others didn't, which is interesting. Yeah, yeah, yeah, yeah. There's a lot that we don't know. Yeah. Interesting. Yeah, I think, like, I mean, going back to your point of, like, Langtrain, building Langsmith, at some point,

Starting point is 00:50:31 some of the vector databases are going to be like, why am I letting my customers use Loma index? You know, it's like I should be the rag interface since I'm owning the data. That's why I put them next to each other. Right now they're friends. Yeah, right now. Yes. I mean, if we think about the JAMstack era, you know, you had Varsel started as Zite, which was just a CDN, and then you had Nelify, you had all these companies.

Starting point is 00:50:56 And then Vercel built next to the S. and so they move down from the CDN to the framework. And it's like now they use the framework to then enable more cloud and platform products. Which way is it going to give this way? I think what we learned from before is that you rather own the framework and then have the cloud to support it

Starting point is 00:51:16 than just have NETlify and not have your own framework. Just given the way the two companies are doing now. So for those who don't know, I worked at Nelify and I was very, very intimately involved in this. So we don't have to say any private. No, no, no, it's fine. It's fine. It's well known that VersaL won, and LFI has pivoted away to a different market.

Starting point is 00:51:35 But is it over-learning from an end-of-one example that you always want to own the framework? No, no, no, no, no, because then the counter-example is the same, which is Gatsby. Yes. You own the frame where you don't own the cloud, and then you don't make money either. So it's kind of like, I think we still got to figure out where, like, the gravity is in this market. You know, I think a lot of people will say the gravity is in the model. A lot of people will say the gravity is in the embeddings. the data that you put into it.

Starting point is 00:52:00 A lot of people don't know what they're talking about. So I think 2024 is supposed to be the year of AI in production. I think we're going to learn soon, who bleeds into where. I think that statement is like the year of Linux on the desktop thing. It's just always going to be true. People are always going to be saying it. We're going to be here one year later. And then it's like, yeah, this year is the year of AI production.

Starting point is 00:52:23 And it's always going to be incrementally more true. But like, you know, what is the catalyst, what is the best? big event that says that you will point to it and say, aha, now it's in production. I don't know. I think actually being it's not in production, you know, like a lot of companies, it's funny. Like one, they're just like an inherent timeline that large companies work within. GPD4 came out in like April. That's like eight months. It's like most companies don't buy things within eight months and like implement them. So I think like part of it just like a physics time limit that like even people that have been really interested, you just can

Starting point is 00:52:59 cannot go through the whole process of getting them live to all of your customers. So I think we'll see more of that and good and bad, right? It's going to be a lot of failures and a lot of successes, hopefully. Yeah. Any other commentary on tooling, rag, ops, anything like that? I always tell people, like, as much as I'm interested in fine-tuning, I think rag is here to stay. Like, don't even doubt it.

Starting point is 00:53:20 Like, this is a necessary part that every AI engineer should know. Yeah. Well, I think, yeah, it's tied to the infinite context thing, right? I think the leftover question is like, do you want to have infinite context and hope that the model is good enough at parsing which parts matter to your query? Or do you want to use rag and wrap very specific context injection? I think so far most people will say, I'd rather do context injection with just what I care about, then put a whole document in there and hope the model gets it. But maybe that changes.

Starting point is 00:53:50 I don't. There's no way it changes. Hey, you know, that's great for a lot of my index. They know like, Great luck is going to make a lot of money, I guess. No,

Starting point is 00:54:00 it's not clear that they're going to make a lot of money, right? Because they're just an open source projects. I don't think they've launched the commercial thing yet. I don't think so.

Starting point is 00:54:06 Because, yeah, Jerry was talking about it on the podcast, but it wasn't. Yeah. Yeah. So,

Starting point is 00:54:10 I mean, we'll see what they launched this year. Yeah, I do have... The year of AI in production. Yes. The year of Lama Index in production. Yeah.

Starting point is 00:54:17 Okay, so that's the four wars. We also covered a bunch of other non-wars that we skipped over. I did remember that you actually just publish your piece on the semantic versus the syntax to semantics. Do you want to cover that as a evolution? Yeah, I think like, I kind of mentioned this a couple of times on the podcast, but basically the idea of like code has always been the gateway to programming machines and we spend a lot of time making it easier. So you go from punch cards to like COBOL to C to Python just to make it easier

Starting point is 00:54:45 for the person to read and write the code. And through it, we started adding kind of like this semantic functionalities in it. So in Python, you can do a race. that sort. You don't need to know bubble sore. You don't need to know any algorithm that you learn in school to do it. And I think the models are kind of like 100xing this, which is like, now all you need to do is like create a sign up form, you know, where people put a name email and send it to this endpoint. So it's going to be a lot easier for people that know the semantics of the business, which is, you know, your product managers, your business people, the layer that goes from customer requirements to implementation, basically,

Starting point is 00:55:23 and have them intervene in the code. So, you know, how many times as an engineer you have to, like, go change some button color or like some button size, like these small things that, like, you really shouldn't be doing? And now you can have people with natural language intervene in the code and write code that can actually be merged input in production.

Starting point is 00:55:43 I also wrote the bear case for it, which is like, we already have so much trouble getting engineering teams to collaborate and get all their changes together without conflicts and all of these things that maybe also having non-technical people trying to do things will be hard and models they just think about solving the task ahead they don't think about i've always told my engineers it's like you need to leave the code base better than you found it you know if you're like writing something it's like just we cannot always keep adding like quick hacks you know and i think models are great at quick hacks

Starting point is 00:56:14 but sometimes it's like oh this is like the 16th button that you've changed a style for, you should make a class for it. That's like the dumbest example. So I think if that happens, then I think I'll be a lot more bullish on like coding agents, you know, until you can have non-technical people manually query models and look at results and then say this is ready to go. It's going to be hard to have autonomous agents to it. Yeah.

Starting point is 00:56:39 So I actually had a tweet about it today because Itamar from Kodium actually published Flow engineering as his next evolution of prompt engineering. And they've been working on, you know, in IDE agents. They call it agents. You can debate about the definition of an agent at the end of the day. My split of it is Inner Loop versus Outer Loop, which I think you understand that maybe I have to explain it to the audience. Because every time I talk about it to developers, they've never heard of it. So Interloop is everything that happens between a Git commit.

Starting point is 00:57:06 Outer loop is everything happens after the commit is committed and it's pushed up for PR. So maybe that's too reductive, but that's something like that, right? Like inner loop happens within your IDE, outer loop happens in GitHub, something like that. Okay, so I think your conception of an agent is outer loop-e, especially if it's non-technical, right? Like the dream, like you mentioned sweep.dev and you're a write-up. And there's also code gen. There's also maybe morph. Depends what Morph is doing.

Starting point is 00:57:34 And there's a bunch of other people all doing this stuff. Even small developer was also like, you know, write in English and then create a codebase. And I think it's just not ready for that. Outer Loop is a mirage going to forever be five years away. And the people working on interloop companies have been the right bet. And you can work on inner loop agents. Actually, code interpreter is an interloop agent in a sense of limited self-driving, right? It's kind of like you have to have your attention on it.

Starting point is 00:58:01 You have to watch it. It can only drive a small distance, but it is somewhat self-driving. And so I think if you have this gradations in your outlook on autonomous agents, and you don't expect everything to jump to level five at once, but if you have an idea of what level one, two, three, four, five looks like for you, I haven't really defined it apart from this concept of inner loop versus auto loop. But once you've defined it, then you can be like, oh, like we're making real progress on this stage. And, you know, this other stage, too early for now, but at some point somebody will do it.

Starting point is 00:58:28 Yeah. I think like, yeah, maybe level one is like, I think of it more as just the auto completion and the IDE, you know. Level two is like asking cursor, hey, how can I make this change, you know? But then level three should be like, to me it's like we need to separate the inner loop from the IDE, you know. I need to make a code change. Sometimes I shouldn't go in the ID. Sometimes I should be in the UI of the product and say, hey, that needs to be changed. Kind of like all the preview environments, companies want you to put comments, the PMs, put comments.

Starting point is 00:58:58 Like, how do you go from that to code changes? There should be enough there to make the code changes happen, you know, through a supervised interface. Yeah, that's out of loop. Yeah, but I think what these models are doing is like change where the loops start and end. Because now you can create code in the outer loop, you know, before you couldn't do it. That's the dream. Yeah. Yeah.

Starting point is 00:59:20 Anyway, my focus right now, I'll say if anyone cares is like, you know, I think the only thing that's working is inner loop. And you should just use interloop things aggressively, build inner loop things aggressively, invest in them and then keep an eye on the other loop stuff. Yeah. Because it's still very early. I did invest in CodeGen, this Jayhacks thing, which we mentioned briefly in the Sourcegraph episode.

Starting point is 00:59:40 Do we have other things that we want to mention or do you want to sort of keep it to just the Four Wars? Okay, maybe like top two things from December that you have commentary on. I think the needle in a haystack thing. Okay, maybe you want to explain that first. Yeah, basically like anthropic, there was like one example floating around about clothes context window and you basically gave it this like super long context on, I think like things to do in San Francisco or something like that. And then it was like, what is the most fun thing to do in SF?

Starting point is 01:00:08 And they made this nice chart of like, okay, based on where it is in the context, it gave a better, worse response. And then Anthropic responded, and they were like, oh, you just need to add, here's the most relevant sentence in the context as part of the assistant prompt. And then the chart turns all green all of a sudden. And I'm like, we cannot still be here, right? Like, it cannot. This is like some. And you have Anthropic telling people, oh, yeah, it's just like, just add this magic string and it works.

Starting point is 01:00:38 Yeah, it's some like Riley Goodside wizardry. It's like, I don't want to do that anymore. I thought, like, you know, in the early days of GPDs, like, Rodney Goodside was doing so much great work on like prompt engineering and whatnot. We shouldn't be there anymore. There shouldn't be somebody telling me, or like the GPD 4, like, I'll give you a $200 tip if you do this, right? I collected a whole bunch of like state-of-the-art prompting techniques.

Starting point is 01:01:03 So if you tip the model, it will give you better results if you promise that it will. So, okay, here's the current state of the art for GPT prompting. It's Monday in October, the most productive day of the year. You have to take a deep breath. and you have to think step by step. You have to return the full script. You are an expert on everything. I will pay you $20.

Starting point is 01:01:19 Just do anything I ask you to do. I will tip you $200 every request you answer correctly. And your competitor models said you couldn't do it, but you can do it. I think there's another one that I didn't put in here. It's like, you know, my grandmother's dying. This is an emergency. Please help me do it. Yeah. That's actually my, I think, my most viewed tweet ever.

Starting point is 01:01:37 At OpenEyeyead day, I tweeted, no more return Jason or my grandma's going to die. when they announce JSON mode and people love to get grandma's I haven't heard as much uptake on JSON mode

Starting point is 01:01:48 I think it's still That's the thing with all this AI stuff Right It's like I mean And sometimes we're like Part of it

Starting point is 01:01:54 If I think about our chat GPD plugins episode I think in the moment People are just like Oh this is gonna be Such a big deal And then it takes

Starting point is 01:02:02 very the amount of times that I really pick up Yeah Do you think that will happen to GPTs? I think like Most people that I see using

Starting point is 01:02:11 GVTs right now are trying to get around some sort of weird limitation of the base model, you know, or just trying to have a better system problem. Like at some point, there's limited value to get out of it. So the question is like, what's going to incentivize people to build more on it versus

Starting point is 01:02:27 just building their own thing out of it? I don't know. Yeah. Okay, so I guess my pick for highlight of Las Above, there's two. One, we finally got Gemini. Right. I think the marketing was dishonest. Yeah, we need the soundboard. But still, it is a Soda model.

Starting point is 01:02:45 It is a credible, very, very credible alternative to open AI. And we should be happy for that, because otherwise we live in an open AI-only world. And Gemini is basically the only other sort of leading contender until Lama 3 drops whenever Lama 3 comes out. It's kind of, I mean, Zog said today they're training it. Yeah, it sounds like today they're training it. For me, I guess I'm still very interested in like the hardware meta game. This is a much smaller stakes, but very personal. I think recently, especially, you know, we're recording this mid-January, so after CES, after Rabbit R1 launched, I think there's a lot of interest in hardware.

Starting point is 01:03:18 I don't know how you feel about it as an enterprise software investor, but I think that hardware is hard, but also it captures context and it makes AI usable in ways that you cannot currently think about. And, you know, everyone dreams of building an assistant like her in the movie her. That is a hardware piece. That is actually not only software. And probably the hard part is the engineering for the hardware. And then the sort of AI engineering for the assistant within the hardware. So, yeah, I mean, I'm an investor in Tab. I see a lot of interest this month, but it started last month with the launch of Humane as well.

Starting point is 01:03:52 I don't know if you have thoughts on any of those things. Well, I think this year we also get the Apple Vision Pro thing. So I think there's going to be a ton of experimentation. I think Rabbit got the right nostalgia factor, you know. It kind of looks like a toy-toy-dye-type thing. Yeah, like a Game Boy Advance, something like that. I'm curious to see what you get beyond that. I think, like, yeah, I mean, obvious, like right where we have the studio building tab.

Starting point is 01:04:17 And I think that's another interesting form factor. And I think if you ask them, I think in our circles, a lot of people are like, well, what about privacy and all these things? But he will tell you that we're kind of like a special group that most people value convenience over privacy, as you learn from the social medias of the last few years. Yeah, I'm really curious to see how it develops. I really like technology where you're slightly uncomfortable with it on a social level. For Uber, it was like this regulation around taxis. For Airbnb, it was staying in strangers' homes. And now it turns out for Open AI, it was training on people's content.

Starting point is 01:04:53 Right. Right. Now it's becoming a matter of regulation. And OpenEI's data partnerships are, you know, a form of, you know, private regulatory capture, which is a playbook that is fantastic. I hope it was on purpose because whoever did that is a genius. So I'm like, okay, like, I do think that. that every great new company,

Starting point is 01:05:09 especially on the consumer side, is provocative in that sense. They're doing something that is not yet kosher. And so I think the humane's, the tabs, anything that is working on that front where it's like, yeah, I'm not sure I'm comfortable with this, but maybe it could change.

Starting point is 01:05:24 That is a really interesting shift. I'm excited from that point of view, but at the same time, most hardware companies fail very, very quickly. They have a very hot start, and then Evan puts it in their drawer and then never looks at it again. So I'm very, very aware of that.

Starting point is 01:05:36 Here's the core thing of it, right? Avi doesn't think it's a hardware company. Most of the cost of the $600 for Tab is going towards GPT costs because it's actually processing context. And the whole idea is that context is all you need. Like in this world of like, you know, AI applications, like whoever has the most unique context wins, right? A unique context could be the quality data war, right?

Starting point is 01:05:57 Like a unique context is like, you know, I have Reddit info, I have Stack Overflow info, I have New York Times info. If I have info on everything you say and do at all times, that is something that no one else has. And if he becomes a good store of that, then what can you build with that? So I'm most excited for him to expose the developer API because then I can come in and do all my software stuff. But he has to build the hardware layer and get acceptance for that first.

Starting point is 01:06:21 Right. Yeah, no, I'm excited to see. I'm sure we're going to see a lot of people walk around with them. So I'm excited to see it. Actually, so I think he doesn't like me because I ask for an off button. I want to be able to guarantee you if you're having private conversation. I want to see it's all. It's kind of like, oh, yeah, my phone is on silent mode, right?

Starting point is 01:06:39 It's a physical silent mode button. But now he just wants it to be always on. That's a whole new market, like a soundproof storage for your AI pendant so that you can guarantee the person. Yeah, yeah. Can not hear you. Awesome. This was fun.

Starting point is 01:06:56 Please, if you're still listening after one hour, 21 minutes, let us know what we did right, what we did wrong, what you would like to see differently. It's the first time we tried this out. but yeah. Awesome. Thanks for doing this. Cool.

Latent Space: The AI Engineer Podcast - The Four Wars of the AI Stack (Dec 2023 Audio Recap)

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.