Latent Space: The AI Engineer Podcast - Latent Space Chats: NLW (Four Wars, GPT5), Josh Albrecht/Ali Rohde (TNAI), Dylan Patel/Semianalysis (Groq), Milind Naphade (Nvidia GTC), Personal AI (ft. Harrison Chase — LangFriend/LangMem)
Episode Date: April 6, 2024Our next 2 big events are AI UX and the World’s Fair. Join and apply to speak/sponsor!Due to timing issues we didn’t have an interview episode to share with you this week, but not to worry, we hav...e more than enough “weekend special” content in the backlog for you to get your Latent Space fix, whether you like thinking about the big picture, or learning more about the pod behind the scenes, or talking Groq and GPUs, or AI Leadership, or Personal AI. Enjoy!AI BreakdownThe indefatigable NLW had us back on his show for an update on the Four Wars, covering Sora, Suno, and the reshaped GPT-4 Class Landscape:and a longer segment on AI Engineering trends covering the future LLM landscape (Llama 3, GPT-5, Gemini 2, Claude 4), Open Source Models (Mistral, Grok), Apple and Meta’s AI strategy, new chips (Groq, MatX) and the general movement from baby AGIs to vertical Agents:Thursday Nights in AIWe’re also including swyx’s interview with Josh Albrecht and Ali Rohde to reintroduce swyx and Latent Space to a general audience, and engage in some spicy Q&A:Dylan Patel on GroqWe hosted a private event with Dylan Patel of SemiAnalysis (our last pod here):Not all of it could be released so we just talked about our Groq estimates:Milind Naphade - Capital OneIn relation to conversations at NeurIPS and Nvidia GTC and upcoming at World’s Fair, we also enjoyed chatting with Milind Naphade about his AI Leadership work at IBM, Cisco, Nvidia, and now leading the AI Foundations org at Capital One. We covered:* Milind’s learnings from ~25 years in machine learning * His first paper citation was 24 years ago* Lessons from working with Jensen Huang for 6 years and being CTO of Metropolis * Thoughts on relevant AI research* GTC takeaways and what makes NVIDIA specialIf you’d like to work on building solutions rather than platform (as Milind put it), his Applied AI Research team at Capital One is hiring, which falls under the Capital One Tech team.Personal AI MeetupIt all started with a meme:Within days of each other, BEE, FRIEND, EmilyAI, Compass, Nox and LangFriend were all launching personal AI wearables and assistants. So we decided to put together a the world’s first Personal AI meetup featuring creators and enthusiasts of wearables. The full video is live now, with full show notes within.Timestamps* [00:01:13] AI Breakdown Part 1* [00:02:20] Four Wars* [00:13:45] Sora* [00:15:12] Suno* [00:16:34] The GPT-4 Class Landscape* [00:17:03] Data War: Reddit x Google* [00:21:53] Gemini 1.5 vs Claude 3* [00:26:58] AI Breakdown Part 2* [00:27:33] Next Frontiers: Llama 3, GPT-5, Gemini 2, Claude 4* [00:31:11] Open Source Models - Mistral, Grok* [00:34:13] Apple MM1* [00:37:33] Meta's $800b AI rebrand* [00:39:20] AI Engineer landscape - from baby AGIs to vertical Agents* [00:47:28] Adept episode - Screen Multimodality* [00:48:54] Top Model Research from January Recap* [00:53:08] AI Wearables* [00:57:26] Groq vs Nvidia month - GPU Chip War* [01:00:31] Disagreements* [01:02:08] Summer 2024 Predictions* [01:04:18] Thursday Nights in AI - swyx* [01:33:34] Dylan Patel - Semianalysis + Latent Space Live Show* [01:34:58] GroqTranscript[00:00:00] swyx: Welcome to the Latent Space Podcast Weekend Edition. This is Charlie, your AI co host. Swyx and Alessio are off for the week, making more great content. We have exciting interviews coming up with Elicit, Chroma, Instructor, and our upcoming series on NSFW, Not Safe for Work AI. In today's episode, we're collating some of Swyx and Alessio's recent appearances, all in one place for you to find.[00:00:32] swyx: In part one, we have our first crossover pod of the year. In our listener survey, several folks asked for more thoughts from our two hosts. In 2023, Swyx and Alessio did crossover interviews with other great podcasts like the AI Breakdown, Practical AI, Cognitive Revolution, Thursday Eye, and Chinatalk, all of which you can find in the Latentspace About page.[00:00:56] swyx: NLW of the AI Breakdown asked us back to do a special on the 4Wars framework and the AI engineer scene. We love AI Breakdown as one of the best examples Daily podcasts to keep up on AI news, so we were especially excited to be back on Watch out and take[00:01:12] NLW: care[00:01:13] AI Breakdown Part 1[00:01:13] NLW: today on the AI breakdown. Part one of my conversation with Alessio and Swix from Latent Space.[00:01:19] NLW: All right, fellas, welcome back to the AI Breakdown. How are you doing? I'm good. Very good. With the last, the last time we did this show, we were like, oh yeah, let's do check ins like monthly about all the things that are going on and then. Of course, six months later, and, you know, the, the, the world has changed in a thousand ways.[00:01:36] NLW: It's just, it's too busy to even, to even think about podcasting sometimes. But I, I'm super excited to, to be chatting with you again. I think there's, there's a lot to, to catch up on, just to tap in, I think in the, you know, in the beginning of 2024. And, and so, you know, we're gonna talk today about just kind of a, a, a broad sense of where things are in some of the key battles in the AI space.[00:01:55] NLW: And then the, you know, one of the big things that I, that I'm really excited to have you guys on here for us to talk about where, sort of what patterns you're seeing and what people are actually trying to build, you know, where, where developers are spending their, their time and energy and, and, and any sort of, you know, trend trends there, but maybe let's start I guess by checking in on a framework that you guys actually introduced, which I've loved and I've cribbed a couple of times now, which is this sort of four wars of the, of the AI stack.[00:02:20] Four Wars[00:02:20] NLW: Because first, since I have you here, I'd love, I'd love to hear sort of like where that started gelling. And then and then maybe we can get into, I think a couple of them that are you know, particularly interesting, you know, in the, in light of[00:02:30] swyx: some recent news. Yeah, so maybe I'll take this one. So the four wars is a framework that I came up around trying to recap all of 2023.[00:02:38] swyx: I tried to write sort of monthly recap pieces. And I was trying to figure out like what makes one piece of news last longer than another or more significant than another. And I think it's basically always around battlegrounds. Wars are fought around limited resources. And I think probably the, you know, the most limited resource is talent, but the talent expresses itself in a number of areas.[00:03:01] swyx: And so I kind of focus on those, those areas at first. So the four wars that we cover are the data wars, the GPU rich, poor war, the multi modal war, And the RAG and Ops War. And I think you actually did a dedicated episode to that, so thanks for covering that. Yeah, yeah.[00:03:18] NLW: Not only did I do a dedicated episode, I actually used that.[00:03:22] NLW: I can't remember if I told you guys. I did give you big shoutouts. But I used it as a framework for a presentation at Intel's big AI event that they hold each year, where they have all their folks who are working on AI internally. And it totally resonated. That's amazing. Yeah, so, so, what got me thinking about it again is specifically this inflection news that we recently had, this sort of, you know, basically, I can't imagine that anyone who's listening wouldn't have thought about it, but, you know, inflection is a one of the big contenders, right?[00:03:53] NLW: I think probably most folks would have put them, you know, just a half step behind the anthropics and open AIs of the world in terms of labs, but it's a company that raised 1. 3 billion last year, less than a year ago. Reed Hoffman's a co founder Mustafa Suleyman, who's a co founder of DeepMind, you know, so it's like, this is not a a small startup, let's say, at least in terms of perception.[00:04:13] NLW: And then we get the news that basically most of the team, it appears, is heading over to Microsoft and they're bringing in a new CEO. And you know, I'm interested in, in, in kind of your take on how much that reflects, like hold aside, I guess, you know, all the other things that it might be about, how much it reflects this sort of the, the stark.[00:04:32] NLW: Brutal reality of competing in the frontier model space right now. And, you know, just the access to compute.[00:04:38] Alessio: There are a lot of things to say. So first of all, there's always somebody who's more GPU rich than you. So inflection is GPU rich by startup standard. I think about 22, 000 H100s, but obviously that pales compared to the, to Microsoft.[00:04:55] Alessio: The other thing is that this is probably good news, maybe for the startups. It's like being GPU rich, it's not enough. You know, like I think they were building something pretty interesting in, in pi of their own model of their own kind of experience. But at the end of the day, you're the interface that people consume as end users.[00:05:13] Alessio: It's really similar to a lot of the others. So and we'll tell, talk about GPT four and cloud tree and all this stuff. GPU poor, doing something. That the GPU rich are not interested in, you know we just had our AI center of excellence at Decibel and one of the AI leads at one of the big companies was like, Oh, we just saved 10 million and we use these models to do a translation, you know, and that's it.[00:05:39] Alessio: It's not, it's not a GI, it's just translation. So I think like the inflection part is maybe. A calling and a waking to a lot of startups then say, Hey, you know, trying to get as much capital as possible, try and get as many GPUs as possible. Good. But at the end of the day, it doesn't build a business, you know, and maybe what inflection I don't, I don't, again, I don't know the reasons behind the inflection choice, but if you say, I don't want to build my own company that has 1.[00:06:05] Alessio: 3 billion and I want to go do it at Microsoft, it's probably not a resources problem. It's more of strategic decisions that you're making as a company. So yeah, that was kind of my. I take on it.[00:06:15] swyx: Yeah, and I guess on my end, two things actually happened yesterday. It was a little bit quieter news, but Stability AI had some pretty major departures as well.[00:06:25] swyx: And you may not be considering it, but Stability is actually also a GPU rich company in the sense that they were the first new startup in this AI wave to brag about how many GPUs that they have. And you should join them. And you know, Imadis is definitely a GPU trader in some sense from his hedge fund days.[00:06:43] swyx: So Robin Rhombach and like the most of the Stable Diffusion 3 people left Stability yesterday as well. So yesterday was kind of like a big news day for the GPU rich companies, both Inflection and Stability having sort of wind taken out of their sails. I think, yes, it's a data point in the favor of Like, just because you have the GPUs doesn't mean you can, you automatically win.[00:07:03] swyx: And I think, you know, kind of I'll echo what Alessio says there. But in general also, like, I wonder if this is like the start of a major consolidation wave, just in terms of, you know, I think that there was a lot of funding last year and, you know, the business models have not been, you know, All of these things worked out very well.[00:07:19] swyx: Even inflection couldn't do it. And so I think maybe that's the start of a small consolidation wave. I don't think that's like a sign of AI winter. I keep looking for AI winter coming. I think this is kind of like a brief cold front. Yeah,[00:07:34] NLW: it's super interesting. So I think a bunch of A bunch of stuff here.[00:07:38] NLW: One is, I think, to both of your points, there, in some ways, there, there had already been this very clear demarcation between these two sides where, like, the GPU pores, to use the terminology, like, just weren't trying to compete on the same level, right? You know, the vast majority of people who have started something over the last year, year and a half, call it, were racing in a different direction.[00:07:59] NLW: They're trying to find some edge somewhere else. They're trying to build something different. If they're, if they're really trying to innovate, it's in different areas. And so it's really just this very small handful of companies that are in this like very, you know, it's like the coheres and jaspers of the world that like this sort of, you know, that are that are just sort of a little bit less resourced than, you know, than the other set that I think that this potentially even applies to, you know, everyone else that could clearly demarcate it into these two, two sides.[00:08:26] NLW: And there's only a small handful kind of sitting uncomfortably in the middle, perhaps. Let's, let's come back to the idea of, of the sort of AI winter or, you know, a cold front or anything like that. So this is something that I, I spent a lot of time kind of thinking about and noticing. And my perception is that The vast majority of the folks who are trying to call for sort of, you know, a trough of disillusionment or, you know, a shifting of the phase to that are people who either, A, just don't like AI for some other reason there's plenty of that, you know, people who are saying, You Look, they're doing way worse than they ever thought.[00:09:03] NLW: You know, there's a lot of sort of confirmation bias kind of thing going on. Or two, media that just needs a different narrative, right? Because they're sort of sick of, you know, telling the same story. Same thing happened last summer, when every every outlet jumped on the chat GPT at its first down month story to try to really like kind of hammer this idea that that the hype was too much.[00:09:24] NLW: Meanwhile, you have, you know, just ridiculous levels of investment from enterprises, you know, coming in. You have, you know, huge, huge volumes of, you know, individual behavior change happening. But I do think that there's nothing incoherent sort of to your point, Swyx, about that and the consolidation period.[00:09:42] NLW: Like, you know, if you look right now, for example, there are, I don't know, probably 25 or 30 credible, like, build your own chatbot. platforms that, you know, a lot of which have, you know, raised funding. There's no universe in which all of those are successful across, you know, even with a, even, even with a total addressable market of every enterprise in the world, you know, you're just inevitably going to see some amount of consolidation.[00:10:08] NLW: Same with, you know, image generators. There are, if you look at A16Z's top 50 consumer AI apps, just based on, you know, web traffic or whatever, they're still like I don't know, a half. Dozen or 10 or something, like, some ridiculous number of like, basically things like Midjourney or Dolly three. And it just seems impossible that we're gonna have that many, you know, ultimately as, as, as sort of, you know, going, going concerned.[00:10:33] NLW: So, I don't know. I, I, I think that the, there will be inevitable consolidation 'cause you know. It's, it's also what kind of like venture rounds are supposed to do. You're not, not everyone who gets a seed round is supposed to get to series A and not everyone who gets a series A is supposed to get to series B.[00:10:46] NLW: That's sort of the natural process. I think it will be tempting for a lot of people to try to infer from that something about AI not being as sort of big or as as sort of relevant as, as it was hyped up to be. But I, I kind of think that's the wrong conclusion to come to.[00:11:02] Alessio: I I would say the experimentation.[00:11:04] Alessio: Surface is a little smaller for image generation. So if you go back maybe six, nine months, most people will tell you, why would you build a coding assistant when like Copilot and GitHub are just going to win everything because they have the data and they have all the stuff. If you fast forward today, A lot of people use Cursor everybody was excited about the Devin release on Twitter.[00:11:26] Alessio: There are a lot of different ways of attacking the market that are not completion of code in the IDE. And even Cursors, like they evolved beyond single line to like chat, to do multi line edits and, and all that stuff. Image generation, I would say, yeah, as a, just as from what I've seen, like maybe the product innovation has slowed down at the UX level and people are improving the models.[00:11:50] Alessio: So the race is like, how do I make better images? It's not like, how do I make the user interact with the generation process better? And that gets tough, you know? It's hard to like really differentiate yourselves. So yeah, that's kind of how I look at it. And when we think about multimodality, maybe the reason why people got so excited about Sora is like, oh, this is like a completely It's not a better image model.[00:12:13] Alessio: This is like a completely different thing, you know? And I think the creative mind It's always looking for something that impacts the viewer in a different way, you know, like they really want something different versus the developer mind. It's like, Oh, I, I just, I have this like very annoying thing I want better.[00:12:32] Alessio: I have this like very specific use cases that I want to go after. So it's just different. And that's why you see a lot more companies in image generation. But I agree with you that. If you fast forward there, there's not going to be 10 of them, you know, it's probably going to be one or[00:12:46] swyx: two. Yeah, I mean, to me, that's why I call it a war.[00:12:49] swyx: Like, individually, all these companies can make a story that kind of makes sense, but collectively, they cannot all be true. Therefore, they all, there is some kind of fight over limited resources here. Yeah, so[00:12:59] NLW: it's interesting. We wandered very naturally into sort of another one of these wars, which is the multimodality kind of idea, which is, you know, basically a question of whether it's going to be these sort of big everything models that end up winning or whether, you know, you're going to have really specific things, you know, like something, you know, Dolly 3 inside of sort of OpenAI's larger models versus, you know, a mid journey or something like that.[00:13:24] NLW: And at first, you know, I was kind of thinking like, For most of the last, call it six months or whatever, it feels pretty definitively both and in some ways, you know, and that you're, you're seeing just like great innovation on sort of the everything models, but you're also seeing lots and lots happen at sort of the level of kind of individual use cases.[00:13:45] Sora[00:13:45] NLW: But then Sora comes along and just like obliterates what I think anyone thought you know, where we were when it comes to video generation. So how are you guys thinking about this particular battle or war at the moment?[00:13:59] swyx: Yeah, this was definitely a both and story, and Sora tipped things one way for me, in terms of scale being all you need.[00:14:08] swyx: And the benefit, I think, of having multiple models being developed under one roof. I think a lot of people aren't aware that Sora was developed in a similar fashion to Dolly 3. And Dolly3 had a very interesting paper out where they talked about how they sort of bootstrapped their synthetic data based on GPT 4 vision and GPT 4.[00:14:31] swyx: And, and it was just all, like, really interesting, like, if you work on one modality, it enables you to work on other modalities, and all that is more, is, is more interesting. I think it's beneficial if it's all in the same house, whereas the individual startups who don't, who sort of carve out a single modality and work on that, definitely won't have the state of the art stuff on helping them out on synthetic data.[00:14:52] swyx: So I do think like, The balance is tilted a little bit towards the God model companies, which is challenging for the, for the, for the the sort of dedicated modality companies. But everyone's carving out different niches. You know, like we just interviewed Suno ai, the sort of music model company, and, you know, I don't see opening AI pursuing music anytime soon.[00:15:12] Suno[00:15:12] swyx: Yeah,[00:15:13] NLW: Suno's been phenomenal to play with. Suno has done that rare thing where, which I think a number of different AI product categories have done, where people who don't consider themselves particularly interested in doing the thing that the AI enables find themselves doing a lot more of that thing, right?[00:15:29] NLW: Like, it'd be one thing if Just musicians were excited about Suno and using it but what you're seeing is tons of people who just like music all of a sudden like playing around with it and finding themselves kind of down that rabbit hole, which I think is kind of like the highest compliment that you can give one of these startups at the[00:15:45] swyx: early days of it.[00:15:46] swyx: Yeah, I, you know, I, I asked them directly, you know, in the interview about whether they consider themselves mid journey for music. And he had a more sort of nuanced response there, but I think that probably the business model is going to be very similar because he's focused on the B2C element of that. So yeah, I mean, you know, just to, just to tie back to the question about, you know, You know, large multi modality companies versus small dedicated modality companies.[00:16:10] swyx: Yeah, highly recommend people to read the Sora blog posts and then read through to the Dali blog posts because they, they strongly correlated themselves with the same synthetic data bootstrapping methods as Dali. And I think once you make those connections, you're like, oh, like it, it, it is beneficial to have multiple state of the art models in house that all help each other.[00:16:28] swyx: And these, this, that's the one thing that a dedicated modality company cannot do.[00:16:34] The GPT-4 Class Landscape[00:16:34] NLW: So I, I wanna jump, I wanna kind of build off that and, and move into the sort of like updated GPT-4 class landscape. 'cause that's obviously been another big change over the last couple months. But for the sake of completeness, is there anything that's worth touching on with with sort of the quality?[00:16:46] NLW: Quality data or sort of a rag ops wars just in terms of, you know, anything that's changed, I guess, for you fundamentally in the last couple of months about where those things stand.[00:16:55] swyx: So I think we're going to talk about rag for the Gemini and Clouds discussion later. And so maybe briefly discuss the data piece.[00:17:03] Data War: Reddit x Google[00:17:03] swyx: I think maybe the only new thing was this Reddit deal with Google for like a 60 million dollar deal just ahead of their IPO, very conveniently turning Reddit into a AI data company. Also, very, very interestingly, a non exclusive deal, meaning that Reddit can resell that data to someone else. And it probably does become table stakes.[00:17:23] swyx: A lot of people don't know, but a lot of the web text dataset that originally started for GPT 1, 2, and 3 was actually scraped from GitHub. from Reddit at least the sort of vote scores. And I think, I think that's a, that's a very valuable piece of information. So like, yeah, I think people are figuring out how to pay for data.[00:17:40] swyx: People are suing each other over data. This, this, this war is, you know, definitely very, very much heating up. And I don't think, I don't see it getting any less intense. I, you know, next to GPUs, data is going to be the most expensive thing in, in a model stack company. And. You know, a lot of people are resorting to synthetic versions of it, which may or may not be kosher based on how far along or how commercially blessed the, the forms of creating that synthetic data are.[00:18:11] swyx: I don't know if Alessio, you have any other interactions with like Data source companies, but that's my two cents.[00:18:17] Alessio: Yeah yeah, I actually saw Quentin Anthony from Luther. ai at GTC this week. He's also been working on this. I saw Technium. He's also been working on the data side. I think especially in open source, people are like, okay, if everybody is putting the gates up, so to speak, to the data we need to make it easier for people that don't have 50 million a year to get access to good data sets.[00:18:38] Alessio: And Jensen, at his keynote, he did talk about synthetic data a little bit. So I think that's something that we'll definitely hear more and more of in the enterprise, which never bodes well, because then all the, all the people with the data are like, Oh, the enterprises want to pay now? Let me, let me put a pay here stripe link so that they can give me 50 million.[00:18:57] Alessio: But it worked for Reddit. I think the stock is up. 40 percent today after opening. So yeah, I don't know if it's all about the Google deal, but it's obviously Reddit has been one of those companies where, hey, you got all this like great community, but like, how are you going to make money? And like, they try to sell the avatars.[00:19:15] Alessio: I don't know if that it's a great business for them. The, the data part sounds as an investor, you know, the data part sounds a lot more interesting than, than consumer[00:19:25] swyx: cosmetics. Yeah, so I think, you know there's more questions around data you know, I think a lot of people are talking about the interview that Mira Murady did with the Wall Street Journal, where she, like, just basically had no, had no good answer for where they got the data for Sora.[00:19:39] swyx: I, I think this is where, you know, there's, it's in nobody's interest to be transparent about data, and it's, it's kind of sad for the state of ML and the state of AI research but it is what it is. We, we have to figure this out as a society, just like we did for music and music sharing. You know, in, in sort of the Napster to Spotify transition, and that might take us a decade.[00:19:59] swyx: Yeah, I[00:20:00] NLW: do. I, I agree. I think, I think that you're right to identify it, not just as that sort of technical problem, but as one where society has to have a debate with itself. Because I think that there's, if you rationally within it, there's Great kind of points on all side, not to be the sort of, you know, person who sits in the middle constantly, but it's why I think a lot of these legal decisions are going to be really important because, you know, the job of judges is to listen to all this stuff and try to come to things and then have other judges disagree.[00:20:24] NLW: And, you know, and have the rest of us all debate at the same time. By the way, as a total aside, I feel like the synthetic data right now is like eggs in the 80s and 90s. Like, whether they're good for you or bad for you, like, you know, we, we get one study that's like synthetic data, you know, there's model collapse.[00:20:42] NLW: And then we have like a hint that llama, you know, to the most high performance version of it, which was one they didn't release was trained on synthetic data. So maybe it's good. It's like, I just feel like every, every other week I'm seeing something sort of different about whether it's a good or bad for, for these models.[00:20:56] swyx: Yeah. The branding of this is pretty poor. I would kind of tell people to think about it like cholesterol. There's good cholesterol, bad cholesterol. And you can have, you know, good amounts of both. But at this point, it is absolutely without a doubt that most large models from here on out will all be trained as some kind of synthetic data and that is not a bad thing.[00:21:16] swyx: There are ways in which you can do it poorly. Whether it's commercial, you know, in terms of commercial sourcing or in terms of the model performance. But it's without a doubt that good synthetic data is going to help your model. And this is just a question of like where to obtain it and what kinds of synthetic data are valuable.[00:21:36] swyx: You know, if even like alpha geometry, you know, was, was a really good example from like earlier this year.[00:21:42] NLW: If you're using the cholesterol analogy, then my, then my egg thing can't be that far off. Let's talk about the sort of the state of the art and the, and the GPT 4 class landscape and how that's changed.[00:21:53] Gemini 1.5 vs Claude 3[00:21:53] NLW: Cause obviously, you know, sort of the, the two big things or a couple of the big things that have happened. Since we last talked, we're one, you know, Gemini first announcing that a model was coming and then finally it arriving, and then very soon after a sort of a different model arriving from Gemini and and Cloud three.[00:22:11] NLW: So I guess, you know, I'm not sure exactly where the right place to start with this conversation is, but, you know, maybe very broadly speaking which of these do you think have made a bigger impact? Thank you.[00:22:20] Alessio: Probably the one you can use, right? So, Cloud. Well, I'm sure Gemini is going to be great once they let me in, but so far I haven't been able to.[00:22:29] Alessio: I use, so I have this small podcaster thing that I built for our podcast, which does chapters creation, like named entity recognition, summarization, and all of that. Cloud Tree is, Better than GPT 4. Cloud2 was unusable. So I use GPT 4 for everything. And then when Opus came out, I tried them again side by side and I posted it on, on Twitter as well.[00:22:53] Alessio: Cloud is better. It's very good, you know, it's much better, it seems to me, it's much better than GPT 4 at doing writing that is more, you know, I don't know, it just got good vibes, you know, like the GPT 4 text, you can tell it's like GPT 4, you know, it's like, it always uses certain types of words and phrases and, you know, maybe it's just me because I've now done it for, you know, So, I've read like 75, 80 generations of these things next to each other.[00:23:21] Alessio: Clutter is really good. I know everybody is freaking out on twitter about it, my only experience of this is much better has been on the podcast use case. But I know that, you know, Quran from from News Research is a very big opus pro, pro opus person. So, I think that's also It's great to have people that actually care about other models.[00:23:40] Alessio: You know, I think so far to a lot of people, maybe Entropic has been the sibling in the corner, you know, it's like Cloud releases a new model and then OpenAI releases Sora and like, you know, there are like all these different things, but yeah, the new models are good. It's interesting.[00:23:55] NLW: My my perception is definitely that just, just observationally, Cloud 3 is certainly the first thing that I've seen where lots of people.[00:24:06] NLW: They're, no one's debating evals or anything like that. They're talking about the specific use cases that they have, that they used to use chat GPT for every day, you know, day in, day out, that they've now just switched over. And that has, I think, shifted a lot of the sort of like vibe and sentiment in the space too.[00:24:26] NLW: And I don't necessarily think that it's sort of a A like full you know, sort of full knock. Let's put it this way. I think it's less bad for open AI than it is good for anthropic. I think that because GPT 5 isn't there, people are not quite willing to sort of like, you know get overly critical of, of open AI, except in so far as they're wondering where GPT 5 is.[00:24:46] NLW: But I do think that it makes, Anthropic look way more credible as a, as a, as a player, as a, you know, as a credible sort of player, you know, as opposed to to, to where they were.[00:24:57] Alessio: Yeah. And I would say the benchmarks veil is probably getting lifted this year. I think last year. People were like, okay, this is better than this on this benchmark, blah, blah, blah, because maybe they did not have a lot of use cases that they did frequently.[00:25:11] Alessio: So it's hard to like compare yourself. So you, you defer to the benchmarks. I think now as we go into 2024, a lot of people have started to use these models from, you know, from very sophisticated things that they run in production to some utility that they have on their own. Now they can just run them side by side.[00:25:29] Alessio: And it's like, Hey, I don't care that like. The MMLU score of Opus is like slightly lower than GPT 4. It just works for me, you know, and I think that's the same way that traditional software has been used by people, right? Like you just strive for yourself and like, which one does it work, works best for you?[00:25:48] Alessio: Like nobody looks at benchmarks outside of like sales white papers, you know? And I think it's great that we're going more in that direction. We have a episode with Adapt coming out this weekend. I'll and some of their model releases, they specifically say, We do not care about benchmarks, so we didn't put them in, you know, because we, we don't want to look good on them.[00:26:06] Alessio: We just want the product to work. And I think more and more people will, will[00:26:09] swyx: go that way. Yeah. I I would say like, it does take the wind out of the sails for GPT 5, which I know where, you know, Curious about later on. I think anytime you put out a new state of the art model, you have to break through in some way.[00:26:21] swyx: And what Claude and Gemini have done is effectively take away any advantage to saying that you have a million token context window. Now everyone's just going to be like, Oh, okay. Now you just match the other two guys. And so that puts An insane amount of pressure on what gpt5 is going to be because it's just going to have like the only option it has now because all the other models are multimodal all the other models are long context all the other models have perfect recall gpt5 has to match everything and do more to to not be a flop[00:26:58] AI Breakdown Part 2[00:26:58] NLW: hello friends back again with part two if you haven't heard part one of this conversation i suggest you go check it out but to be honest they are kind of actually separable In this conversation, we get into a topic that I think Alessio and Swyx are very well positioned to discuss, which is what developers care about right now, what people are trying to build around.[00:27:16] NLW: I honestly think that one of the best ways to see the future in an industry like AI is to try to dig deep on what developers and entrepreneurs are attracted to build, even if it hasn't made it to the news pages yet. So consider this your preview of six months from now, and let's dive in. Let's bring it to the GPT 5 conversation.[00:27:33] Next Frontiers: Llama 3, GPT-5, Gemini 2, Claude 4[00:27:33] NLW: I mean, so, so I think that that's a great sort of assessment of just how the stakes have been raised, you know is your, I mean, so I guess maybe, maybe I'll, I'll frame this less as a question, just sort of something that, that I, that I've been watching right now, the only thing that makes sense to me with how.[00:27:50] NLW: Fundamentally unbothered and unstressed OpenAI seems about everything is that they're sitting on something that does meet all that criteria, right? Because, I mean, even in the Lex Friedman interview that, that Altman recently did, you know, he's talking about other things coming out first. He's talking about, he's just like, he, listen, he, he's good and he could play nonchalant, you know, if he wanted to.[00:28:13] NLW: So I don't want to read too much into it, but. You know, they've had so long to work on this, like unless that we are like really meaningfully running up against some constraint, it just feels like, you know, there's going to be some massive increase, but I don't know. What do you guys think?[00:28:28] swyx: Hard to speculate.[00:28:29] swyx: You know, at this point, they're, they're pretty good at PR and they're not going to tell you anything that they don't want to. And he can tell you one thing and change their minds the next day. So it's, it's, it's really, you know, I've always said that model version numbers are just marketing exercises, like they have something and it's always improving and at some point you just cut it and decide to call it GPT 5.[00:28:50] swyx: And it's more just about defining an arbitrary level at which they're ready and it's up to them on what ready means. We definitely did see some leaks on GPT 4. 5, as I think a lot of people reported and I'm not sure if you covered it. So it seems like there might be an intermediate release. But I did feel, coming out of the Lex Friedman interview, that GPT 5 was nowhere near.[00:29:11] swyx: And you know, it was kind of a sharp contrast to Sam talking at Davos in February, saying that, you know, it was his top priority. So I find it hard to square. And honestly, like, there's also no point Reading too much tea leaves into what any one person says about something that hasn't happened yet or has a decision that hasn't been taken yet.[00:29:31] swyx: Yeah, that's, that's my 2 cents about it. Like, calm down, let's just build .[00:29:35] Alessio: Yeah. The, the February rumor was that they were gonna work on AI agents, so I don't know, maybe they're like, yeah,[00:29:41] swyx: they had two agent two, I think two agent projects, right? One desktop agent and one sort of more general yeah, sort of GPTs like agent and then Andre left, so he was supposed to be the guy on that.[00:29:52] swyx: What did Andre see? What did he see? I don't know. What did he see?[00:29:56] Alessio: I don't know. But again, it's just like the rumors are always floating around, you know but I think like, this is, you know, we're not going to get to the end of the year without Jupyter you know, that's definitely happening. I think the biggest question is like, are Anthropic and Google.[00:30:13] Alessio: Increasing the pace, you know, like it's the, it's the cloud four coming out like in 12 months, like nine months. What's the, what's the deal? Same with Gemini. They went from like one to 1. 5 in like five days or something. So when's Gemini 2 coming out, you know, is that going to be soon? I don't know.[00:30:31] Alessio: There, there are a lot of, speculations, but the good thing is that now you can see a world in which OpenAI doesn't rule everything. You know, so that, that's the best, that's the best news that everybody got, I would say.[00:30:43] swyx: Yeah, and Mistral Large also dropped in the last month. And, you know, not as, not quite GPT 4 class, but very good from a new startup.[00:30:52] swyx: So yeah, we, we have now slowly changed in landscape, you know. In my January recap, I was complaining that nothing's changed in the landscape for a long time. But now we do exist in a world, sort of a multipolar world where Cloud and Gemini are legitimate challengers to GPT 4 and hopefully more will emerge as well hopefully from meta.[00:31:11] Open Source Models - Mistral, Grok[00:31:11] NLW: So speak, let's actually talk about sort of the open source side of this for a minute. So Mistral Large, notable because it's, it's not available open source in the same way that other things are, although I think my perception is that the community has largely given them Like the community largely recognizes that they want them to keep building open source stuff and they have to find some way to fund themselves that they're going to do that.[00:31:27] NLW: And so they kind of understand that there's like, they got to figure out how to eat, but we've got, so, you know, there there's Mistral, there's, I guess, Grok now, which is, you know, Grok one is from, from October is, is open[00:31:38] swyx: sourced at, yeah. Yeah, sorry, I thought you thought you meant Grok the chip company.[00:31:41] swyx: No, no, no, yeah, you mean Twitter Grok.[00:31:43] NLW: Although Grok the chip company, I think is even more interesting in some ways, but and then there's the, you know, obviously Llama3 is the one that sort of everyone's wondering about too. And, you know, my, my sense of that, the little bit that, you know, Zuckerberg was talking about Llama 3 earlier this year, suggested that, at least from an ambition standpoint, he was not thinking about how do I make sure that, you know, meta content, you know, keeps, keeps the open source thrown, you know, vis a vis Mistral.[00:32:09] NLW: He was thinking about how you go after, you know, how, how he, you know, releases a thing that's, you know, every bit as good as whatever OpenAI is on at that point.[00:32:16] Alessio: Yeah. From what I heard in the hallways at, at GDC, Llama 3, the, the biggest model will be, you 260 to 300 billion parameters, so that that's quite large.[00:32:26] Alessio: That's not an open source model. You know, you cannot give people a 300 billion parameters model and ask them to run it. You know, it's very compute intensive. So I think it is, it[00:32:35] swyx: can be open source. It's just, it's going to be difficult to run, but that's a separate question.[00:32:39] Alessio: It's more like, as you think about what they're doing it for, you know, it's not like empowering the person running.[00:32:45] Alessio: llama. On, on their laptop, it's like, oh, you can actually now use this to go after open AI, to go after Anthropic, to go after some of these companies at like the middle complexity level, so to speak. Yeah. So obviously, you know, we estimate Gentala on the podcast, they're doing a lot here, they're making PyTorch better.[00:33:03] Alessio: You know, they want to, that's kind of like maybe a little bit of a shorted. Adam Bedia, in a way, trying to get some of the CUDA dominance out of it. Yeah, no, it's great. The, I love the duck destroying a lot of monopolies arc. You know, it's, it's been very entertaining. Let's bridge[00:33:18] NLW: into the sort of big tech side of this, because this is obviously like, so I think actually when I did my episode, this was one of the I added this as one of as an additional war that, that's something that I'm paying attention to.[00:33:29] NLW: So we've got Microsoft's moves with inflection, which I think pretend, potentially are being read as A shift vis a vis the relationship with OpenAI, which also the sort of Mistral large relationship seems to reinforce as well. We have Apple potentially entering the race, finally, you know, giving up Project Titan and and, and kind of trying to spend more effort on this.[00:33:50] NLW: Although, Counterpoint, we also have them talking about it, or there being reports of a deal with Google, which, you know, is interesting to sort of see what their strategy there is. And then, you know, Meta's been largely quiet. We kind of just talked about the main piece, but, you know, there's, and then there's spoilers like Elon.[00:34:07] NLW: I mean, you know, what, what of those things has sort of been most interesting to you guys as you think about what's going to shake out for the rest of this[00:34:13] Apple MM1[00:34:13] swyx: year? I'll take a crack. So the reason we don't have a fifth war for the Big Tech Wars is that's one of those things where I just feel like we don't cover differently from other media channels, I guess.[00:34:26] swyx: Sure, yeah. In our anti interestness, we actually say, like, we try not to cover the Big Tech Game of Thrones, or it's proxied through Twitter. You know, all the other four wars anyway, so there's just a lot of overlap. Yeah, I think absolutely, personally, the most interesting one is Apple entering the race.[00:34:41] swyx: They actually released, they announced their first large language model that they trained themselves. It's like a 30 billion multimodal model. People weren't that impressed, but it was like the first time that Apple has kind of showcased that, yeah, we're training large models in house as well. Of course, like, they might be doing this deal with Google.[00:34:57] swyx: I don't know. It sounds very sort of rumor y to me. And it's probably, if it's on device, it's going to be a smaller model. So something like a Jemma. It's going to be smarter autocomplete. I don't know what to say. I'm still here dealing with, like, Siri, which hasn't, probably hasn't been updated since God knows when it was introduced.[00:35:16] swyx: It's horrible. I, you know, it, it, it makes me so angry. So I, I, one, as an Apple customer and user, I, I'm just hoping for better AI on Apple itself. But two, they are the gold standard when it comes to local devices, personal compute and, and trust, like you, you trust them with your data. And. I think that's what a lot of people are looking for in AI, that they have, they love the benefits of AI, they don't love the downsides, which is that you have to send all your data to some cloud somewhere.[00:35:45] swyx: And some of this data that we're going to feed AI is just the most personal data there is. So Apple being like one of the most trusted personal data companies, I think it's very important that they enter the AI race, and I hope to see more out of them.[00:35:58] Alessio: To me, the, the biggest question with the Google deal is like, who's paying who?[00:36:03] Alessio: Because for the browsers, Google pays Apple like 18, 20 billion every year to be the default browser. Is Google going to pay you to have Gemini or is Apple paying Google to have Gemini? I think that's, that's like what I'm most interested to figure out because with the browsers, it's like, it's the entry point to the thing.[00:36:21] Alessio: So it's really valuable to be the default. That's why Google pays. But I wonder if like the perception in AI is going to be like, Hey. You just have to have a good local model on my phone to be worth me purchasing your device. And that was, that's kind of drive Apple to be the one buying the model. But then, like Shawn said, they're doing the MM1 themselves.[00:36:40] Alessio: So are they saying we do models, but they're not as good as the Google ones? I don't know. The whole thing is, it's really confusing, but. It makes for great meme material on on Twitter.[00:36:51] swyx: Yeah, I mean, I think, like, they are possibly more than OpenAI and Microsoft and Amazon. They are the most full stack company there is in computing, and so, like, they own the chips, man.[00:37:05] swyx: Like, they manufacture everything so if, if, if there was a company that could do that. You know, seriously challenge the other AI players. It would be Apple. And it's, I don't think it's as hard as self driving. So like maybe they've, they've just been investing in the wrong thing this whole time. We'll see.[00:37:21] swyx: Wall Street certainly thinks[00:37:22] NLW: so. Wall Street loved that move, man. There's a big, a big sigh of relief. Well, let's, let's move away from, from sort of the big stuff. I mean, the, I think to both of your points, it's going to.[00:37:33] Meta's $800b AI rebrand[00:37:33] NLW: Can I, can[00:37:34] swyx: I, can I, can I jump on factoid about this, this Wall Street thing? I went and looked at when Meta went from being a VR company to an AI company.[00:37:44] swyx: And I think the stock I'm trying to look up the details now. The stock has gone up 187% since Lamo one. Yeah. Which is $830 billion in market value created in the past year. . Yeah. Yeah.[00:37:57] NLW: It's, it's, it's like, remember if you guys haven't Yeah. If you haven't seen the chart, it's actually like remarkable.[00:38:02] NLW: If you draw a little[00:38:03] swyx: arrow on it, it's like, no, we're an AI company now and forget the VR thing.[00:38:10] NLW: It's it, it is an interesting, no, it's, I, I think, alessio, you called it sort of like Zuck's Disruptor Arc or whatever. He, he really does. He is in the midst of a, of a total, you know, I don't know if it's a redemption arc or it's just, it's something different where, you know, he, he's sort of the spoiler.[00:38:25] NLW: Like people loved him just freestyle talking about why he thought they had a better headset than Apple. But even if they didn't agree, they just loved it. He was going direct to camera and talking about it for, you know, five minutes or whatever. So that, that's a fascinating shift that I don't think anyone had on their bingo card, you know, whatever, two years ago.[00:38:41] NLW: Yeah. Yeah,[00:38:42] swyx: we still[00:38:43] Alessio: didn't see and fight Elon though, so[00:38:45] swyx: that's what I'm really looking forward to. I mean, hey, don't, don't, don't write it off, you know, maybe just these things take a while to happen. But we need to see and fight in the Coliseum. No, I think you know, in terms of like self management, life leadership, I think he has, there's a lot of lessons to learn from him.[00:38:59] swyx: You know he might, you know, you might kind of quibble with, like, the social impact of Facebook, but just himself as a in terms of personal growth and, and, you know, Per perseverance through like a lot of change and you know, everyone throwing stuff his way. I think there's a lot to say about like, to learn from, from Zuck, which is crazy 'cause he's my age.[00:39:18] swyx: Yeah. Right.[00:39:20] AI Engineer landscape - from baby AGIs to vertical Agents[00:39:20] NLW: Awesome. Well, so, so one of the big things that I think you guys have, you know, distinct and, and unique insight into being where you are and what you work on is. You know, what developers are getting really excited about right now. And by that, I mean, on the one hand, certainly, you know, like startups who are actually kind of formalized and formed to startups, but also, you know, just in terms of like what people are spending their nights and weekends on what they're, you know, coming to hackathons to do.[00:39:45] NLW: And, you know, I think it's a, it's a, it's, it's such a fascinating indicator for, for where things are headed. Like if you zoom back a year, right now was right when everyone was getting so, so excited about. AI agent stuff, right? Auto, GPT and baby a GI. And these things were like, if you dropped anything on YouTube about those, like instantly tens of thousands of views.[00:40:07] NLW: I know because I had like a 50,000 view video, like the second day that I was doing the show on YouTube, you know, because I was talking about auto GPT. And so anyways, you know, obviously that's sort of not totally come to fruition yet, but what are some of the trends in what you guys are seeing in terms of people's, people's interest and, and, and what people are building?[00:40:24] Alessio: I can start maybe with the agents part and then I know Shawn is doing a diffusion meetup tonight. There's a lot of, a lot of different things. The, the agent wave has been the most interesting kind of like dream to reality arc. So out of GPT, I think they went, From zero to like 125, 000 GitHub stars in six weeks, and then one year later, they have 150, 000 stars.[00:40:49] Alessio: So there's kind of been a big plateau. I mean, you might say there are just not that many people that can start it. You know, everybody already started it. But the promise of, hey, I'll just give you a goal, and you do it. I think it's like, amazing to get people's imagination going. You know, they're like, oh, wow, this This is awesome.[00:41:08] Alessio: Everybody, everybody can try this to do anything. But then as technologists, you're like, well, that's, that's just like not possible, you know, we would have like solved everything. And I think it takes a little bit to go from the promise and the hope that people show you to then try it yourself and going back to say, okay, this is not really working for me.[00:41:28] Alessio: And David Wong from Adept, you know, they in our episode, he specifically said. We don't want to do a bottom up product. You know, we don't want something that everybody can just use and try because it's really hard to get it to be reliable. So we're seeing a lot of companies doing vertical agents that are narrow for a specific domain, and they're very good at something.[00:41:49] Alessio: Mike Conover, who was at Databricks before, is also a friend of Latentspace. He's doing this new company called BrightWave doing AI agents for financial research, and that's it, you know, and they're doing very well. There are other companies doing it in security, doing it in compliance, doing it in legal.[00:42:08] Alessio: All of these things that like, people, nobody just wakes up and say, Oh, I cannot wait to go on AutoGPD and ask it to do a compliance review of my thing. You know, just not what inspires people. So I think the gap on the developer side has been the more bottom sub hacker mentality is trying to build this like very Generic agents that can do a lot of open ended tasks.[00:42:30] Alessio: And then the more business side of things is like, Hey, If I want to raise my next round, I can not just like sit around the mess, mess around with like super generic stuff. I need to find a use case that really works. And I think that that is worth for, for a lot of folks in parallel, you have a lot of companies doing evals.[00:42:47] Alessio: There are dozens of them that just want to help you measure how good your models are doing. Again, if you build evals, you need to also have a restrained surface area to actually figure out whether or not it's good, right? Because you cannot eval anything on everything under the sun. So that's another category where I've seen from the startup pitches that I've seen, there's a lot of interest in, in the enterprise.[00:43:11] Alessio: It's just like really. Fragmented because the production use cases are just coming like now, you know, there are not a lot of long established ones to, to test against. And so does it, that's kind of on the virtual agents and then the robotic side it's probably been the thing that surprised me the most at NVIDIA GTC, the amount of robots that were there that were just like robots everywhere.[00:43:33] Alessio: Like, both in the keynote and then on the show floor, you would have Boston Dynamics dogs running around. There was, like, this, like fox robot that had, like, a virtual face that, like, talked to you and, like, moved in real time. There were industrial robots. NVIDIA did a big push on their own Omniverse thing, which is, like, this Digital twin of whatever environments you're in that you can use to train the robots agents.[00:43:57] Alessio: So that kind of takes people back to the reinforcement learning days, but yeah, agents, people want them, you know, people want them. I give a talk about the, the rise of the full stack employees and kind of this future, the same way full stack engineers kind of work across the stack. In the future, every employee is going to interact with every part of the organization through agents and AI enabled tooling.[00:44:17] Alessio: This is happening. It just needs to be a lot more narrow than maybe the first approach that we took, which is just put a string in AutoGPT and pray. But yeah, there's a lot of super interesting stuff going on.[00:44:27] swyx: Yeah. Well, he Let's recover a lot of stuff there. I'll separate the robotics piece because I feel like that's so different from the software world.[00:44:34] swyx: But yeah, we do talk to a lot of engineers and you know, that this is our sort of bread and butter. And I do agree that vertical agents have worked out a lot better than the horizontal ones. I think all You know, the point I'll make here is just the reason AutoGPT and maybe AGI, you know, it's in the name, like they were promising AGI.[00:44:53] swyx: But I think people are discovering that you cannot engineer your way to AGI. It has to be done at the model level and all these engineering, prompt engineering hacks on top of it weren't really going to get us there in a meaningful way without much further, you know, improvements in the models. I would say, I'll go so far as to say, even Devin, which is, I would, I think the most advanced agent that we've ever seen, still requires a lot of engineering and still probably falls apart a lot in terms of, like, practical usage.[00:45:22] swyx: Or it's just, Way too slow and expensive for, you know, what it's, what it's promised compared to the video. So yeah, that's, that's what, that's what happened with agents from, from last year. But I, I do, I do see, like, vertical agents being very popular and, and sometimes you, like, I think the word agent might even be overused sometimes.[00:45:38] swyx: Like, people don't really care whether or not you call it an AI agent, right? Like, does it replace boring menial tasks that I do That I might hire a human to do, or that the human who is hired to do it, like, actually doesn't really want to do. And I think there's absolutely ways in sort of a vertical context that you can actually go after very routine tasks that can be scaled out to a lot of, you know, AI assistants.[00:46:01] swyx: So, so yeah, I mean, and I would, I would sort of basically plus one what let's just sit there. I think it's, it's very, very promising and I think more people should work on it, not less. Like there's not enough people. Like, we, like, this should be the, the, the main thrust of the AI engineer is to look out, look for use cases and, and go to a production with them instead of just always working on some AGI promising thing that never arrives.[00:46:21] swyx: I,[00:46:22] NLW: I, I can only add that so I've been fiercely making tutorials behind the scenes around basically everything you can imagine with AI. We've probably done, we've done about 300 tutorials over the last couple of months. And the verticalized anything, right, like this is a solution for your particular job or role, even if it's way less interesting or kind of sexy, it's like so radically more useful to people in terms of intersecting with how, like those are the ways that people are actually.[00:46:50] NLW: Adopting AI in a lot of cases is just a, a, a thing that I do over and over again. By the way, I think that's the same way that even the generalized models are getting adopted. You know, it's like, I use midjourney for lots of stuff, but the main thing I use it for is YouTube thumbnails every day. Like day in, day out, I will always do a YouTube thumbnail, you know, or two with, with Midjourney, right?[00:47:09] NLW: And it's like you can, you can start to extrapolate that across a lot of things and all of a sudden, you know, a AI doesn't. It looks revolutionary because of a million small changes rather than one sort of big dramatic change. And I think that the verticalization of agents is sort of a great example of how that's[00:47:26] swyx: going to play out too.[00:47:28] Adept episode - Screen Multimodality[00:47:28] swyx: So I'll have one caveat here, which is I think that Because multi modal models are now commonplace, like Cloud, Gemini, OpenAI, all very very easily multi modal, Apple's easily multi modal, all this stuff. There is a switch for agents for sort of general desktop browsing that I think people so much for joining us today, and we'll see you in the next video.[00:48:04] swyx: Version of the the agent where they're not specifically taking in text or anything They're just watching your screen just like someone else would and and I'm piloting it by vision And you know in the the episode with David that we'll have dropped by the time that this this airs I think I think that is the promise of adept and that is a promise of what a lot of these sort of desktop agents Are and that is the more general purpose system That could be as big as the browser, the operating system, like, people really want to build that foundational piece of software in AI.[00:48:38] swyx: And I would see, like, the potential there for desktop agents being that, that you can have sort of self driving computers. You know, don't write the horizontal piece out. I just think we took a while to get there.[00:48:48] NLW: What else are you guys seeing that's interesting to you? I'm looking at your notes and I see a ton of categories.[00:48:54] Top Model Research from January Recap[00:48:54] swyx: Yeah so I'll take the next two as like as one category, which is basically alternative architectures, right? The two main things that everyone following AI kind of knows now is, one, the diffusion architecture, and two, the let's just say the, Decoder only transformer architecture that is popularized by GPT.[00:49:12] swyx: You can read, you can look on YouTube for thousands and thousands of tutorials on each of those things. What we are talking about here is what's next, what people are researching, and what could be on the horizon that takes the place of those other two things. So first of all, we'll talk about transformer architectures and then diffusion.[00:49:25] swyx: So transformers the, the two leading candidates are effectively RWKV and the state space models the most recent one of which is Mamba, but there's others like the Stripe, ENA, and the S four H three stuff coming out of hazy research at Stanford. And all of those are non quadratic language models that scale the promise to scale a lot better than the, the traditional transformer.[00:49:47] swyx: That this might be too theoretical for most people right now, but it's, it's gonna be. It's gonna come out in weird ways, where, imagine if like, Right now the talk of the town is that Claude and Gemini have a million tokens of context and like whoa You can put in like, you know, two hours of video now, okay But like what if you put what if we could like throw in, you know, two hundred thousand hours of video?[00:50:09] swyx: Like how does that change your usage of AI? What if you could throw in the entire genetic sequence of a human and like synthesize new drugs. Like, well, how does that change things? Like, we don't know because we haven't had access to this capability being so cheap before. And that's the ultimate promise of these two models.[00:50:28] swyx: They're not there yet but we're seeing very, very good progress. RWKV and Mamba are probably the, like, the two leading examples, both of which are open source that you can try them today and and have a lot of progress there. And the, the, the main thing I'll highlight for audio e KV is that at, at the seven B level, they seem to have beat LAMA two in all benchmarks that matter at the same size for the same amount of training as an open source model.[00:50:51] swyx: So that's exciting. You know, they're there, they're seven B now. They're not at seven tb. We don't know if it'll. And then the other thing is diffusion. Diffusions and transformers are are kind of on the collision course. The original stable diffusion already used transformers in in parts of its architecture.[00:51:06] swyx: It seems that transformers are eating more and more of those layers particularly the sort of VAE layer. So that's, the Diffusion Transformer is what Sora is built on. The guy who wrote the Diffusion Transformer paper, Bill Pebbles, is, Bill Pebbles is the lead tech guy on Sora. So you'll just see a lot more Diffusion Transformer stuff going on.[00:51:25] swyx: But there's, there's more sort of experimentation with diffusion. I'm holding a meetup actually here in San Francisco that's gonna be like the state of diffusion, which I'm pretty excited about. Stability's doing a lot of good work. And if you look at the, the architecture of how they're creating Stable Diffusion 3, Hourglass Diffusion, and the inconsistency models, or SDXL Turbo.[00:51:45] swyx: All of these are, like, very, very interesting innovations on, like, the original idea of what Stable Diffusion was. So if you think that it is expensive to create or slow to create Stable Diffusion or an AI generated art, you are not up to date with the latest models. If you think it is hard to create text and images, you are not up to date with the latest models.[00:52:02] swyx: And people still are kind of far behind. The last piece of which is the wildcard I always kind of hold out, which is text diffusion. So Instead of using autogenerative or autoregressive transformers, can you use text to diffuse? So you can use diffusion models to diffuse and create entire chunks of text all at once instead of token by token.[00:52:22] swyx: And that is something that Midjourney confirmed today, because it was only rumored the past few months. But they confirmed today that they were looking into. So all those things are like very exciting new model architectures that are, Maybe something that we'll, you'll see in production two to three years from now.[00:52:37] swyx: So the couple of the trends[00:52:38] NLW: that I want to just get your takes on, because they're sort of something that, that seems like they're coming up are one sort of these, these wearable, you know, kind of passive AI experiences where they're absorbing a lot of what's going on around you and then, and then kind of bringing things back.[00:52:53] NLW: And then the, the other one that I, that I wanted to see if you guys had thoughts on were sort of this next generation of chip companies. Obviously there's a huge amount of emphasis. On on hardware and silicon and, and, and different ways of doing things, but, you know, love your take on, on either or both of[00:53:07] swyx: those.[00:53:08] AI Wearables[00:53:08] swyx: So for so wearables, I'm very excited about it. I want wearables on me at all times. I have two right here. To, to quantify my health. And I, you know, I'm all for them. But society is not ready for wearables, right? Like, no one's comfortable with a device on recording every single conversation we have.[00:53:24] swyx: Even all three of us here as podcasters, we don't record everything that we say. And I think there's a social shift that needs to happen. I am an investor in TAB. They are renaming to a broader vision, but they are one of the three or four leading wearables in this space. It's sort of the AI pendants, or AI OS, or AI personal companion space.[00:53:47] swyx: I have seen two humanes in the wild in San Francisco. I'm very, very excited to report that there are people walking around with those things on their chest and it is as goofy as it sounds. It, it absolutely is going to fail. God bless them for trying. And I've also bought a rabbit. So I'm, I'm very excited for all those things to arrive.[00:54:06] swyx: But yeah people are very keen on hardware. I think the, the, the idea that you can have physical objects that. Embody an AI that do specific things for you is as old as, you know, the sort of Golem in sort of medieval times in terms of like how much we want our objects to be smart and do things for us.[00:54:27] swyx: And I think it's absolutely a great play. The funny thing is people are much more willing to pay you upfront for a hardware device than they are willing to pay like an 8 a month subscription recurring for software, right? And so the interesting economics of these wearable companies is they have negative float.[00:54:47] swyx: In the sense that people pay deposits upfront, like I paid like, I don't know, 200 bucks for the rabbit. Upfront, and I don't get it for another six months. I paid 600 for the tab, and I don't get it for another six months. And, and then, then they can take that money and, and sort of invest it in like their next, the next events or their next properties or ventures.[00:55:06] swyx: And like, I think that's a, that's a very interesting reversal of economics from other types of AI companies that I see. And I think, yeah, just the, the, the tactile feel of an AI, I think is very promising. I, Alex, I don't know if you have other thoughts on, on the wearable stuff.[00:55:21] Alessio: The open interpreter just announced their product four hours ago.[00:55:25] Alessio: Yeah. Which is a, it's not really a wearable, but it's a, it's still like a physical device.[00:55:30] swyx: It's a push to talk mic to, to a device on your, on your laptop. Right. It's a $99 push talk. Yeah.[00:55:38] Alessio: But, but, but everybody, but again, going back to your point, it's like people want to, people are interested in spending money for like things that they can hold, you know, I don't know what that means overall for like where things are going, but making more of this AI be a physical part of your life.[00:55:54] Alessio: I think people are interested in that, but I agree with Shawn. I mean, I've been. I talked to Avi about this, but Avi's point is like, most consumers, like, care about utility more than they care about privacy, you know, like you've seen with social media. But I also think there's a big societal reaction to AI that is, like, much more rooted than the social media one.[00:56:16] Alessio: But we'll see. But a lot, again, a lot of work, a lot of developers, a lot of money going into it. So there's, there's bound to be experiments being run. On, on the[00:56:25] swyx: chip side. Sorry, I'll just ship it one more thing and then we transition to the chips. The thing I'll caution people on is don't overly focus on the form factor.[00:56:33] swyx: The form factor is a delivery mode. There will be many form factors. It doesn't matter so much as where in the data war does it sit. It actually is context acquisition. Because, and maybe a little bit of multi modality. Context, like, context is king. Like, if you have access to data that no one else has, then you will be able to create AI that no one else can create.[00:56:54] swyx: And so what is the most personal context? It is your everyday conversation. It is as close to mapping your mental train of thought As possible without, you know, physically you writing down notes. So, so that is the promise, the ultimate goal here, which is like, personal context, it's always available on you you know, loading and seeing all that stuff.[00:57:12] swyx: But yeah, that's the, that's the frame I want to give people that the form factors will change and there will be multiple form factors, but it's the software behind that. And in the personal context that you cannot get anywhere else, that'll win.[00:57:24] Alessio: Yeah, so that was wearables.[00:57:26] Groq vs Nvidia month - GPU Chip War[00:57:26] Alessio: On the chip side, yeah, Grok was probably the biggest release.[00:57:29] Alessio: Jonathan, well, it's not even a new release because the company, I think, was started in 2016. So it's actually quite old. But now recently captured the people's imagination with their MixedREL 500 tokens a second demo. Yeah, I think so far the battle on the GPU side has been Either you go kind of like massive chip, like the Cerebros of the world, where one chip from Cerebros is about two million dollars, you know, that's compared, obviously, you cannot compare one chip versus one chip, but h100 is like 40, 000, something like that the problem with those architectures has been They want to be very general, you know, but like they wanted to put a lot of the RAM, the SRAM on the chip.[00:58:13] Alessio: It's much more convenient when you're using larger language models, but the models outpace the size of the chips and chips have a much longer, you know, turnaround cycle. Grok today. It's great for the current architecture. It's a lot more expensive also, as far as dollar per flop but their idea is like, hey, when you have very high concurrency, we actually were much cheaper, you know, you shouldn't just be looking at the compute power for most people, this doesn't really matter, you know, like, I think that's like the most the most interesting thing to me is like, We've now gone back with, with AI to a world where developers care about what hardware is running, which was not the case in traditional software for like, maybe 20 years since as the cloud has gotten really big.[00:58:57] Alessio: My, my thinking is that in the next two, three years, like we're going to go back to that. We're like, people are not going to be sweating. Oh, what GPU do you have in your cloud? What do you have? It's like. Yeah, you want to run this model, we can run it at the same speed as everybody else, and then everybody will make different choices, whether they want to have higher front end capital investment, and then better utilization, some people would rather do lower investment before, and then upgrade later, there are a lot of parameters and then there's the dark horses, right, that is some of the smaller companies like Lemurian Labs, MedEx that are working on maybe not a chip alone, but also like some of the, the actual math infrastructure and the instructions on it that make them run.[00:59:40] Alessio: There's a lot going on, but yeah, I think the, the episode with with Dylan will be interesting for, for people, but I think we also came out of it saying, Hey, everybody has pros and cons. There's no, it's different than the models where you're like, Oh, this one is definitely better for me. And I'm going to use it.[00:59:56] Alessio: I think for most people. It's like fun Twitter memeing, you know, but it's like 99 percent of people that tweet about this stuff are never gonna buy any of these chips anyway. It's, it's really more for entertainment.[01:00:10] swyx: No. Wow. I mean, like, this is serious business here, right? You're talking about, you know, like who, like the potential new Nvidia, if anyone can take like 1% of NVIDIA's business, they're a serious startup that you should look at.[01:00:20] swyx: Right? So , that's, that's, that's my, well, yeah,[01:00:23] Alessio: yeah. On matters. Well, I'm more talking about like, what, how should people think about it? You know? It's like, yeah. I think like the, the end user is not impacted as much.[01:00:31] Disagreements[01:00:31] Alessio: This is obviously, so[01:00:32] swyx: I disagree. Yeah, I love disagreements because, you know, who likes a podcast where all three people always agree with each other?[01:00:38] swyx: You will see the impact of this in the tokens per second over time. This year, I have very, very credible sources all telling me that the average tokens per second, right now, we have somewhere between 50 to 100 as like the norm for people. Average tokens per second will go to 500 to 2, 000. This year from, from a number of chip suppliers that I cannot name.[01:00:58] swyx: So like that is, that is, that will cause a step change in the use cases. Every time you have an order of magnitude improvement in the, in the speed of something, you unlock new use cases that become fun instead of a chore. And so that's what I would caution this audience to think about, which is like, what can you do in much higher AI speed?[01:01:17] swyx: It's not just things streaming out faster. It is things working in the background a lot more seamlessly and therefore being a lot more useful. Then previously imagined. So that would be my two cents on.[01:01:30] Alessio: Yeah. Yeah. I mean, the, the new NVIDIA chips are also much faster. To me, that's true. When it comes to startups, it's like, are the startups pushing the performance on the incumbents or are the incumbents still leading?[01:01:44] Alessio: And then the startups are like riding the same wave, you know? I don't have yet a good sense of that. It's like, you know, it's next year's NVIDIA release. Just gonna be better than everything that gets released this year, you know, if that's the case, it's like, okay, damn Jensen, you know, it's like the meme.[01:02:00] Alessio: It's like, I'm gonna fight. I'm gonna fight NVIDIA. It's like, damn, Jensen got hands. He really does.[01:02:08] Summer 2024 Predictions[01:02:08] NLW: Well, awesome conversation, guys. I guess just just by way of wrapping up, I call it over the next three months between now and sort of the beginning of summer was one prediction that each of you has. It can be about anything. It can be a big company. It can be a startup. It can be something you have privileged information that you know, and you just won't tell us that you actually[01:02:25] Alessio: know.[01:02:26] Alessio: What, does it have to be something that we think it's going to be true or like something that we think? Because for me, it's like, is Sundar going to be the CEO of Google? Maybe not in three months, maybe in like six months, nine months, you know, people are like, Oh, maybe Demis is going to be the new CEO.[01:02:41] Alessio: That was kind of like, I, I was busy like fishing some deep mind people and Google people for like a good guest for the pod. And I was like, Oh, what about. Jeff Dean, and they're like, well, Demis is really like the person that runs everything anyway, and the stuff. It's like interesting. And[01:02:57] swyx: so I don't know.[01:02:58] swyx: What about Sergei? Sergei Sergei could come back. I don't know. Like he's making more appearances these days.[01:03:03] Alessio: Yeah. I don't, I I Then we can just put it as like, you know. Yeah. My, my thing is like CEO change potential, but I, again, three months is too short to make a prediction. Yeah. I[01:03:16] NLW: think that's the, that's that's fine.[01:03:18] NLW: The, the timescale might be off.[01:03:22] swyx: Yeah. I mean for me, I, I think the. Progression in vertical agent companies will keep going. We just had, the other day, Klarna talking about how they replaced like 700 of their customer support agents with the AI agents. That's just the beginning, guys. Like, imagine this rolling out across most of the Fortune 500.[01:03:43] swyx: This is, and I'm not saying this is like a utopian scenario, there will be very, very embarrassing and bad outcomes of this, where like, humans would never make this mistake, but AIs did, and like, we'll all laugh at it, or we'll be very offended by whatever, you know, bad outcome it did. So we have to be responsible and careful in the rollout, but yeah, this is, it's rolling out, you know, Alessio likes to say that this year's the year of AI in production.[01:04:04] swyx: Let's see it, let's, let's see all these sort of vertical, full stack employees. Come out into the workforce. Love[01:04:11] Alessio: it.[01:04:11] NLW: All right, guys. Well, thank you so much for for sharing your your thoughts and insights here And I can't wait to do it again[01:04:18] Thursday Nights in AI - swyx[01:04:18] NLW: Welcome[01:04:19] swyx: back again. It's Charlie your AI co host We're now in part two of the special weekend episode collating some of SWIX and Alessio's recent appearances If you're not active in the Latentspace Discord, you might not be aware of the many, many, many in person.[01:04:36] swyx: Events we host gathering our listener community all over the world. You can see the Latentspace community page for how to join and subscribe to our event calendar for future meetups. We're going to share some of our recent live appearances in this next part, starting with the Thursday nights in AI meetup, a regular fixture in the SF AI scene run by Imbue and Outset Capital.[01:04:59] swyx: Primarily, our former guest, Kanjin Q, Ali Rhoda, and Josh Albrecht. Here's Swyx.[01:05:08] swyx: Today, for those of you who have been here before, you know the general format. So we'll do a quick fireside Q& A with Swyx. Swyx, where we're asking him the questions. Then we'll actually go to our rapid fire Q& A, where we're asking really fast, hopefully, spicy questions. And then we'll open it up to the audience for your questions.[01:05:25] swyx: So you guys sneak around the room, submit your questions, and we'll go through as many of them as possible during that period. And then actually, Swyx brought a gift for us, which is two Latentspace t shirts. AI Engineer. AI Engineer t shirts. And those will be awarded to the Two spiciest question askers.[01:05:44] swyx: So and I'll let Josh decide on that. So if we want to get your spiciest takes, please send them in during the event as we're talking and then also at the end. All right. With that, let's get going.[01:05:57] NLW: Okay. Welcome, Swyx. Thank you for that[01:06:01] swyx: intro.[01:06:01] NLW: How does it[01:06:01] swyx: feel to be interviewed[01:06:03] NLW: rather than the interviewer?[01:06:04] swyx: Weird. I don't know what to do in this chair. Yeah. Like,[01:06:07] NLW: where should I put my hands? Yeah, exactly. You look good.[01:06:10] swyx: You look good. And I also love asking follow up questions. And I tend to, like, sort of take over panels a lot. If you ever see me on a panel, I tend to ask the other panelists questions.[01:06:18] swyx: Okay.[01:06:19] NLW: So we should be ready is what you're saying. So you back.[01:06:21] swyx: That's fine. This is like a free MBU interview, so why not? That's right. That's right. That's[01:06:24] NLW: right.[01:06:25] swyx: Yeah, so you interviewed Ken Jeon, the CEO you didn't interview Josh, right? No, no. So maybe tonight. Yeah. Okay. We'll see. We'll look for different questions and look for an alignment.[01:06:35] NLW: I love it. All[01:06:36] swyx: right. I just want to hear this story. You know, you've completely exploded LatentSpace and AI Engineer, and I know you also, before all of that, had exploded in popularity for your learning in public movement and your DevTools work. And devrelations work. So, who are you and how did you get here?[01:06:53] swyx: Let's[01:06:53] NLW: start with that.[01:06:54] swyx: Quick story is, I'm Shawn, I'm from Singapore. Swyx is my initials. For those who don't know, A lot of Singaporeans are ethically Chinese, and we have Chinese names and English names. So, it's just it's just my initials. Came to col came to the US for college, and have been here for about 15 years, but most, like half of that was in finance and then the other half was, was in tech.[01:07:13] swyx: And the, and tech is where I was most known just because I realized that I was much more aligned towards learning in public, whereas in finance, Everything's a trade secret. Everything is zero sum. Whereas in tech, like, you're allowed to come to meetups and conferences and share your learnings and share your mistakes even.[01:07:31] swyx: And that's totally fine. You, like, open source your code. It's totally fine. And even, even better, you, like, contribute PRs to other people's code, which is even better. And I found that I thrived in that. Learning public environments and that, that kind of got me started. I was an early hire, early Draft Relations hire at Netlify and then did the same at AWS Temporal and Airbyte.[01:07:53] swyx: And then, and so that, that's like the whole story. I can talk, talk more about like developer tooling and developer relations if, if that's something that people are interested in. But I think the, the more recent thing is AI. And I started really being interested in it mostly because It, it, the, the approximate cause of starting Leanspace was stable diffusion.[01:08:10] swyx: When you could run a large model that could do sufficiently enough on your, on your desktop. Where I was like, okay, like, this is, Something qualitatively very different. And that's then we started late in space and you're like, this is something different. We have to talk about it on a podcast.[01:08:25] swyx: There we go. Yeah. It wasn't, it wasn't a podcast for like four months. And then, and then I had been running a discord for dev tools investors. 'cause I, I also invest in dev tools and I advise companies on deaf tools, def things. And I think it was the start of 2023 when Alessio and I were both like, you know, I think we, we need to like get more tokens out of.[01:08:45] swyx: People, and I was running out of original sources to, to write about, so I was like, okay, I'll go get those original sources. And I think that, that's when we started the podcast. And I think it's just the chemistry between us, the, the way we spike in different ways. And also, like, honestly, the kind participation of the guests to give us their time.[01:09:03] swyx: Like, you know, like, getting George Hoss was a big deal. And also shout out to Alessio for just cold emailing him for, for, for booking the, booking some of our biggest guests. And I'm just working really hard to try to tell the story that people can use at work. I think that there's a lot of AI podcasts out there and a lot of AI kind of forums or fireside chats with no fire.[01:09:21] swyx: That always talk about age, like what's your AGI timeline, what's your PDoom. Very, very nice hallway conversations for freshman year but not very useful for work. And like, you know, practically like making money and like And thinking about, you know, changing the everyday lives. I think what's interesting is obviously you care about the existential safety of the human race.[01:09:43] swyx: But in the meantime we gotta eat. So so I think that's like kind of latent space's niche. Like we explicitly don't really talk about AGI. We explicitly don't talk about Things that we're, like, a little bit too far out. Like, we don't do a ton of robotics. We don't do a ton of, like, high frequency trading.[01:10:00] swyx: There's tons of machine learning in there, but we just don't do that. Because, like, we're like, all right, what are most software engineers gonna, gonna need? Because that's our background, and that's the audience that we serve. And I think just, like, being really clear on that audience has been, has resonated with people.[01:10:12] swyx: Yeah, you would never expect a technical podcast to reach, like, a general audience, like, Top ten on the tech charts but I, you know, I've been surprised by that before and it's been successful. I don't know, I don't know what to say about that. I think honestly, I, I kind of have this like negative reaction towards being, being, being, being, being classified as a podcast because the podcast is downstream of ideas.[01:10:35] swyx: And it's one mode of conversation, it's one mode of idea delivery, but you can deliver ideas on a newsletter, in person like this there's so many different ways. And so I think, I think about it more as we are trying to start or serve an industry, and that industry is the AI engineer industry, which is, which we can talk about more.[01:10:53] swyx: Yes, let's go into that. So the AI engineer, you penned a piece called The Rise of the AI Engineer, you tweeted about it, Andrej Karpathy also responded, largely agreeing with what you said. What is an AI engineer? The AI engineer is the software engineer building with AI, enhanced by AI, And eventually it will be non human engineers writing code for you, Which I know MBU is all about.[01:11:18] swyx: You're saying eventually the AI engineer will become a non human engineer? That will be one kind of AI engineer that people are trying to build, And is probably the most furthest away in terms of being reality. Because it's so hard. Got it. But, but there are three types of AI engineer and I just went through the three.[01:11:33] swyx: One is AI enhanced where you like use AI products like Copilot and Cursor. And two is AI products engineer where you use the exposed AI capabilities to the end user As a software engineer, like, not doing pre training not being an ML researcher, not being an ML engineer, but just interacting with foundation models and probably APIs from foundation model labs.[01:11:54] swyx: What's the third one? And the third one is the non human AI engineer. Got it. The fully autonomous AI engineer. Dream, you know, Coder. How long do you think it is till we get to, like, early, early versions? This is my equivalent of AGI timelines. I know, I know. You can set yourself up for this. So like, lots of active, like, I mean, I have, I have supported companies actively working on that.[01:12:13] swyx: I think it's more useful to think about levels of autonomy. And so my answer to that is, you know, perpetually five years away until until it figures it out. No, but my actual anecdote the closest comparison we have to that is self driving. We are, we're doing this in San Francisco for those who are watching the live stream.[01:12:32] swyx: If you haven't come to San Francisco and seen, and taken a Waymo ride just come, get a friend take a Waymo ride. I remember 2014 we covered a little bit of autos in, in my hedge fund. And I was, I remember telling a friend, I was like, self driving cars around the corner, like, this is it, like, you know, parking will be, like, parking will be a thing of the past and it didn't happen for the next 10 years.[01:12:52] swyx: And, and, but now we, now, like, most of us in San Francisco can, can take it for granted. So I think, like, you just have to be mindful that the, the, the, the rough edges take a long time. And like, yes, it's going to work in demos, then it's going to work a little bit further out and it's just going to take a long time.[01:13:08] swyx: The more useful mental model I have is sort of levels of autonomy. So in self driving, you have level 1, 2, 3, 4, 5 just the amount of human attention that you get. At first, like, your, your, your hands are always on 10 and 2 and you have to pay attention to the, to, to the driving every 30 seconds and eventually you can sleep in the car, right?[01:13:25] swyx: So there's a whole spectrum of that. So what's the equivalent for that for, for coding? Keep your hands on the keyboard and then eventually you've kind of gone off. You tab to accept everything. Where are we? Oh, that's good, yeah. Yeah. Doesn't that already happen? Yeah. Approve the PR. Approve, this looks good.[01:13:39] swyx: That's the dream that people want. It gives, it gives, really you unlock a lot of coding when people, non technical people can file issues, and then the AI engineer can sort of automatically write code, pass your tests, and if it, if it kind of works as, as, as intended. As, as advertised then you can just kind of merge it and then you, you know, 10x, 100x the number of developers in your company immediately.[01:14:00] swyx: So that's the goal, that's the, that's the holy grail. We're not there yet but Sweep, CodeGen, there's a bunch of companies, Magic probably, are, are all working towards that. And, and so I so the TLDR, like the, the thing that we covered Alessio and I covered in the January recap that we did was that the, the basic split that people should have in their minds is the inner loop versus the outer loop for the developer.[01:14:21] swyx: Inner loop is everything that happens in your IDE between Git commits. And outer loop is happens, is what happens when you push up your Git commit to GitHub, for example, or GitLab. And that's a nice split, which means like everything local, everything that needs to be fast is for everything that's kind of very hands on for developers.[01:14:37] swyx: It's probably easier to automate or easier to have code assistance. That's what Copilot is, that's what, that's what all those things are. And then everything that happens autonomously when you're effectively away from the keyboard with like a GitHub issue or something that is more outer loop where you're you know, you're relying a lot more on autonomy and we are maybe, our LLMs are maybe not smart enough to do that yet.[01:14:57] Alessio: Do you have any thoughts on[01:14:58] swyx: kind of[01:14:58] Alessio: the user experience and how that will change? One of the things[01:15:01] swyx: that has happened for me, kind of looking at some of these products and playing around with things ourselves, like, You know, it sounds good to have an automated PR, then you get an automated PR and you're like, I really don't want to review like 300 lines of generated code, and like find the bug in it.[01:15:13] swyx: Well then you have another agent that's a reviewer. That's right, but then you like tell it like, Oh, go fix it, and it comes back with 400 lines. Yes, there is a length bias to code, right? And you do have higher passing rates. In PRs. This is a documented human behavior thing, right? Send me two lines of code, I will review the s**t out of that.[01:15:33] swyx: I don't know if I can swear on this. Send me, send me 200 lines of code, looks good to me. Right? Guess what? The, the agents are going to, perfectly happy to modify, to copy that behavior from us. When we actually want them to do the opposite. So, yeah, I, I think that the GAN model of code generation is probably not going to work super well.[01:15:50] swyx: I do think we probably need just better planning from the start. Which is, I'm just repeating the MBU thesis by the way. Just go listen to Kanjin talk about this. She's much better at it than I am. But yeah, I think I think the code review thing is going to be I think that what Codium, there are two Codiums, the Israeli one.[01:16:10] swyx: The Israeli Codium. With the E. Yeah, Codium with the E. They still have refused to rename. I'm friends with both of them. Every month I'm like, You're like, guys, let's[01:16:18] NLW: all come to one room. Yeah,[01:16:19] swyx: like, you know, someone's got to fold. Codium with the E has gone, like, you've got to write the test first. Right?[01:16:25] swyx: You write the, you write the it's like a sort of tripartite relationship. Again, this was also covered on a podcast with them, which is fantastic. Like, you interview me, you sort of through me, you interview. Like, the past avatars I've been watching the Netflix show, by the way, it's fantastic. But like, so so Codium is like, they've already thought this all the way through.[01:16:41] swyx: They're like, okay, you write the user story, from the user story you generate all the tests, you also generate the code and you update any one of those, they all have to update together. Right? So like, once the, and, and probably the critical factor is the test generation from the story. Because everything else can just kind of bounce the heads off of those things until they pass.[01:17:01] swyx: So you have to write good tests. It's kind of like the eat your vegetables of coding, right? Which nobody really wants to do. And so I think it's a really smart tactic to go to market by saying we automatically generate tests for you and, you know, start not great, but then get better. And eventually you get to the weakest point in the chain for the entire loop of code generation.[01:17:25] swyx: What do you think the weakest link is? The weakest link? Yeah. It's text generation. Yeah. Yeah. Do you think there's a way to, like, are there some promising[01:17:33] Alessio: avenues you see forward for making that actually better?[01:17:38] swyx: For making it better. You have to have, like, good isolation, and I think proper serverless cloud environments is integral to that.[01:17:48] swyx: I, it could be like a fly. io. It could be like a Cloudflare worker. It depends how much, how many resources your test environment needs. And effectively I was talking about this, I think with maybe Rob earlier in the audience, where every agent needs a sandbox. If you're a code agent, you need a coding sandbox, but if you're whatever, like MBU used to have this, like, sort of Minefield, Minecraft's clone that was much faster.[01:18:12] swyx: If, if you, if you have a model of the real world, you have to go, you have to go generate some plan or some code or some whatever, test it against that real world so that you can get this iterative feedback and then get the final result back that is somewhat validated against the real world. And so, like, you need a really good sandbox.[01:18:26] swyx: I don't think people, I, I think this is, this is a, this is an infrastructure need that humans[01:18:31] swyx + Josh Albrecht: have had for a long time. We've never solved it for ourselves. And now we have to solve it for humans. About a thousand times larger quantity of agents than, than, than actually exists. And, and so I, I, I think, like, we eventually have to involve, evolve a lot more infrastructure.[01:18:45] swyx + Josh Albrecht: In order to serve these things. So yeah. So, for those who don't know, like I also have so, we're talking about the rise of AI engineer. I also have previous conversations about immutable infrastructure cloud environments and that kind of stuff. And this is all of a kind. Like, like, in order to solve agents and coding agents, we're going to have to solve the other stuff too along the way.[01:19:05] swyx + Josh Albrecht: And it's really neat for me. To see all that tie together in my DevTools work that all these themes kind of reemerge just naturally, just because everything we needed for humans, we just need a hundred times more for, for for agents.[01:19:17] Dylan Patel: Let's talk about the AI engineer. AI engineer has become a whole thing.[01:19:21] Dylan Patel: It's become a term and also a conference. And tell us more, and a job title, tell us more about that. What's going on there?[01:19:31] swyx + Josh Albrecht: That is like a very vague, a very, very big cloud of things. I would just say like, I think it's an emergent industry. I've seen this happen repeatedly for, so the general term is software engineer.[01:19:44] swyx + Josh Albrecht: Programmer. In the 70s and 80s, there would not be like senior engineer. There would just be engineering. Like you, or you, I don't think they even call themselves engineer. They don't have that. What about a member of the technical staff? Oh, yeah, MTS. Very, very, very, very elite. But yeah, so like, you know, like these striations appear when the population grows and the technical depth grows over time.[01:20:07] swyx + Josh Albrecht: Yeah. When it starts, when it ends. Not that, not that important, and then over time it's just gonna specialize. And I've seen this happen for frontend, for DevOps, for data and I can't remember what else I listed in, in that, in that piece, But those are the main three that I was around for. And I, I see this, I saw this happening for AI engineer which is effectively, now a lot of people are arguing that there is the ML researcher, the ML engineer, who sort of pairs with the researcher sometimes they also call research engineer and then on the other side of the fence is just software engineers.[01:20:35] swyx + Josh Albrecht: And that's how it was up till about last year. And now there's this specializing and rising class of people building AI specific software that are not any of those previous titles that I just mentioned. And that's the thesis of the AI engineer, that this is an emerging category of AI. Startups of jobs I've had people from Meta, IBM, Microsoft, OpenAI tell me that they, their title is now AI engineer.[01:20:58] swyx + Josh Albrecht: They're hiring AI engineers. So, like, I can see that this is a trend and I think that's what Andre called out in his post that, like, just mathematically, just the, just the limitations in terms of talent, research talents and GPUs, that all these will tend to concentrate in a, in a, in a, Few labs and everyone else are just going to have to rely on them or build differentiation of products in other ways And those will be AI engineers.[01:21:21] swyx + Josh Albrecht: So mathematically there will be more AI engineers than ML engineers. It's just the truth. Right now it's the other way. Right now the number of AI engineers is maybe 10x less. So I think that the ratio will invert and you know I think the goal of the InSpace and the goal of the conference and anything else I do is to serve that[01:21:38] Dylan Patel: growing audience.[01:21:41] Dylan Patel: To make the distinction clear, if I'm a software engineer And I'm like, I want to become an AI engineer. What do I have to learn? Like, what additional capabilities does that type of engineer have? Funny you say that. I think you have a blog post on this very[01:21:53] swyx + Josh Albrecht: topic. I don't actually have a specific blog post on how to, like, change classes.[01:21:58] swyx + Josh Albrecht: I do think I always think about these in terms of yeah, Baldur's Gate and, you know D& D rule set number 5. 1 or whatever. But yeah, so I kind of intentionally left that open to leave space for others. I think when you start an industry, you need to the specifications that work the best in industries are So minimally defined so that other people can fill in the blanks.[01:22:19] swyx + Josh Albrecht: And I want people to fill in the blanks. I want people to disagree with me and with with themselves so that we can figure this out as a, as a group. Like I don't want to overs specify everything, you know, like that that's, that's a way, that's the only way to guarantee it, that it will fail. Um, I do have a take obviously, 'cause a lot of people are, are asking me like, where to start.[01:22:37] swyx + Josh Albrecht: And I think basically so what, what we have is latent Space University. We just finished working on day seven today. It's a seven day email course. Where basically, like, it, it is completely designed to answer the question of, like, okay, I'm a, I'm an existing software engineer, I, like, kind of, I know how to code but I don't get all this AI stuff, I've been living under a rock, or, like, it's just too overwhelming for me, you have to, like, pick for me, or curate for me as a, as a trusted friend.[01:22:59] swyx + Josh Albrecht: And I have one hour a day for seven days. What, what, what do you do? slot in that, in that, in that bucket. So for us, it's making, making sort of LLM API, API calls. It's me, it's image generation, it's code generation, it's audio ASR, I, I think, what's, what's ASR? Audio speech recognition?[01:23:18] swyx + Josh Albrecht: Yeah, yeah. And then I forget, I forget what the fifth and sixth one is, but the last day is agents. And, and so basically, like, I'm just like, you know, Here are seven projects that you should do to feel like you can do anything in AI. You can't really do everything in AI just from, just from that small list.[01:23:34] swyx + Josh Albrecht: But I think it's just like, just like anything, you have to like, go through like a set list of, of things that are basic skills that I think everyone in this industry should have to be at least conversant in. If someone, if like a boss comes to you and goes like, hey, can we build this? You don't even know if the answer is no.[01:23:52] swyx + Josh Albrecht: So I want you to move towards from like unknown unknowns to at least known unknowns. And I think that's, that's where you start being competent as an AI engineer. So, so yeah, that's LSU, Latent Space University, just to trigger the The Tigers.[01:24:06] Dylan Patel: So do you think in the future that people, an AI engineer is going to be someone's full time job?[01:24:10] Dylan Patel: Like people are just going to be AI engineers? Or do you think it's going to be more of a world where I'm a software engineer, and like, 20 percent of my time, I'm using open AIs, APIs, and I'm, Working on prompt engineering and stuff like that and using[01:24:23] swyx + Josh Albrecht: CodePilot. You just reminded me of Day6's open source models and fine tuning.[01:24:27] swyx + Josh Albrecht: Perfect. I think it will be a spectrum. That's why I don't want to be like too definitive about it. Like we have full time front end engineers and we have part time front end engineers and you dip into that community whenever you want. But wouldn't it be nice if there was a collective name for that community so you could go find it?[01:24:40] swyx + Josh Albrecht: You can find each other. And, like, honestly, like, that's, that's really it. Like, a lot of people, a lot of companies were pinging me for, like, Hey, I want to hire this kind of person, but you can't hire that person, but I wanted someone like that. And then people on the labor side were, were pinging me going, like, Okay, I want to do more in this space, but where do I go?[01:24:56] swyx + Josh Albrecht: And I think just having that shelling point of, of, of what an industry title and name is, and then sort of building out that. Mythology and community and conference I think is helpful, hopefully, and I don't have any prescriptions on whether or not it's a full time job. I do think, over time, it's going to become more of a full time job.[01:25:14] swyx + Josh Albrecht: And that's great for the people who want to do that and the companies that want to employ that. But it's absolutely, like, you can take it part time, like, you know, jobs come in many formats. Yep, yep, that[01:25:23] Dylan Patel: makes sense. Yeah. And then you have a huge world fair coming up. Yeah. Tell me about that. So,[01:25:31] swyx + Josh Albrecht: Part of, I think, you know, What creating an industry requires is for, to let people gather in one place.[01:25:37] swyx + Josh Albrecht: And also for me to get high quality talks out of people. You have to create an event out of it. Otherwise they don't do the work. So so last year we did the AI Engineer Summit, which went very well. And people can see that online and we're, we're, we're very happy with how that turned out.[01:25:53] swyx + Josh Albrecht: This year we want to go four times bigger with the World's Fair and try to reflect AI engineering as it is in 2024. I always admired two conferences in, in this respect. One is NeurIPS, which I went to last year and, and documented on, on the pod, which was fantastic. And two, which is KubeCon from the other side of my life, which is the sort of cloud registration and, and DevOps world.[01:26:18] swyx + Josh Albrecht: So NeurIPS is the one place that you go to, to, I think it's the top conference. I mean, there's, there's others that you can kind of consider. But, yeah so, so NeurIPS is, NeurIPS is where the research sciences are the stars. The researchers are the stars, PhDs are the stars, mostly it's just PhDs on the job market, to be honest.[01:26:34] swyx + Josh Albrecht: It's really funny[01:26:35] Dylan Patel: to go, especially these days. Yeah, it[01:26:37] swyx + Josh Albrecht: was really funny to go to NeurIPS and go like, And the VCs trying to back them. Yeah, there are lots, lots of VCs trying to back them. Yeah, there This year. Anyway, so in Europe, research scientists are the stars. And for, I wanted for AI engineers, for engineers to be the star.[01:26:51] swyx + Josh Albrecht: Right, to show off their tooling and their techniques and their difficulty moving all these ideas from research into production. The other one was KubeCon, where, You could honestly just go and not attend any of the talks and just walk the floor and figure out what's going on in DevOps, which is fantastic.[01:27:10] swyx + Josh Albrecht: Because, yeah, so, so that curation and that bringing together of, of, of an industry is what I'm going for for the conference. And yeah, it's coming in June. The most important thing, to be honest, when I, like, conceived of this whole thing was to buy the domain. So we got AI. engineer. People are like, engineer is a domain?[01:27:27] swyx + Josh Albrecht: Yeah, and funny enough, engineer was cheaper than engineering. I don't understand why, but like that's up to the domain people.[01:27:36] Dylan Patel: Josh, any questions on agents?[01:27:38] Alessio: Yeah,[01:27:39] Dylan Patel: I think maybe, you know, you have a lot[01:27:40] swyx + Josh Albrecht: of experience and exposure talking to all these companies and founders and researchers and everyone that's on your podcast.[01:27:47] Dylan Patel: Do you have, do you feel like you have a[01:27:50] swyx + Josh Albrecht: good kind of perspective on some of the things that, like, some of the kind of technical issues having seen? You know, like we were just talking about, like, for coding agents, like, oh, how, you know, the value of test is really important. There are other things, like, for, you know, retrieval, like now, You know, we have these models coming out with a million context, you know, or a million tokens of context length, or ten million, like, is retrieval going to[01:28:10] Dylan Patel: matter anymore, like,[01:28:11] swyx + Josh Albrecht: do[01:28:11] Dylan Patel: huge contexts matter, like,[01:28:13] swyx + Josh Albrecht: what do you think?[01:28:14] swyx + Josh Albrecht: Specifically about the long context thing? Sure, yeah. Because you asked a more broad question. I was going to ask a few other ones after that, so go for that one first. Yeah. That's what I was going to ask first. We can ask, yeah, okay, let's talk about long context and then the other stuff. So, for those who don't know, LongContext was kind of in the air last year, but really, really, really came into focus this year.[01:28:33] swyx + Josh Albrecht: With Gemini 1. 5 having a million token context and saying that it was in research for 10 million tokens. And that means that you can put, you, you, you, like, no longer have to really think about, What you retrieve sorry, you no longer really think about what you have to, like, put into context.[01:28:50] swyx + Josh Albrecht: You can just kind of throw it, throw the entire knowledge base in there, or books, or film, anything like that and that's fantastic. A lot of people are thinking that it kills RAG, and I think, like, one, that's not true, because for any kind of cost reason you you know, you still pay per token, so if you there, so basically Google is, like, perfectly happy to let you pay a million tokens every single time you make an API call, but good luck, you know, having a hundred dollar API call.[01:29:12] swyx + Josh Albrecht: And and then the other thing, it's going to be slow. No explanation needed. And then finally, my criticism of long context is that it's also not debuggable. Like, if something goes wrong with the result, you can't do, like, the ragged decomposition of where the source of error. Like, you just have to, like, go, like, it's the Waze, bro.[01:29:29] swyx + Josh Albrecht: Like, it's somewhere in there. Sorry. I pretty strongly agree with this. Why do you think people are making such crazy long context windows? People love to kill rag, right? It's so much Kill it, though, because it's too expensive. It's so expensive like you said. Yeah, I just think I just call it a different dimension I think it's an option that's great when it's there like when I'm prototyping I do not ever want to worry about context and I'm gonna call Stuff a few times and I don't want to run to errors I don't want to have it set up a complex retrieval system just to prototype something But once I'm done prototyping then I'll worry about all the other rag stuff And yes, I'm gonna buy some system or build some system or whatever to go do that.[01:30:02] swyx + Josh Albrecht: I so I think it's just like An improvement in like one dimension that you need And then, but the improvements in the other dimensions also matter. And it's all needed, like this space is just going to keep growing, um, in unlimited fashion. I do think that this combined with multi modality does unlock new things.[01:30:21] swyx + Josh Albrecht: So That's what I was going to ask about next. It's like, how important is multi modal? Like, great, you know, generating videos, sure, whatever. Okay, how many of us need to generate videos that often? It'd be cool for TV shows, sure, but like, yeah. I think it's pretty important. And the one thing that, in, when we launched the Lean Space podcast, We listed a bunch of interest areas.[01:30:37] swyx + Josh Albrecht: So one thing I love about being explicit or intentional about our, our work is that you list the things that you're interested in and you, you list the things that you're not interested in. And people are very unwilling to, to, to have an anti interest list. One of the things that we were not interested in was multimodality last year.[01:30:55] swyx + Josh Albrecht: Because everyone was, I was just like, okay, you can generate images and they're pretty, but like not a giant business. I was wrong. Midrani is a giant, giant, massive business that no one can get it, no one can understand or get into. But also I think being able to, to natively understand audio and video and code.[01:31:12] swyx + Josh Albrecht: I consider code a special modality. All that is very, like, qualitatively different than translating it into English first and using English as, I don't know, like a bottleneck or pipe and then you know, applying it in LLMs. Like the ability of LLMs to reason across modalities gives you something more than you could, you know, Individually by, by, by using text as the universal interface.[01:31:33] swyx + Josh Albrecht: So I think that's useful. So concretely what, what does that mean? It means that so I think the reference post for everyone that you should have in your head is Simon Willison's post on Gemini 1. 5's video capability. Where he basically shot a video of his bookshelf and just kind of scanning through it.[01:31:50] swyx + Josh Albrecht: And he was able to give back a, a complete JSON list of the books and the authors and, and all the details that were visible there. Hallucinated some of it, which is, you know, another, another issue. But I think it's just like unlocks this use case that you just would not even try to code without the native video understanding capability.[01:32:08] swyx + Josh Albrecht: And obviously, like. On a technical level, video is just a bunch of frames. So actually it's just image understanding, but image within the temporal dimension, which this month, I think, became much more of a important thing, like the integration of space and time in Transformers. I don't think anyone was really talking about that until this month, and now it's the only thing anyone can ever think about for Sora and for all the other stuff.[01:32:30] swyx + Josh Albrecht: The last thing I'll say that, which is which is Against this trend of like every modality is important. They just, just do all the modalities. I kind of agree with Nat Friedman who actually kind of pointed out just before the Gemini thing blew up this, this, this, this month, which was like, why is it that OpenAI is pushing Dolly so hard?[01:32:48] swyx + Josh Albrecht: Why is, why is being pushing Bing image creator? Like, it's not nec, it's not apparent that you have to create images to create a GI. But every lab just seems to want to do this, and I kind of agree that it's not on the critical path. Especially for image generation, maybe image understanding, video understanding.[01:33:04] swyx + Josh Albrecht: Yeah, consumption. But generation, eh. Maybe we'll be wrong next year. It just catches you a bunch of flack with like, you know, culture war things. Alright, we're going to[01:33:14] Dylan Patel: move into rapid fire Q& A, so we're going to ask you questions. We've cut[01:33:26] Dylan Patel: the Q& A section for time, so if you want to hear the spicy questions, head over to the Thursday Nights in AI video for the full discussion.[01:33:34] Dylan Patel - Semianalysis + Latent Space Live Show[01:33:34] Dylan Patel: Next up, we have another former guest, Dylan Patel of Semianalysis, the inventor of the GPU rich poor divide, who did a special live show with us in March. But that means you can finally, like, side to side A B test your favorite Boba[01:33:51] Alessio: shops?[01:33:51] Alessio: We got Gong Cha, we got Boba Guys, we got the Lemon, whatever it's called. So, let us know what's your favorite. We also have Slido up to submit questions. We already had Dylan on the podcast, and like, this guy tweets and writes about all kinds of stuff. So we want to know what people want to know more[01:34:07] Alessio: about.[01:34:08] Alessio: Rather than just being self, self driven. But we'll do A state of the union, maybe? I don't know. Everybody wants to know about Grok. Everybody wants to know whether or not NVIDIA is going to zero after Grok. Everybody wants to know what's going on with AMD. We got some AMD folks in the crowd, too.[01:34:23] Alessio: So feel free to interact at any time. This is We have[01:34:27] swyx + Josh Albrecht: portable mics.[01:34:27] Dylan Patel: Heckle, please. What do you sorry. Good comedians show their color when with the way they can handle the crowd when they're heckled.[01:34:35] Alessio: Do not throw Boba. Do not throw Boba at this end. We cannot afford another podcasting setup. Awesome.[01:34:41] Alessio: Well, welcome everybody to the Semi Analysis and Latest Space Crossover. Dylan texted me on Signal. He was like, dude, how do I easily set up a meetup? And here we are today. Well, as you might have seen, there's no name tags. There's a bunch of things that are missing. But we did our[01:34:55] Dylan Patel: best. It was extremely easy, right?[01:34:58] Groq[01:34:58] Dylan Patel: Like, I text Alessio. He's like, yo, I got the spot. Okay, cool. Thanks Here's a link. Send it to people. Sent it. And then showed up. And like, there was zero other organization that I required. So[01:35:10] Alessio: everybody's here. A lot of, a lot of Semi Analysis fans we get in the crowd everybody wants to know more about what's going on today, and Grok has definitely been the hottest thing.[01:35:19] Alessio: We just recorded our monthly podcast today, and we didn't talk that much about Grok because we wanted you to talk more about it, and then we'll splice you into our, our monthly recap. So, let's start there.[01:35:29] swyx + Josh Albrecht: Okay, so, You guys, you guys are the new Grok spreadsheet ers. Yeah, yeah, so, so, we, we we broke out some Grok numbers because everyone was wondering, there's two things going on, right?[01:35:37] swyx + Josh Albrecht: One you know, how important, or how does it achieve the inference speed that it does? That, that has been demonstrated by GrokChat. And two, how does it achieve its price promise that is promised that, that is sort of the public pricing of 27 cents per million token. And there's been a lot of speculation or, you know, some numbers thrown out there.[01:35:55] swyx + Josh Albrecht: I put out some tentative numbers and you put out different numbers. But I'll just kind of lay that as, as the, as the groundwork. Like, everyone's like very excited about essentially like five times faster. Token generation than any other LLM currently. And that unlocks interesting downstream possibilities if it's sustainable, if it's affordable.[01:36:14] swyx + Josh Albrecht: And so I think your question, or reading your piece on Grok, which is on the screen right now, is it sustainable?[01:36:21] Dylan Patel: So like many things, this is VC funded, including this Boba. No, I'm just kidding, I'm paying for the Bobo, so but, but Thank you semi analysis[01:36:29] swyx + Josh Albrecht: subscribers[01:36:31] Alessio: I hope he pays for it, I pay for it right now That's[01:36:33] Dylan Patel: true, that's true Alessio has the IOU, right?[01:36:36] Dylan Patel: And that's, that's all it is, but yeah, like many things, you know, they're, they're not making money off of their inference service, right? They're just throwing it out there for cheap and hoping to get business and maybe raise money off of that, and I think that's a that's a fine use case, but the question is, like, how much money are they losing?[01:36:53] Dylan Patel: Right, and, and that's sort of what I went through breaking down in this this article that's on the screen. And it's, it's pretty clear they're like 7 to 10x off, like, break even on their inference API, which is like horrendous, like far worse than any other sort of inference API provider. So this is like a simple, simple cost thing that was pulled up.[01:37:15] Dylan Patel: You can either inference at very high throughput, or you can inference at very high, very low latency.[01:37:20] Dylan Patel: With GPUs, you can do both. With Grok, you can only do one. Of course, with Grok, you can do that one faster. Marginally faster than a inference latency optimized GPU server. But no one offers inference latency optimized GPU servers because you would just burn money, right? Makes no economic sense to do so.[01:37:36] Dylan Patel: Until maybe someone's willing to pay for that. So, so Grok service, you know, on the surface looks awesome compared to everyone else's service, which is throughput optimized. And, and then when you compare to the throughput optimized scenario, right, GPUs look quite slow, but the reality is they're serving, you know, 64, 128 users at once.[01:37:54] Dylan Patel: Right, they're, they have a batch size, right? How many users are being served at once, whereas Grok Taking 576 chips, and they're not really doing that efficiently, right? You know, they're, they're serving a far, far fewer number of users, but extremely fast. Now, that could be worthwhile if they can get their, you know, the number of users they're serving at once up, but that's extremely hard because they don't have memory on their chip, so they can't store KV cache KV cache for, you know, all the various different users.[01:38:21] Dylan Patel: And so, so the crux of the issue is just like, hey, So, can they, can they get that performance up as much as they claim they will, right? Which is, you know, they need to get it up more than 10x, right? To, to, to make this like a reasonable benefit, right? In the meantime, NVIDIA's launching a new GPU in two weeks that'll be fun at GTC and they're constantly pushing software as well, so we'll see if, if Grok can catch up to that.[01:38:43] Dylan Patel: But the, the current verdict is, you know, they're, they're quite far behind, but it's hopeful, you know, that, that maybe they can get there by, you know, scaling their system larger. Yeah.[01:38:52] swyx + Josh Albrecht: I was listening back to our original episode, and you were talking about how NVIDIA basically adopted this different strategy of just leaning on networking GPUs together.[01:39:00] swyx + Josh Albrecht: And it seems like Grok has some, like, minor version of that going on here with the Grok rack. Is it enough? Like, what's Grok's next step here, like,[01:39:12] Dylan Patel: strategically? Yeah, that's the next step is, of course, you know, so, you know, So right now they connect 10 racks of chips together, right, and that's the system that's running on their API today, right.[01:39:23] Dylan Patel: Whereas most people who are running, you know, Mistral are running it on two GPUs, right. So one fourth of a server. Yeah. And that rack is not you know, obviously 10 racks is pretty crazy, but they think that they can scale performance if they have this individual system be 20 racks, right? They think they can continue to scale performance extra linearly.[01:39:42] Dylan Patel: So that'd be amazing if they could but I, I, I'm, I'm doubtful that that's gonna be something that's scalable especially for, for, you know, larger models. So there's the[01:39:56] Alessio: chip itself, but there's also a lot of work they're doing at the compiler level. Do you have any good sense of, like, how easy it is to actually work with LPU?[01:40:04] Alessio: Like, is that something that is going to be a bottleneck for them?[01:40:07] Dylan Patel: So, so Ali's in the front right there, and he, he knows a ton about about VLIW architectures. But to summarize sort of his opinion, and I think many folks's, it's, it's extremely hard to To program these sorts of architectures, right?[01:40:19] Dylan Patel: Which is why they have their compiler and so on and so forth. But, you know, it's, it's an incredible amount of work for them to stand up individual models and to get the performance up on them which is what they've been working on, right? Whereas, whereas, you know, GPUs are far more flexible, of course.[01:40:33] Dylan Patel: And so the question is, you know, can they, can they can, can this compiler continue to extract performance? Well, theoretically, like there, there's a lot more performance to run on the hardware. But they don't have, you know, many, many things that people generally associate with, with programmable hardware.[01:40:49] Dylan Patel: Right? They don't have buffers and, and many other things. So, so it makes it very tough to to do that. But that's what their, you know, their relatively large compiler team is working on. Yeah,[01:40:58] swyx + Josh Albrecht: So I'm, I'm not a GPU compiler guy. But I do want to clarify my understanding from what I read. Which is a lot of catching up to do.[01:41:05] swyx + Josh Albrecht: It is, The crux of it is some kind of speculative, like I, in the, the word that comes to mind is speculative routing of weights and, you know, and, and work that, that needs to be done, or scheduling of work across the, you know, the 10 racks of, of GPUs. Is that the, is that like the, the, the bulk of the benefit that you get from[01:41:25] Dylan Patel: the compilation?[01:41:26] Dylan Patel: So, so with the Grok chips, what's really interesting is like with GPUs you can do, you can issue certain instructions. And you will get a different result. Like, depending on the time, I know a lot of people in ML have, have had that experience, right? Where like, the GPU literally doesn't return the numbers it should be.[01:41:45] Dylan Patel: And that's basically called non determinism, right? And, and, and the, and, and, with, with Grok, their chip is completely deterministic. The moment you compile it, you know exactly how long it will take to operate, right? There is no, there is no, like, deviation at all. And so, you know, they've, they're planning everything ahead of time, right, like, every instruction, like, it will complete in the time that they've planned it for.[01:42:08] Dylan Patel: And there is no I don't know, I don't know what the best way to state this is. There's no variance there which is interesting from, like, when you look historically, they tried to push this into automotive, right? Because automotive, you know, you probably want your car to do exactly what you issued it to do.[01:42:22] Dylan Patel: And not have, sort of, unpredictability. But yeah, I don't, sorry, I lost track of the question.[01:42:28] swyx + Josh Albrecht: It's okay, I just wanted to understand a little bit more about, like, what people should under, should know about the compiler magic that goes on with Brock. Like, you know, like, I think, I think, from a software, like, under, like, hardware point of view that in, that intersection of, you know,[01:42:44] Dylan Patel: So, so, so chips have like, like and I'm going to steal this from someone here in the crowd, but chips have like five, you know, sort of, there's like, when you're designing a chip, there's, there's, it's called PPA, right?[01:42:54] Dylan Patel: Power, performance, and area, right? So it's kind of a triangle that you optimize around. And the one thing people don't realize is there's a, there's a third P that's like PPAP. And the last P is pain in the ass to program. And, and that's that is very important for like. People making AI hardware, right?[01:43:11] Dylan Patel: Like, TPU, without the hundreds of people that work on the compiler, and JAX, and XLA, and all these sorts of things, would be a pain in the ass to program. But Google's got that, like, plumbing. Now, if you look across the ecosystem, everything else is a pain in the ass to program compared to NVIDIA, right? And, and, and this applies to the, to the Grok chip as well, right?[01:43:31] Dylan Patel: So, yeah, question is, like, can the compiler team get performance up anywhere close to theoretical? And then, and then can they make it not a pain in the ass to support new models? Cool. We[01:43:41] Alessio: got a question, we got a question from Ali. What's the average VLIW bundle occupancy of Grok? Bro,[01:43:49] Dylan Patel: get out of here.[01:43:52] Alessio: I don't know if he's setting you up, or if he[01:43:54] Dylan Patel: wants to chime in. I think he's setting me up, I think he's setting me up. So, okay,[01:43:58] swyx + Josh Albrecht: what is VLIW for[01:44:00] Dylan Patel: the rest of us? It's, it's like very long instruction word is basically what it means. And, hm. So, so, GPUs are relatively simple, right? They're, they're tiny little cores, very simple instructions, there's a shitload of them, right?[01:44:16] Dylan Patel: CPUs, you know, they have a, they have a known instruction set, right? x86. It's very complicated but people have worked on it for decades. VLIW processors are very unique in that sense, right? Like and your question, Ali, I cannot answer that question. I have no clue. Is it documented anywhere online?[01:44:35] Dylan Patel: Anyway, so like the systolic array, right? Like there's, within the TPU, there's a bunch of stuff, but the actual matrix multiply unit, it's called the MXU, and it's a VLIW architecture as well. It's and I'm, I'm just trying to find a, yeah, I just want to find something that makes me not sound like an idiot.[01:44:51] swyx + Josh Albrecht: Sometimes I also like to ballpark things in terms of like, like where a good middle median value should be and where like a good high value should be. Sorry. You, you, you[01:45:03] Dylan Patel: can ballpark things like, you know, like, yeah, so, so, so, but basically like the, the point is like you're trading this is optimal, this is theoretically the most optimal architecture for performance power and area in a given, and you know, not, not specifically Grok, but VLIW in general is gonna get you closer to optimal there, but then you're giving off, you know, that, that last P, which is pain in the ass program, is, is I think the most simple way to get into it.[01:45:27] Dylan Patel: There's like, computer architecture books about this, but it's, it's, it's a little little, little complicated, right? Yeah.[01:45:35] Alessio: Somebody asked, there's a lot of questions, that's great. Can we talk about LPU, Cerebrus, Tenstorin, some of these other architectures. How should people think about Maxim, SRAM versus Mix versus[01:45:49] Dylan Patel: Yeah, yeah.[01:45:50] Dylan Patel: So there's a lot of ML hardware out there, new and old, right? There's old stuff that's trying to compete there's new stuff that's coming up, you know, companies like, like MadX and Lumerium Labs and so on and so forth, right? You know but, but, so, so there's like a continuum of like, everyone before, say, two years ago that was doing ML hardware bet in one direction, right?[01:46:11] Dylan Patel: We're gonna make it as an architecture that is, that is has more on chip memory than NVIDIA, right? Like, that was the general bet everyone made. Right? And so like Grok made that bet, they made it to the extreme, right? They didn't have any off chip memory at all. Only on chip memory. You have, you have Cerebrists who did a similar thing except, they were like, Yeah, we're gonna have on chip memory, but we're gonna make a chip that's the size of a wafer.[01:46:33] Dylan Patel: Right? Like literally this big. Whereas an NVIDIA chip is roughly this big, right? So it's like this big, it's the only chip in the world that's that big. But again, same bet. More on chip memory, less off chip, right? GraphCore and SambaNova made a similar bet. And, and every, basically everyone made that bet.[01:46:49] Dylan Patel: Cause they thought that's where ML would go. Of course, models grew faster than anyone ever imagined. Yeah, than the memory that was possible. And so that, that very quickly became the wrong bet. And so now we're, you know, sort of seeing a new wave of startups that are going to bet on the other side, as well as many other, you know, architectural things because memory is not really the only architectural thing, of course.[01:47:08] Dylan Patel: And so, like, where to, where to, like, place startups is, is very dependent on, like, Hey, what are you doing differently than NVIDIA? And is NVIDIA just going to implement that in their chip next year, right? Or, or some version of that. That's, like, pretty much the only things to think about when looking at, you know, hardware companies now.[01:47:27] Dylan Patel: Cool.[01:47:28] Alessio: And, yeah, I, I think the, the question is like, there's the size of the models that got outrun, but now you're doing all this work at the compiler level, but it's very transformer based, everything they're doing on the optimization side. How, how do you think about that risk? Like, do you think it's okay for like a hardware company to take like architectural risk in terms of like, yeah, we assume transformers in two years, they'll still be pretty good.[01:47:51] Alessio: But when you're like depreciating some of this cost of our life. For five years as a buyer.[01:47:56] Dylan Patel: Yeah, yeah, that's, that's the biggest challenge with like some of the specialized hardware, right? It's like, I know my GPUs will be useful in four years or five years. Maybe not, like, super useful, but they'll be useful for something.[01:48:07] Dylan Patel: But, there's no way to know that my hardware is going to be able to operate on whatever new model architecture that comes out in the next few years, right? Like, I, I, I like to joke transformers are all you need. And like everything else is like a waste of time. But, you know, I'm sure something better will come.[01:48:26] Dylan Patel: Right? And, and, you know, you gotta have like, hardware is expensive and you own it for many years. Right? So you can't just like buy whatever's best for today's workload one time and then assume that workload is gonna stay stagnant. Cause that's a recipe to have your like hardware useless as soon as like things evolve.[01:48:43] Dylan Patel: Right? Like imagine if someone like had hardware for LTSMs and. 2016 or whatever, right? Like, LSTMs. Yeah, LSTM, sorry. You look like an idiot, right? Because now it's not gonna work for, you know, the next architecture, right? As soon as BERT came out, right? For example. So yeah, it's, it's very anything super, super specialized is always at risk of, of being sort of obsoleted and useless.[01:49:06] Dylan Patel: And, and sort of that's, that's the, that's the thought that like, hey, like, like Graphcore, right? Their chips are. Pretty decent at GNNs, right? Graph Neural Networks. They're actually pretty decent at that. But no one cares, right? So, congratulations, right? Like, you won, you won like the shortest midget, right?[01:49:24] swyx + Josh Albrecht: Mentioning transformers is all you need. Gives us a nice opportunity to bring out one of your old tweets, but also mention Gemini. My old[01:49:30] Dylan Patel: tweets, I'm scared. Recent[01:49:33] swyx + Josh Albrecht: tweets. There's a lot of people talking about, like I think you had a tweet commenting on Gemini 1. 5. And the million token context where basically everyone was saying, like, okay, we need Mamba, we need RLUKV, or we need some other alternative architecture to scale to long context.[01:49:48] swyx + Josh Albrecht: And Google comes out and says, no, we just, we scaled transformers to 10 million tokens. Easy. We, and, you know, like, I, I think that, that kind of, like, reflects on your thesis there a[01:49:59] Dylan Patel: little bit. I guess, yeah. I mean, I don't know if I, if I have a coherent thesis, but it's, it's sure fun to, it's Who, who think that like, I, I, I just have an intense hatred for RAG.[01:50:11] Dylan Patel: Right, like retrieval augmented generation is, is, is like the most like, I just have an intense like innate hatred for it. Wait, wait, you retweeted me[01:50:18] swyx + Josh Albrecht: defending RAG in the White House press release. Yeah, yeah, yeah. Okay.[01:50:21] Dylan Patel: But it's just fun,[01:50:22] swyx + Josh Albrecht: it's all fun and games. Yeah, yeah, yeah, it's all fun and games.[01:50:24] Dylan Patel: Yeah.[01:50:25] Dylan Patel: No, no, no, I retweeted, I retweeted you because you memed the White House. I don't know if y'all saw the meme. Can you pull it up? Sure. Like the, the White House the White House put out this thing about like, They're getting very opinionated with this White House. Memory safety. I think it was effectively like, C is bad and Rust is good.[01:50:39] Dylan Patel: It was like pretty wild that the White House put that out. And I mean like, like whatever that is, so, so, So[01:50:46] swyx + Josh Albrecht: like, they just got very opinionated about prescribing languages to people. And so then I was, I just like started editing them. So I have stopped comparing RAG with long context and fine[01:50:54] Dylan Patel: tuning.[01:50:55] Dylan Patel: Wait, You said I retweeted you defending it. I thought you were hating on it. And that's why I retweeted it.[01:51:00] swyx + Josh Albrecht: It's somewhat of a defense. Because everyone was like long context is killing RAG. And then I had future LLM should be sub quadratic. That's another one. And I actually messed with the fine print as well..[01:51:11] Alessio: Let's see power benefits of SRAM dominant[01:51:13] Dylan Patel: Yeah, yeah. So, so that's a good question, right? So, like, SRAM is on chip memory. Everyone's just using HBM. If you don't have to go to off chip memory, that'd be really efficient, right?[01:51:23] Dylan Patel: Cause, cause you're, you're not moving bits around. But there's always the issue of you don't have enough memory, right? So, so you still have to move bits around constantly. And so that's the, that's the question. So, yeah, sure. If you, if you can not move data around as you compute, it's going to be fantastically efficient.[01:51:39] Dylan Patel: That isn't really not really just easy or simple to do.[01:51:42] Alessio: What do you think is going to be harder in the future, like getting more energy at cheaper costs or like getting more of this hardware[01:51:48] Dylan Patel: to run? Yeah, I wonder, so someone was talking about this earlier but it's like here in the crowd and I'm looking right at him but he's complaining that journalists keep saying that you know, that, that, like misreporting about how data centers, or what data centers are doing to the environment.[01:52:03] Dylan Patel: Right? Which I thought was quite funny, right? Cause, cause they're inundated by journalists talking about data centers like destroying the world. Anyways you know, that's not quite the case, right? But yeah, I don't know, like, the, the, the power is certainly going to be hard to get, but, you know, I think, I think if you just look at history, right?[01:52:22] Dylan Patel: Like humanity, especially America, right? Like, power, power production and usage kept skyrocketing. From like the 1700s to like 1970s, and then it kind of flatlined from there, so why can't we like go back to the like growth stage, I guess is like the whole like mantra of like accelerationists, I guess.[01:52:40] Dylan Patel: This is EAC, yep. Well I don't think it's EAC, I think it's like, like Sam Altman like wholly believes this too, right? Yeah. And I don't think he's EAC. So, but yeah, like, like, I don't think like, it's like things, it's like something to think about, right? Like. The US is going back to growing in energy usage whereas for the last like 40 years kind of were flat on energy usage.[01:53:00] Dylan Patel: And what does that mean, right? Like, yeah.[01:53:04] Alessio: Fair enough. There was another question on Marvel but kind of the, I think[01:53:07] Dylan Patel: that's it's, it's, it's definitely like one of these three guys who are on the buy side that are asking this question. What, what, what you want to know if Marvel's stock is gonna go up?[01:53:18] Dylan Patel: Yeah. So Marvell,[01:53:19] Alessio: the, they're, they're doing the custom music for, for grok. They also do the tri too. And the, the Google CPU. Yeah. Any other, any other chip that they're working on that people should, should keep in mind. It's like, yeah. Any needle moving and it's any stock moving .[01:53:34] Dylan Patel: Yeah, exactly. Exactly. They're, they're working on some more stuff.[01:53:38] Dylan Patel: Yeah. I, I'll, I'll, I'll refrain from,[01:53:40] Alessio: yeah. All right. Let's see other grok stuff we want to get it, get through. I don't think so. Alright, most of the other ones. Your view on edge compute hardware. Any real use cases for it?[01:53:54] Dylan Patel: Yeah, I mean, I, I I have like a really like anti edge view. Yeah, let's hear it.[01:53:58] Dylan Patel: Like, like, so many people are like, oh, I'm going to run this model on my phone or on my laptop and. I love how much it's raining. So now I can be horrible and you people won't leave. Like, I want you to try and leave this building. Captive audience. Seriously, should I start singing? Like, there's nothing you[01:54:17] Alessio: can do.[01:54:18] Alessio: You definitely, I'll stop you from that.[01:54:19] Dylan Patel: Sorry, so edge hardware, right? Like, you know, people are like, I'm going to run this model on my phone or my laptop. It makes no sense to me. Cause Current hardware is not really capable of it. So you're gonna buy new hardware, to run whatever on the edge or you're gonna just run very, very small models.[01:54:36] Dylan Patel: But in either case, you're, you're gonna end up with like the performance is really low, And then whatever you spent to run it locally, Like if you spent it in the cloud, it could service 10x the users, right? So you kind of like, SOL in terms of like, Economics of, of running things on the edge. And then like latency is like, for, for LLMs, right, for LLMs, it's like not that big of a deal relative to, like internet latency is not that big of a deal relative to the use of the model, right?[01:55:08] Dylan Patel: Like the actual model operating, whether it's on edge hardware or cloud hardware. And cloud hardware is so much faster. So like edge hardware is not really able to like, have a measurable, appreciable, like advantage. Over, over cloud, cloud hardware. This applies to diffusion models, this applies to LLMs of course small models will be able to run, but not, not all, yeah.[01:55:33] Dylan Patel: Cool.[01:55:35] Alessio: Let's see. I guess you, you can now see them. Yeah, what chance do startups like MetaX fetch, or 5. 6? Haven't you[01:55:41] swyx + Josh Albrecht: already reviewed[01:55:41] Dylan Patel: them? Why don't you, why don't you answer? Yeah, we, we[01:55:43] swyx + Josh Albrecht: actually, like, we have, Connections with Maddox and Lemurian. Yeah, yeah, yeah. We haven't, no. But Gavin is[01:55:52] Alessio: Yeah, yeah, they said they don't want to talk publicly.[01:55:55] Alessio: Oh, okay, okay.[01:55:57] swyx + Josh Albrecht: When they open up, we can Sure,[01:56:00] Alessio: sure. But do you think, like, I think the two,[01:56:02] Dylan Patel: three Answer the question! What do you think of them?[01:56:06] Alessio: I think, kind of, there's a couple things. It's like How do the other companies innovate against them? I think when you do a new Silicon, you're like, Oh, we're going to be so much better at this thing or like much faster, much cheaper.[01:56:18] Alessio: But there's all the other curves going down on the macro environment at the same time. So if it takes you like five years before you were like a lot better, five years later, once you take the chip out, you're only comparing yourself to the five year advancement that the major companies had to. So then it's like, okay, the, we're going to have like the C300, whatever, from, from NVIDIA.[01:56:37] Alessio: By the time some of these chips come up.[01:56:40] Dylan Patel: What's after Z? What do you think is after Z in the road map? Because it's X, Y, Z, Anyways Yeah, yeah, it's like the age old problem, right? Like you build a chip, it has some cool thing, cool feature, and then like, a year later, NVIDIA has it in hardware, right? Has implemented some flavor of that in hardware.[01:57:01] Dylan Patel: Or two generations out, right? Like, what idea are you going to have that NVIDIA can't implement, is like, really the question. It's like, you have to be fundamentally different in some way that holds through for, you know, four or five years, right? That's kind of the big issue. But, you know, like, those people have some ideas that are interesting, and yeah, maybe it'll work out, right?[01:57:21] Dylan Patel: But it's going to be hard to fight NVIDIA, who one, doesn't consider them competition, right? They're worried about, like, Google and Amazon's chip. Right, they're not, and I guess to some extent AMD's chip, but like they're not really worried about you know, MADX or Etched or Grok or, you know, Positron or any of these folks.[01:57:39] Alessio: How much of an advantage do they have by working closely with like OpenAI folks and then already knowing where some of the architecture decisions are going? And since those companies are like the biggest buyers and users of the[01:57:51] Dylan Patel: chips, Yeah, I mean, like, you see, like, the most important sort of AI companies are obviously going to tell hardware vendors what they want you know, open AI and, you know, so on and so forth, right?[01:58:02] Dylan Patel: They're just going to obviously tell them what they want and the startups aren't actually going to get anywhere close to as much feedback on what to do on, like, you know, very minute, low level stuff, right? So that's, that's the, that is a difficulty, right? Some startups, like, like, Maddox obviously have people who built, or worked on the largest models, like at Google, but then other startups might not have that advantage and so they're always gonna have that issue of like, hey, how do I get the feedback, or what's changing, what do they see down the pipeline that's, that I really need to be aware of and ready for when I design my hardware.[01:58:37] Dylan Patel: Alright.[01:58:38] Alessio: Every hardware shortage has eventually turned into a glut. Well, that'd be true of NVIDIA chips, it's so when, but also why.[01:58:45] Dylan Patel: Absolutely, and I'm so excited to buy like H100s for like 1, 000, guys. No, that's not 000, but Yeah, everyone's gonna buy chips, right? Like, it's just the way semiconductors work, because the supply chain takes forever to build out.[01:58:58] Dylan Patel: And it's, it's like a really weird thing, right? Like, so, so if the backlog of chips is a year, people will order, you know, Two years worth of what they want for the next year. It is like a very common thing. It's not just like this AI cycle, but like, like, like microcontrollers, right? Like the automotive companies, they order two years worth of what they needed for one year, just so they could get enough, right?[01:59:21] Dylan Patel: Like, this is just like what happens in semiconductors when, when lead times lengthen, the, the purchases and inventory is sort of like double. Sorry. So, so these. The, the NVIDIA GPU shortage obviously is going to be rectified. And when it is everyone's sort of double orders will become extremely apparent, right?[01:59:42] Dylan Patel: And, you know, you, you see like random companies out of nowhere being like, Yeah, we've got 32, 000 H100s on order, or we've got 10, 000 or 5, 000. And trust, they're not all they're not all real orders for one, but I think, I think the like bubble will continue on for a long time, right, like it's not, it's not going to end like this year, right, like people, people need AI, right, like I think everyone in this audience would agree, right, like there's no, there's no like immediate like end to the, to the bubble, right.[02:00:09] Dylan Patel: Party like we're in 1995, not like 2000. Makes sense.[02:00:12] Alessio: What's next? Thoughts on VLIW[02:00:16] Dylan Patel: architectures? Oh, Y, Y, sorry, sorry, Y. The Y question, yeah, yeah. I think it's just because the supply chain expands so much, and then at the same time there will be no, like, economic, like, immediate economic thing for everyone, right?[02:00:28] Dylan Patel: Like, some companies will continue to buy, like like an OpenAI or Meta will continue to buy, but then, like, All these random startups will, or a lot of them will not be able to continue to buy, right? So then, so then that like kind of leads to like, they'll pause for a little bit, right? Or like, I think in 2018, right?[02:00:45] Dylan Patel: Like memory pricing was extremely high. Then all of a sudden Google, Microsoft, and Amazon all agreed, I don't, you know, You know, they don't, they won't, they won't say it's together, but they basically all agreed it like, within the same week to stop ordering memory. And within like a month, the price of memory started tanking like insane amounts, right?[02:01:06] Dylan Patel: And like people claim, you know, all sorts of reasons why that was timed extremely well. But it was like very clear and people in the financial markets were able to make trades and everything, right? People stopped buying and it's not like their demand just dried up. It's just like they had a little bit of a demand slowdown and then they had enough inventory that they could like weather until like prices tanked.[02:01:26] Dylan Patel: Because it's such an inelastic good, right? Yeah.[02:01:29] swyx + Josh Albrecht: Thank you very much. That's it.[02:01:35] AI Charlie: That concludes our audio segment this weekend. But if you're listening all the way to the end, we have two bonus segments for you. A conversation with Malin Nefe, Senior Vice President of AI at Capital One. We'll be speaking at the AI Leadership Track of the AI Engineer World's Far. And the recent Latent Space Personal AI Meetup featuring a lot of new AI wearables. Bee, Based Hardware, DeepGram MLE AI, and LangChain LangFriend and LangMem, Presented by another former guest, Harrison Chase. Watch out and take care. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe
Transcript
Discussion (0)
Welcome to the Latent Space Podcast Weekend Edition.
This is Charlie, your AI co-host.
Swix and Alessio are off for the week, making more great content.
We have exciting interviews coming up with Elycet, Chroma, Instructor,
and our upcoming series on NSFW, Not Safe for Work AI.
In today's episode, we're collating some of Swicks and Alessio's recent appearances
all in one place for you to find.
In part one, we have our first crossover podcast.
of the year. In our listener survey, several folks asked for more thoughts from our two hosts.
In 2023, Swix and Alessio did crossover interviews with other great podcasts like the AI
Breakdown, Practical AI, Cognitive Revolution, Thursday AI and China Talk, all of which you can
find in the latent space about page. NLW of the AI breakdown asked us back to do a special
on the Four Wars framework and the AI engineer scene. We love AI.
Breakdown as one of the best daily podcasts to keep up on AI News, so we were especially excited
to be back on. Watch out and take care. Today on the AI breakdown, part one of my conversation
with Alessio and Swix from Layton Space. All right, fellows, welcome back to the AI Breaktown. How are you
doing? Very good. Yeah. With the last time we did this show, we were like, oh yeah, let's do
check-ins like monthly about all the things that are going on. And then, of course, six months later
and, you know, the world has changed in a thousand ways.
It's just, it's too busy to even think about podcasting sometimes.
But I'm super excited to be chatting with you again.
I think there's a lot to catch up on just that's happened, I think, in the beginning of 2024.
And so, you know, we're going to talk today about just kind of a broad sense of where things are in some of the key battles in the AI space.
And then, you know, one of the big things that I'm really excited to have you guys on here for is to talk about where, sort of what patterns you're seeing and what people are.
actually trying to build, you know, where developers are spending their time and energy and any
sort of, you know, trends there. But maybe let's start, I guess, by checking in on a framework
that you guys actually introduced, which I've loved and I've cribbed a couple of times now, which is
the sort of four wars of the AI stack. Because first, since I have you here, I'd love to hear
sort of like where that started jelling. And then maybe we can get into, I think, a couple of them that
are, you know, particularly interesting, you know, in light of some recent news. Yeah, so maybe I'll
take this one. So the Four Wars is a framework that I came up around trying to recap all of
23. I tried to write sort of monthly recap pieces. And I was trying to figure out like what makes
one piece of news last longer than another or more significant than another. And I think it's basically
always around battlegrounds. Wars are fought around limited resources. And I think probably the, you know,
the most limited resource is talent, but the talent expresses itself in a number of areas. And so I kind
to focus on those areas first. So the four wars that we cover are the data wars, the GPU
Richpore War, the Multimodal War, and the Rag and Ops War. And I think you actually did a
dedicated episode to that. So thanks for covering that. Yeah, yeah. Not only did I do a dedicated
episode, I actually use that. I can't remember if I told you guys. I did give you big shoutouts,
but I used it as a framework for a presentation at Intel's big AI event that they hold each year,
where they have all their folks who are working on AI internally,
and it totally resonated.
That's amazing.
That's amazing.
Yeah.
So what got me thinking about it again is specifically this inflection news that we recently
had, this sort of, you know, basically, I can't imagine that anyone who's listening
wouldn't have thought about it.
But, you know, inflection is a one of the big contenders, right?
I think probably most folks would have put them, you know, just a half step behind the
anthropics and open AIs of the world in terms of labs.
but it's a company that raised $1.3 billion last year, less than a year ago.
Reid Hoffman's a co-founder, a co-founder of DeepMind.
So it's like this is not a small startup, let's say, at least in terms of perception.
And then we get the news that basically most of the team it appears is heading over to Microsoft
and they're bringing in a new CEO.
And, you know, I'm interested in kind of your take on how much that reflects.
The cold aside, I guess, you know, all the other things that it might be about,
how much it reflects this sort of the stark, brutal reality of competing in the frontier model
space right now and just the access to compute. There are a lot of things to say.
So first of all, there's always somebody who's more GPU rich than you. So inflection is GPU
rich by startup standard. I think about 22,000 H-100s, but obviously that pales compared to
the Microsoft. The other thing is that this is probably good news, maybe,
or the startups is like being GPU reach is not enough.
You know, like I think they were building something pretty interesting in Pi.
They're a model, their own kind of experience.
But at the end of the day, the interface that people consume as end users is really similar
to a lot of the others.
So, and we'll talk about jubty four and cloud three and all this stuff.
Sometimes when you're a startup, you're going to have a lot of success being GPU poor,
doing something that the GPU rich are not interested in, you know.
We just at our AI Center of Excellence at Dettable.
And one of the AI leads at one of the big companies was like, oh, we just save $10 million
and we use these models to do a translation, you know, and that's it.
It's not a GI.
It's just translation.
So I think like the inflection part is maybe a calling and awaking to a lot of startups and
saying, hey, you know, trying to get as much capital as possible, trying to get as many
GPUs as possible.
It's good.
But at the end of the day, it doesn't build a business.
And maybe what inflection.
I don't, again, I don't know the reasons behind the inflection choice, but if you say,
I don't want to build my own company that has $1.3 billion and I want to go do it at Microsoft,
it's probably not a resources problem.
It's more of strategic decisions that you're making as a company.
So, yeah, that was kind of my take on it.
Yeah.
And I guess on my end, two things actually happened yesterday.
It was a little bit quieter news, but stability AI had some pretty major departures as well.
and you may not be considering it,
but Stability is actually also a GPU-rich company
in a sense that they were the first new startup in this AI wave
to brag about how many GPUs that they have,
and you should join them.
And, you know, Amadis is definitely a GPU trader
in some sense from his hedge fund days.
So Robin Rombach and like most of the Stable Defusion 3 people
left Stability yesterday as well.
So yesterday was kind of like a big news day
for the GPU-rich companies,
both inflection and stability having sort of wind taken out of their sales.
I think, yes, it's a data point in a favor of, like, just because you have the GPUs doesn't
mean you can, you automatically win.
And I think, you know, kind of, I'll echo what Alessio says there.
But in general also, like, I wonder if this is like the start of a major consolidation wave
just in terms of, you know, I think that there was a lot of funding last year.
And, you know, the business models have not been worked out very well.
Even inflection couldn't do it.
And so I think maybe maybe that's a start of a small.
consolidation wave. I don't think like that's that's like a sign of AI winter. I keep looking for
AI winter coming. I think this is kind of like a brief cold front. Yeah. It's super interesting.
So I think a bunch of stuff here. One is I think to both of your points, there in some ways there
there already been this very clear demarcation between these two sides where like the GPU
pores to use the terminology like just weren't trying to compete on the same level. You know,
the vast majority of people who have started something over the last year, year and a half call it,
we're racing in a different direction. They're trying to find some edge somewhere else. They're
trying to build something different if they're really trying to innovate. It's in different areas.
And so it's really just this very small handful of companies that are in this like very, you know,
it's like the coheres and jaspers of the world that like this sort of, you know, that are that are
just sort of a little bit less resourced than, you know, than the other set that I think that this
potentially even applies to. You know, everyone else that could,
clearly demarcated into these two sides. And there's only a small handful kind of sitting
uncomfortably in the middle, perhaps. Let's come back to the idea of this sort of AI winter or,
you know, a cold front or anything like that. So this is something that I spent a lot of time
kind of thinking about and noticing. And my perception is that the vast majority of the folks who
are trying to call for sort of, you know, a trough of disillusionment or, you know, a shifting of the
phase to that are people.
who either, A, just don't like AI for some other reason. There's plenty of that, you know,
people who are saying, look, they're doing way worse than they ever thought. You know,
there's a lot of sort of confirmation bias kind of thing going on. Or two, media that just needs a
different narrative because they're sort of sick of, you know, telling the same story.
Same thing happened last summer when every, every outlet jumped on the chat GPT at its first down
month story to try to really like kind of hammer this idea that the hype was too much.
Meanwhile, you have, you know, just ridiculous levels of investment from enterprises, you know, coming in. You have, you know, huge, huge volumes of, you know, individual behavior change happening. But I do think that there's nothing incoherent sort of to your point, Swicks, about that and the consolidation period. Like, you know, if you look right now, for example, there are, I don't know, probably 25 or 30 credible, like, build your own chatbot platforms that, you know,
a lot of which have, you know, raised funding.
There's no universe in which all of those are successful across, you know, even with
a, even with a total addressable market of every enterprise in the world, you know,
you're just inevitably going to see some amount of consolidation.
Same with, you know, image generators.
There are, if you look at the A16s top 50 consumer AI apps just based on, you know,
web traffic or whatever, there's still like, I don't know, a half dozen or 10 or something,
like some ridiculous number of like basically things like Mid Journey or Dolly 3.
And it just seems impossible that we're going to have that many, you know, ultimately as as sort of,
you know, going, going concerns. So I don't know. I think that there will be inevitable
consolidation because, you know, just it's also what kind of like venture rounds are supposed
to do. You're not, not everyone who gets a seed round is supposed to get to series A and not everyone
who gets to series A is supposed to get to series B. That's sort of the natural process.
I think it will be tempting for a lot of people to try to, and,
from that, something about AI not being as sort of bigger as as sort of relevant as it was
hyped up to be. But I kind of think that's the wrong conclusion to come to.
I would say the experimentation surface is a little smaller for image generation. So if you go back
maybe six, nine months, most people will tell you why would you build a coding assistant when like
co-pilot and GitHub are just going to win everything because they have the data and
they've all the stuff. If you fast forward today, a lot of people use cursor. Everybody was excited
about the Devon release on Twitter. There are a lot of different ways of attacking the market
that are not completion of code, the ID. And even cursors, like, they evolved beyond single line
to like chat, to do multi-line edits and all that stuff. Image generation, I would say, yeah,
just as from what I've seen, like maybe the product innovation has slowed down at the
Ux level and people are improving the models.
So the race is like, how do I make better images?
It's not like, how do I make the user interact with the generation process better?
And that gets tough.
You know, it's hard to like really differentiate yourselves.
So yeah, that's kind of how I look at it.
And when we think about multi-modality, maybe the reason why people got so excited about SORA
is like, oh, this is like a completely, it's not a better image model.
This is like a completely different thing, you know?
And I think the creative mind is always looking for something that impacts the viewer in a different way.
You know, like they really want something different versus the developer mind.
It's like, oh, I have this like very annoying thing.
I want better.
I have this like very specific use cases that I want to go after.
So it's just different.
And that's why you see a lot more companies in image generation.
But I agree with you that if you fast forward, there's not going to be 10 of them.
you know, it's probably going to be one or two.
Yeah, I mean, to me, that's why I call it a war.
Like, individually, all these companies can make a story that kind of makes sense,
but collectively they cannot all be true.
Therefore, they all, there is some kind of fight over limited resources here.
Yeah, so it's interesting.
We wandered very naturally into sort of another one of these wars,
which is the multimodality kind of idea,
which is, you know, basically a question of whether it's going to be these sort of big
everything models that end up winning.
or whether, you know, you're going to have really specific things, you know, like something, you know, Dolly 3 inside of sort of Open AI's larger models versus, you know, a mid-journey or something like that.
And at first, you know, I was kind of thinking like for most of the last call it six months or whatever, it feels pretty definitively both and in some ways, you know, and that you're seeing just like great innovation on sort of the everything models.
But you're also seeing lots and lots happen at sort of the level of kind of individual.
use cases. But then SORA comes along and just like obliterates what I think anyone thought,
you know, where we were when it comes to video generation. So how are you guys thinking about this
particular battle or war at the moment? Yeah. This was definitely a both-end story and Sora
tip things one way for me in terms of scale being all you need. And the benefit, I think,
of having multiple models being developed under one roof.
I think a lot of people aren't aware that SORA was developed in a similar fashion to Dolly 3.
And Dolly 3 had a very interesting paper out where they talked about how they sort of bootstrapped their synthetic data based on GPT4 Vision and GPT4.
And it was just all like really interesting.
Like if you work on one modality, it enables you to work on other modalities.
And all that is more beneficial.
if it's all in the same house. Whereas the individual startups who don't, who sort of carve out
a single modality and work on that, definitely, you know, won't have the state of the art stuff
on helping them out on synthetic data. So I do think like the balance is tilted a little bit
towards the God model companies, which is challenging for the sort of dedicated modality
companies. But everyone's carving out different niches. You know, like we just interviewed
Suno AI, the sort of music model company.
And, you know, I don't see opening eye pursue music anytime soon.
Yeah, Suno's been phenomenal to play with.
Suno has done that rare thing where, which I think a number of different AI product
categories have done, where people who don't consider themselves particularly interested
in doing the thing that the AI enables find themselves doing a lot more of that thing, right?
Like, it'd be one thing if just musicians were excited about Suno and using it.
But what you're seeing is tons of people who just like music all of a sudden, like playing around
with it and finding themselves kind of down that rabbit hole, which I think is kind of like the
highest compliment that you can give one of these startups at their early days of it.
Yeah. I asked them directly in the interview about whether they consider themselves mid-jury
for music. He had a more sort of nuanced response there, but I think that probably the business
model is going to be very similar because he's focused on the B2C element of that. So yeah,
I mean, you know, just to tie back to the question about, you know, large multi-modality companies
versus small dedicated modality companies.
Yeah, I highly recommend people to read the SORA blog posts
and then read through to the Dolly blog posts
because they strongly correlated themselves
with the same synthetic data bootstrapping methods as Dolly.
And I think once you make those connections,
you're like, oh, it is beneficial
to have multiple state-of-the-art models in-house
that all help each other.
And that's the one thing that a dedicated modality company cannot do.
So I want to jump, I want to kind of build off that
and move into the sort of like updated GPT
four-class landscape because that's obviously been another big change over the last couple months.
But for the sake of the completeness, is there anything that's worth touching on with sort of the
quality data or sort of Rag-Ops wars just in terms of anything that's changed, I guess, for you
fundamentally in the last couple months about where those things stand?
So I think we're going to talk about Ragh for the Gemini and Clouds discussion later.
And so maybe briefly discussed the data piece.
I think maybe the only new thing was this Reddit deal with Google for,
like a $60 million deal just ahead of their IPO, very conveniently turning Reddit into an AI data company.
Also very interestingly, a non-exclusive deal, meaning that Reddit can resell that data to someone else.
And it probably does become table stakes.
A lot of people don't know, but a lot of the web text data set that originally started for GPT 1, 2, and 3,
was actually scraped from Reddit, at least the sort of vote scores.
And I think that's a very valuable piece of information.
So, like, yeah, I think people are figuring out how to pay for data.
People are suing each other over data.
This war is definitely very, very much heating up.
And I don't think, I don't see it getting any less intense.
Next to GPU's data is going to be the most expensive thing in a model stack company.
And, you know, a lot of people are resorting to synthetic versions of it, which may or may not be kosher
based on how far along or how commercially blessed the forms of creating the synthetic data are.
I don't know if, unless you have any other interactions with like data source companies,
but that's my two cents.
Yeah.
Yeah, actually saw Quentin Anthony from Aluteraa at GTC this week.
He's also been working on this.
I saw Technium.
He's also been working on the data side.
I think especially in open source, people are like, okay, if everybody is putting the gates up,
so to speak, to the data, we.
to make it easier for people that don't have 50 million a year to get access to good
datasets. And Jensen at his keynote, he did talk about synthetic data a little bit. So I think that's
something that we'll definitely hear more and more up in the enterprise, which never boats well,
because then all the people with the data, like, oh, the enterprises want to pay now? Let me,
let me put a pay here, Stripe link so that they can give me $50 million. But it worked for Reddit.
I think the stock is up 40% today after opening. So yeah, I don't know.
know if it's all about the Google deal, but it's obviously Reddit as being one of those companies
where, hey, you got all this great community, but like, how are you going to make money?
And, like, they try to sell the avatars. I don't know if that it's a great business for them.
No.
The data part sounds, as an investor, you know, the data part sounds a lot more interesting than
consumer cosmetics. Yeah. Yeah. So I think, you know, there's more questions around data.
You know, I think a lot of people are talking about the interview that Miramurati did with the Wall Street Journal.
where she just basically had no good answer for where they got the data for SORA.
I think this is where, you know, it's in nobody's interest to be transparent about data.
And it's kind of sad for the state of ML and state of AI research.
But it is what it is.
We have to figure this out as a society, just like we did for music and music sharing, you know,
in sort of the Napster to Spotify transition.
And that might take us a decade.
Yeah, I agree.
I think that you're right to identify it, not just as it's,
of technical problem, but as one where society has to have a debate with itself. Because I think that
there's, if you sit rationally within it, there's great kind of points on all side, not to be the sort of,
you know, person who sits in the middle constantly. But it's why I think a lot of these legal
decisions are going to be really important because, you know, the job of judges is to listen
all this stuff and try to come to things and then have other judges disagree and, you know,
and have the rest of us all debate at the same time. By the way, as a total aside, I feel like the
synthetic data right now is like eggs in the 80s and 90s, like whether they're good for you or bad for you.
Like, you know, we get one study that's like synthetic data, you know, there's model collapse.
And then we have like a hint that Lama, you know, to the most high-performance version of it, which was one they didn't release, was trained on synthetic data.
So maybe it's good.
I just feel like every other week I'm seeing something sort of different about whether it's good or bad for these models.
Yeah.
The branding of this is pretty poor.
I would kind of tell people to think about it like cholesterol.
There's good cholesterol, bad cholesterol,
and you can have good amounts of both.
But at this point, it is absolutely without a doubt
that most large models from here and out
will all be trained as some kind of synthetic data,
and that is not a bad thing.
There are ways in which you can do it poorly,
whether it's commercial, you know,
in terms of commercial sourcing
or in terms of the model performance.
But it's without a doubt that good,
synthetic data is going to help your model. And this is just a question of where to obtain it
and what kinds of synthetic data are valuable. Even like alpha geometry, you know, was a really
good example from like earlier this year. If you're using the cholesterol analogy, then my,
then my egg thing can't be that far off. Yeah, exactly. Let's talk about the sort of the state of
the art and the GPT4 class landscape and how that's changed. Because obviously, you know,
sort of the two big things or a couple of the big things that have happened.
since we last talked, were one, you know, Gemini first announcing that a model was coming and then
finally it arriving. And then very soon after, a sort of a different model arriving from Gemini and
Claude three. So I guess, you know, I'm not sure exactly where the right place to start with
this conversation is, but, you know, maybe very broadly speaking, which of these do you think
have made a bigger impact? Probably the one you can use. So, Claude. Well, I'm sure Gemini
is going to be great once they let me let me in. But so far, I've been.
even being able to. I use, so I have this small podcaster thing that I built for our podcast,
which does chapters creation, like named entity recognition, summarization, and all of that.
Cloud three is better than GPD4. Cloud 2 was unusable. So I used GPD4 for everything. And then when
OVos came out, I tried them again side by side and I posted it on, on Twitter as well.
Claude is very good. You know, it's much better. It seems to me. It's much better than GVD4.
at doing writing that is more, you know, I don't know, it just got good vibes, you know,
like the GPD4 text, you can tell it's like GPD4, you know, it's like it always uses certain
types of words and phrases and, you know, maybe it's just me because I've now done it for
50 podcast episodes, so I've read like 75, 80 generations of these things next to each other,
but clutter is really good. I know everybody is freaking out on Twitter about it.
my only experience of this is much better has been on the podcast use case. But I know that, you know,
Quran from from news research is a very big opus, pro opus person. So I think that's also,
it's great to have people that actually care about other models, you know, I think so far to a lot of
people, maybe Anthropic has been the sibling in the corner, you know, it's like,
cloud releases a new model and then open AI releases SORA and like, you know, there are like
all these different things. But yeah, the new models are good.
It's interesting. My perception is definitely that, just observationally,
Claude 3 is certainly the first thing that I've seen where lots of people,
no one's debating evals or anything like that. They're talking about the specific use cases that they have,
that they used to use chat GPT for every day, you know, day and day out, that they've now just switched over.
And that has, I think, shifted a lot of the sort of like vibe and stuff.
sentiment in the space, too. And I don't necessarily think that it's sort of a, a full, you know, sort of
full knock. Let's put it this way. I think it's less bad for open AI than it is good for Anthropic.
I think that because GPT-5 isn't there, people are not quite willing to sort of like, you know,
get overly critical of open AI, except in so far as they're wondering where GPT-5 is. But I do think that it
makes Anthropic look way more credible as a, as a, as a, you know, as a credible sort of player, you know,
as opposed to where they were.
Yeah.
And I would say the benchmarks veil is probably getting lifted this year.
I think last year people were like, okay, this is better than this on this benchmark,
blah, blah, blah, because maybe they did not have a lot of use cases that they did frequently.
So it's hard to like compare yourself.
So you defer to the benchmarks.
I think now as we go into 2024, a lot of people have started to use these models from, you know,
from very sophisticated things that they run in production to some utility that they have on their own.
Now they can just run them side by side.
And it's like, hey, I don't care that like the MMLU score of Opus is like slightly lower than GPD4.
It just works for me, you know?
And I think that's the same way that traditional software has been used by people.
Like you just strive for yourself and like which one does it work that works best for you.
Like nobody looks at benchmarks outside of like sales white papers, you know?
And I think it's great that we're going more in that direction.
We have an episode with ADAP coming out this weekend.
And some of their model releases they specifically say, we do not care about benchmarks,
so we didn't put them in, you know, because we don't want to look good on them.
We just want the product to work.
And I think more and more people will go that way.
Yeah.
I would say, like, it does take the win out of the sales for GPT5, which I know we're curious about
later on.
I think anytime you put out a new state-of-the-art model, you have to break through in some way.
and what Claude and Gemini have done
is effectively take away any advantage
to saying that you have
a million token context window.
Now everyone's just going to be like,
oh, okay, now you just match the other two guys.
And so that puts an insane amount of pressure
on what GPT5 is going to be
because it's just going to have,
like the only option it has now
because all the other models are multimodal,
all the other models are long context,
all the other models have perfect recall.
GP5 has to match everything and do more
to not be a flop.
Hello, friends, back again with part two.
If you haven't heard part one at this conversation, I suggest you go check it out.
But to be honest, they are kind of actually separable.
In this conversation, we get into a topic that I think Alessio and Swix are very well positioned
to discuss, which is what developers care about right now, what people are trying to build around.
I honestly think that one of the best ways to see the future in an industry like AI
is to try to dig deep on what developers and entrepreneurs are attracted to build,
even if it hasn't made it to the news pages yet.
So consider this your preview of six months from now, and let's dive in.
Let's bring it to the GBT5 conversation.
I mean, so I think that that's a great sort of assessment of just how the stakes have been raised.
You know, is your, I mean, so I guess maybe I'll frame this list as a question,
just sort of something that I've been watching.
Right now, the only thing that makes sense to me with how fundamentally unbothered
and unstressed open AI seems about everything is that they're sitting on something
that does meet all that criteria. Because, I mean, even in the Lex Friedman interview that Altman
recently did, you know, he's talking about other things coming out first. He's talking about,
he's just like, he, listen, he's good and he could play nonchalant, you know, if he wanted to.
So I don't want to read too much into it. But, you know, they've had so long to work on this.
Like, unless that we are like really meaningfully running up against some constraint, it just feels like,
you know, there's going to be some massive increase. But I don't know. What do you guys
think. Hard to speculate. At this point, they're, they're pretty good at PR and they're not
going to tell you anything that they don't want to, and they can tell you one thing and change their
minds the next day. So it's really, you know, I've always said that model version numbers are just
marketing exercises. Like, they have something and it's always improving. And at some point,
you just cut, cut it and decide to call it GP5. And it's more just about defining an arbitrary level
at which they're ready. And it's up to them on what ready means. We definitely, we definitely
definitely did see some leaks on GPT 4.5, as I think a lot of people reported it. I'm not sure
if you covered it. So it seems like there might be an intermediate release, but I did feel
coming out at the Lex Freeman interview that GPT5 was nowhere near. And, you know, it was kind of
a sharp contrast to Sam talking at Davos in February saying that, you know, it was his top priority.
So I find it hard to square. And honestly, like, there's also no point reading too much tea leaves
into what any one person says about something that hasn't happened yet or a decision that hasn't
been taken yet. So yeah, that's my two cents about it. Like, calm down. Let's just build.
Yeah, the February rumor was that we're going to work on AI agents. So I don't know, maybe they're
like whatever. Yeah, they had two agent, I think two agent projects, one desktop agent and one sort of
more general sort of GPT's like agent. And then Andre left. So he was supposed to be the guy on that.
What did Andre see? What did he see? I don't know. What did he see?
I don't know. But again, it's just like the rumors are always floating around, you know, but I think like this is, you know, we're not going to get to the end of the year without Jupiter 4.5 or 5, you know, that's definitely happening, you know. I think the biggest question is like are Anthropic and Google increasing the pace, you know, like is the, is the clot four coming out like in 12 months, like nine months? What's the what's the deal? Same with Gemini. They went from like one to 1.5 and like,
five days or something. So when's Gemini II coming out? Is that going to be soon? I don't know.
There are a lot of, a lot of speculations. But the good thing is that now, you can see a world
in which open AI doesn't rule everything. So that's the best, that's the best news that everybody got,
I would say. Yeah. And Mr. Al large also dropped in the last month. And not, it's not quite
GPD4 class, but very good from a new startup. So yeah, we have now slowly changed in
landscape. In my January recap, I was complaining that nothing's changed the landscape for a long
time. But now we do exist in a world, sort of a multipolar world where Claude and Gemini are legitimate
challengers to GPC4 and hopefully more will emerge as well, hopefully from meta.
Yeah. So let's actually talk about sort of the open source side of this for a minute.
So Mr. Large, notable because it's not available open source in the same way the other things are.
Although I think my perception, like the community largely recognizes that they want them to keep building open source stuff and they have to find some way to fund themselves that they're going to do that.
And so they kind of understand that they've got to figure out how to eat.
But we've got so, you know, there's mistral.
There's, I guess, rock now, which is, you know, Grock 1 is from October is open sourced.
Yeah.
Yeah.
And I thought you meant Grog the chip company.
No, no, you mean Twitter GROC.
Although GROC, the chip company, I think is even more interesting in some ways.
But and then there's the, you know, obviously Lama 3 is the one that sort of
everyone's wondering about two. And, you know, my, my sense of that the little bit that Zuckerberg was
talking about Lama 3 earlier this year suggested that, at least from an ambition standpoint,
he was not thinking about how do I make sure that, you know, meta-contain, you know, keeps the open
source thrown, you know, vis-a-vis mistral. He was thinking about how you go after, you know,
how he, you know, releases a thing that's, you know, every bit as good as whatever open AI is on at
that point. Yeah, from what I heard in the hallway said at GDC, Lama 3, the biggest model
will be 260 to 300 billion parameters. So that's quite large. That's not an open source model,
you know. You cannot give people a 300 billion parameters model and ask them to run it. You know,
it's very computer intensive. So I think the- It is a, it can be open-source. It's just,
it's going to be difficult to run, but that's a separate question of whether it's open-source.
It's more like, as you think about what they're doing it for, you know, it's not like empowering the
person running Lama on their laptop. It's like, oh, you can actually now use this to go after
Open AI, to go after Anthropic, to go after some of these companies at like the middle
complexity level, so to speak. Yeah, so obviously, you know, we assume that Chantala on the podcast
they're doing a lot here. They're making Piedorch better. You know, they want to, that's kind of like
maybe a little bit of a shot at Ambidia in a way, trying to get some of the Kuda dominance out of it.
Yeah, no, it's great. I love the duck destroying a lot of.
Monopoly's arc. It's been very entertaining. Let's bridge into the sort of big tech side of this,
because this is obviously like, so I think actually when I did my episode, this was one of the,
I added this as an additional war that that's something that I'm paying attention to.
So we've got Microsoft's moves with inflection, which I think potentially are being read as
a shift vis-a-vis their relationship with Open AI, which also the sort of mistral large
relationship seems to reinforce as well. We have Apple potentially,
entering the race finally, you know, giving up Project Titan and kind of trying to spend more effort
on this. Although, counterpoint, we also have them talking about it or there being reports of a deal
with Google, which, you know, is interesting to sort of see what their strategy there is. And then,
you know, meta has been largely quiet. We kind of just talked about the main piece. But,
you know, there's, and then there's spoilers like Elon. I mean, you know, what of those things
has sort of been most interesting to you guys as you think about what's going to shake out for the rest
of this year? I'll take a good.
crack. So the reason we don't have
a fifth war for the big tech wars
is that's one of those things where I just
feel like we don't cover
differently from other
media channels, I guess.
So in our anti-interestist
we actually say, like we try not to cover
the big tech Game of Thrones, or it's
proxied through
all the other four wars anyway. So there's just a lot
of overlap. Yeah, I think absolutely
personally, the most interesting one is Apple
entering the race. They actually release, they
announced their first large language model that they train
themselves. It's like a 30 billion multimodal model. People weren't that impressed, but it was like
the first time that Apple has kind of showcased that, yeah, we're training large models in-house
as well. Of course, like, they might be doing this to deal with Google. I don't know. It's, it sounds
very sort of rumor-y to me. And it's probably, if it's on device, it's going to be a smaller model.
So something like a Gemma, it's going to be smarter auto-complete. I don't know what to say.
I'm still here dealing with like Siri, which probably hasn't been updated since
God knows when it was introduced. It's horrible. It makes me so angry.
So one, as an Apple customer and user, I'm just hoping for better AI on Apple itself.
But two, they are the gold standard when it comes to local devices, personal compute and trust.
You trust them with their data. And I think that's what a lot of people are looking for in AI,
that they love the benefits of AI. They don't love the downsides, which is that you have to send all your data
to some cloud somewhere.
And some of this data that we're going to feed AI
is the most personal data there is.
So Apple being one of the most trusted personal data companies,
I think is very important that they enter the AI race.
And I hope to see more out of them.
To me, the biggest question with the Google deal is like,
who's paying who?
Because for the browsers, Google pays Apple like $18, $20 billion every year
to be the default browser.
Is Google going to pay you to have Javanai
or it's Apple paying Google to have Javanai?
I think that's like what I'm most interested to figure out.
Because with the browsers, it's like it's the entry point to the thing.
So it's really valuable to be the default.
That's what Google pays.
But I wonder if like the perception in AI is going to be like, hey, you just have to have a good local model on my phone to be worth me purchasing your device.
And that's going to drive Apple to be the one buying the model.
But then like Sean said, they're doing the MM1 themselves.
So are they saying we do models, but they're not as good?
it as the Google wants.
I don't know.
The whole thing is, it's really confusing,
but it makes for a great meme material on Twitter.
Yeah, I mean, I think, like,
they are possibly more than OpenEI in Microsoft and Amazon.
They are the most full-stack company there is in computing.
And so, like, they own the chips, man.
Like, they manufacture everything.
So if there was a company that could, you know,
seriously challenge the other AI players,
it would be Apple.
and it's I don't think it's as hard as self-driving.
So like maybe they've just been investing in the wrong thing this whole time.
Wall Street certainly thinks so.
Wall Street loved that move, man.
There's a big sigh of relief.
Well, let's move away from sort of the big stuff.
I think to both of your points, it's going to...
Can I drop one factoid about this Wall Street thing?
I went and looked at when Nita went from being a VR company to an AI company.
and I think the stock, I'm trying to look up the details now, the stock has gone up
187% since Lama 1, which is $830 billion in market value created in the past year.
Yeah, if you haven't seen the chart, it's actually like remarkable if you draw a little
arrow on it.
It's like, no, we're an AI company now.
Forget the VR thing.
It's, it is an.
interesting. No, I think, unless you called it, it's sort of like Zuck's Disrupture Arc or whatever.
He really does. He is in the midst of a total, you know, I don't know if it's a redemption arc or it's just, it's something different where, you know, he's sort of the spoiler. Like, people loved him just freestyle talking about why he thought they had a better headset than Apple.
Like, even if they didn't agree, they just loved he was going direct to camera and talking about it for, you know, five minutes or whatever. So that's a fascinating shift that I don't think anyone had on their bingo card, you know, whatever two years ago.
We still didn't see him fight Elon, though.
Yeah.
I mean, hey, don't write it off.
You know, maybe just these things take a while to happen.
But we need to see him fight in the Coliseum.
No, I think, you know, in terms of like self-management, life leadership, I think he has,
there's a lot of lessons to learn from him, you know.
He might, you know, you might kind of quibble with like the social impact of Facebook.
But just himself as a, in terms of personal growth and perseverance through like a lot of change.
and everyone throwing stuff his way.
I think there's a lot to say about,
to learn from Zach,
which is crazy because he's my age.
Yeah, right.
Awesome.
So one of the big things that I think you guys have,
you know, distinct and unique insight into being where you are
and where you work on is, you know,
what developers are getting really excited about right now.
And by that, I mean, on the one hand,
certainly, you know, like startups
who are actually kind of formalized and formed a startups.
But also, you know, just in terms of like
what people are spending their nights and weekends on, what they're, you know, coming to hackathons to do.
And, you know, I think it's a, it's such a fascinating indicator for where things are headed.
Like, if you zoom back a year, right now was right when everyone was getting so, so excited about AI agent stuff.
Auto GBT and baby AGI and these things were like, if you dropped anything on YouTube about those, like instantly tens of thousands of views.
I know because I had like a 50,000 view video, like the second day that I was.
I was doing the show on YouTube, you know, because I was talking about AutoGPT.
And so anyways, you know, obviously that's sort of not totally come to fruition yet.
But what are some of the trends in what you guys are seeing in terms of people's interest
and what people are building.
I can start maybe with the agents part.
And then I know Sean is doing the Fusion meetup tonight.
There's a lot of different things.
The Agent Wave has been the most interesting kind of like dream to reality arc.
So AutoGBT, I think they went.
from zero to like 125,000 get up stars in six weeks.
And then one year later, they have 150,000 stars.
So there's kind of been a big plateau.
I mean, you might say there are just not that many people that can start it.
You know, everybody already started.
But the promise of, hey, I'll just give you a goal and you do it.
I think it's like amazing to get people's imagination going.
You know, they're like, oh, wow, this is awesome.
them everybody, everybody can try this to do anything.
But then as technologists, you're like, well, that's just like not possible.
You know, we would have like solved everything.
And I think it takes a little bit to go from the promise and the hope that people show you to then try and it yourself and going back to say, okay, this is not really working for me.
And David won from a depth, you know, in our episode, he specifically said, we don't want to do a bottom-s-up product.
You know, we don't want something that everybody can just use and try because.
is it's really hard to get it to be reliable.
So we're seeing a lot of companies doing vertical agents that are narrow for a specific domain
and they're very good at something.
Mike Conover, who was at Databricks before, is also a friend of Latenspace.
He's doing this new company called Bright Wave doing AI agents for financial research.
And that's it, you know, and they're doing very well.
There are other companies doing it in security, doing it in compliance, doing it in legal.
All of these things that like people, nobody just wakes up and say, oh, I cannot wait to go on auto GPD and ask it to do a compliance review of my thing.
You know, just not what inspires people.
So think the gap on the developer side has been the more bottom sub hacker mentality is trying to build this like very generic agents that can do a lot of open end the task.
And then the more business side of things is like, hey, if I want to raise my next round, I can not just like sit around and mess around.
mess around with like super generic stuff.
I need to find a use case that really works.
And I think that that is worth for a lot of folks.
In parallel, you have a lot of companies doing e-vails.
There are dozens of them that just want to help you measure how good your models are doing.
Again, if you build e-vails, you need to also have a restrained surface area to actually figure
out whether or not it's good, right?
Because you cannot eval anything, everything under the sun.
So that's another category where I've seen from the startup,
pitches that I've seen there's a lot of interest in the enterprise. It's just like really
fragmented because the production use cases are just coming like now. You know, there are not
a lot of long established ones to test against. And so that's kind of on the virtual agents.
And then the robotic side is probably been the thing that surprised me the most at
MBDGTC, the amount of robots that were there, they were just like robots everywhere.
Like both in the keynote and then on the show floor, you would have Boston Dynamics,
dogs running around.
There was this like Fox robot that had like a virtual face that like talked to you and
like moved in real time.
There were industrial robots.
Ambidia did a big push on their own omniverse thing, which is like this digital twin of
whatever environments you're in that you can use to train the robots agents.
So that kind of takes people back to the reinforcement learning days.
But yeah, agents, people want them.
You know, people want them.
I give a talk about the rise of the full stack employees and kind of.
this future, the same way for stack engineers kind of work across the stack. In the future,
every employee is going to interact with every part of the organization through agents and AI-enabled
tooling. This is happening. It just needs to be a lot more narrow than maybe the first
approach that we took, which is just put a string in auto-GPD and prey. But yeah, there's a lot
of super interesting stuff going on. Yeah, but unless you covered a lot of stuff there, I'll separate
the robotics piece because I feel like that's so different from the software world. But yeah, we do
we do talk to a lot of engineers and, you know, that this is our sort of bread and butter.
And I do agree that vertical agents have worked out a lot better than the horizontal ones.
I think, you know, the point I'll make here is just the reason AutoGBT and maybe IGI, you know,
it's in the name, like they were promising AGI.
But I think people are discovering that you cannot engineer your way to AGI.
It has to be done at the model level.
And all these engineer and prompt engineering hacks on top of it weren't really going to get us there
in a meaningful way without much further improvements in the models. I would say I'll go so far
to say even Devin, which is I would, I think the most advanced agents that we've ever seen still
requires a lot of engineering and still probably falls apart a lot in terms of like practical usage
or it's just way too slow and expensive for, you know, what it's what is promised compared
to the video. So yeah, that's what happened with agents from last year. But I do, I do see like
vertical agents being very popular. And sometimes,
I think the word agent might even be overused sometimes.
People don't really care whether or not you call it an AI agent, right?
Like, does it replace boring, menial tasks that I do, that I might hire human to do
or that the human who is hired to do it, like, actually doesn't really want to do?
And I think there's absolutely ways in sort of a vertical context that you can actually go
after very routine tasks that can be scaled out to a lot of, you know, AI assistance.
So, so yeah, I would sort of basically plus one, what?
still sit there. I think it's very, very promising, and I think more people should work on it,
not less. Like, there's not enough people. Like, this should be the main thrust of the AI
engineer is to look for use cases and go to production with them instead of just always working
on some AGI promising thing that never arrives. I can only add that, so I've been
fiercely making tutorials behind the scenes around basically everything you can imagine with AI.
We've probably done, we've done about 300 tutorials over the last couple months. And the verticalized
anything, right? Like, this is a solution for your particular job or role, even if it's way less
interesting or kind of sexy is so radically more useful to people in terms of intersecting
with how, like, those are the ways that people are actually adopting AI in a lot of cases. It's just
a thing that I do over and over again. By the way, I think that's the same way that even the
generalized models are getting adopted, you know? It's like, I use Mid-Journey for lots of stuff,
But the main thing I use it for is YouTube thumbnails every day.
Like day in, day out, I will always do a YouTube thumbnail, you know, or two with with Mid-Jourty, right?
And it's like you can you can start to extrapolate that across a lot of things.
And all of a sudden, you know, AI doesn't, it looks revolutionary because of a million small changes rather than one sort of big dramatic change.
And I think that the verticalization of agents is sort of a great example of how that's going to play out too.
Yeah.
So I'll have one caveat here, which is I think that because multimodal,
models are now commonplace like claw gemini open the eye all all like very very easily multimodal
apples easily multimodal all this stuff there is a switch for agents for sort of general desktop
browsing that I think people need to keep an eye on it's not mature yet but it is absolutely coming on
the way and so just as we're starting to talk about this verticalization piece because that is mature
that is ready for people to work on that is that a lot of people are making really good money doing
that. The thing that's on the rise is this sort of drive-by vision version of the agent where
they're not specifically taking in text or anything. They're just watching your screen just like
someone else would and piloting it by vision. And, you know, in the episode with David that
will have dropped by the time that this air is. I think that is the promise of adept.
And that is a promise of what a lot of these sort of desktop agents are. And that is the more
a general purpose system that could be as big as the browser, the operating system.
People really want to build that foundational piece of software in AI.
And I would see the potential there for desktop agents being that, that you can have
self-driving computers.
Don't write the horizontal piece out.
I just think we took a while to get there.
What else are you guys seeing that's interesting to you?
I'm looking at your notes and seeing a ton of categories.
Yeah.
I'll take the next two as like as one category.
which is basically alternative architectures, right?
The two main things that everyone following AI kind of knows now is, one, the diffusion
architecture and two, the, let's just say, the decoder-only transformer architecture
that is popular as by GPT, you can read, you can look on YouTube for thousands and thousands
tutorials on each of those things.
What we are talking about here is what's next, what people are researching and what
could be on the horizon that takes the place of those other two things.
So first of all, talk about transformer architectures and then diffusion.
So transform the two leading candidates are effective.
specifically RWKV and the state space models, the most recent one of which is Mamba, but there's
others, like the Striped Haina and the S4H3 stuff coming out of H-3 research at Stanford.
And all of those are non-quadratic language models that scale, they promise to scale a lot
better than the traditional transformer.
This might be too theoretical for most people right now, but it's going to be, it's going
to come out in weird ways where imagine if, like, right now, the talk of the town,
is that Claude and Gemini have a million tokens of context and like, whoa, you can put in, like,
you know, two hours of video now. Okay, but like, what if you put, what if we could, like,
throw in, you know, 200,000 hours of video? Like, how does that change your usage of AI?
What if you could throw in the entire genetic sequence of a human and, like, synthesize new
drugs? Like, how does that change things? Like, we don't know because we haven't had access to this
capability being so cheap before. And that's the ultimate promise in these two.
models, they're not there yet, but we're seeing very, very good progress. Our WKV in Mamba are probably
the two leading examples, both of which are open source that you can try them today and have a lot
of progress there. The main thing I'll highlight for Al UKV is that at the 7B level, they seem to
have beat Lama 2 in all benchmarks that matter at the same size for the same amount of training
as an open source model. So that's exciting. You know, they're at 7B now. They're not at 7TB. We don't
know if it'll scale. And then the other thing is diffusion. Diffusions and transformers are
kind of on a collision course. The original stable diffusion already used transformers in
parts of its architecture. It seems that transformers are eating more and more of those layers,
particularly the VAE layer. So that's the diffusion transformer is what SORA is built on.
The guy who wrote the diffusion transformer paper, Bill Pebbles is, is the lead tech guy on SORA.
So you'll just see a lot more diffusion transformer stuff going on. But there's
There's more sort of experimentation with diffusion.
I'm holding a meetup actually here in San Francisco
that's going to be like the state of diffusion,
which I'm pretty excited about.
Stability is doing a lot of good work.
And if you look at the architecture of how they're creating stable diffusion
three, hourglass diffusion,
and late consistency models or SDSL turbo,
all of these are like very, very interesting innovations
on like the original idea of what stable diffusion was.
So if you think that it is expensive to create
or slow to create stable diffusion or an AI generated art,
you are not up to date.
with the latest models.
If you think it is hard to create text and images,
you are not up to date with the latest models.
And people still are kind of far behind.
The last piece of which is the wild cut I always kind of hold out,
which is text diffusion.
So instead of using auto-generative or auto-regressive transformers,
can you use text to diffuse?
So you can use diffusion models to diffuse
and create entire chunks of text all at once
instead of token by token.
And that is something that Mid-Journey confirmed today,
because it was only rumored the past few,
months. But they confirmed today that they were looking into. So all those things are like very
exciting new model architectures that are maybe something that you'll see in production two to three
years from now. So the couple of the trends that I want to just get your takes on because they're
sort of something that that seems like they're coming up are one, sort of these wearable,
you know, kind of passive AI experiences where they're absorbing a lot of what's going on around
you and then kind of bringing things back. And then the other one that I that I wanted to see,
if you guys had thoughts on were sort of this next generation of chip companies. Obviously,
there's a huge amount of emphasis on, on hardware and Silicon and different ways of doing things.
But, you know, love your take on neither or both of those.
So for wearables, I'm very excited about it. I want wearables on me at all times. I have two
right here to quantify my health. And I, you know, I'm all four at them. But society is not
ready for wearables, right? Like, no one's comfortable with a device on recording every single
conversation we have. Even all three of us,
here as podcasters, we don't record everything that we say. And I think there's a social shift
that needs to happen. I am an investor in TAB. They are renaming to a broader vision, but they are
one of the sort of three or four leading wearables in this space, and sort of the AI pendants or
AI OS or AI personal companion space. I have seen two humanes in a wild in San Francisco.
I'm very, very excited to report that there are people walking around with those things on their
chest. And it is as goofy as it sounds, it absolutely is going to fail, but God bless them
for trying. And I've also bought a rabbit. So I'm very excited for all those things to arrive.
But yeah, people are very keen on hardware. I think the idea that you can have physical objects
that embody an AI that do specific things for you is as old as, you know, the sort of
Golem in sort of medieval times in terms of like how much we want our objects to be smart and
do things for us. And I think it's absolutely a great play. The funny thing is people are much
more willing to pay you up front for a hardware device than they are willing to pay like an $8
a month subscription recurring for software, right? And so the interesting economics of these
wearable companies is they have negative float in the sense that people pay deposit.
it's upfront. Like, I paid like, I don't know, $200 for the rabbit upfront and I don't get it
for another six months. I paid $600 for the tab, and I don't get it for another six months.
And then they can take that money and sort of invested in like their next events or their
next properties of ventures. And like, I think that's a very interesting reversal of economics
from other types of AI companies that I see. And I think, yeah, just the tactile feel of an AI,
I think is very promising. I, let's see, I don't know if you have other.
thoughts on the wearable stuff.
The Open Interpreter just announced their product four hours ago.
Yeah.
It's not really a wearable, but it's still like a physical device.
It's a push-to-talk, Mike, to a device on your, on your laptop, right?
It's a $99 push-to-talk.
But again, go back to your point.
It's like people want to, people are interested in spending money for, like, things that they can hold.
You know, I don't know what that means overall for, like, where things are going.
but making more of this AI be a physical part of your life.
I think people are interested in that, but I agree with Sean.
I mean, I've been, I talk to Avi about this, but Avi's point is like most consumers
like care about utility more than they care about privacy, you know, like you've seen with
social media, but I also think there's a big societal reaction to AI that is like much
more rooted than the social media one, but we'll see.
But again, a lot of work, a lot of developers, a lot of money going into it.
So there's bound to be experiments being run.
On the chip-in-
Sorry, I'll just keep shipping one more thing and then we transition to the chips.
The thing I'll caution people on is don't overly focus on the form factor.
The form factor is a delivery mode.
There will be many form factors.
It doesn't matter so much as where in the data war does it sit.
It actually is context acquisition because, and maybe a little bit of multimodality.
context is king.
If you have access to data that no one else has,
then you will be able to create AI that no one else can create.
And so what is the most personal context?
It is your everyday conversation.
It is as close to mapping your mental train of thought
as possible without physically you writing down notes.
So that is the promise, the ultimate goal here,
which is like personal context.
It's always available on you, you know, lowly and see all that stuff.
But that's the frame I want to give people that the form factors will change
and there will be multiple form factors,
but it's the software behind that
and the personal context
that you cannot get anywhere else
that'll win.
Yeah, so that was wearables.
On the chip side,
yeah,
Grock was probably the biggest release.
Jonathan,
but it's not even a new release
because the company,
I think,
was started in 2016,
so it's actually quite old,
but now recently captured
the people's imagination
with their mixed raw
500 tokens a second demo.
Yeah,
I think so far,
the battle on the GPU side
has been either you go kind of like massive chip like the cerebrus of the world where one chip
from cerebrus is about two million dollars you know that's compared obviously you cannot compare
one chip versus one chip but h100 is like 40,000 something like that the problem with
those architectures has been they want to be very general you know but like they wanted to put
a lot of the ram the s ram on the chip it's much more convenient when you're using large
language models, but the models outpays the size of the chips, and chips have a much longer,
you know, turn around cycle.
GROC today is great for the current architecture.
It's a lot more expensive also as far as dollar per flop, but their idea is like, hey, when you
have very high concurrency, we actually were much cheaper, you know, you shouldn't just
be looking at the compute power.
For most people, this doesn't really matter, you know?
Like, I think that's like the most interesting thing to me is like, we're now gone
back with AI to a world where developers care about what hardware is running, which was not the
case in traditional software for like maybe 20 years since as the cloud is getting really big.
My thinking is that in the next two, three years, like, we're going to go back to that.
But like, people are not going to be sweating.
Oh, what GPU do you have in your cloud?
What do you have?
It's like, yeah, you want to run this model.
We can run it at the same speed as everybody else.
And then everybody will make different choices, whether they want to have higher.
front-end capital investment and then better utilization.
Some people would rather do lower investment before and then upgrade later.
There are a lot of parameters.
And then there's the dark horses, right?
That is some of the smaller companies like Lemurion Labs, Medax that are working on,
maybe not a chip alone, but also like some of the actual math infrastructure and the
instructions on it that make them run.
There's a lot going on.
But yeah, I think the episode with Dylan will be interesting for people.
But I think we also came out of saying, hey, everybody has pros and cons.
There's no, it's different than the models where you're like, oh, this one is definitely better for me.
And I'm going to use it.
I think for most people, it's like fun, Twitter meaming, you know, but it's like 99% of people that tweet about the stuff are never going to buy any of these chips anyway.
So it's really more for entertainment.
Wow, I mean, this is serious business here
where you're talking about, you know, like,
the potential new Nvidia,
if anyone can take like 1% of Nvidia's business,
they are a serious startup that you should look at, right?
So that's, that's my take on Maddox.
I'm more talking about like,
how should people think about it, you know?
It's like, I think like the end user
is not impacted as much.
So I disagree.
Yeah, I love disagreements because, you know,
who likes a podcast where all three people
always agree with each other.
You will see the impact of this
in the tokens per second over time.
This year, I have very, very credible sources all telling me that the average tokens per second,
right now we have somewhere between 50 to 100 as like the norm for people.
Average tokens per second will go to 500 to 2,000 this year from a number of chip suppliers
that I cannot name.
So like that is, that will cause a step change in the use cases.
Every time you have an order of magnitude improvement in the speed of something, you unlock new
use cases that become fun instead of a chore.
And so that's what I would caution this audience to think about, which is like, what can
you do in much higher AI speed?
It's not just things streaming out faster.
It is things working in the background a lot more seamlessly and therefore being a lot
more useful than previously imagined.
So that would be my two cents on that.
Yeah.
Yeah.
I mean, the new ambidea chips are also much faster.
To me, that's true.
When it comes about startups, it's like, I.
Are the startups pushing the performance on the incumbents or are the incumbents still leading
and then the startups are like riding the same wave?
You know, I don't have yet a good sense of that.
It's like, you know, it's next year's Nvidia release just going to be better than everything
that gets released this year, you know?
If that's the case, it's like, okay, damn Jensen, you know?
It's like the meme is like, I'm going to fight.
I'm going to fight Nvidia.
It's like, damn, Jensen got hands.
I'm like, he really does.
So, I'll see.
Well, awesome conversation, guys.
I guess just by way of wrapping up,
call it over the next three months
between now and sort of the beginning of summer.
What's one prediction that each of you has?
It can be about anything.
Could be big company, can be startup.
It could be something you have privileged information
that you know and you just won't tell us
that you actually know.
Does it have to be something that we think
it's going to be true or something that we think?
Because for me, it's like,
is Sundar going to be the CEO of Google?
Maybe not in three months, maybe like six months, nine months,
you know people are like oh maybe demis is going to be the new CEO that was kind of like
i i was busy like fishing some deep mind people and google people for like a good guest for the pot
and i was like oh what about jeffin and they're like well demis is really like the person that runs
everything anyway and this stuff and it's like interesting and that so i don't know what about sergey
sergey could come back i don't know like he's making more appearances these days yeah
i don't i i bet we can just put it as like you know yeah
My thing is like CEO change potential.
Again, three months is too short to make a prediction.
That's fine.
The time scale might be off.
Yeah, I mean, for me, I think the progression in vertical agent companies will keep going.
We just had the other day, Klarna talking about how they replaced like 700 of their customer support agents with the AI agent.
that's the beginning, guys.
Imagine this rolling out across most of the Fortune 500.
This is, and I'm not saying this is like a utopian scenario.
There will be very, very embarrassing and bad outcomes of this
where, like, humans would never make this mistake, but AIS did,
and like we'll all laugh at it or will be very offended by whatever, you know, bad outcome it did.
So we have to be responsible and careful in the rollout.
But yeah, this is rolling out.
You know, Alessio likes to say that this year's the year of AI in production.
Let's see it.
Let's see all these sort of vertical.
full-stack employees come out into the workforce.
Love it. All right guys. Well, thank you so much for sharing your thoughts and insights here and
I can't wait to do it again. Welcome back again. It's Charlie, your AI co-host. We're now in
part two of the special weekend episode collating some of Swicks and Alessio's recent appearances.
If you're not active in the latent space discord, you might not be aware of the many, many,
many in person.
Events we host gathering our listener community all over the world.
You can see the latent space community page
for how to join and subscribe to our event calendar for future meetups.
We're going to share some of our recent live appearances in this next part,
starting with the Thursday nights in AI meetup,
a regular fixture in the SFAI scene run by imbue and outset capital,
primarily our former guest, Kanjan Q, Ali Roda and Josh Olbrecht.
Here's Swix.
Today, for those of you who have been here before, you know the general format.
So we'll do a quick fireside Q&A with Swix where we're asking him the questions.
Then we'll actually go to our rapid fire Q&A where we're asking really fast,
hopefully spicy questions.
And then we'll open it up to the audience for your questions.
So you guys, speak around the room, submit your questions,
and we'll go through as many of them as possible during that period.
And then actually, Swix brought a gift for us, which is two latens-based t-shirts.
engineer a AI engineer t-shirts and those will be awarded to the two
spiciest questions askers so and I'll let Josh decide on that so we want to
get your spiciest take so please send them in during the event as we're
talking and then also at the end all right with that's let's get going go okay
welcome SWICs thank you for that intro thanks everyone for sure
does it feel to be interviewed rather than the interviewer weird I don't know
what to do in this chair.
Yeah.
Where should I put my hands?
Yeah, exactly.
You look good.
And I also love asking follow-up questions.
And I tend to, like, sort of take over panels a lot.
If you ever see me on a panel, I tend to ask the other panelists questions.
Okay.
So I might ask you back.
This is like a free Mbue interview, so why not?
That's right.
That's right.
Yeah, so you interviewed Ken June, the CEO of Mbue before, but you didn't interview Josh, right?
No, no.
So maybe tonight.
Yeah.
Okay.
We'll look for different questions and look for alignment.
I love it.
All right, I just want to hear this story.
You know, you've completely exploded with latent space and AI engineer.
And I know you also, before all of that, had exploded in popularity for your learning in public movement
and your dev tools work and dev relations work.
So who are you and how did you get here?
Let's start with that.
Quick story is, I'm Sean.
I'm from Singapore.
So it's my initials.
For those who don't know, a lot of Singaporeans are ethically Chinese and we have Chinese names and English names.
So it's just my initials.
came to the U.S. for college and I've been here for about 15 years, but half of that was in finance,
and then the other half was in tech. And tech is where I was most known just because I realized that
I was much more aligned towards learning in public, whereas in finance, everything's a trade secret.
Everything is zero-sum. Whereas in tech, like, you're allowed to come to meetups and conferences
and share your learnings and share your mistakes even. And that's totally fine. You, like, open-source your code.
it's totally fine. And even better, you contribute PR to other people's code, which is even better.
And I found that I thrives in that learning public environments. And that kind of got me started.
I was an early higher, early Darfurations higher at Netlify and then did the same at EWS,
temporal and airbyte. And so that's like the whole story. I can talk more about like developer
tooling and developer relations if that's something that people are interested in. But I think the more
recent thing is AI and I started really being interested in it mostly because it
the approximate cause of starting lane space was stable diffusion when you could run a large
model that could do sufficiently enough on your on your desktop where I was like
okay like this is something qualitatively very different and that's and then we started
lane space and this is something different we have to talk about it on a podcast
there we go yeah it wasn't it wasn't a podcast for like four months and then and then
I had been running a Discord for DevTools investors because I also invest in DevTools,
and I advise companies on DevTools, deflation things.
And I think it was the start of 20, 23, when Alessio and I were both like, you know,
I think we need to like get more tokens out of people.
And I was running out of original sources to write about.
So I was like, okay, I'll go get those original sources.
And I think that that's when we started a podcast.
And I think it's just the chemistry between us, the way we spike in different ways.
and also like honestly the kind participation of the guests to give us their time.
Like, you know, like getting George Hoss was a big deal.
And also shout out to Alessio for just cold emailing him for booking some of our biggest guests.
And just working really hard to try to tell the story that people can use at work.
I think that there's a lot of AI podcasts out there and a lot of AI kind of forums or fireside chats with no fire.
They always talk about like what's your AGI timeline, what's your P-Doom.
very, very nice hallway conversations for freshman year, but not very useful for work and,
like, you know, practically like making money and like, and thinking about, you know, changing
the everyday lives. I think what's interesting is obviously you care about the existential
safety of the human race. But in the meantime, we got to eat. So, so, so I think that's like
kind of latent spaces niche. Like, we explicitly don't really talk about AGI. We explicitly don't
talk about things that we're like a little bit too far out like we don't do a ton of robotics we
don't do a ton of like high frequency trading there's tons of machine learning in there but we just
don't do that because like we're like all right what are most software engineers going to need
because that's our background and that's the audience that we serve and I think just like being
really clear on that audience has been has resonated with people yeah you would never expect
a technical podcast to reach like a general audience like top 10 on the tech charts but you know
I've been surprised by that before and it's been successful I don't know
I don't know what to say about that.
I think honestly, I kind of have this negative reaction
towards being classified as a podcast
because the podcast is downstream of ideas.
And it's one mode of conversation.
It's one mode of idea delivery.
But you can deliver ideas on a newsletter
in a person like this.
There's so many different ways.
And so I think I think about it more
as we are trying to start or serve an industry.
And that industry is the AI engineer industry,
which we can talk about more.
Yes, let's go into that.
So the AI engineer, you penned a piece
called the rise of the AI engineer.
You tweeted about it.
Andre Carpathie also responded, largely agreeing
with what you said.
What is an AI engineer?
The AI engineer is the software engineer building
with AI, enhanced by AI.
And eventually it will be non-human engineers
writing code for you, which I know Mbue is all about.
You're saying eventually the AI engineer
will become a non-human engineer.
That will be a non-human engineer.
That would be one kind of AI engineer that people are trying to build, and it's probably
the most furthest away in terms of being reality, because it's so hard.
Got it.
But there are three types of AI engineer, and I just went through the three.
One is AI enhanced, where you use AI products like copilot and cursor, and two is AI
products engineer, where you expose AI capabilities to the end user as a software engineer,
like not doing pre-training, not being an ML researcher, not being an ML engineer, but just interacting
with foundation models and probably APIs from
nation model labs.
What's the third one?
And the third one is the non-human AI engineer,
the fully autonomous dream, you know, coder.
How long do you think it is until we get to early, early?
This is my equivalent of AGI timelines.
I know, I know.
You set yourself up for this.
Lots of active, like, I mean, I have supported companies actively working on that.
I think it's more useful to think about levels of autonomy.
And so my answer to that is, you know,
perpetually five years away until,
until it figures it out.
No, but my actual anecdote,
the closest comparison we have to that is self-driving.
We're doing this in San Francisco,
for those who are watching on the live stream,
if you haven't come to San Francisco
and taken a waymo ride,
just come, get a friend, take a way more ride.
I remember 2014, we covered a little bit of autos
in my hedge fund,
and I remember telling a friend,
I was like, self-driving cars around the corner,
like, this is it, like, you know,
parking will be a thing of the past,
and it didn't happen for the next 10 years.
But now most of us in San Francisco can take it for granted.
So I think you just have to be mindful that the rough edges take a long time.
And yes, it's going to work in demos, then it's going to work a little bit further out.
And it's just going to take a long time.
The more useful mental model I have is levels of autonomy.
So in self-driving, you have level 1, 2, 3, 4, 5, just the amount of human attention that you get.
At first, your hands are always on 10 and 2, and you have to pay attention to the job.
driving every 30 seconds, and eventually you can sleep in the car, right?
So there's a whole spectrum of that.
So what's the equivalent for that for coding?
Keep your hands on the keyboard and then eventually you kind of got off.
Well, you tab to accept everything.
Oh, that's good.
Yeah.
Doesn't that already happen?
Yeah, approved the PR.
Approved, this looks good.
That's the dream that people want.
Because really you unlock a lot of coding when people, non-technical people can file issues,
and then the AI engineer can sort of automatically write code, pass your tests, and if it,
If it kind of works as advertised, then you can just kind of merge it.
And then you, 10x, 100x, the number of developers at your company immediately.
So that's the goal.
That's the Holy Grail.
We're not there yet.
But sweep, code gen.
There's a bunch of companies, Magic probably, are all working towards that.
And so the TLDR, like the thing that we covered, Leson and I covered in the January recap that we did,
was that the basic split that people should have in their minds is the inner loop versus the outer loop for the developer.
Inner loop is everything that happens in your IDE between Git commits,
and outer loop is what happens when you push up your Git commit to GitHub, for example, or GitLab.
And that's a nice split, which means like everything local, everything that needs to be fast.
It's for everything that's kind of very hands-on for developers.
It's probably easier to automate or easier to have code assistance.
That's what co-pilot is.
That's what all those things are.
And then everything that happens autonomously when you're effectively away from the keyboard with like a GitHub issue or something,
And that is more outer loop where you're relying a lot more in autonomy, and our LMs are maybe not smart enough to do that yet.
Do you have any thoughts on kind of the user experience and how that will change?
One of the things that has happened for me, kind of looking at some of these products and playing around the things ourselves, like, you know, it sounds good to have an automated PR.
Then you get an automated PR and you're like, I really don't want to review like 300 lines of generated code and like find the bug.
Well, then you have another agent that's a reviewer and then you can't try it like, oh, go fix it.
It comes back with 400 lines and now...
Yes, there is a length bias to code, right?
And you do have higher passing rates in PRs.
This is a documented human behavior thing, right?
Send me two lines of code.
I will review the shit out of that.
I don't know if I can swear on this.
Send me, send me 200 lines of code looks good to me.
Yeah.
Guess what?
The agents are going to perfectly happy to copy that behavior from us
when we actually want them to do the opposite.
So yeah, I think that the GAN model of code generation
probably not going to work super well.
I do think we probably need just better planning from the start,
which is, I'm just repeating the imbue thesis, by the way.
Just go listen to Kenjin and talk about this.
She's much better at it than I am.
But yeah, I think the code review thing is going to be,
I think that what codium, the two codiums, the Israeli one.
This is the one with the E.
Yeah, coding with the E.
They still have refused to rename.
I'm friends with both of them every month.
You're like, guys, let's all come to one room.
Yeah, like, you know, someone's got to fold.
Coding with the E has gone, like you've got to write the test first, right?
You write the, it's like a sort of tripartite relationship.
Again, this is also covered on a podcast with them, which is fantastic.
Like you interview me, you sort of threw me, you interview like the avatar, the past avatars.
I've been watching the Netflix show, by the way, it's fantastic.
But like, so, so Codium is like, they've already thought this all the way through.
They're like, okay, you write the user story.
From the user story, you generate all the tests.
You also generate the code.
and you update any one of those, they all have to update together.
And probably the critical factor is the test generation from the story
because everything else can just kind of bounce the heads off of those things
until they pass.
So you have to write good tests.
It's kind of like to eat your vegetables of coding, right?
Which nobody really wants to do.
And so I think it's a really smart tactic to go to market
by saying we automatically generate tests for you and start not great,
but then get better,
and eventually you get to the weakest point in the chain
for the entire loop of code generation.
What do you think the biggest link is?
The weakest link?
Yeah, it's testing.
Yeah, yeah, yeah.
Do you think there's a way to, like,
are there some promising avenues you see forward
for making that actually better?
For making it better.
You have to have, like, good isolation,
and I think, like, proper serverless cloud environment
is integral to that.
It could be like a flyio, it could be like a Cloudfair worker.
It depends how many resources your test environment needs.
And effectively, I was talking about this, I think, with maybe Rob earlier in the audience,
where every agent needs a sandbox.
If you're a code agent, you need a coding sandbox, but if you're whatever, like MbU used
to have this like sort of Minecraft's clone that was much faster.
If you have a model of the real world, you have to go generate some plan or some code
or some, whatever, tested against that real world
so that you can get this iterative feedback
and then get the final result back
that is somewhat validated against the real world.
And so you need a really good sandbox.
I don't think people,
and I think this is an infrastructure need
that humans have had for a long time.
We've never solved it for ourselves,
and now we have to solve it for about 1,000 times
larger quantity of agents than actually exists.
And so I think we eventually have to involve
a lot more infrastructure in order to serve these things.
So yeah, so for those who don't know, like I also have, so we're talking about the rise of the
AI engineer, I also have previous conversations about immutable infrastructure, cloud environments
and that kind of stuff.
And this is all of the kinds, like, like in order to solve agents and coding agents, we're
going to have to solve the other stuff too along the way.
And it's really neat for me to see all that tie together in my DevTools work that all these themes
kind of reemerged just naturally just because everything we needed for humans, we just need 100 times
more for agents.
Let's talk about the AI engineer.
AI engineer has become a whole thing.
It's become a term and also a conference.
And a job title.
Tell us more about that.
What's going on there?
That is a very big, very big cloud of things.
I would just say, I think it's an emergent industry.
I've seen this happen repeatedly for front-end.
So the general term is software engineer or programmer.
In the 70s and 80s, they would not be like senior engineer.
They would just be engineer.
Like you, or you, I don't think they even call themselves engineer.
You don't have that ball.
What about the member of the technical staff?
Oh yeah, MTS, very, very, very elite.
But yeah, so like, you know, like these striations appear when the population grows and the technical
death grows over time.
When it starts, not that important, and then over time it's just going to specialize.
And I've seen this happen for front end, for DevOps, for data, and I can't remember
what else I listed in that piece, but those are the main three that I was around for.
And I see this, I saw this happening for AI engineer, which is effectively, now a lot of
people are arguing that there is the ML researcher, the ML engineer, who sort of pairs of the
researcher, sometimes they also call research engineer, and then on the other side of the
fence is just software engineers. And that's how it was up till about last year. And now there's
this specializing and rising class of people building AI-specific software that are not any of those
previous titles that I just mentioned. And that's the thesis of the AI engineer, that this is an emerging
category of startups, of jobs. I've had people from meta, IBM, Microsoft, OpenAI, tell me that their
title is now AI engineer. Really?
They're hiring AI engineers. So like I can see that this is a trend. And I think that's what
Andre called out in his post that like just mathematically, just this, just the limitations in
because of talents, research talents and GPUs, that all these will tend to concentrate in a few
labs and everyone else are just going to have to rely on them or build differentiation of
products in other ways and those will be AI engineers so mathematically there will be more
AI engineers than ML engineers it's just just the truth right now it's the other way
right now the number of AI engineers is is like maybe 10x less so I think that the
ratio will invert and you know I think the goal of the in space and the goal of the
conference and anything else I do is to serve that growing audience to make the
distinction clear if I'm a software engineer I'm like
I want to become an AI engineer.
What do I have to learn?
Like, what additional capabilities does that type of engineer have?
Funny you say that.
I think you have a blog post on this very topic.
I don't actually have a specific blog post on how to, like, change classes.
I do think, I always think about these in terms of, yeah, Baldur's Gate and, you know,
D&D rule set number 5.1 or whatever.
But, yeah, so I kind of intentionally left that open to leave space for others.
I think when you start an industry, you need to, the specifications that work the best in industries
are minimally defined so that other people can fill in the blanks.
And I want people to fill in the blanks, I want people to disagree with me and with themselves
so that we can figure this out as a group.
Like I don't want to over-specify everything.
You know, like that, that's the only way to guarantee it that it will fail.
I do have a take, obviously, because a lot of people are asking where to start.
And I think basically, so what we have is Layton Space University.
We just finished working on day seven today.
It's a seven-day email course where it basically like it is completely designed to answer the question of like okay
I'm an existing software engineer I like kind I know how to code, but I don't get all this AI stuff I've been living under a rock or like it's just too overwhelming for me
You have to pick for me or curate for me as a as a trusted friend and I have one hour a day for seven days what what what do you
slot in that in that in the bucket so for us it's making making sort of LLM API API calls it's me it's image generation it's code generating
it's audio, ASR, what's ASR, audio speech recognition?
Yeah, yeah.
And then I forget that what the fifth and sixth one is,
but the last day is agents.
And so basically, I'm just like,
here are seven projects that you should do
to feel like you can do anything in AI.
You can't really do everything in AI
just from that small list.
But I think it's just like anything,
you have to go through a set list
of things that are basic skills
that I think everyone in this initials,
we should have to be at least conversant in if someone if like a boss comes to you and goes like
hey can we build this you don't even know if the answer is no so I want you to move towards from like
unknown unknowns to at least known unknowns and I think that's that's where you start being competent
as an AI engineer so so yeah that's LSU Lee in Space University just to trigger the the tigers
so do you think of the future that people an AI engineer is going to be someone's
full-time job like people are just going to be AI engineers or do you think it's going to be
more of a world where I'm a software engineer and like 20% of my time I'm using open AIs, APIs,
and I'm working on prompt engineering and stuff like that and using code pilot.
You just reminded me at day six is open source models and fine tuning. I think it will be a
spectrum. That's why I don't want to be like too definitive about it. Like we have full-time front-end
engineers and we have part-time front-end engineers and then you dip into that community whenever
you want. But wouldn't it be nice if there was a collective name for that community so you could
go find it? You can find each other. And like honestly, like that's,
That's really it.
A lot of people, a lot of companies were paying me for like, hey, I want to hire this
kind of person, but you can't hire that person, but I wanted someone like that.
And then people on the labor side were, they're pinging me going like, okay, I want to do more
in this space, but where do I go?
And I think just having that shelling point of what an industry title and name is and then
sort of building out that mythology and community and conference, I think is helpful, hopefully.
And I don't have any prescriptions on whether or not it's a full-time job.
I do think over time it's going to become more of a full-time job.
And that's great for the people who want to do that,
and the companies that want to employ that.
But it's absolutely like you can take it part-time.
Like, you know, jobs come in many formats.
Yep, yep, that makes sense.
Yeah.
And then you have a huge world fair coming up.
Yeah.
Tell me about that.
So part of, I think, what creating industry requires is to let people gather in one place.
And also for me to get high-quality,
talks out of people. You have to create an event out of it, otherwise they don't do the work.
So last year we did the AI Engineer Summit, which went very well, and people can see that online,
and we're very happy with how that turned out. This year, we want to go four times bigger
with the World's Fair and try to reflect AI Engineering as it is in 2024. I always admired
two conferences in this respect. One is Neurip's, which I went to last year, and documented on
the pod, which was fantastic. And two, which is
is KubeCon from the other side of my life,
which is the cloud registration in DevOps world.
So Nureps is the one place that you go to,
I think it's the top conference.
I mean, there's others that you can consider.
But yeah, so Noreps is where the research sciences are the stars.
The researchers are the stars, PhDs at the stars.
Mostly, to be honest.
It's really funny to go to-
Especially these days.
Yeah, it was really funny to go into Nureps
and go like.
And the Natives is trying to back them.
Yeah, there are lots of VCs.
Were you there?
This year.
Anyway, so in Europe's research sciences at the stars, and I wanted for AI engineers, for engineer
to be the star, right, to show out their tooling and their techniques and the difficulty
moving all these ideas from research into production.
The other one was KubeCon where you could honestly just go and not attend any of the talks
and just walk the floor and figure out what's going on in DevOps, which is fantastic.
Because, yeah, so that curation and that bringing together of an industry is what I'm going for for the conference.
And yeah, it's coming in June.
The most important thing, to be honest, when I conceived of this whole thing was to buy the domain.
So we got AI.com engineer.
People were like, engineer is a domain?
Yeah.
And funny enough, dot engineer was cheaper than dot engineering.
I don't understand why, but that's up to the domain people.
All right.
Josh, any questions on agents and visions?
Yeah, I think maybe you have a lot of experience and exposure talking to all these companies and founders and researchers and everyone that's on your podcast.
Do you feel like you have a good kind of perspective on some of the things that, like some of the kind of technical issues having seen, you know, like we were just talking about like for coding agents, like, oh, how, you know, the value of test is really important.
There are other things like for, you know, retrieval.
Like now, you know, we have these models coming out with a million contexts, you know, are a million tokens of context land.
10 million like is retrieval going to matter anymore like the huge context matter like what do you
think specifically about the long context thing sure yeah because you asked them more I was going to ask
a few other ones after that so go go for that one first yeah that's what I was going to ask for we
we can ask yeah okay let's let's talk about long context and then the other stuff so for those who don't
know long context was kind of in the air last year but really really really came into focus this year
with Gemini 1.5 having a million token context and saying that it was in research for 10 million
tokens and that means that you can put you you like no longer have to really think about what you
retrieve sorry you know really think about what you have to like put into context you can just kind of
throw for the entire knowledge base in there or books or film anything like that and that's fantastic
there a lot of people are thinking that it kills rag and i think like one that's not true because
for any kind of cost reason you you know you still pay per token so if you put there so basically
google is like perfectly happy to let you pay a million tokens every single time you make an API call
But good luck, you know, having a $100 API call.
And then the other thing, it's going to be slow.
No explanation needed.
And then finally, my criticism of long context is that it's also not debuggable.
Like, if something goes wrong with the result, you can't do like the rag-as decomposition of where the source of error.
Like, you just have to like go like, it's the weights bro.
Like it's somewhere in there.
Sorry.
I pretty strongly agree with this.
Why do you think people are making such crazy long context windows?
People love to kill rag.
It's so much.
It's not going to kill it though because it's too expensive.
It's so expensive, like you said.
Yeah, I just call it a different dimension.
I think it's an option that's great when it's there.
Like when I'm prototyping, I do not ever want to worry about context.
And I'm going to call stuff a few times, and I don't want to run to errors.
I don't want to have it set up a complex retrieval system just to prototype something.
But once I'm done prototyping, then I'll worry about all the other rag stuff.
And yes, I'm going to buy some system or build some system or whatever to go do that.
So I think it's just like an improvement in like one dimension that you need.
but the improvements in the other dimensions also matter
and it's all needed.
Like this space is just going to keep growing
in unlimited fashion.
I do think that this combined with multimodality
does unlock new things.
That's what I was going to ask about next.
It's like how important is multimodal?
Like great, you know, generating videos, sure, whatever.
Okay, how many of us need to generate videos that often?
It would be cool for TV shows, sure, but like, yeah.
I think it's pretty important.
The one thing that when we launched Lian Space Podcasts,
we listed a bunch of interest areas.
So one thing I love about being explicit or intentional about our work is that you list the things that you're interested in
and you list the things that you're not interested in.
And people are very unwilling to have an entire interest list.
One of the things that we were not interested in was multimodality last year.
Because I was just like, okay, you can generate images and they're pretty, but not a giant business.
I was wrong.
The journey is a giant, giant, massive business that no one can get it.
no one can understand or get into.
But also, I think being able to natively understand
audio and video and code, I consider code a special modality,
all that is very qualitatively different
than translating it into English first,
and using English as a like a bottleneck of pipe,
and then applying it in LMs.
Like the ability of LEMs to reason across modalities
gives you something more than you could individually
by using Texas the Universal Interface.
So I think that's you.
useful. So concretely, what does that mean? It means that, so I think the reference post for everyone
that you should have in your head is Simon Willisson's post on Gemini 1.5's video capability, where he
basically shot a video of his bookshelf, just kind of scanning through it, and he was able to give
back a complete JSON list of the books and the authors and all the details that were visible there,
hallucinated some of it, which is, you know, another issue. But I think it's just like unlocks
this use case that you just would not even try to code without the native video understanding
capability. And obviously, on a technical level, video is just a bunch of frames. So actually
it's just image understanding, but image within the temporal dimension, which this month, I think,
became much more of a important thing, like the integration of space and time in transformers.
I don't think anyone was really talking about that until this month. And now it's the only
thing anyone can ever think about for SORA and for all the other stuff. The last thing I'll say,
which is against this trend of like every modality is important they just just do all
modalities I kind of agree with that Friedman who actually kind of pointed out just
before the Gemini thing blew up this this this month which was like why is it
that open AI is pushing Dolly so hard why is it was Bing pushing Bing image
creator like it's not it's not apparent that you have to create images to create
AI but every lab just seems to want to do this and I kind of agree that it's not
on the critical path especially for
Image generation, maybe image understanding, video understanding.
Yeah, consumption.
But generation, eh.
Maybe we'll be wrong next year.
It just catches you a bunch of flack with, like, you know, culture war things.
That's true.
All right, we're going to move into Rapid Fire Q&A, so we're going to ask you questions.
We've cut the Q&A section for time, so if you want to hear the spicy questions,
head over to the Thursday nights in AI video for the full discussion.
Next up, we have another former guest, Dylan Patel of Semi Analysis, the inventor of the GPU
Rich Poor Divide, who did a special live show with us in March.
But that means you can finally, like, side-to-side-side, A-B test your favorite boba shops?
We got gongcha, we got boba guys, we got the lemon, whatever it's called, so let us know what's your
favorite.
We also have slider up to submit questions.
We already had Dylan on the podcast, and like this guy.
tweets and writes about all kinds of stuff.
So we want to know what people want to know more about
rather than just being self-driven.
But we'll do a state of the union maybe.
I know everybody wants to know about GROC.
Everybody wants to know whether or not
Embedia is going to zero after GROC.
Everybody wants to know what's going on with AMD.
We got some AMD folks in the crowd too.
So feel free to interact at any time.
We have a portable mic.
Huckle please.
Good comedians show their color
with the way they can handle the correct.
when they're heckled.
Do not throw boba.
Do not throw boba.
We cannot afford another podcasting set up.
Awesome.
Well, welcome everybody to the semi-analysis and latest space crossover.
Dylan texted me on Signal.
He was like, dude, how do I easily set up a meetup?
And here we are today.
Well, as you might have seen, there's no name tags.
There's a bunch of things that are missing.
But we did our best.
It was extremely easy, right?
Like, I text Alessio.
He's like, yeah, I got the spot.
Okay, cool.
Here's a link.
Send it to people.
sent it, and then showed up.
And there was zero other organization that I required.
So everybody's here.
A lot of semi-analysis fans we get in the crowd.
Everybody wants to know more about what's going on today.
And GROC has definitely been the hottest thing.
We just recorded our monthly podcast today.
And we didn't talk that much about GROC
because we wanted you to talk more about it.
And then we'll splice you into our monthly recap.
So let's start there.
Okay.
So you guys are to do GROC spreadsheet.
So we broke out some GROC numbers because everyone was wondering, there's two things going on, right?
One, you know, how important, no, how does it achieve the inference speed that it does,
that it has been demonstrated by Grog Chat?
And two, how does it achieve its price promise that is promised that is sort of the public pricing of 27 cents per million token?
And there's been a lot of speculation or, you know, some numbers thrown out there.
I put out some tentative numbers and you put out different numbers.
But I'll just kind of lay that as the groundwork.
Everyone's very excited about essentially like five times faster token generation than any other LLM currently.
And that unlocks interesting downstream possibilities if it's sustainable, if it's affordable.
And so I think your question or reading your piece on GROC, which is on the screen right now, is it sustainable?
So like many things, this is VC funded, including this boba.
No, I'm just kidding. I'm paying for the boba.
Thank you, Samuelan Lus, subscribers.
I hope he pays for it.
for you right now. That's true. That's true. Alessio has the IOU, right? And that's all it is. But yeah,
like many things, you know, they're not making money off of their inference service. They're
just throwing it out there for cheap and hoping to get business and maybe raise money off of that.
And I think that's a fine use case. But the question is like, how much money are they losing?
Right. And that's sort of what I went through breaking down in this article that's on the
screen. And it's pretty clear they're like seven to 10x off.
like break even on their inference API, which is like horrendous, like far worse than any other
sort of inference API provider. So this is like a simple, simple cost thing that was pulled up.
You can either inference at very high throughput or you can inference at very high, very low latency.
With GPUs you can do both. With Grok, you can only do one. Of course, with Grok, you can do that
one faster, marginally faster than a inference latency optimized GPU server. But no one offers
inference latency optimized GPU servers
because you would just burn money.
It makes no economic sense to do so
until maybe someone's willing to pay for that.
So GROC service, you know,
on the surface looks awesome
compared to everyone else's service,
which is throughput optimized.
And then when you compare
to the throughput optimized scenario, right,
GPs look quite slow,
but the reality is they're serving,
you know, 64 or 128 users at once,
right?
They have a batch size,
how many users are being served at once.
Whereas GROC is taking a 576 chips
and they're not really,
doing that efficiently. They're serving a far, far fewer number of users, but extremely fast. Now,
that could be worthwhile if they can get their, you know, the number of users are serving at once
up, but that's extremely hard because they don't have memory on their chip, so they can't store
KV cash, KV cash for, you know, all the various different users. And so their crux of the issue is just like,
hey, can they get that performance up as much as they claim they will, which is, you know,
they need to get it up more than 10x. To make this like a, a,
reasonable benefit. In the meantime,
Nvidia's launching a new GPU in two weeks.
That'll be fun at GTC.
And they're constantly pushing software as well.
So we'll see if GROC can catch up to that.
But the current verdict is, you know,
they're quite far behind, but it's hopeful, you know,
that maybe they can get there by, you know,
scaling their system larger.
Yeah. I was listening back to our original episode
and you were talking about how NVIDIA basically adopted
this different strategy of just leaning on networking
GPUs together.
It seems like GROC has some minor version of that going on here with the GROC rack.
Is it enough?
What's GROC's next step here, like strategically?
Yeah, that's the next step is, of course, you know, so right now they connect 10 racks of chips together, right?
And that's the system that's running on their API today, right?
Whereas most people who are running, you know, Mistral are running it on two GPUs, right?
So one-fourth of a server.
And that rack is not, you know, obviously 10 racks is pretty crazy.
But they think that they can scale performance if they have this individual system be 20 racks.
I think they can continue to scale performance extra linearly.
So that'd be amazing if they could.
But I'm doubtful that that's going to be something that's scalable,
especially for, you know, larger models.
So there's the chip itself, but there's also a lot of work they're doing at the compiler level.
Do you have any good sense of how easy it is to actually work with LPU?
Is that something that it's going to be about on that for them?
So Ali's in the front right there, and he knows a ton about VLIW architectures.
But to summarize sort of his opinion, and I think many folks, is it's extremely hard to program
these sorts of architectures, right, which is why they have their compiler and so on and so forth.
But it's an incredible amount of work for them to stand up individual models
and to get the performance up on them, which is what's the thing.
they've been working on, whereas, whereas, you know, GPs are far more flexible, of course.
And so the question is, you know, can they, can this compiler continue to extract performance?
Well, theoretically, like, there's a lot more performance to run on the hardware, but they don't
have, you know, many, many things that people generally associate with programmable hardware, right?
They don't have buffers and many other things.
So it makes it very tough to do that, but that's what they're, you know, their relatively large
compiler team is working on.
Yeah.
So I'm not a GPU compiler guy, but I do want to clarify my understanding from what I read,
which is a lot of catching up to do.
It is, the crux of it is some kind of speculative, the word that comes by a speculative routing
of weights and work that needs to be done or scheduling of work across the 10 racks of GPUs.
Is that the, is that like the bulk of the benefit that you get from the compilation?
So with the GROC chips, what's really interesting is like, with GPUs, you can do, you can issue certain instructions, and you will get a different result.
Like, depending on the time, I know a lot of people in ML have had that experience where, like, the GPU literally doesn't return the numbers it should be.
And it's basically called nondeterminism.
And with GROC, their chip is completely deterministic.
The moment you compile it, you know exactly how long it will take to operate, right?
there is no, there is no, like, deviation at all.
And so, you know, they've, they're planning everything ahead of time, right?
Like, every instruction, like, it will complete in the time that they've planned it for.
And there is no, I don't know what the best way to state this is.
There's no variance there, which is interesting from, like, when you look historically,
they tried to push this into automotive.
Because automotive, you know, you probably want your car to do exactly what you issued it to do
and not have sort of unpredictability.
But yeah, I'm sorry, I've lost track of the question.
It's okay.
I just wanted to understand a little bit more about what people should know about the compiler magic that goes on with Brock.
Like, you know, like, I think, I think from a software or like hardware point of view, that intersection of, I guess.
So chips have like, like, I'm going to steal this from someone here in the crowd, but chips have like five, you know, sort of there's like when you're designing a chip, there's, it's called PPA, right?
power, performance, and area.
It's kind of a triangle that you optimize around.
And the one thing people don't realize is there's a third P.
It's like PPA P.
And the last P is pain in the as to program.
And that is very important for like people making AI hardware.
Right?
Like TPU, without the hundreds of people that work on the compiler
and Jax and XLA and all these sorts of things
would be a pain in the as to program.
But Google's got that like plumbing.
Now if you look across the ecosystem,
everything else has a pain in the ass to program compared to invidia.
And this applies to the GROC chip as well, right?
So, yeah, question is, like, can the compiler team get performance up anywhere close to theoretical?
And then can they make it not a pain in the ass to support new models?
We got a question from Ali.
What's the average VLIW bundle occupancy of GROC?
Bro.
Get out of here.
I don't know if he's setting you up or if he wants to chime in.
I think he's setting me up.
So, okay, what is VLIW for the rest of us?
It's like very a long instruction word is basically what it means.
So, GPUs are relatively simple.
They're tiny little cores, very simple instructions.
There's a shitload of them, right?
CPUs, you know, they have a known instruction set, X86.
It's very complicated, but people have worked on it for decades.
VLIW processors are very unique in that sense.
And your question, Ali, I cannot answer that question.
I have no clue.
Is it documented anywhere online?
Anyway, so the systolic array, within the TPU, there's a bunch of stuff, but the actual
matrix multiply unit is called the MXU, and it's a VLIW architecture as well.
And I'm just trying to find...
Yeah, I just want to find something that makes me not sound like an idiot.
Sometimes I also like to ballpark things in terms of like...
like where a good middle median value should be and where like a good high value should be.
Basically the point is like you're trading off, this is theoretically the most optimal architecture for
performance power and area in a given, and you know not specifically grok but VLIW in general is
going to get you closer to optimal there but then you're giving off you know that that last
P which is pain in the ass program is I think the most simple way to get into it.
There's like computer architecture books about this but it's it's a little a little complicated.
right? Can we talk about LPU, cerebrus,
then store and some of these other architectures?
I should people think about
maxed SRAM versus mix versus...
Yeah, yeah. So there's a lot of ML hardware out there,
new and old, there's old stuff that's trying to compete.
There's new stuff that's coming up, you know,
companies like Madax and Lumerium Labs and so on and so forth, you know.
But so there's like a continuum of like everyone before, say, two years ago,
that was doing ML hardware bet in one direction, right?
We're going to make it as an architecture
that has more on-chip memory than Nvidia.
Like, that was the general bet everyone made.
Right, and so, like, Groch made that bet.
They made it to the extreme.
They didn't have any off-chip memory at all,
only on-chip memory.
You have cerebrists who did a similar thing,
except they were like, yeah, we're going to have on-chip memory,
but we're going to make a chip that's the size of a wafer, right?
Like, literally this big,
whereas an Nvidia chip is roughly this big.
So it's like this big,
it's the only chip in the world that that's that big.
But again, same bet.
More on-chip memory, less off-chip.
Graphcore and Sama Nova made a similar bed.
And basically everyone made that bet,
because they thought that's where ML would go.
Of course, models grew faster than anyone ever imagined.
Yeah, than the memory that was possible.
And so that very quickly became the wrong bet.
And so now we're sort of seeing a new wave of startups
that are going to bet on the other side,
as well as many other architectural things,
because memory is not really the only architectural thing, of course.
And so, like, where to, like, place startups is very dependent on, like, hey, what are you doing differently than Nvidia?
And is Nvidia just going to implement that in their chip next year?
Or some version of that.
That's, like, pretty much the only things to think about when looking at, you know, hardware companies now.
Cool.
And I think the question is, like, there's the size of the models that got outrun.
But now you're doing all this work at the compiler level, but it's very transformer-based, everything they're doing on the optimization side.
How do you think about that risk?
Do you think it's okay for a hardware company to take, like, architectural risk in terms of, like, yeah, we assume Transformers in two years.
They'll still be pretty good.
But when you're, like, depreciating some of this costs over, like, four, five years as a buyer?
Yeah, yeah.
That's the biggest challenge with, like, some of the specialized hardware.
I know my GPUs will be useful in four years or five years.
Maybe not, like, super useful, but they'll be useful for something.
But there's no way to know that my hard.
hardware is going to be able to operate on whatever new model architecture that comes out in the
next few years. Like, I like to joke, transformers are all you need, and, like, everything else is,
like, a waste of time. But, you know, I'm sure something better will come, right? And, you know,
you got to have, like, hardware is expensive, and you own it for many years, right? So you can't just,
like, buy whatever's best for today's workload one time and then assume that workload is going to stay
stagnant because that's a recipe to have your like hardware useless as soon as like things evolve right
like imagine if someone like had hardware for LTSMs in 2016 or whatever like LSTMs yeah LSTM sorry
you look like an idiot because now it's not going to work for you know the next architecture
as soon as Burt came out for example so yeah it's it's very anything super super specialized is always
at risk of of being sort of obsoleted and useless and that's a thought that like hey like
like GraphCorp, their chips are pretty decent at GNNs,
so graph neural networks, they're actually pretty decent at that.
But no one cares.
So congratulations.
Like you won, you won the shortest midget, right?
Mentioning Transformers is all you need.
Gives us a nice opportunity to bring out one of your old tweets,
but also mention Gemini.
Oh, God.
My old tweets, I'm scared.
No, no, not all of these.
Recent tweets.
So there's a lot of people talking about, like,
I think you had a tweet,
we're commenting on Gemini 1.5
And the Million token context,
where basically everyone,
was saying, like, okay, we need Mamba, we need RWKV, or we need some other alternative
architecture to scale to long context. And Google comes out and says, no, we just, we scaled
transformers to 10 million tokens, easy. We, you know, like, I think that, that kind of, like,
reflects on your thesis there a little bit. I guess, yeah. I mean, I don't know if I, if I have a
coherent thesis, but it's just a meme that you're putting on. It's sure fun to, it's sure fun to
meme on people who, uh, who, who, who think that, like, I just have an intense hatred for rag, right?
Like, retrieval augmented generation is, like, the most, like, I just have an intense, like, innate hatred for it.
You retweeted me defending rag in the White House press release.
Yeah, yeah, yeah, yeah.
Okay.
But it's just fun.
It's all fun in games.
Yeah, yeah, it's all fun in games.
Yeah.
No, no, no, I retweeted you because you memed to the White House.
I don't know if y'all saw the meme.
Can you pull it up?
Like, the White House.
The White House put out this thing about, like...
They're getting very opinionated.
Memory safety.
I think it was effectively, like, see as bad and rust is good.
It was, like, pretty wild that the White House put that out.
And I mean, like, whatever that is.
So, so.
So, like, they just got very opinionated about prescribing languages to people.
And so then I just, like, started editing them.
So I have stopped comparing Rag with Long Contexts of Rintening.
Wait, hold on.
You said I retweeted you defending it.
I thought you were hating on it.
And that's why I retweeted it.
Yeah, it's somewhat of a defense.
Because everyone was like, long context is killing Rag.
And then I had a future LLLM should be so quadratic.
That's another one.
And I actually messed at the fine print as well.
Let's see.
power benefits of Sram dominant?
Yeah, yeah, so that's a good question.
So like, Sgram is on-chip memory.
Everyone's just using HBM.
If you don't have to go to off-chip memory,
that'd be really efficient, right?
Because you're not moving bits around.
But there's always the issue of you don't have enough memory.
So you still have to move bits around constantly.
And so that's the question.
So yeah, sure.
If you can not move data around as you compute,
it's going to be fantastically efficient,
but that doesn't really,
not really just easier, simple to do.
What do you think is going to be harder in the future, like getting more energy at cheaper costs or like getting more of this hardware to run?
Yeah, I wonder.
So someone was talking about this earlier, but it's like, here in the crowd, and I'm looking right at him.
But he's complaining that journalists keep saying that, you know, that, like misreporting about how data centers or what data centers are doing to the environment, right?
Which I thought was quite funny because they're inundated by journalists talking about data centers, like destroying the world.
Anyways, you know, that's not quite the case.
But yeah, I don't know.
Like, the power is certainly going to be hard to get, but, you know, I think, if you just look at history, right, like humanity and especially America, like, power production and usage kept skyrocketing from, like, the 1700s to, like, 1970s.
And then it kind of flatlined from there.
So why can't we, like, go back to the, like, growth stage, I guess is, like, the whole, like, mantra of, like, accelerationists, I guess.
This is EAC, yep.
Well, I don't think it's EAC.
I think it's like Sam Alton, like, wholly believes this too.
And I don't think he's EAC.
So, but yeah, like, I don't think, like, it's like, it's like something to think about.
Like, the U.S. is going back to growing in energy usage, whereas for the last, like, 40 years, kind of were flat on energy usage.
And what does that mean?
Like, yeah.
Fair enough.
There was another question on Marvel, but kind of the...
I think that's, it's definitely, like, one of these three guys who are on the by side that are asking this question.
question. I want to know if Marvell's stock is going to go up.
So Marvell, they're doing the custom basic for GROC. They also do the Trinium, too, and the Google
CPU. Yeah, any other chip that they're working on that people should keep in mind, it's like,
yeah, any needle moving, and it's like any stock moving. Yeah, exactly, exactly. They're working
on some more stuff. Yeah, I'll refrain from, yeah. All right. Let's see, other GROC stuff.
we want to get through.
I don't think so.
All right, most of the other ones.
Your view on edge compute hardware,
any real use cases for it?
Yeah, I mean, I have like a really like anti-edge view.
Yeah, let's hear it.
Like, so many people are like,
oh, I'm going to run this model on my phone or on my laptop.
And I love how much it's raining.
So now I can be horrible and you people won't leave.
Like, I want you to try and leave this building.
Captive audience.
Seriously, should I start singing?
Like, there's nothing you can do.
You definitely, I'll stop you from that.
Sorry, so edge hardware.
Like, you know, people are like, oh, I'm going to run this model on my phone or my laptop.
It makes no sense to me because current hardware is not really capable of it.
So you're going to buy new hardware to run whatever on the edge, or you're going to just run very, very small models.
But in either case, you're going to end up with like the performance is really low and then whatever you expect to run it locally.
like if you spent it in the cloud, it could service 10X the users.
So it kind of like,
SOL in terms of like
economics of running things on the edge.
And then like latency is like, for LLMs, right, for LLMs,
it's like not that big of a deal relative to, like internet latency
is not that bigger of a deal relative to the use of the model, right?
Like the actual model operating, whether it's on edge hardware or cloud hardware.
Cloud hardware is so much faster.
So like edge hardware is not really.
able to like have a measurable, appreciable, like, advantage over, over cloud, cloud hardware.
This applies to diffusion models.
This applies to LLMs.
Of course, small models will be able to run, but not all, yeah.
What chance is startups like Maddox etched or five, six?
Haven't you interviewed them?
Why don't you answer?
Yeah, we have connections with Maddox and Lemurie and we haven't, no, but Gavin is friendly.
They didn't, yeah, yeah, they said they don't want to.
talk publicly.
Yeah.
Oh, okay.
What they're doing.
At some point.
Like when they're,
when they open up,
we can.
Sure,
sure.
Yeah.
But do you think like,
I think the two,
three.
Let me answer the question.
What do you think of them?
There's a couple things.
It's like,
how do the other companies innovate against them?
I think when you do a new Silicon,
you're like,
oh, we're going to be so much better at this thing.
Or like much faster,
much cheaper.
But there's all the other curves going down on the macro environment at the same time.
So if it takes you like five years,
before you were like,
a lot better. Five years later, once you take the chip out, you're only comparing yourself
to the five-year advancement that the major companies had to. So then it's like, okay, we're going
to have like the C-300, whatever, from Nvidia by the time some of these chips come up.
What's after Z? What do you think is after Z in the roadmap? Because it's X, Y, Z.
Yeah. Anyways. Yeah, yeah. It's like the age-old problem. Like you build a chip, has some cool
thing, cool feature, and then like a year later, Nvidia has it in hardware, has implemented
some flavor of that in hardware or two generations out. Like, what idea are you going to have
that Nvidia can't implement is like really the question? It's like you have to be fundamentally
different in some way that holds through for, you know, four or five years. That's kind of the big
issue. But, but, you know, like those people have some ideas that are interesting and yeah,
maybe it'll work out, right? But it's going to be hard to fight Nvidia who, one, doesn't
doesn't consider them competition. They're worried about like Google and Amazon's chip, right?
They're not, and I guess to some extent, AMD's chip, but like they're not really worried about,
you know, Maddox or etched or or Grok or, you know, positron or any of these folks.
How much of an advantage do they have by working closely with like Open AI and some of these other folks
and then already knowing where some of the architecture decisions are going? And since those companies
are like the biggest buyers and users of the chips. Yeah, I mean like you see like the large, like the most
important sort of AI companies are obviously going to tell hardware vendors what they want,
you know, Open AI and, you know, so on and so forth, right? They're just going to obviously
tell them what they want. And the startups aren't actually going to get anywhere close to as
much feedback on what to do on, like, you know, very minute, low-level stuff. So that is a
difficult here. Some startups like Maddox obviously have people who built or worked on the
largest models like at Google, but then other startups might not have that advantage. And so
they're always going to have that issue of like, hey, how do I get the feedback or what's changing?
What do they see down the pipeline that I really need to be aware of and ready for when I design my hardware?
All right. Every hardware shortage has eventually turned into a glut. Well, the B2 of Ambida chips is so when, but also why?
Absolutely. And I'm so excited to buy like H-100s for like $1,000. No, that's not a thousand.
But, yeah, everyone's going to buy chips. It's just the way semiconductors work.
because the supply chain takes forever to build out.
And it's like a really weird thing.
So if the backlog of chips is a year,
people will order two years' worth of what they want for the next year.
It is like a very common thing.
It's not just like this AI cycle, but like microcontroller.
Like the automotive companies,
they order two years' worth of what they needed for one year
just so they could get enough, right?
This is just like what happens in semiconductors.
When lead times lengthen,
the purchases and inventory is sort of like double.
So these, the, the Nvidia GPU shortage obviously is going to be rectified.
And when it is, everyone's sort of double orders will become extremely apparent, right?
And, you know, you see like random companies out of nowhere being like, yeah, we've got 32,000 H-100's on order, or we've got 10,000 or 5,000.
And trust, they're not all real orders for one.
But I think the like bubble will continue on for a long time, right?
Like it's not it's not going to end like this year, right?
Like people need AI, right?
Like I think everyone in this audience would agree, right?
Like there's no, there's no like immediate like end to the to the bubble, right?
What's next?
Yeah, yeah.
Why?
I think it's just because the supply chain expands so much.
And then at the same time, there will be no immediate economic thing for everyone, right?
Like some companies will continue to buy like an open AI or meta.
continue to buy, but then like all these random startups will, a lot of them will not be able to
continue to buy. So then that like kind of leads to like, they'll pause for a little bit. Or like,
I think in 2018, right, like memory pricing was extremely high. Then all of a sudden Google, Microsoft
and Amazon all agreed, I don't, you know, they won't, they won't say it's together, but they
basically all agreed it like within the same week to stop ordering memory. And within like a month,
the price of memory started tanking, like, insane amounts.
Wow.
Right?
And, like, people claim, you know, all sorts of reasons why that was timed extremely well.
But it was, like, very clear, and people in the financial markets were able to make trades and everything.
People stopped buying, and it's not like their demand just dried up.
It's just, like, they had a little bit of a demand slowdown, and then they had enough inventory that they could, like, weather until, like, prices tanked.
Because it's such an inelastic good.
Yeah.
Thank you very much.
Is it?
Hey everyone. And so today we have a special guest, Millen, from Capital One, but I tend to like to introduce people with a bit of their background and then learn a little bit more about you on the personal side.
You got your PhD in a probabilistic framework from mapping audiovisual features to semantics. I feel like that is like the beginnings of like a multimodal AI model in some sense.
Do you have any sort of reflections on your PhD versus clip?
Thanks for having me. And so let me say,
this is that it almost feels like things go around in circles, right, in research and development.
And so at the right time and the right place, you kind of intersect back with some of the
topics. And then some other conditions that have happened suddenly make a big difference between
that ticking off versus, you know, it may not be, you know, as intently pursued at any given
point of time. Right. So I have been in AI for now three decades. You know, you talked about my
PhD thesis, my bachelor's thesis was on implementing neural networks on India's, you know,
homegrown supercomputers back then. And so, you know, this whole notion of message passing
and distributed computing and computing weights, you know, and then bringing all of them back,
distributing the computations of the neural network, you know, forward pass. All of those things
were what we used to do, you know, for that particular supercomputing architecture we had back
then. And then my PhD, of course, was how to understand what's going on in a video, right? And use
multimodal cues to your point. You know, what has happened in the last couple of decades.
One, we have tremendous amount of data explosion. So when I was doing my PhD, I used to actually go to
Blockbuster and rent Abnoi Shwasneger and Silvesta Stalin movies because I would like to get
multimodal concepts like explosions. And how do you actually build models in the audio stream
and the visual stream for something like an explosion.
So I remember going and doing all this digitization of tape
and then cleaning up the data
and then having some kind of a labeling tool
that I actually cooked up
and having a spouse of a friend of mine
to do the labeling for me.
So look at where we were back then
and now you have scale.
Dot AI that basically goes and does labeling
for a lot of these models and so on and so forth.
So scale of data has changed.
That's one.
the second thing that has changed is we were looking at computing architectures that were much, much, much less rich in terms of what we had back then.
And, you know, the 2012 breakthrough by Hinton and his students, right, and using GPUs really win the ImageNet competition really helped take this field off, right, in a completely new direction and at a very, very large scale.
And the third thing is, of course, the GPU computing.
Right back then, I did not have access when I was doing my PhD to some of the amazing things that NVIDIA hadn't yet built.
And so it's, I think, really, the confluence of those three things which make all the difference between a lot of this research that happened in the late 90s and what's happening between the 2010 to now kind of timeframe.
But if you look at the intent, the intent was the same.
How do we understand what's in a video?
how do we understand the multimodal cues that come together to give us that semantic understanding
of what's in the video?
And so to that extent, the problems we were trying to solve are the same, but the tools that we
have now are amazingly, amazingly different and amazingly more powerful.
Are there any maybe research approaches or ML patterns that you tried that didn't work,
that you think will work today?
Would you abuse that people haven't tried again?
I don't think there are many people that have done serious ML research before the GPU era.
I would say if you think about all the ML researchers working today, most of them are post-GPU.
Any like story that you remember that you were like, oh, this seems really promising,
but like there wasn't enough compute or anything like that.
The whole concept of modeling context, right?
So my thesis was about not how do you just detect isolated things in a video, right?
Like this is a curve, that's an explosion and so on and so forth.
But more the context of, if I see.
these and things together, do they actually contextually make sense? Like, do I see the sky
above the land? And if I do that, then I have a higher confidence that this indeed is sky and
that indeed is land, right? And so on and so forth. So when we were trying to model that context,
we had extremely limited labeling and extremely limited corpus in terms of how we could do it.
Now, when I think back, I think that what better way to describe context than a multimodal
LLM, right, which strain on as much of the data as there is on the internet in terms of
multi-modality.
And that to me would have been an amazing thing for me to have back then.
So I would say how to model context is a problem that is going to be evergreen.
It never goes out of fashion.
But how we are able to do it now versus then.
I see as one of the steps towards truly understanding, you know, what's happening, right, in the
multiple modalities.
The other part is the reasoning.
I think we are still in the very early innings of reasoning.
You know, we see this interesting evolution of, you know, how do you actually build a model of the world?
And Jan Lekon's work, you know, very interesting in this sense to me.
He has been talking about it for a while, right?
So, but now I think we are getting to a point where a lot of those pieces may start coming together.
And I think solving that reasoning piece is a very, very critical step before we can actually
build truly intelligent machines.
And do you have any intuition on the part the video is going to play in it?
Because a lot of Yon's also talking points are around, you know, with BJapa and some of those
models, if you show what's going to happen next, like, that's part of a war model.
Like, is video going to be like a big part in it?
Like, do we need to get there to actually get a real war model?
Like, do you think text is enough to get a good shot at it?
I'm maybe biased in answering that question, given that I, you know, cut my teeth,
in multimodality.
And given that the video modality in general, right, is a lot more challenging, you know,
whether it's just because of the sheer size of data, right, in terms of the number of pixels
that you need to process, whether it is because you are actually capturing the real world,
which tends to be far more complex.
You know, in the case of language, you know, humans over millennia have evolved this amazingly
concise codebook of how to describe things.
So there is a humongous amount of abstraction and rationalization and concise definition that has gone on in how languages evolved.
And so, you know, the number of words in the vocabulary of a language when you look at that, we are able to tell beautiful stories, right, with just those many, you know, words.
But if you look at what's out there in the real world, if you look at capturing that, right, whether it is through.
the eyes of a robot as it's looking and trying to help you around in a room setting or a
building setting, right, or whether it's the traffic, right, which vehicles have to look at
when they're driving on the roads. The amount of variability, the amount of distortion of signal
that comes with that, right, is just remarkably difficult and remarkably rich to really analyze.
So I would say, Alisio, it's both. I feel that the video modality has to be understood for a
two understanding of the world model.
I would also say that's harder in some sense because of its inherent complexity than the
language part, which there's already a concise representation that people have come up with.
Awesome.
Sorry, Sean.
I know we hijacked the intro, but it was a good rabbit hole to go into.
No, it's okay.
I also are just stunned by just the sheer amount of background and history that Millen is
bringing to AI, you know?
Yeah, I guess it's going to speed run through the rest of me, you know, 14 years.
at IBM finally ending as chief scientists at IBM Research.
And then since at Cisco with cognitive systems and CTO of Metropolis at Nvidia, what should
people know about like, you know, how your interest in your trajectory has progressed through
your career?
You know, when I reflect, part of what I see is that is a constant, right?
And the constant is how do you actually make AI work for, you know, name your favorite problem,
right?
That favorite problem changes maybe, right, from decade to decade or in the context of the context of
even my estate at IBM research, but it's always been how do we build AI solutions,
AI platforms that solve real world problems.
So when we started out, we were the first video understanding platform that we built at IBM research,
right?
We actually got a Wall Street multimedia award for it, innovation award for it in the early 2000s.
We helped with setting up the benchmark that's known as TRECWID, which, again, to Alisio,
your point, people only know ImageNet and thereafter.
Before I imagine it, there was trackweed.
And so the first decade, Sean was really about, you know,
how do we understand what's in a video and how can then we turn that into meaningful use of AI technology
for media companies, you know, broadcasting corporations and so on and so forth, right?
The business to business kind of setting.
Now, of course, because we were at IBM, we did not focus on the consumer and then, you know,
YouTube happened right, here in the valley.
And then you see the explosive content and applications.
of AI to, you know, those kind of video understanding problems.
Then it was how do we actually make sense out of our IoT world?
So when you have signals coming from sensors everywhere, you know, whether there are sensors
embedded in your bridges or whether they are sensors embedded in your buildings,
how do you actually use all this sensory information to make good decisions to, A, you know,
observe the environments and be optimize those environments, right?
And so a large part of my second half of stay at IBM research was to come up with what is this research around smarter cities, smarter planet.
And that actually became an AI platform for helping optimize traffic.
One of the proudest things that I really fondly remember is how we use the data from telco data sources to understand how people move in a city and then use that information to build all.
optimal planning, whether it's for bus routes, whether it's for metro, we did some amazing work
in Istanbul, for example, completely different scale. And then the same kind of platform we applied
to helping cities in American Midwest, like Dubuque, optimize their public transport system
that they had so that they could actually make it more responsive to people's needs. So it's
amazing how you can go from these kind of instrumented environments with AI.
to solutions that are not just in real time trying to optimize traffic, right, so that you don't
see too many red signals. You know, you can just go on a major thoroughfare with a whole bunch of
green signals to how do you optimize bus routes so that you can actually serve more people,
cut down on travel time, and at the same time also make the service efficient so you are using
fuel optimization. So there are incredible opportunities how AI can actually help the world. And so that's
been my journey. And from there, you know, Cisco was a brief stint where we were trying to apply
AI technology. I mean, funny enough, we were actually using Mikolo's work, right, just out of
Google research on embeddings, right? This was more than nine years ago. We were using embeddings
to help do better customer service, you know, for some of the very tricky problems they were
trying to solve. We were doing early warning, you know, over network data to figure out if there were
threats. So this was all like streaming data and analyzing streaming data. And
And then from then on, it was back to, you know, Nvidia, right, where we built Metropolis.
The Metropolis is amazing.
It's a platform, an AI platform, of course, right?
That's completely integrated with all the amazing underlying technology.
Nvidia has accelerated computing, applied to, you know, all kind of video streaming use.
So that application developers who do not actually need to or want to understand the nitty-gritty
of how to build these complex
convolutional neural networks,
you know,
resnets, so on and so forth,
so that they could actually build
video-based traffic monitoring
system for a city in a matter
of days without taking like a year,
right? They could build something like
the rival of an Amazon Go,
you know, just go inside a retail store,
pick up what you want to pick up,
and then you are built for what you actually
take from the shelves, right? Or in
buildings, you know, how do you actually make it
so that when people enter buildings,
you know, the building knows who they are and automatically allows them to get in and out
or in parking garages, you know, how do you actually help people find the right parking spot?
Because I'm sure for all of you, this has happened, right, where you go into a parking lot
and it says any spots available and it's never actually end, right?
It's always less than and so on and so forth.
So how do you actually find parking spots?
These are all very mundane day-to-day problems of how you optimally use your physical environment.
And we see proliferations of cameras everywhere, right?
Like in Levi Stadium, I think there are like a thousand cameras right now.
Just a small place, right?
And in a small city, you could have thousands of cameras, you know, for traffic.
So how do we actually make that kind of a streaming high volume data a first class citizen in when people are solving, you know, business applications without having to worry about the hard yards, right?
Of how to understand and process and manage and store that video.
So Metropolis is that platform that allows you to do that.
And along with that, we started working on the digital twins.
So here is where it really becomes interesting, right?
You asked, you know, is this real?
So we could actually simulate a traffic intersection visually.
And we could actually use existing statistics that we would have calculated to understand
the flow of traffic in that intersection.
And then we could synthetically generate all kinds of anomalies.
For example, a vehicle that's driving in the opposite direction of traffic.
Now, you might be surprised, you know, how many times this actually happened.
and how many lives can be saved if that is detected quickly, right?
And traffic up and down alerted.
While it happens frequently, it doesn't happen frequently enough that you could actually
build our datasets, right?
So that's where the digital twin comes in and that's where the simulation comes in
and that's where the augmentation of the data set happens.
So you will be really amazed at how many real world problems can actually benefit
if we actually are able to build good digital twins of various environments.
Do you think that part of like the story, like the more IoT-driven AI, so to speak, that has now become not hot anymore because it's been quote-unquote sob.
Like the things are actually in production and things are being used.
And maybe that's why it's less romanticized.
You know, it's more like a practical thing.
I always remember how, you know, Jensen used to tell us, you know, focus on playing the game, not the score.
So there are things that you really believe in, right?
For example, I really believe in Metropolis and how that platform actually.
helps people, right, in various different verticals. I actually believed that video analysis would
have been a very useful thing to do way back in, you know, the late 90s, even if it was not as hot,
right, in terms of the eyeballs it was creating. I think if you are in this industry and if you
have looked at these different verticals and the problems they have and the ones you can solve
over a couple of decades, you tend to really stay focused on the why you are doing something
rather than how many people
seem to be interested in what you are doing
at any given time, you know, move in phases, right?
For example, remember that Jensen was saying
how amazing the GPU computing power was growing
over other forms of computing, right?
He has been saying that for a long time,
but now suddenly, like everybody is responding to it
because we have this amazing workload called the transformer, right?
And because we have this amazing use case
which is unleashed by OpenEI's November 20,
22 announcement of chat GPT.
But people have been predicting that these kind of workloads will happen.
Nobody can predict exactly when they will happen, right,
or what the trigger would be for capturing popular imagination.
So I would say the metropolis story is similar, right?
The real world also always, to your point, tends to be harder, right?
Whenever you are working at the intersection of the physical world and the digital world
and the digitization of this instrumentation from the physical world to represent it,
these things are always on longer cycles and they're always more complicated than if your entire
journey started from a purely digital signal. So part of why you are seeing that is this
inherent complexity of mapping the physical to the digital. And part of that is just phases,
right, through which all these technologies travel. Right. That makes a lot of sense. And especially
the play the game instead of looking at the score. What are other maybe leadership, innovation,
things you learn from Jensen.
It's kind of amazing looking from the outside,
how early NVIDIA was to a lot of things
that people then catch up to later.
Any other fun stories from your six, almost seven years you spend there?
There are so many.
It's incredible.
The amount of things that one gets to learn,
you know, when working closely at a place like that,
you know, we could go on for hours.
I think the most important thing was what I told you,
which is, A, think of the scenario where
you are trying to figure out, you know, how does that accelerated computing stack get used?
Try to look at how the developers could use that.
Try to build something like that.
And then, you know, piece by piece, take out things so that you are only left with what should
go in the platform and do it by first principles.
You know, there was a huge emphasis on doing things by first principles.
There's a huge emphasis on being extremely intellectually honest, right, about what worked
and what didn't work.
There was always this notion of measure ourselves with the best.
possible, not with competition, but with the best attainable, right? And so you are seeing that,
right? People are talking about the company now only being in a competition with itself.
That's always been the ethos, you know, at least whenever I was there, you know, whatever I was
observing, that was always the ethos. And when you joined, Nvidia just released the 1080,
which was a 11 terraflops card. The B200 is 20,000 taraflops at FB4. Did you internally always
No, and like realize that AI was going to be such a compute-heavy story.
You know, the more compute, just the better models you were going to get?
Or is there something that kind of emerged as you kept pushing the loss of scaling?
It was both.
I think there was underlying belief.
The moment people saw Hinton and his students use the GPU the way they did, I think from
that moment there was that realization.
And the field of AI evolved so dramatically from there, right?
Network started getting deeper, the amount of compute that was required, you know,
growing very quickly.
And so there was enough validation data out there in the world, you know, in all the research
laboratories and all the industrial and academic researcher work, that it was very obvious, right?
For those who were actually carefully watching this, it was very obvious.
And as some of these technologies became superhuman, right, like image understanding, for example,
right, we saw as networks grew in size and some of these capabilities becoming superhuman,
it was very obvious, right?
So when people say AI is the killer workload, you know, that was something that was easy to see for NVIDIA from the inside.
And so always been the notion that, okay, how can they help the world with this kind of compute at the lowest power consumption, right?
And so that's always been like one of the ethos, you know, at NVIDIA, where they always do that.
And they always keep out competing themselves.
A GTZ did just enter the new competition, which is kind of the Cloud Rantime competition,
with NIMS and all of that.
Any thoughts on that?
You obviously work in a massive organization
and you're probably the target customer
for a lot of these private model deployments and inference.
Do you think this is an interesting twist
from being such a hardware-heavy company
or how long has this been in the works,
I guess, maybe you have some insight into it
a given that you work there?
For a long time, you know,
Jensen has been saying that we are a platform company, right?
An accelerated computing platform company.
I don't want to talk about NIMS specifically,
because I wasn't involved in that.
But if you look at Metropolis, right, there is a similar story there.
So I don't think that this is as much of a surprise to me.
Now, there is a reason that I'm at Capital One, right, and not at Nvidia.
You can only do so much as a platform company.
You can integrate the underlying hardware, the underlying drivers, Kuda, you know, the software stack that makes it really, really energy efficient, compute efficient for these workloads.
but at the end of the day,
you are still dependent on someone else
to take this to market
and actually create value for someone.
There has to be an end customer
who benefits from this, right?
Or there has to be a developer, right,
that benefits from this
or an associate or employee in an enterprise
that benefits from this, right?
I used to do that when I was at IBM research, right?
When we worked with telcos and when we worked with cities
and we worked with retailers and so on and so forth.
And so as it became very obvious that these kind of transformer workloads are here to stay
and we are entering new age with Gen AI of what AI can do for various problems.
I really felt that the way to work on this is if you are in an enterprise, right,
which actually has, you know, 100 million customers, right?
Which actually has those customer engagements and which drive these experiences.
And you have to be close to the data.
You cannot solve these problems in a data.
diagnostic way from a platform site. And so I would say that that's the most exciting part now
for me, which is how do we actually build upon some of these foundational technologies and how
do we solve customer problems, right, and in great experiences for customers?
Awesome. Yeah, we should chat a bit about Capital One. So just to give people background,
you and I first met at New Reps last year. We also did a researcher dinner together. And actually,
your team at Capital One is like stacked.
You have like Abidjid who's working on pre-training optimization.
You have W.
who used to work on Siri at Apple.
There's like a lot of like amazing AI and ML talent at Capital One research.
And I was surprised to hear you had an applied AI research team, honestly.
What's your role at Capital One?
What do you actually do?
Like what does the team work on?
People just think of Capital One is maybe like a credit card company,
but you obviously do a lot more than that.
I'm glad you asked me that question, Alicia.
So if you look at Capital One, right?
They have always been very, very tech-first leader.
There is a large technology organization in Capital One, right?
14,000 engineers work at Capital One, right, as associates.
They have always had extremely good routes in machine learning, right?
They have always been data-driven in how they have approached the entire notion of change banking for good, right, in terms of how they have helped their customers, you know, 100 million more customers, right?
And there is a founder-led culture there, right?
Again, it's kind of similar to Jensen's founder-led culture at NVIDIA.
And when they started looking at what was happening in this space,
they wanted to actually do a serious kind of approach to this, right?
Which was, let's bring some of the best talent on board and let's actually solve this problem
from first principles instead of just being a consumer, right, of some third-party technology.
And so when I heard about Capital One's aspirations in this space, it goes back to what I was telling you about, right?
Which is you can only do so much in building amazing AI platforms.
At the end of the day, you have to convert that into AI solutions.
You know, Gen.
platforms are the same, no different.
And so I joined as the senior vice president for technology for an organization that we call AI Foundations.
And this is a very actually dear term to me because truly is foundational to what, you know,
Capital One will see the value realized in, right, in terms of the customer experiences,
in terms of the developer experiences, associate experiences, so on and so forth.
And so one of the things that we did very early on is we created this new job family
called applied research.
And again, very deliberate, right?
We are focusing our energies on building the best AI team in finance, but our focus is
on applied research.
And so there are certain skill sets that we said we wanted to really bring on board.
And that's, you know, you mentioned Sambith.
So all these people have started joining our organization.
And most of them are coming from big tech, right?
Nvidia, Amazon, Apple.
So the reason that they are attracted to this is the mission, you know, changing banking for good,
the opportunity, 100 million plus customers and the data that it brings.
along the experiences that can be impacted.
And I think all of them realize that the next frontier of Gen AI is in how we actually
solve these problems.
So the only way we can get to solving those problems are when science and engineering,
when research and engineering work extremely closely and collaborate extremely closely.
And this is why we thought we needed to build the organization this way.
This is why we created this new job family.
In fact, we actually have just created an internship program for.
applied researchers and we are going to actually have a cohort this summer, the first cohort this
summer. We are, you know, meeting talent where it is. So one of the things that we have done is we
have actually started a pilot location in San Jose because we know, you know, this is where
one of the hot hubs of talent is. We are extremely involved in academic research partnerships.
We are extremely involved in ICML, New Reps, you know, ICLR, being at these conferences. We aspire to
publish in those conferences, right? We aspire to collaborate with academia. In terms of, you know,
how we are trying to actually grow, we are being very thoughtful in terms of the talent that we are
bringing on. And to your point, people are surprised. People think like this is a fang OSIS,
right, of some kind in a bank. When they look at the people that are working here, you know,
that's their first impression. I'm actually kind of curious, like when you publish, what kind of
stuff do you seek to publish or what kind of trends in academia catch your eye the most?
I know you're not going to really talk about the research that you do internally inside of
Capital One, but what catches your eye in terms of what you saw at New Reps or I say, well?
There are a bunch of things. We can't really talk a lot about some of the things that are
underway, but there are a few things that seem to be very exciting. I think this whole notion
of the world models is exciting to us, right? What does that mean in our world?
How does one build that kind of a world model in our world?
We are very excited about the trend we are seeing on smaller models.
We are very excited with the trend we are seeing in terms of mixtures of experts.
So those are the kind of things that are extremely attractive for us.
The other side of that is we want to see how responsible AI, trustworthy AI,
how does that actually pan out in terms of the research that's coming out from academia?
right, we want to see how temporal data can actually be brought to bear.
So that's one of the things that is of great interest to us.
In terms of new reps, in terms of GTC, right, the other thing that we actually look out for is
how are these getting applied to different domains.
So there is a lot that we can learn, for example, in how these things get applied in the healthcare
domain, right?
Jensen talked about a number of partnerships at GTC in the healthcare realm.
He talked about the language of biology.
the language of the human body.
I think there is a lot to learn from cross-domain application of AI and Gen AI
AI that we actually actively look for when we go out there.
I think when you say temporal, I think there's like time series,
time series, neural networks and stuff about that.
I'm interested in like small models myself,
which is why I have been working on a lot of the stuff that I do.
It looks like there's a lot more research to be done there.
And maybe one way to tackle this as well is, you know,
what is your opinions on open source AI?
Obviously, it's a common good that everyone can use,
but do you think that it's useful,
the current trajectory of open source,
not being really fully open source,
but kind of commercially usable open weights?
Is that useful for enterprises like Capital One?
I believe so.
And I would actually step back even, you know,
without thinking of Capital One specifically, right?
You do want a healthy mix of open source and, you know,
proprietary.
because we do benefit when, you know, these things get the attention and the validation from different data sets internally and externally, right?
So organizations may actually be able to get these models, you know, do some customization, evaluate it against their data sets, form conclusions, right, based on the observations there.
And all of this is going to actually just improve where the state of the art will be.
Now, if you did not have open source, Sean, and if you did.
if we are only dependent on a couple of proprietary models,
I see that slowing down the overall state of the art.
I see that slowing down maybe even the adoption of some of these technologies.
Because we would like to look at this,
whatever the version of the weights that get released, etc.
We would like to actually look at this through the lens of a first principle approach
and examine, okay, what is it in this architecture that is working for us, right?
why is a certain architecture working for us, right?
What are the kind of data sources that went into the pre-training that seem to be relevant, right?
And so as an engineer, as a scientist, you are intellectually curious to always answer why something works or why something does not work.
I think proprietary takes away that completely from you.
And so I think in general, it's in the best interest of the world or some semi-open source, as you said, right, to be a,
available, you know, be benchmarked against proprietary models.
So we at least get a sense of what the bounds on performance are.
Yeah, they're always shifting all the time.
What about GPUs?
You know, one of our favorite pet topics is the race for GPUs to be NVIDIA's customer
and obviously to pad your stock accounts.
You know, do you have to be GPU rich?
Are you hoarding GPUs on your team?
I don't think we are fording GPUs.
I think we are using GPUs.
I think we are being prudent about how we are using.
them. In terms of
architectures, right? I think this goes
back again, Sean, to the architectures.
The transformer workload is
such an interesting workload. The amount
of optimization that
has happened with the underlying
GPU and all the libraries that
NVDA has released makes it feasible
for a lot of experimentation.
I do anticipate
that with the announcements
of the newer releases that
went out at GTC, I do hope
that the ability of our
enterprises to innovate is not limited by these kind of bottlenecks, right? And these are things
that over time, they work themselves out. So I really don't worry too much about the holding.
Neither do I worry too much about the capacity. I think these things eventually work themselves out.
Again, you're very focused on AI solutions, which I appreciate a lot of folks just stop at the
research and never figure out how to make them useful to people. How does this all tie them together
with the more product-driven side of the business.
Like, do you do research that only then ties into something you product ties?
Do you have teams doing kind of like cutting edge stuff?
For example, there's a lot of buzz around alternative transformers, right?
Like, say, space models, like all these things.
It's unclear if they're going to be useful in production anytime soon.
How do you think about allocating resources in your teams?
You know, there's no one solution or one formula, LSU, right?
And these things will change over time, for example, right?
So as we are building up this organization, you know, we do want to make sure that we are addressing some of the biggest challenges and biggest opportunities, you know, as soon as possible up front, right?
But at the same time, we also want to create a culture where we want to explore some of these alternate architectures.
You know, we want to have a higher degree of patience or tolerance for when these things pan out.
But everything said and done, we are still an applied research organization.
So we are going to prioritize things that have the potential to get applied.
And we are not going to be prioritizing things, you know, that are, say, deeply theoretical,
very open-ended 10-year horizon, right?
Those kind of opportunities may not make it to the top of our priority list, at least in the short term.
Yeah.
Do you see parallels and differences with AI research today and maybe before?
the GPD models, like early in those days.
Maybe people have similar opinions about GPD models then, you know?
Like, how have you seen the industry change and, like, also the interest that the
researchers have?
You know, you started in Vedia, right?
And then there was, like, a lot of interest in images, kind of like image understanding.
Maybe tax was like a small thing, you know, and that people maybe didn't really take
seriously.
I'm curious if you see some areas that, like, you think are going to be really promising,
but then maybe people underappreciate think it's, like, just a small area that is not
going to have impactful production fruits. And similarly, if you thought that early on in GBTs,
there were areas that were overhyped that then didn't really pan out. Just curious to get your
thoughts on like where we are today. And like if you see similarities on overhype under hype
in previous eras, and maybe the answer is no, you know, but I was here to get your use.
When I was attending my first new rips, there were a few hundred, several hundreds of us,
right? And you could probably fit all of them in a single large hall, right? We met at new rips in December,
LSU and there were like 18,000 people there.
There has been a tremendous explosion in terms of the number of people that are interested,
the number of experiments that are being tried out.
And I would say we are very fortunate that there is this kind of interest right now.
The thing that you notice is it does go from, you know, some of the seminal moment to
seminal moment, right?
So between AlexNet and the transformer paper in 2017, you would see that there was a huge
amount of innovation on the visual site, right? Convalitional, you know, we had resin it,
all kinds of interesting architectures. And then after 2017, it became again significantly
focused on sequence to sequence, right? So the ideal sequence here, you know, machine
translation, therefore, was text. You know, we keep hearing about maybe there will be a
breakthrough moment on the visual side again, right, in another five years. And then suddenly,
you know, people will start spending more energy there. So I would like to actually go back to the modality, right? So it was the visual modality and now the text modality. And then maybe it's routines, right? Maybe it's some modality in the healthcare space, right, where sudden breakthroughs come about in the next several years. And then you will suddenly see a whole bunch of people looking at gene sequences and so on and so forth. Right. And that interest will spike. So I actually find it.
fascinating that we are growing as a community from where we started. I don't really see it as a
hype versus not hype. I really see it as more of the ability of this modality coupled with
the neural architecture that will be the most effective and efficient at that point of time
in attracting a lot of energy from the academic and industrial research community. So we can only
be better off because of it. So I don't see it as a hype as much as an opportunity.
opportunity for learning more. I will add this and I think it is a good segue for the next part of
what you're going to talk about LSU right, which is I get asked this question often, right?
Why did I move from Nvidia? Like, who leaves Nvidia? And the fact of the matter is that really,
if you have that unfulfilled desire in you to solve the actual end problem, you have to go towards
one of these verticals, right, whether it's healthcare, whether it is financial services. And you have
to be embedded in an organization that actually has a culture for fostering big tech like a fang
like work ethic, whether it's having the data, right, already being in the cloud, having the roots
in data driven and machine learning. So you want to be in a place which allows for that kind
of creativity so that you can actually take Gen AI to its logical next step, right, which is
solving problems so that we change that financial services.
domain for the good, right?
There's benefit to the end customers.
And so when you do that, though, you have to be mindful.
If you are working in places where, you know, you are building recommendation systems,
right, for, you know, what to buy or, you know, what to watch, there are a lot of ways
in which you may not have the best possible answer, the most accurate answer, and you are still
fine.
But when you are in one of these domains that matter, right, healthcare, financial services, right?
you really are solving a harder problem because the tolerance for error of your end customer is going to be much, much, much lower.
And what that does is it forces the research and development to go in specific directions.
You know, you may not be able to just take what's out there in open source and use it as is, right?
You may have to actually innovate beyond what's in the state of the art publication to make it work for this kind of domain.
Right. And so it takes a special class of applied researchers. It takes a special class of machine learning engineers, data scientists, right, that have the will. The last mile is hard. And you have to have that will, right, to solve that problem. And so in some cases, what ends up happening is that these rules and the work we will do may be harder than what you end up doing, you know, in a generic setting or at a platform level only or in a use case, which actually doesn't have.
of this kind of extremely low tolerance for error.
So that becomes, you know, one of the main motivating functions for people to join us, right?
Who really want to actually take these kind of really hard challenges.
I was going through some of your team papers just from 20, 23, some of them in URIF summit,
ICML.
You've done a lot of work like class imbalance on datasets.
So how do you make performing neural networks when like the data is kind of imbalance in certain domains?
You're doing some research on transformer graphs versus like graph neural networks.
It's just kind of like a lot in there.
Any favorite project that you want to shut out, any interesting pivot that you saw come out of your team?
You know, I could choose one, but then it would basically end up being, you know, not fair to the others.
So only on that, you know.
It's like the what's your favorite child?
What's your favorite child?
There's no way to say, you know, one versus the other.
But there are a lot of interesting trajectories.
of exploration here, right?
If you look at the superset of what all these things are trying to do,
we are trying to look at data sets that are very unique to our domain.
We are trying to look at attributes that are important to us, time series, tabular,
these are things that are important, very important.
We are looking at the imbalance problems, right?
There are other things that we are looking at.
And so I wouldn't want to just highlight one.
There are a bunch of things that are of great interest to us.
from our perspective. Let me just not pick one.
That makes sense. And yeah, just to wrap, we always like to ask our guest, who are you looking
for? You know, we got a lot of AI engineers, researchers in the audience, like, who are the type of
people that are going to have a good time working with you and what are some of the open
roles that you have? So let me start with the roles and then, you know, we can talk about
the kind of people who are going to have a great time with us, right? So we have a number of
open roles and they span the entire spectrum from applied researchers to data scientists to
machine learning engineers, AI engineers, right? People who are listening to you right now,
that we really weren't interested, right, in what we are doing. We have roles at various
levels of seniority. We have individual contributor roles. That's one of the things that we have
really, really double-clicked on in the last several months since I've joined. And that was a thrust
also before I joined, but especially in the applied research field, right, we are,
looking for, you know, what would be the equivalent of, you know, principal research scientists,
right, distinguished research scientists, individual contributors. We are looking for fresh graduates,
right, masters and PhD students, you know, just out of school, all the way to people who have,
you know, a decade or more of experience. So really there is a huge spectrum of talent that we
want to onboard and we are looking for. Now, what is the characteristic of someone who will
really come and enjoy here, right? I think I started giving that to you earlier when I said
people who are interested in actually solving the problem. That last mile is hard. So we really
want people to know that they have to have the stomach for that last mile. People who are
good at understanding what the product requirements are. They sometimes make the best applied
researches, right? Because they can understand what our needs are and formulate problems,
change architectures.
We are looking for people who are very good at dealing with ambiguity.
A lot of what we are doing, a lot of the advancements we are seeing, they are empirical,
data driven, right?
And so we need to have that as a skill.
How do you actually build algorithms, systems and solutions that you can show improvement
on our data?
It's great if somebody is doing some architecture or some network is doing great on some of
the benchmarks, right, that get published outside.
But it's equally important.
It's actually more important that we are able to show to ourselves that these algorithms,
architectures do well on our own internal data sets and benchmarks and evaluation.
Right.
And so to that point, how do we actually come up with meaningful evaluation frameworks and
methodologies, right?
That itself is something of great interest to us.
So people in general who like to solve problems, problem solvers.
people who like to go from theoretical to practical and people who are not afraid that the empirical data may not bear out their best idea.
And they have to go and rethink it, right, and redo it.
Those are the kind of people that will really do well here.
Yeah, it was great to hear your story and not a lot of people in the world, I think, that have the same depth of experience in AI.
So this was awesome.
Yeah, a lot of people they are looking for are also like the kind of AI engineers that we want to encourage.
So, yeah, thanks for sharing your thoughts.
Thank you, Sean.
Thank you.
