Sharp Tech with Ben Thompson - Nvidia Searches for a Moat at AI Woodstock, The Blackwell B200, Microsoft’s Deal with Inflection AI

Starting point is 00:00:04 Hello and welcome back to another episode of Sharp Tech. I'm Andrew Sharp and on the other line, Ben Thompson. Ben, how you doing? Doing okay, Andrew. How are you? I'm doing well. We got a lot to get to so I don't want to actually waste too much time with preamble here. I'm just going to say that people can email us at email at sharptech.fm.

Starting point is 00:00:26 We're going to do mostly AI today. So if you've got non-AI questions for Monday's show, send them in over the weekend and we can circle up next week. I'm also saving some fun TikTok emails that will wait to address if the Senate ever actually moves forward with a vote. So who knows how long we're going to be waiting before we address any of those emails. But we'll all find out together. But as for today's show, Ben, again, just a crazy week of AI news. So we're going to start with the Wall Street Journal who writes, the Nvidia frenzy over artificial intelligence has come to this. Chief Executive Jensen Wong unveiled his company's latest chips on Monday in a sports arena at an

Starting point is 00:01:14 event one analyst dubbed the AI Woodstock. So before we get into this substance, can I just get your thoughts on the scene there? Because it really does feel like the church of Jensen is growing more and more with each passing week. So what did you think of the scene in the scene in the scene? San Jose Sharks Arena as NVIDIA unveiled the what is it the Blackwell Chips the new generation here Yeah it was it was interesting

Starting point is 00:01:41 I mean on one hand it's you know it just It's kind of cool like you know nerdy sort of chip stuff and it's you know sort of like this this rock band sort of thing like there was the Zoom out from the upper levels of this entire stadium being filled There is a bit that makes me a little nervous

Starting point is 00:02:00 It's like, you know, there's a good argument to be made that Nvidia in particular continues to be relatively, you know, I would certainly say as a few months ago, undervalued. Even today, if you sketch out sort of the possibilities going forward and, you know, apply to the sector as a whole, particularly if you really think through the potential implications of AI and the amount of productivity it can unleash, you know, if you were doing an expected value analysis, that incorporates. the upside potential and the mid, you know, you do like sort of a sensitivity analysis, whatever it might be, that upside is so high that today's expected value is really, really large and, you know, and probably larger than even people are anticipating. Now, of course, you can do that with lots of things. You could have done that with the Internet back in the 90s, right? And all of those expected value calculations were correct.

Starting point is 00:02:59 It was exceptionally valuable. The issue, the challenge is timing. It's like, when does it actually happen? When does it actually become meaningful in the sort of way that that sort of people expected to? And, you know, the sort of bubble, as it were, was in some sense, the sort of the money and the expectations getting ahead of the reality and a lot of the infrastructure that be put in place to sort of make that happen. And there was definitely a sense of you go back to the mid-20s. I spent a lot of time saying there is no bubble. This is ridiculous.

Starting point is 00:03:31 All this is real because I think a lot of the 2010s and a lot of valuations that happened there was actually the realization of the promise of the 90s. And that all proved to be correct. It was not a bubble. Now, got pretty crazy during COVID for lots of reasons that, you know, I've been discussed ad nauseum, including all the injection of liquidity sort of into the stock markets broadly and valuations went crazy. And then there was definitely a correction that was not.

Starting point is 00:03:58 very fun. But number one, that sort of inflation and collapse was nowhere close to what happened in sort of the late 2000s, although it has had impacts on things like tech employment and things along those lines. So I'm not sort of dismissing it at all. Yeah. But this sort of nature of this rock concert event, it's sort of, this is the first one, at least to me, that feels very bubbly. It's like this is not normal This is like I don't think Again I don't think we're there yet But you can really see a path

Starting point is 00:04:35 From here to there And you know You can imagine looking back at this I drew an analogy to the Windows 95 sort of launch Which again the bubble didn't burst Until another five or six years after that Right no you threw out

Starting point is 00:04:48 The Windows 95 launch as one comp And then you mentioned that iPhone launches Used to feel like this In the early days of the iPhone And so that's sort of the two sides of the spectrum there. You know, maybe it's Windows in 1995 or maybe it's the iPhone and it's going to become this mainstay on the landscape going forward. Right. Well, I mean, but I think it's fair to argue that it's going to be bigger than the iPhone, right?

Starting point is 00:05:12 Like broadly speaking, AI sort of as a whole because we're not just talking, you know, Nvidia is a particular manifestation of the queerest sort of beneficiary of this wave right now. and whether they're the only one or will maintain their position is certainly something we can sort of get into. But this idea of AI being like the Internet or bigger than the Internet, of being sort of the, you know, as Jensen Hwang would say, the next industrial revolution. Like you think through the implications of a tremendous amount of work being reduced to not zero marginal cost yet. Inference costs are expensive. Energy is sort of a big deal. but relatively speaking, the widespread societal impacts of that.

Starting point is 00:05:57 I mean, again, smartphones had a major societal impact. But how did that societal impact sort of manifest? Very heavily in the information space for sure, very heavily in sort of the outgrowth of that. I think the smartphone had a big impact on our politics. There's a big impact on people's like day-to-day lives and all those sorts of things. But the jobs that we do are by and large kind of the same over the the last 10 to 15 years, whereas AI is much more sort of industrial in its implications than maybe the smartphone was.

Starting point is 00:06:31 Yes. Well, I was just entertained by the scene. And it feels like for a while there last year, there was a constant stream of viral tweets where it was like, here are 12 ways that chat GPT can change your entire life right now. And now, like six months later, I feel like every other week we get a viral thread that's like here are Jensen Wong's rules for living and building a successful company. I mean, he has great rules for living, honestly. I know.

Starting point is 00:06:59 I enjoy it. Yeah, he used to come on trajectory before sort of chat GPT. And this was one of the bits I was sort of getting at with my article, which was, you know, GTCs used to be different. Like, like they would talk about the chips, but they were really fun because they would like talk about like 15 different things that were all pretty crazy and wild. Right. Could I just say what thing? You cited his 2022 GTC. And I remember I had just started working with you. And I was trying to process everything that Jensen Huang threw out at this 2022 GTC. Because that was back when NVIDIA was pitching the metaverse and all these different possibilities. And my brain was like melting trying to process it all. And his energy was infectious though, even though I wasn't totally, it wasn't all downloading for me. Right. Well, because I think part of the issue that's much clearer to see in retrospect is Invidia believed they had this new way to compute that was super important and would make all these things possible. And all the stuff they were talking about was not unrealistic. But they felt they had to sort of help people get down this road. Look, you can do this new kind of computing. You can do this sort of parallelized computing that lets you do all these new sorts of things, this new new sort of approach. But like we need to help you. Here, we'll make a library.

Starting point is 00:08:21 for this, we'll make a library for that, we'll do X, Y, Z so you can get started because it's too hard for people to even imagine the possibilities or know where to begin. And that's why you had, you know, to do a twice yearly GTC is, you know, most companies do like once a year announce, whatever, they're doing twice a year and there's unveiling all these sorts of things. And it's like, well, how can they do that? Well, they're doing it because they've built the software ecosystem. They're just adding like, you know, a few APIs that unlock certain things.

Starting point is 00:08:49 it's all built on the same sort of foundation. And, you know, the contrast to this event was this event was a chip launch. Why? Because you're, you're doing generative AI. That's why. Like it's pretty crystal clear what this event was about. Now, they talked about other things. They had those sort of robotics on stage.

Starting point is 00:09:09 They had a brief little bit about self-driving. They had all, you know, there were aspects of the old GTC in there. But it felt to me the nature. of this was different because, as I sort of put it in there, it's pre and post sort of product market fit, right? Everyone knows what paralyzed, you know, computing can be good for. It can be good for trading AI models and doing inference on AI models. That's a very good use case.

Starting point is 00:09:36 That's a use case people are interested in. And in that context, the message that Inveda needs to deliver is not, hey, look at this, you should come build on this. It's sort of, it's more almost the aspect of, in old GTCs, it was Jensen talking to you, trying to convince you of these possibilities. In this case, it was actually Jensen standing there receiving the adulation of the crowd that is with him and that is already convinced. It was a very different sort of sensation. Yes, okay. So let's get into exactly what they announced.

Starting point is 00:10:13 Bloomberg writes, Chief Executive Officer Jensen Wong took the stage to show up the new Blackwell Computing Platform, headlined by the B200 chip, a 208 billion transistor powerhouse that exceeds the performance of Nvidia's already class-leading AI accelerators. The chip promises to extend Nvidia's lead on rivals at a time when major businesses and even nations are making AI development a priority. And Ben, nothing makes me feel more like a normie than some of these AI conversations. So my understanding, based on the presentation, is that the Blackwell B200 generation of GPUs are physically bigger and an upgrade over the H-100 generation. And that creates the possibility for a faster inference for LLM models, which essentially means faster response times and then a giant

Starting point is 00:11:11 reduction in cost and energy consumption. So two-part question, A, is that accurate? And then B, practically speaking, what would the B-200 make possible that isn't already possible today? Well, first off, I would like to clip the sort of last 15 seconds there, send it to Andrew Sharp in 2020. And just let him just let 2020 Andrew Sharp roast 24 Andrew Sharp. Here we are. Yeah, here we are. Well, I mean, I would say the B200 does not strike me as being particularly transformative. There is a pun embedded in there, which I'll get to in a moment.

Starting point is 00:11:52 It is, it's a bigger, faster chip. Or it's not actually that much faster. It's mostly a bigger chip. But that's fine. Like you go back to say, again, to call back to the 90s, Intel was mostly just releasing faster chips every year. But that was actually really important because we were already. ready building capabilities and programs that assumed there would be bigger, faster chips, and it was an enabling sort of thing.

Starting point is 00:12:15 There was no need to reinvent the wheel. There was just the need to do what they were doing sort of better and faster. Now, there are improvements. Like, this is the first chip that is, I'm sure, I would imagine the development started before CHAPT, but definitely a lot of the development has happened after that. There's specific sort of bits in there, like a transformer engine sort of thing that's tuned for the sort of transformers that are using these models. So it is more performant beyond just sort of its increase in dye size.

Starting point is 00:12:44 There are some interesting aspects about this size increase, though. Number one, this was supposed to be a three nanometer chip, which is the process that iPhones are used on or iPhone chips are built on. One of the problems I've talked with this before is the iPhone three nanometer process is actually a bit of a dead end. TSM went down this road and realized they couldn't scale it beyond what it was. and so they sort of went back to the drawing board and restarted. And Apple is basically like, look, we're not waiting another year for three nanometer. We already waited a year longer than we wanted to. We will take that dead end and we will make chips on it.

Starting point is 00:13:22 And TSM's like, hey, you're willing to pay for it. We will do that for you. But almost everyone else that was going to be on three nanometer. It was delayed. And so three nanometer has been pretty delayed, relatively speaking, for TSMC. They were like on a two-year cycle. the faulty, not faulty, but dead end three nanometers took three years and the actual scalable three nanometers took more like four years. And that ended up being too late for Nvidia in part because invidia has accelerated their release schedule.

Starting point is 00:13:52 Like they were releasing the, you know, chips every two, two and a half years. This is coming out, you know, less than two years since the introduction or about two years since the A100, which only came to market about 15 months ago or something, or the H100, or the H100. that only came to the market relatively recently. So, Nvidia is definitely looking to move faster, but they're running into the fact that TSM is moving slower. So it ended up not being three nanometer. It's four nanometer, which the current H-100 is. And the implication of that is, I mentioned earlier, speed,

Starting point is 00:14:26 it's not really that much faster, sort of, if at all. And the way GPUs work is they're relatively very, very simple processors compared to CPUs, which are quite complicated. They just have a lot of them. And they are suited to jobs that are massively paralyzable, where you're doing the same operation all at the same time. And that's the case with AI. You're doing a ton of calculations.

Starting point is 00:14:51 This is a probabilistic sort of based things. You're calculating all the different possibilities than choosing the one that's probably most likely. You have to do all those calculations at once. And so to get performance, the more you can spread that out sort of the better. And so this is happening in multiple ways. So number one, the chip is bigger. Now, the chip being bigger is a problem for manufacturing because the larger the chip, the more surface area it takes. The more possibility there is for a mistake to happen.

Starting point is 00:15:18 And so what they're doing is actually two dyes, two chips that are actually a little smaller than the H-100, then they're tied together with this interlink, not dissimilar to what Apple is doing. I suspect with the sort of, there are max chips and ultra chips, or I think it's the ultra chips that are that are, that are, that are, that are, that are. along these lines. You know, it will take some independent testing. Like the claim is this is totally coherent in that there's no, like the processor doesn't care or know which, which diet's on. It's just totally seamless. There's this whole bit about processing is you have to know perfectly where the bits

Starting point is 00:15:55 are sort of at all time, that sort of coherency. And a lot of challenges come, like, for example, we go back to the GROC sort of thing. One of the reasons why GROC is fast is they use S-RAM. in which you know perfectly the state of memory at all time because it's sort of it's locked in place. It's called static. The S is for static. That lets you go faster because there's no sort of quantum

Starting point is 00:16:21 or not quantum, like there's no uncertainty about where a particular bit is at any time. Regular memory has to be refreshed all the time or else it will lose its position. So it has to have a charge going through it. When the charge is going through it, you have a moment of uncertainty where you don't know the actual state of what things are in.

Starting point is 00:16:37 And so that slows things down. You have high bandwidth memory, which these chips are using to have massive throughput. That actually increases uncertainty. It increases latency and the time that this sort of happens. So there's all these sort of complexities that go into these different choices as far as architecture. It's really important for this to be as perform as they claim that there is total certainty between these two dies. So for now, we'll assume that's the case. Again, I'm sure people will work to sort of verify that.

Starting point is 00:17:05 So, but number one, your fastest possible processing is going to be on one chip. And so making that chip bigger, you can do more calculations and accomplish more because it's all in one place. But as Jensen Huang will tell you again and again, the idea here is not that you're just building a chip. You're building a system. And so the larger system, your Nvidia has this technology called EnviLink, where it's super high bandwidth sort of connections between different chips. So you can have a bunch of chips that are linked together, and they have the NVLink 5, a new version of it, they have this NVLink switch to get these larger systems. But the idea is you get these larger systems that are multiple chips. And now each chip can't know independently what sort of every other chip is because it's not that level of sort of tied together.

Starting point is 00:17:51 But when you're running these language models, you're doing a gazillion identical calculations at the same time. All those calculations in isolation are serial. You have to actually finish the calculation. But you can paralyze them so you can do them all at the same time. Then when they're all done, you compare them. Everyone sort of adjusts themselves based on the calculations. Then they do the next one. And you have to have this communication between all these different chips that sort of lets you go from A to B to C to E, but just do it like really, really fast.

Starting point is 00:18:24 So they have these larger systems where you have multiple chips that all work together. And then they have entire racks that are like a. bunch of these all glued together. And the idea is from a programming perspective, this is very hard to program because you're trying to get the same thing happening on all these chips all at the same time, you can using Kuta sort of address the entire rack as if it's one chip. And it will just figure it all out and do all this sort of calculations sort of at once. So the larger you can get, the more paralyzing you can do, the more.

Starting point is 00:18:58 So this happens for two parts of AI. Sorry, I'm going way into the weeds. No, no, no. I was mostly following along. It was underwater for a little while there. But now, so the larger the chip, the more powerful, the model you can power would be my guess. Yeah. So one of the major constraints here is just sort of the speed of light in that the more chips that you have,

Starting point is 00:19:21 it actually takes time to communicate with them. So number one, actually part of their system, it's wired up with copper, which the implication is if you're using, fiber optics, you have to sequence the lights. And that is costly and it introduces more latency because you have to actually take the time to get stuff sort of on the same sort of rhythm as it were. So everything you can do to more tightly link this stuff together, the more time you can spend actually calculating stuff. So the challenge here is you have these very simple, very fast serial processors, they calculate something. And the reality is then they kind of sit for a while. Now, sitting for a while is a matter of nanoseconds, but every moment that they're not calculating

Starting point is 00:20:06 is a waste of time, it's a waste of money and a waste of energy. So one of the huge challenges here is keeping these things fed. And so the larger you can make it and the more coherent you can make it, the more consistently you can fill it. And this is super important when it comes to training because these training runs are astronomical. So like, like, you know, GPT4 was trained on like 18,000 or 19,000 A100s, for example, which Jetson Wog said was about 8,000 H100s or maybe actually no, might have been 30,000. I can't remember.

Starting point is 00:20:38 It's a very large number of A100s. One of the, like, there's just huge challenges in sequencing that all up because you're doing all these calculations in parallel. Then everything has to communicate with each other. Okay, where are we at now? okay, let's all do the next calculation. And actually you're wasting a ton of time in that coordination section before you actually do the next calculation.

Starting point is 00:21:00 And stuff breaks down, chips go bad. One of the technologies they introduce. The energy is crazy expensive. Sort of like self-checking sort of thing. Yeah. And it's expensive. It's complicated. It's hard.

Starting point is 00:21:10 And you're limited in the size of the model just by physics of how much you can do. This matters because at least to date with transformer-based models, there is a a linear relationship between the size of the model and the quality of the model, which means the bigger you make the model, the better it is. That's actually all there is to it. Now, there's questions that does that scaling law go up forever? It seems to, we haven't hit the wall yet, which means to get better models, we need to make bigger models. To make bigger models, we have to overcome the laws of physics. We overcome the laws of physics by having chips that are bigger that can hold more internally.

Starting point is 00:21:50 You have a bunch of bigger chips together that are more tightly connected. You can do that. And so, you know, the, according to Huang, a GPT4 level takes 9,000 H-100s. It takes 2,000 B-100s. Now, the implication there is not your training GPT4 models. It's that what if we took that same 9,000 and trained a new model? Suddenly you have the potential to be four times the size or actually maybe even larger. So to the extent you're constrained by physics, you're constrained by the,

Starting point is 00:22:18 this speed of light, you're constrained by these coordination challenges. The way to get larger and advance is to have more perilism and more efficiency within those constraints. That's what the B-100 will do. And that's what the C-100 or whatever the next sort of model will do. It will sort of keep moving down, keep moving down this route, even though, even though it's not actually faster. This is where it's different than, like, back in the day, programming was by and large,

Starting point is 00:22:45 single-threaded. What that meant was there's just a string of commands and you have to execute every command and then do the next one, XYZ. And so the way to do that better and faster was to have a faster processor. So in the 90s, Intel would just come out with faster and faster processors. You go from 336 megahertz to 66, from 66 to 100 to 100 to 150 to 150 to 150 and you get up to 1 gigahertz. We kind of topped out at like four gigahertz. And actually that was getting way too hot. They could never get that shrunk down.

Starting point is 00:23:13 We've kind of settled in probably around two to three gigahertz. is sort of a sustainable number. Within that, then, they've had to do all these crazy things with CPUs to make them more efficient. They do things like branch prediction. We're going to actually calculate every possible one and then choose the right one. And if we get that wrong, they'll have to go back and restart. And they have these very long pipelines that are very deep and complex. And if you get it wrong, it's called a miss.

Starting point is 00:23:38 And you have to go back and do it again and it slows you down. GPUs, again, they're the opposite. As CPUs have gotten more and more complicated and longer and longer pipelines, lines, the GPUs are super short and super sort of compact and they just do it really fast. And the answer is to do a bunch of them all at the same time. Now, again, it's really hard to program. It's hard to take a program, which is almost all programming to date, written for a CPU world where the CPU takes care a lot of that to actually write it for the parallel world. And the old GTC was trying to find the use cases that fit this new kind of computing.

Starting point is 00:24:12 Growing use cases against the wall. Yeah, that's simply the vibe I got. We now have a use case. And so now it's just sort of accelerating that use case sort of going forward. Okay. So the other aspect of what Huang was saying that caught my eye and that you included in your article, the NIMS portion of the presentation. Can you explain how NIMS fits into Nvidia's announcements this week and also just the future

Starting point is 00:24:38 of the business model? So looking back, I was a big proponent of and wrote like I've been, you know, an Nvidia bowl sort of for a very long time. Because of this CUDA layer, this idea that Nvidia is in many respects best understood as sort of the Apple of chip making. They are not just a chip company. They're an integrated sort of offering that integrates hardware and software. And they make

Starting point is 00:25:05 their software freely available, which is, which is CUDA. And again, it's hard to program for this stuff. And so CUDA made it easier. It's by no means easy. Anyone that has to work with Kuda, it's really, really hard, right? But it's much easier than it used to be when you had to do all the paralyization yourself. And so to help make it easier, they would build all these frameworks on top of like if you want to do, you know, microbiology, not microbiology. What's the, what's the bio, geo, whatever. You want to do weather modeling.

Starting point is 00:25:33 You want to do sort of automotive. You want to do all these sorts of things. We'll give you the set of libraries that that sort of goes into this and make it sort of a little bit more approachable, a little bit easier. But the assumption, and when Vida was pitching to at that time, were people who would program in KUDA. They would actually go and build out new applications. The implication of finding the use case for all this stuff is that the vast majority of people are not going to be programming on KUDA. The people that are going to be programming in KUDA are the ones that are actually building, like training these models and setting up these inference clusters. And that's a much smaller number of people whose decision making, they will put forward the investment to figure out a way to maximize their total cost of ownership.

Starting point is 00:26:24 So I think an analogy here, you go to something like X86, right, for Intel. And by and large, when Intel's playing in a consumer space and everyone's programming to that, they have this real software moat, which is you can redo stuff. you can redo a bunch of low-level stuff to run on CISC or run on sort of arm or whatever it might be. But why are you going to do that when just the Intel process is sort of getting faster and faster and it's fine, right? And you can use an AMD process or where it might be. You fast forward to the cloud era where you have these massive astronomical hyper-scalers, and you have two problems for Intel. Number one, there's a lot of low-level stuff that was X-86, but it was actually Intel-specific.

Starting point is 00:27:08 It really only worked with Intel check. And so AMD is like, well, it's hard for us to break into the data center because who's going to reprogram all the stuff? Well, if you're an Amazon or you're a Microsoft or you're a meta, it is well worth the expense to get your low-level stuff working on AMD because then you have a competitive landscape and you can choose sort of the best processor. That goes further to arm. Like Amazon has spent massive amounts of money getting core level stuff working on arm so they can have their own Graviton arm. chips that they can make available at a much lower cost, yet with higher profit margins. And that's a real problem for Intel. When your buyers become very large, your buyers are heavily incentivized to undo your moat,

Starting point is 00:27:54 particularly if your mode is software. Like, they're no longer motivated by it's hard and will cost a lot of money because the hard and cost a lot of money is weighed in balance to spending billions and billions and billions of dollars, right? Suddenly the calculation changes. And so in the case of NVIDIA, this is the long-term sort of issue. To the extent they're selling goes to these large crowd providers is the extent to which they're losing leverage in the relationship because those buyers are willing and capable

Starting point is 00:28:27 of getting rid of you, not getting rid of you, but getting rid of your moat of making it so we're not coup de-dependence. Yeah, guess what? Programming parallelism is really hard. We'll figure it out. We're not going to be completely dependent on you. And that was never really an issue before generative AI because it just wasn't a big enough market.

Starting point is 00:28:47 No one was motivated enough to do it. Now everyone's extremely motivated to do it. And so there's an issue where invidia's had this beautiful software moat, but that beautiful software moat was a function of the market not being that large. And that's why I called the article waves and moats. There's this wave, if you have a moat, a nice little bit of, water on your castle and you're hit with a tsunami, your boat's not very meaningful anymore, right? It's completely overwhelmed. Well, and I thought that your article captured that dynamic well,

Starting point is 00:29:19 because to the extent that the rock concert vibes signal that we're in the middle of a gold rush and Nvidia is at the center, that also signals that like the faster this industry grows, the more incentive that creates for competitors to undercut NVIDIA on price and go after the mode and even some of the hyperscalers to start developing their own GPUs if they can. So it almost looks like a race

Starting point is 00:29:46 where NVIDIA is now going to have to try to devise enough software solutions to try to create lock-in regardless of what competitors are offering. And then on the other side, you've got companies like AMD and Intel who are just going to be fighting tooth and nail to get a piece of this.

Starting point is 00:30:02 market as Well, AMD in particular, but also all these companies are building their own AI chips, right? Everyone's trying to do this.

Starting point is 00:30:09 So, Invinia is doing a number of things, all of which I sort of touched on in various ways today. So we'll start with NIM. The problem with Kuda as a layer of lock-in is that that is not

Starting point is 00:30:23 at the most high leverage. It's high leverage, but it's high leverage because it's used by developers. Developers are employed by companies. And they can, do something else, right? It's annoying to switch, but they can switch and they can be directed to switch, right? Figure out something else. Figure out a way where we don't have a coupa

Starting point is 00:30:44 dependency. And so that's, that is happening. It's like people are trying to figure this out. This is too expensive. The part, who can you not tell what to do? Consumers, the high end, the people that are using it, this is the aggregator point. This goes back to our argument. and debates about antitrust relative to these big players. The challenge when it comes to an Apple or Google or an Amazon is that their power is not derived from a captive audience that can be directed where to go. Their power is derived from the individual decisions of millions and billions of users. And you can't easily tell them what to do.

Starting point is 00:31:22 And so you end up with these, you know, all this mess in the EU with Apple is just a replay of the mess of the EU and Google. It's a replay of the mess of the EU and Microsoft where their problem. is the phrase I always use, they're pushing on a string. They're trying to make, their fundamental issue is that consumers are choosing an outcome that they believe leads to an anti-competitive market,

Starting point is 00:31:46 but the cause is the consumer choice. And so you're trying to undo consumer choice by regulating the underlying platform, and it's just, it's not going to work. And so that's where Nvidia wants to get. And that's a great business for Apple. That's a great business for Google, and that's what Nvidia is trying to create.

Starting point is 00:32:02 They want to get above the stack. They want to give above the models. And so what NIMs are is this idea of like, look, it's hard to actually make this model. It's hard to actually get this sort of thing. Like there's a lot of work and investment that goes into it. It would be better if there's sort of everyone's working together and sharing all these sorts of things. So imagine you have, you have SAP or you have service now, right? And you want to interact with it.

Starting point is 00:32:25 Well, you could build some sort of custom integration with it so your company can sort of talk to your, your, your, your, your, your, your, your, your, your, your, your, your, your, you're, Now and get sort of the answer, what did it be better if there was just a Service Now module? You could go get it and you can integrate it and it will link with all the other ones. Why do they link together? Because it's all natural language. Like you don't actually, like this potential is actually very real. This idea that you can have all these things and link them together. There's a huge coordination challenge with APIs because you have to actually get the APIs right.

Starting point is 00:32:53 You have to get it communicated because computers expect perfect instructions. If it's not perfect, it's a bug. The idea of AI is it can get close enough. It can figure it out in the probabilistic. area overlaps to a sufficient degree that you can get the right answer. And so, Nvidia's, you know, what they're saying here, set aside our framing here, just look at it from consumer perspective is, look, you get it, you company understand AI is important.

Starting point is 00:33:21 You want to make this work. Do you want to go higher sort of Accenture or Deloitte to build out some sort of AI system for you? That's going to be difficult. They're probably not going to do a very good job. and then it's going to have to be upgraded and you're just going to be, no, let's all work together and get like sort of, you can just get this module and get this module and it'll just work, right?

Starting point is 00:33:40 And it'll be iterated. It's kind of like a cloud service. That's why it's called a microservice. This idea, it's an ongoing, kept up sort of thing that sort of all works together. And number one, I think there's a market for that. I think it makes sense. Number two, what is the key about this market that Nvidia is seeking to build? all the models they are going to host, all the NIMs will only run on

Starting point is 00:34:06 Nvidia chips. And there's your lock in. There's no reason that needs to be the case. That is for sure like someone else could build this market. Right. But that is more that is broadly there is no reason it has to be Nvidia only. It is in video only because Nvidia is trying to rebuild a moat. And and like there's just no sort of ifs, ands or buts about it.

Starting point is 00:34:28 And they're trying to do it now because right now, Jensen Wong could be on stage like a Taylor Swift concert and he can get people on board. And so it's just this all-out sprint, this all-out race to redig this new moat that's much further afield from the castle. And that in the long run, people who are so eager to get on board this AI thing and face real challenges in implementing it and figuring it out, this will be a much easier solution. Okay, so Google, speaking of Nvidia competition, Julia says, Ben and Andrew, you might remember me from my previous TikTok email about AI photo sharing and printing. I appreciated you guys reading that on air. And no, I'm not Sundar Pichai, just the PM who launched most of those features while at Google. As a side note, I totally forgot that we accused Sundar Pichai of ghostwriting Julia's email about Google on AI photos. That put a huge smile on my face. Thank you, Julie. Now I work elsewhere, she says. But my question today is, am I the only one who sees Nvidia's recent gains as a missed opportunity for Google TPUs? Do you think Google ever had a real shot at this market?

Starting point is 00:35:43 My take, Google has been working on a hardware problem since 2015. But because they never properly invested in the TPU software stack, their performance edge wasn't enough to ever get widespread adoption. So just real quick, I figured it's on topic. What do you think there? I talked about the difference between CPUs and GPUs. GPUs are much simpler, but they're still programmable. You can still sort of like the, remember, this all started with Nvidia gaming GPUs that was used for like, image recognition.

Starting point is 00:36:15 And suddenly it was like, wow, this could this paralyzable is really useful. We're going to develop this. But it's all been the same sort of sort of story there. A TPU is even simpler. So now the even simpler means it's cheaper. produce and it's even more scalable than sort of a GPU is. Right. It's more efficient than GPUs, right? Well, yeah, but it's doing less and it's less capable. It's less flexible. That means

Starting point is 00:36:40 all this stuff's a trade-off. The fact that it's simpler and more scalable means it's even harder to program for. This idea that they never properly invest in the TPU software stack, arguable, yes, Google has a longstanding problem of having internal solutions that they do a very poor job of sort of externalizing, but it's also the case that just fundamentally speaking, TPUs always will be much harder to program for than sort of than GPUs. Just as GPUs are much harder and more difficult to program for than CPUs. Now, TPUs are probably closer to GPUs, the GPs are to CPUs, but the sort of the tradeoff very much applies.

Starting point is 00:37:23 Secondly, Google is always, first and foremost, focused on their own needs. And so they do optimize for their internal sort of needs. And then they try to sell it outside. They've never been good at going on and understanding the consumer and tuning it and understanding it. So yeah, those are all valid criticisms. But the other thing to remember is that at the end of the day, the better business, at least in theory, is hosting all this sort of stuff and offering it up as a service, right? Like the reason to buy GPUs, which are astronomically expensive is because your payback period is still very, short. It's actually shorter, I think, the lot of the process is you charge so much for it

Starting point is 00:38:00 on sort of a rental basis. And the idea is you're going to get mass usage and which fills up all, again, you want to keep them busy. You don't want them sleeping, right? And so there was a hilarious tweet going viral about someone like speculating about how GPT couldn't be as big as it was because of the cost of GPUs and talk about like 50 people using a GPU. No, these GPUs are filled constantly with tens of thousands or millions of people like, and part of it, the whole orchestration of channeling results and keeping these GPUs full is a huge sort of challenge and a massive problem and something that Microsoft and opening I had to work through like in the days and weeks after chat GPT when they were overwhelmed with sort of demand.

Starting point is 00:38:38 But it's this massive pair. So all this is like at the other day, having it all together and being differentiated in the top end with the software used to access it should be more sustainable. Like, Nvidia's business looks fantastic today, but there's a reason they're scrambling to rebuild this software most. Because at the end of the day, Nvidia is winning because they're the fastest and they're getting a lot of margin for that.

Starting point is 00:39:05 But that's a hard thing to sustain in the very, very long run. Software is where a difference should happen. And aggregation and network effects is where true modes are really, really built. So Google is unique in that they've been at this for a long time. The fact they're so far down the road with TPUs is very impressive. But I think it's perfectly rational and appropriate for them. to focus that on their own. Google wins not by becoming a chipmaker.

Starting point is 00:39:32 Google wins by Google services dominating this new era. And they can win in this new era because they have scale and cost advantages that make it viable to roll out. This is the context of why would Apple partner with Gemini for these large scale models? I suspect Google is the only entity in the world that could actually do this because they have this built out, right? And that's a pretty great and sustainable and sort of long run opportunity. This is why no one dismisses Google's opportunities. And in fact, many people still favor them in the long run for AI. And you go back to this expected value, the expected value of AI, if it manifests as the huge

Starting point is 00:40:13 sort of world-changing technology that it is, is going to be much greater than these profits invidia is making by selling chips. Like it's going to be in the sustainable long-run sort of services. is that's what Google is focused on, is focused on. And I think that's a very reasonable approach for them. It fits their company. It fits what they do. They're not suited to be a chip maker or chip seller.

Starting point is 00:40:33 It's fine. And they're very well placed because of TPUs and what they did with them. And, you know, if they had been trying to sell them, they'd be competing against Nvidia, which has a, you know, you think Kuda looks hard. Suddenly Kuda is the easiest thing in the world when it comes to, oh, after you TPUs and sort of sell this. Right. No, we're trying to, like, so.

Starting point is 00:40:52 I think it's fine. I don't think there's any flaw in Google's strategy with TPUs by and large. Now, the software should be better. It should be much easier for people to get started. I think Google's concerns as they move slow and they don't get stuff going. It's been clear that Transformers are a big thing. It should be much easier to get started. But they do have a lot of, it's worth noting, a lot of very large AI startups are on Google.

Starting point is 00:41:20 And those are the sort of companies that have the capability. and the motivation to figure out how to use TPUs effectively, and they found the trade-off, the investment to get TPUs working is worth it in the long run. Interesting. Okay. Yeah. I mean, if I'm understanding correctly, though, software is critical to creating a market for any of this hardware, whether it's GPUs and Kuta or TPUs and software and being able to program to TPUs. Is that right? Okay. So in the early days of a market,

Starting point is 00:41:51 this is sort of classic Clay Christensen, the integrated sort of product wins because everyone's trying to figure out what to do. And so the more you can solve the problems internally and reduce the work that needs to be done externally, the better. So in the case of Nvidia, they have this integrated Kuda and sort of the chip. Once a market is defined,

Starting point is 00:42:10 what happens is you get modularization, where you get separation in the stacks because you want competition on the stacks because there's high motivation to reduce costs, so you want to increase sort of competition. There's high motivation for standardization because the powers that be are pushing for that to spur this sort of competition. And so in the long run, you would expect the development of the software layer being independent of the chip. And, you know, just sort of up and down the stack.

Starting point is 00:42:40 Right. This is the play for like AMD, for example, right? AMD software stinks. It's what AMD is focused on in doing a somewhat good job of being helped along massively by Microsoft and meta and particular, who are highly motivated to find an alternative to Nvidia is getting like their like Pi Torch and the various open source stacks working well on AMD so that they can sort of substitute that in if need be. That's the goal. The system integrators, the big players, want to modulize everything so they can switch stuff in and out, thus spurring competition

Starting point is 00:43:16 in the market so you get lower prices and higher performance and you get the margin as a system integrator instead of the individual component maker getting the margin. Nvidia is fighting against that. They don't want to be just a chipmaker. They want to be a system provider themselves. So this is the huge fight and tension that is going on in the market right now. Okay. Well, speaking of companies with scale, we have the other big news of the week here.

Starting point is 00:43:43 Greg says, I just saw the news about Microsoft acquiring inflection AI, which brought to mind your February 28th daily update, Mistral, Microsoft's investment, generative AI, and customer support. How does this development update your thinking on Microsoft's strategy or the foundation model market more broadly? Ben, what do you think?

Starting point is 00:44:05 There's one obvious high little takeaway. And I wrote a long daily update about this yesterday that got into, like, the whole inflection thing's very weird. I also had my browser crash. And so I called it at a point. Everything about this is weird. Well, so it's weird. So just a quick thing on inflection.

Starting point is 00:44:26 Why is a startup buying 22,000 GPUs? Like, what, how are you getting leverage on that expense? And they didn't have an API. Like, it's not like they're getting other developers to use it. Like, they're trying to get a consumer product. The whole advantage of the cloud is that you only pay for what you use. So if you're, as you're trying to get penetrated, and product market fit. And meanwhile, your monthly visitors are like,

Starting point is 00:44:52 Stratory level. Like you're bearing this astronomical cost. It's like a foundry that builds up, spends $20 billion and then it gets no orders for chips. Well, can I summarize, let me try to summarize the mechanics of what happened there. And you can correct me if I'm wrong. But your breakdown on Stratory was great. And it would be impossible to recapitulate in detail on this podcast, particularly like 45 minutes in. But what I took away from it was that inflection raised a bunch of money and then spent all that money on 22,000 H100 GPUs from Nvidia for reasons that aren't completely clear. And then when inflection failed, the GPUs they'd purchased were still incredibly valuable. So Microsoft appears to have entered into a licensing

Starting point is 00:45:41 agreement. Just to be clear, the GPUs, that was, so that was the, meat of the article because it's just weird. Why are you buying this? And no, Nvidia was an investor. So, Nvidia is very interested in them buying GPUs. And Nvidia's also investor in Corwee, where apparently the GPUs were housed, but it's actually managed by Microsoft. It's just, the whole thing is weird.

Starting point is 00:46:01 It's the market is so warped. Well, it's so warped because Nvidia controls it. And Nvidia is not selling these GPUs in a free market. Jensen Huang is choosing who gets to buy GPUs. And, and so you, that's why my belief is, they go to host DGX Cloud, you can get GPUs. Oh, Corweave, look at you, Corweave. Oh, you'll never do.

Starting point is 00:46:22 You'll never compete with us. Here's a whole bunch of GPUs. Have a bunch. Oh, Inflection. We'll invest a bunch of money, but instead of renting GPUs, you have to buy a whole bunch of them and host them on Corweave over here. Well, and what's craziest to me, I mean, so Microsoft appears to have entered into a licensing agreement that will make

Starting point is 00:46:39 inflection products available on Azure, but will also allow Microsoft to use the 22,000 H100 GPUs. But it's crazy that Microsoft, a $3 trillion company, it speaks to how constrained the supply is for GPUs. If Microsoft needs to buy their way into GPU access through this like convoluted licensing agreement. Sorry. I don't want to dwell on the GPS.

Starting point is 00:47:06 I don't know. No one knows the status of the GPUs in this deal. Okay. I don't think Microsoft did the deal for GPU access. It's more of the GPU business. it is really weird. And so I want to dive into it. And also my summary of Microsoft's motivations was like one tiny paragraph.

Starting point is 00:47:22 Microsoft wants a hedge against Open AI. That's the answer. Like that's the actual most important takeaway. You could ignore the rest of my daily update. That's the only paragraph that matters. Microsoft was they looked sort of like doom in the eye in November and realized our entire company's strategy is being built around AI. It's being built as the Azure.

Starting point is 00:47:44 go forward, we're spending billions and billions of dollars for GPUs to support this. Every one of our products is integrating this. This is the key to our company going forward. Our market valuation, $3 trillion company, the last $1.5 trillion or whatever it is, is predicated on AI. And we don't control it. And not only do we not control it, the entity that we're partnered with, which we have no influence over at the end of the day, at least from a legal perspective, is bat shit insane. Like it's this nonprofit, which is kind of a farce all along, and no one knows who's in charge, and they're all at war with each other.

Starting point is 00:48:19 And what are we doing? How did we get ourselves in this position? Now, they muddled through that because at the end of the day, all that matters in AI right now is who has GPUs. Okay? Like, this is the takeaway of this episode as a whole. He who owns GPUs controls the world. That's not going to be the case forever. Like, at some point, we're going to have enough GPS.

Starting point is 00:48:41 PPUs in the world. Then the question is going to be energy, like, who actually can power all this sort of stuff. But in the long run, Microsoft had sufficient leverage over OpenAI to get through November. But they were at long term risk, and it was arguably reckless to be in that position, particularly in the long term. And so what we see this since then is Microsoft giving open eye the middle finger and saying, you can't do anything. We've already established this. You are dependent on us. and we are going to do everything in our power to at least have a drop in replacement and maybe replace you at some point.

Starting point is 00:49:18 And so what they did in this point, they acquired a team. They acquired a model, license and not acquire whatever. Sorry, they didn't acquire. Sorry, after you the right words. They hired a team. They licensed a model and they maybe have access to a whole bunch of more of GPUs that are already serving that model. This idea that those GPUs that perplexity is going to keep going to. Is not going to concern?

Starting point is 00:49:38 Perplexity. I keep on perplexity. I did a great update too. They call inflection is going to be an ongoing concern is absurd. This is a company that's dead in the water. I didn't look into their fundraise when it happened. I usually don't focus on startups to that degree. It was clear at the time.

Starting point is 00:49:52 It didn't make any sense. Like, again, this massive expenditure for a consumer-oriented model, if you don't blow up like chat GPT dead, you're dead. And they didn't. And they are. But Microsoft's like, yeah, we need an alternative in place that we control. That's their motivation. Now, is it the right model?

Starting point is 00:50:08 Is it the right team? there's been clouds that have been around the sort of leaders sort of exits from various companies over time. You know, TBD will see how that works out. But the motivations are crystal clear. Well, but doesn't Microsoft have a ton of power over Open AI? For now, we're talking about the long run, right? Open I can at any point declare that the current model is AGI and Microsoft doesn't get access to it. Right?

Starting point is 00:50:36 That's in the deal. Yeah. That's what we went over in the Elon. episode. I'm just, I recall from the Elon lawsuit, a lot of boasting from Satya about how much power they have over Open AI. They do for now. But that's not just, you have to be thinking, if you're Satina Dadella, you need to be thinking about 2040, 2050, 2,500 when presumably AI runs the world. You can't be dependent on, you know, open AI needs to make it at 2025, much less make it to sort of like 2,500, right? And that's not sustainable. And maybe Microsoft always knew that and something like this was always the plan.

Starting point is 00:51:15 Maybe they were just woke up to it in November. But this is very clear, you know, I would love to be a fly on the wall of future Microsoft opening at meetings. They, the alignment between the two companies seems low and lowering sort of overtime. You know, we'll see what happens there in the long run. I suspect there's a lot of tension and a lot of, you know, I think Microsoft's like, yeah, we, there was probably a lot of bending over by Microsoft. for a long time to opening eyes wishes because making it work. They were the hot new thing. Yeah.

Starting point is 00:51:45 Well, it was in, they were. And it was essential to Microsoft. And I think after November is Microsoft like, we can't do this. This is reckless to sort of be in this position. And I think this is the outcome of it. Now, speaking of warped markets, they should have just bought inflection, right? They would make all this much cleaner. But we have a situation where your girl, Lena Khan will sue.

Starting point is 00:52:09 meta for buying an exercise app for VR. And what does that mean? Of course they will win the case, but it will take two, three years, at which point the moment's passed. And so we're stuck in this ridiculous situation where they hire the whole team. They signed some sort of licensing deals that the investors write, I can imagine investors in inflection is like, yeah, this isn't going anywhere. We're going to have to write it off to zero.

Starting point is 00:52:34 What you do in normal case, you would sell, they probably have one ex liquidation preference. You sell the company for however much was put into it. All the workers are compensated by getting very good jobs at a big company. They'll have an earn out of four years. This is how Silicon Valley works. You go out, you take a risk, you raise a much of money, and everyone's downside is protected. We talk about VCB downside of zero.

Starting point is 00:52:56 Actually, it's mostly a downside if you get your money back, right? And you get your money back by the big company acquiring them. And whatever technology was built, maybe it's thrown away. Maybe it's actually rolled out to billions of people at once, which is great from a, societal perspective. Pye could still take off here. I'm not giving up hope. Right.

Starting point is 00:53:12 Like what's being worried? I mean, it's, but, but everyone realizes that there is no sort of logic being applied to the prosecution of acquisition in tech. Well, but in effect, they still have acquired. They've essentially acquired this company. In a drastically less efficient way, in a way that by and large benefits insiders, I can imagine a reason this worked just because one of the outside investors in inflection or the biggest one was Greylock where

Starting point is 00:53:40 Reid Hoffman's on the Yeah, Reid Hoffman's on the was a co-founder. He's like chairman of the board. Happens to be on the board of Microsoft as well. By the way, everyone going forward, get Reid Hoffman, right? Because he'll smooth out an exit strategy, right? For you. And, you know, did Microsoft,

Starting point is 00:53:58 are they paying as much of this quote-unquote licensing deal as if flexion was on the free market? Might Apple have been interested? Might Amazon have been interested? Who can say? Who knows? We're in this situation, like with GPUs, you have these convoluted deals that are bad for everyone, that are low transparency, that is not a free market because the limitation in supply means Jensen Huang gets to warp the market according to his preferences. In this case, because you have a regulator coming in and just being unrealistic and insisting on prosecuting everything, whether or not it's going to win, you don't have a free market and it's not good for anyone.

Starting point is 00:54:34 Well, this was a very interesting conversation because I was under the impression that inflection was a failure after drunkenly purchasing 22,000 H100 GPUs. And then the GPUs wound up being valuable, sort of like how spectrum, like radio stations can fail, but the spectrum is insanely valuable. So everyone gets rich anyway. That's what I was imagining for the investors who put up $1.5 billion for inflection. But if, you. inflection as OpenAI insurance is way more intriguing and does make some sense based on the way the last 12 months have played out for Open AI and Microsoft. Yeah, this is good feedback for my writing. Maybe I needed to repeat my one paragraph 10 times to make sure that it landed. It wasn't dismissed because it was just in there. No, that's the deal. The GPU aspect is wildly weird and like almost more entertaining. It's just so bizarre.

Starting point is 00:55:31 Where are the GPUs, right? But the other thing, these are. are these are now old GPUs. The B100 was just announced. It's not in the field yet, but it's coming. There is a broader bit by all accounts. Old GPUs do hold value. People are still grasping for access to A100,

Starting point is 00:55:47 it's a new two generations old, just because it's so hard to sort of get access. The idea of Microsoft being at the back of the line of any line is crazy to consider, but Nvidia has so much power and can only produce so much that even companies like Microsoft have to wait. Microsoft's pretty high up. I think Nvidia's two largest customers are Microsoft and meta. So they're not like, you know, and I think Nvidia has been very nice to customers.

Starting point is 00:56:12 They have been very aggressive in rolling out their preferred solutions and those sorts of things. Nvidia sells a lot less to AWS. Why? AWS doesn't want to implement Nvidia systems. They want Nvidia chips to go into their system, their networking, their control. And so AWS gets fewer than Nvidia GPUs. It's Jansen's world right now. Every week on Twitter, advice for living.

Starting point is 00:56:33 Yeah, I mean, it's pretty hard-nosed. Like, it's pretty like bare-knuckle sort of like stuff that is going on under the surface here. And it's very logical. Again, Nvidia has a window of leverage right now to rebuild their moat. And they are doing everything they can to do just that because you have to not just think about the next five years or next five months. You have to think about the next amount of time. And it's hard to stay differentiated as a pure hardware maker. You need that sort of integration.

Starting point is 00:57:04 The one bit about this that does matter is I don't think it makes sense for Apple to build a GPT4 or Gemini level sort of model. Just the they're so far behind. It's not suited to what they do well. And the cost to build out the infrastructure to serve it. If you think that we're good to world that devices don't matter, then yeah, maybe they should. But then it's like, why does Apple even matter as a company? That's just sort of like wish fulfillment in that I want Apple to live forever. If we're in a world that devices don't matter, there's no reason for Apple to exist.

Starting point is 00:57:35 And I like that they are not building this out. There's also the bit there they could not build it out because Infidia's not going to sell them any GPUs because they hate each other. Which is a fun little twist. I hadn't considered that aspect of it. But yeah, we had that discussion of commoditize your compliments on the episode Sunday night. And then like four hours later, Mark German broke the news about Gemini and Apple, which is still technically a rumor. but makes a lot of sense given Apple's positioning. It doesn't make sense for Apple to build their own,

Starting point is 00:58:05 to build their own sort of at-scale model. And it makes perfect sense for Google. Google is the only company that could potentially meet this sort of need. What I think, I couldn't remember if we had talked about it on the podcast or not, what I think Apple is doing is they are building models, but they are super focused on models that run on device. And that's their leverage. They want to make their devices more differentiated,

Starting point is 00:58:29 more valuable. And there's definitely you can envision that the more we can get stuff on device, like we didn't get to the energy question, but energy is the overarching thing

Starting point is 00:58:38 hanging over all of this. We're rolling that over to next week for sure. It costs a lot. It costs a lot and that is the ultimate limiting factor. And to the extent you can put it on device,

Starting point is 00:58:50 it's actually as a whole less efficient, but from a, it's like you're externalizing your energy costs, right, to the consumers who are playing into the wall

Starting point is 00:58:59 and using it. And by the way, get better performance in many respects. The response time's way faster. The latency is way lower because you're not having to do like a full trip around to a data center somewhere. But your capability is much worse. And I talk about the scaling bit. Right. The issue with the scaling bit is it requires a lot of memory. Memory and memory is expensive. Memory locally. Just wherever the model is being hosted. And so you're limited. in the amount of memory that's on a phone. Now, Apple's unified architecture is actually beneficial in this regard. They have more memory available to their GPUs than a traditional separate CPU GPU sort of architecture might. You know, so at least in theory, you will see how it actually plays out. But, you know, Apple, I think, is focused on getting models that work locally and then for the really big stuff, offloading that to a partner and Google makes total sense

Starting point is 00:59:58 as a partner. Not as much since as meta, by the way, as I've noted, meta is the perfect Apple partner. They're perfectly aligned. Yeah.

Starting point is 01:00:06 They just can't figure it out. Iron out those differences. Apple, fighting with Jensen Huang, fighting with Mark Zuckerberg. You know, everything, what goes around,

Starting point is 01:00:15 comes around. Should have been nicer. Well, one question as we close out our own personal AI Woodstock here. So when you say Google is maybe the only partner

Starting point is 01:00:25 that would have made sense other than meta, Is that because of Google's scale? Like they're the only one that could handle the iPhone audience if you're talking about introducing a model? First and foremost is scale. Google has the capability. Dylan Patel, all-time great coinage. He talked about the GPU rich and the GPU poor. Google is the GPU ultra-rich sitting on its yacht in the Mediterranean.

Starting point is 01:00:51 And their wealth is unquestioned. Their willingness to work is kind of the bigger. you, right? And they, because of TPUs, because of the, and the scalability, and they've been building this out for a long time, and they're ramping up and accelerating more and more, right? It's not a workplace or culture issue to put in a big order for a whole bunch more TPUs, right? Like, they are, and so they are getting the infrastructure in place and they, now it's probably to be a challenge when and if this watches with an iPhone, right? We have, like, it's fascinating from a society standpoint. Apple, when they throw

Starting point is 01:01:27 their weight behind something and want to feature a model like Gemini, I guarantee you overnight, the adoption is going to explode. And I think it's going to become a more central part of the conversation for normies like me who don't know the difference between a TPU and a GPU or didn't before the AI gold rush here. But that is going to be a really big deal when it happens. for sure right now all this constraint is downstream of losers like me having a perplexity app and a chat gpti app and going to them and willingly using them creating the data sets the world over yeah that's right um so i think only google could do it from an infrastructure perspective number two google and apple are just very aligned like like google is a services company apple's a hardware

Starting point is 01:02:24 company. There was a mistake in the Google made a mistake in the smartphone era by thinking they were competing with Apple. They weren't. And they've course corrected over the last eight to 10 years. Apple, you know, Apple gets to rage against Facebook and personal data and give Google a protected place on the phone that's not subject to ATT. And then Google pays Apple. That money is made by personalized tracking, but it's laundered through Google, which is great. Apple. doesn't have to take the brand hit of saying the British monarch was like an American Indian or whatever the Gemini was doing. Oh, look at the Google model over there. They got to make sure it's branded heavily Google when Gemini is serving answers. Yeah, Grummer and I were already on this like, no, he's like, I don't think, I think the white label, like, no way. This is like, this is going to have the biggest Google. It'll blot out your whole screen. Yeah, exactly.

Starting point is 01:03:20 It's going to be great. Well, much to look forward to in 2024, you know, here we go. But I've enjoyed our tour through the AICs. I'm sure we'll get into more next week. I do want to talk about the energy consumption angle because that is a fascinating aspect that I feel like is going to become a central part of the AI conversation, perhaps as soon as the next couple of years here. But for now, send us questions, send us takes, email at sharptech.fm.

Starting point is 01:03:50 and Ben, I look forward to getting into more next week. Sounds good. I'll talk to you then.

Sharp Tech with Ben Thompson - Nvidia Searches for a Moat at AI Woodstock, The Blackwell B200, Microsoft’s Deal with Inflection AI

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.