Bankless - DeepSeek R1 & The Short Case For Nvidia Stock | Jeffrey Emanuel
Episode Date: January 28, 2025China’s new DeepSeek AI model, which reportedly matches GPT-4’s performance at 1/45th the cost, has rattled the AI hardware market and contributed to a 20% dip in Nvidia’s stock price. Investor-...technologist Jeffrey Emanuel argues that DeepSeek’s efficiency gains aren’t the only story, as his viral 12,000-word article “The Short Case for Nvidia Stock” also catalyzed the market’s panic. In this episode, we explore how these converging factors could unbundle Nvidia’s once-unassailable lead and drastically reshape AI compute economics. Jeff's Article: https://youtubetranscriptoptimizer.com/blog/05_the_short_case_for_nvda Jeff on X: https://x.com/doodlestein ------ BANKLESS SPONSOR TOOLS: 🪙FRAX | SELF SUFFICIENT DeFi https://bankless.cc/Frax 🦄UNISWAP | BUG BOUNTY PROGRAM https://bankless.cc/Uniswap-Bug-Bounty ⚖️ARBITRUM | SCALING ETHEREUM https://bankless.cc/Arbitrum 🛞MANTLE | MODULAR LAYER 2 NETWORK https://bankless.cc/Mantle 🌐CELO | BUILD TOGETHER AND PROSPER https://bankless.cc/Celo 🎮RONIN | THE FUTURE OF WEB3 GAMING https://bankless.cc/Ronin ----- ✨ Mint the episode on Zora ✨ https://zora.co/collect/base:0x4be6cd4d402fed49eb2de95fbc8e737e8ffd3e7f/23?referrer=0x077Fe9e96Aa9b20Bd36F1C6290f54F8717C5674E ------ TIMESTAMPS 00:00 Start 05:20 Intro To Jeff 07:27 The Issue With DeepSeek 16:00 Inference Compute Explained 24:51 Nvidia's Competition 29:14 DeepSeek's Impact On Valuations 36:46 Criticism Around Jeff's Thesis 44:17 DeepSeek's 45x Upgrade 51:30 The Transformer Explained 01:03:15 Why Is Everyone Shocked? 01:12:07 Synthetic Data 01:21:30 Why Jeff's Article Went Viral ------ Not financial or tax advice. See our investment disclosures here: https://www.bankless.com/disclosures
Transcript
Discussion (0)
I basically was trying to help my organic search ranking of my little YouTube tool.
And then it's like in the process, I may have inadvertently contributed to $2 trillion
getting wiped off global equity markets.
Because, you know, the fact is all of the news headlines came out saying the stock
market crash because deepseek.
I'd like to point out that the Deepseek V3 technical paper, it came out December 27th.
A month ago.
That's a month ago.
Even the newer model, the R1 model that does the chain of thought, that paper came out a week ago, and people were all over that.
So why suddenly on Monday did everything crash?
And I'd like to think it's because I'm pretty sure it is that I wrote this article in a way that sort of speaks to headstrong managers so they can understand it.
And I published it like in the middle, you know, the night on Friday.
and then it started taking off,
and then it got shared by Chammoth,
who has, you know, whatever, 1.8 million, right?
And Chimath, and it's been viewed over 2 million times.
Naval-Rav account has 2.5 million.
And then, like, the Y Combinator, Gary Tan,
and the Y Combinator account.
Between them, they have millions of followers.
And not only did they share it,
but they were, like, very effusive in their praise about,
this is really smart.
and that went praising.
Everyone is talking about this new Deep Seek AI model from China
that is reportedly 45 times more cost efficient than U.S.-based eye models
and charges 95% less money to use than ChatGPT.
As a result, Navidia is down 20% wiping out $600 billion in market value,
and both OpenAI and META's AI labs are scrambling to discover
how a relatively unheard of Chinese AI lab was able to outperform their,
very expensive models with a Chinese grown model that just cost $6 million to train. The guest
on the show today is Jeffrey Emmanuel, who actually thinks that this part of the story,
the DeepSeek AI model part, is over indexed on. And it's actually a confluence of other factors
that is contributing to the unbundling of Navidia's market share. And it's not the release of
DeepSeek that triggered the 20% drawdown, but instead a 12,000 word article that he wrote on his
blog that quickly went from just a few handful of readers to over two million readers,
over the weekend that actually coincided with the 20% drop in Nvidia price when the market opened
on Monday. In this episode, Jeffrey and I go through his article and reasoning behind why
Nvidia is under threat of getting unbundled by other chip suppliers in addition to DeepSeek's
impact upon the entire resource supply chain of training and inference around LLM models. Let's go
ahead and get right into this episode with Jeffrey. But first, a moment to talk about some of these
fantastic sponsors that make this show possible. Are you ready to swap smarter? Unitswap apps are
simple, secure, and seamless tools that crypto users trust.
The Uniswap protocol has processed more than $2.5 trillion in all-time swap volume,
proving it's the go-to liquidity hub for swaps.
With support for growing numbers of chains, including Ethereum, Mainnet, Base, Arbitron, Polygon, ZKSink,
Uniswap apps are built for a multi-chain world.
Uniswap syncs your transactions across its web interface, mobile apps, and Chrome browser extensions,
so you're never tied to one device.
And with self-custody for your funds and MED protection, Uniswap keeps your cryptocurrency.
secure while you swap anywhere anytime.
Connect your wallet and swap smarter today with the Uniswap web app or download the Uniswap
wallet available now in iOS, Android, and Chrome.
Uniswap, the simple secure way to swap in a multi-chain world.
With over $1.5 billion in TVL, the M-Eath protocol is home to M-Eath, the fourth largest
ETH liquid staking token, offering one of the highest APRs among the top 10 LSTs.
And now, CEMETH takes things even further.
This restaked version captures multiple yields across Kerak, eigenlayer, symbiotic, and many more.
making CMEE the most fission and most composable LRT solution on the market.
Metamorphosis, season one, dropped $7.7 million in Cook rewards to M.Eath holders.
Season two is currently ongoing, allowing users to earn staking, re-staking, and AVS yields,
plus rewards in Cook, M-Eath Protocol's governance token, and more.
Don't miss out on the opportunity to stake, restake, and shape the future of M-Eath protocol with Cook.
Participate today at m-eath.mantle.xyZ.
What if the future of Web3 gaming wasn't just a fantasy, but something you could explore today?
the blockchain already trusted by millions of players and creators is opening its doors to a new era of innovation starting February 12th.
For players and investors, Ronan is a home to a thriving ecosystem of games, NFTs, and live projects like Axi and pixels.
With its permissionless expansion, the platform is about to unleash new opportunities in gaming, defy, AI agents, and more.
Sign up for the Ronan wallet now to join 17 million others exploring the ecosystem.
And for developers, Ronan needs your platform to build, grow, and scale.
With fast transactions, low fees, and proven infrastructure, it's optimized for creativity at scale.
Start building on the TestNet today and prepare to launch your ideas, whether it's games,
meme coins, or an entirely new Web3 experience.
Ronin's millions of active users in wallets means tapping into a thriving ecosystem, a 3 million monthly active addresses, ready to explore your creations.
Sign up for Ronin wallet at wallet.roninchain.com and explore the possibilities, whether your player, investor, or builder, the future of Web3 starts on Ronan.
Bankless Nation, very excited to introduce Jeffrey Immanuel.
both an investor and a technologist. He, however, is a very specific flavor of both of those things.
On the tech side, he is deeply informed about the research advances that come out of major AI
labs like OpenAI, Meta, Google. And on the investing side, he plays in the markets as a value
investor, one who dares to go short at times. Jeffrey released an article on his blog called
The Shortcase for Navidia Stock, which has been echoing across the tech industry as this new
deep-seek model has fired a shot all the way from China.
across the bow of the U.S. AI industry and has left the U.S.-based AI companies scrambling,
reeling both tradfi and crypto markets as everyone learns to digest deepseek's impact upon the world.
Jeffrey, welcome to Bankless.
Thanks for having me.
Jeffrey, I really enjoyed your article.
I want to kind of start with the punchline.
I want to read one of the last paragraphs in your article that I really felt summed up the entire digestion of everyone's analysis on how the new deep seek model has impacted the market.
So this is actually the second to last paragraph in your article.
You wrote, perhaps the most devastating to NVIDIA's moat is DeepSeek's recent efficiency
breakthrough, achieving comparable model performance at approximately 1.45th the compute cost.
This suggests the entire industry has been massively over-provisioning compute resources.
Combined with the emergence of more efficient inference architectures through chain of thought
models, the aggregate demand for compute could be significantly lower than current projections
assume. The economics here are compelling. When Deepseek can match GPT4 level performance while
charging 95% less for API calls, it suggests either Navidias customers are burning cash unnecessarily or
margins must come down dramatically. To me, Jeffrey, that was the punchline for, I think,
what everyone fell on the market is Monday when Navidia stock fell 17%. To me, I'm summing this up as
there is a tug-of-war between hardware and software. And with the emergence of Deep-seek,
the software side of this tug-of-war got a very large W.
That's my interpretation.
That's my analysis.
Check me on that.
How do you feel about that kind of conclusion?
You know, it's funny because the deep seek is the part that everybody's the most focused on.
But I actually think the whole shirt thesis still works pretty well without that for all the other reasons that we can discuss.
And the one issue with the deep seek is that it's funny, there's this thing, Jevin's.
paradox, which is like nobody was talking about this until suddenly now everybody's saying
Jevin's every other word.
And it's something that comes from energy economics, which is like you think you make things
more energy efficient, great, we're going to use less energy.
But then what ends up happening is that the price of energy goes down and everybody wants
to use more energy.
And so it actually increases demand for energy.
And so everyone's saying now that, oh, this deep-seek thing is.
wrong because of Jevins. And, you know, I am sympathetic to that, to a degree, but it's not always
so clear. And it's not like the Jevin stuff happens immediately. Like, there's often, you know,
sort of what causes booms and busts as these sort of temporary dislocations between anticipated
demand and realized demand. And really, you know, what I think people miss is that
the big decisions about CAPEX come down to a couple people like, you know, Mark Zuckerberg,
and a lot of it is sort of gut feel like Masayoshi's son, like, is this a good time to just push on the accelerator?
And I think someone like Zok has to take a set back and say, listen, I know my guys are really smart,
but maybe, you know, the answer is not necessarily to spend another, you know, $3 billion on Nvidia chips that are very
expensive, you know, where, I mean, literally, like, they're paying 40 grand for a GPU that's
costing Nvidia maybe, what, $3,500 to make. So it's, they're putting a lot of money in,
uh, in Vidia's pocket. And maybe they can, you know, pump the brakes just a little bit and then
see if they can sort of still, you know, because they projected that they needed a certain amount
of chips for their forecasted demand. So if they can, you know, and the deep seek stuff is all
public. So they can look at the technical report. They can start making these changes themselves
internally, theoretically, you know, at least for the next generation of models, their training.
And as a result, maybe they can kind of, you know, because it's, I think there is still some
skepticism on, you know, Wall Street that, like, are they going to see a return on this money?
Because it's not like anyone's paying to use all this, you know, meta-AI stuff yet.
And so I think it's a little,
I'm not convinced that the Jevin, oh, yeah, well, Jevins.
It's like, okay, well, let's see if that's actually the case.
But then, you know, really separately from that, like I was saying,
even if you remove Deep Seek entirely,
I believe that Invidia particular,
and I want to clarify, I'm such a bull on a,
I'm about as bullish, like 99th percentile on AI as anyone,
you will ever meet. I live in the AI feature all day every day. I have three cloud accounts. I'm
like, you know, using, I'm using this stuff nonstop all day every day. So I'm a huge believe,
like, but Nvidia as a company, they, you know, this is just goes back to my sort of training
and investing is that you see, you see this over and over again. With the one exception of a regulatory,
like, enforced monopoly, um,
you do not have companies just get to print infinite profits without, you know, with, you know, triple-digit revenue growth with 90% gross margins.
You don't get to see, and without having everyone in their brother trying to figure out a way to beat them.
And that's what's happening.
And so you look at, you know, these companies, Cerebris and Grock with a queue, like these companies already have extremely compelling hardware that, you know, largely does.
get around the NVIDIA mode, at least for inference, and, you know, in the case of servers
for, I think, for training too. And, you know, there's all these other short, and I mean,
the other thing is like, you know, normal companies of the scale of Nvidia tend to have
extremely diversified revenue sources, whereas NVIDIA, all the high margin data center
revenues coming from like, you know, five hyperscalers or something. Like, it's a very much
power law distribution. And I just, it's funny because when I started writing the article,
which I started writing because, you know, my friend who's a hedge fund guy asked me about it on
Friday. And I just started writing about it. After, as I was explaining it to him, I realized, like,
I should just write this up. And it's funny because it started out as, you know, if I was
forced to make the shortcase for Nvidia, here's what it would be. And by the time I had finished,
I was like, shit, this actually is a short just from, because I, I wasn't, I wasn't,
Like, I knew there was a lot of custom silicon in the works,
but it was kind of eye-opening to me that every single hyperscalor customer
is literally making their own custom silicon, in some cases,
for both training and inference.
So it's like Amazon, Microsoft, OpenAI, meta.
It's like, they're all doing this, and it's like, you don't, like,
and as soon as they get this stuff to work,
the other thing that's so important to remember is it doesn't necessarily have to be better
than Nvidia's stuff, right?
Right?
Because Nvidia is charging 10x what it costs them.
So if you can make it yourself for, you know, 1X what it costs,
then you can cut the price by 50% to your end customers.
You'll still make a huge margin.
And what matters to you as a hyperscaler is, you know,
how many requests you can handle to your APIs and stuff per dollar.
You know, you don't care if you need more chips.
Like, it's fine.
as long as you don't have to pay these inflated prices for them.
And so I think all these – look, I mean, there are other parts of the thesis we can talk about.
But I actually think all of that stuff should be just as much of a focal point as the Deepseek news.
Yeah, maybe to go back and trace over your article, I see your article in two parts.
It's the moat of Nvidia and how it's being unbundled at the margins by the various set of companies, some of which you just mentioned.
And some of these modes are the fast GPU interconnect.
Invidia has had this amazing ability to make their GPUs talk to each other with extreme bandwidths, as if they are one big unit, like one big GPU.
And that is getting unbundled by another company that is just making very large GPUs that reduces the need.
Well, not CPU.
They're making custom spent, like, it's not really a GPU.
It's like this weird mega chip.
Like, I mean, it's funny because it's like the, the, the, you know, the, you know,
H-100 is considered like an absolute unit when it comes to chip size because it's like this massive freaking package.
But then the Cerebrous thing is like they literally took an entire 300-millimeter wafer and just made the entire thing one enormous.
I mean, these chips are extremely expensive to me.
And but yeah, you don't need to worry about wiring things together if they're all on the same wafer, right?
And I actually just want to point out to that even in video,
didn't make that technology.
They bought Melanox,
this Israeli company,
the doubles of the side,
I think they had 10,000 employees
by the time they bought Melanox for $7 billion,
and that brought in like another,
you know,
about the same,
so it was a big,
really smart thing.
I mean,
if they hadn't bought that company,
like they would not be in the dominant position
they are today with data center stuff.
But yeah, I mean,
everyone has been sort of relying on,
oh, yeah, but what about Interconnect?
Even if, like,
even if AMD could get their acting,
and come out with a decent driver and come up with some alternative to Kuta,
they don't have the interconnect, so you can't use it for this.
And you hear that argument a lot.
And I think, well, you know, you're starting to see on the training side,
this company's Cerebrus with the wafer scale chip.
But then also, you know, the other big news that started before DeepSeek
was, you know, the 01 model from Open Aon.
and that sort of unlocked this other new scaling law,
which is about inference time compute,
which is like it used to be almost all the processing power
was needed on the training side,
and then the inference was pretty fast.
But nowadays, with these models that do chain of thought,
you know, the more they compute at the time you give them a request,
the better the answer they can give.
And so people are now saying, whoa,
so actually most of the compute might be on the inference side.
But the inference side is a very different, you know, computer problem that can be, so if you want, like right now, they use the same GPUs for training and inference.
Okay.
Can we just quickly define training and inference?
Yeah.
Training is, like, actually making the model.
Like, you have, like, a zillion, you know, gigs worth of data of, like, text from the Internet, Wikipedia, blah, blah, blah, broken up into these tokens.
DeepSeek used 15 trillion of them.
And then you take thousands of GPUs and you basically learn how to condense all that data down to 99% less space in the weights.
And then in the process, you basically, the models learn these like coherent model of the world and how to understand things.
Because it's the only way to compress stuff that much without losing all the information is to understand it.
Whereas inference is you already have a trained model.
and now I want to ask it, you know, to write me an essay or do a logic problem for me.
And so the inference is a very different problem.
Like you don't need to have insane, you don't need thousands of GPUs to do it because you've already got the trained model.
You just need a couple GPUs maybe and you can get the answers.
And so.
So just to really trace that over one more time.
Like training is like chat, GPT, open AI, creating their products, creating their models that, you know, I go on to chat GPT.
and then when I type in a query, I am doing inference.
And so it's really, there's a weight here of just, like, a ton of compute up front to make the model once.
And then hopefully, like, a little amount of compute to run inference on it, which is just the daily requests.
And, like, in theory, there's, like, a tradeoff here between how much compute you do initially to train the model.
And hopefully that just makes all future inferences as efficient as possible.
But there's still compute on both sides.
It just makes the model smarter.
Yeah.
And that way you get better answers.
But what changed recently, it used to be that basically all the inferences use this sort of moderately, you know,
or a fixed compute budget.
But now it's like open-ended.
Now like, you know, 01 is like their flagship model from OpenAI.
If you pay $20 a month for chat GPT plus, you can use 01, you know, certain number of requests per week.
If you pay 10 times as much, $200 a month for chat cheap.
which I do and I recommend
anyone who uses this stuff a lot,
it's got O1 Pro.
It's the same model as regular
01. The only difference is that
it takes much longer
to respond because
while it's doing inference, it's
using up far more of these intermediate
logic tokens, as it were, this
chain of thought, which is sort of like the
scratch pad of its internal
thinking process. And then
it gives you an answer, but the answer is
better. Like your code will work the very first time.
You won't have any kind of mistakes in your essay or whatever.
Can we go over this one more time?
So like it's the same, it's the same model.
Pro, the $200 a month version and the $20 month version is the same model.
But there's this extra step, there's an extra layer of things happening where the pro version
is running that same model over and over and over again in chunks.
And it is able to go back and trace over previous work to like check its work before it
actually gives you an output.
And you're saying that just because of this.
It's not an additional layer.
It's just like they just do it for longer.
They just, it's like, it's because you basically, it's a dial.
You say, how much money do I want to spend generating tokens before I give the final answer?
And with pro, it's like, it would not be economical for them to use the amount of tokens that they use for pro for the plus.
In fact, Sam Altman said that, you know, it's funny because everyone on Hacker News was like, in the industry, you know, all these developers were like $200 a month to get real.
How could that make sense?
and Sam Altman came out later and said,
believe it or not, we're actually losing money
charging $200 a month because
people are using it and it just uses
like insane amounts of compute.
And so it really flips
the equation in terms of
how much compute is being used for inference
versus training. And then
this isn't really relevant because
like I said, with the
Nvidia GPUs, you buy an
H-100 GPU, data center
GPU for 40 grand from
Nvidia. You're going to use the same
GPU to train the model and do inference on it. But this company, GROC with the Q,
everyone gets confused because GROC with the K. Not the Twitter GROC. Not the Twitter.
Right, exactly. But GROC with the Q should be better known because this company is really,
I mean, they've got unbelievable technology. They basically said, we're not going to try
to solve training at all. We only care about inference. And so if you want to optimize the entire
stack for inference only, how might you approach that? And,
And the result of that is that they can do inference from, you know, like a standard model,
like the Lama 3.370 billion, which was like until the deep seat came out, it was the sort of
leading edge open source model.
And, you know, if you get a fancy desktop computer with one, let's say, Nvidia 4090 GPU,
which you can get for under $1,000 now, you could get, I don't know,
know, maybe 40 tokens per second, which is actually like good enough that you could use that as
your sort of home version of chat GPT that works pretty well. When you try it on GROC, and anyone
can try this for free, you just sign up with your Google account, and you can do inference from
this model, and it's like insane. It's like, instead of like 40 or 50 tokens per second, it's like
1,500 per second. And so you click your thing, and you just, boom, there's your answer. And it's
like, whoa, that's pretty interesting. And so even though that GROC hardware costs like millions of
for one server, if you have enough demand that you can just keep it busy all the time,
it's actually much cheaper to use.
And most importantly, you're not giving your money to Nvidia.
You're giving it to GROC, you know, so it's just an example of how people manage to, you know,
like if you're trying to assault like a castle that has a big moat, instead of trying to cross
the mode and get, you know, shut up by arrows, why don't you like dig a hole under the mode
or do, you know, the catapult to go over it.
You find creative ways to get around it.
And that's what's happening is everybody's been focused on, well, of frontal assault's not going to work.
And it's like, okay, but there are other ways to seize your castle.
And that's what you're seeing is that all the ingenuity of the market of like, because the, and the reason is because the prize is so big that if you can, you know, you too can make your company worth a trillion dollars if you can take a big piece of this pie.
Whereas that was not true in 2016.
It was like a backwater, you know.
And so it's just, the wheels take a lot of time.
Like if you want to make your own custom, even if you're Amazon with infinite money to spend,
if you want to make your own chips, you know, what do you know about making silicon?
First you have to, you know, poach or hire the really brilliant people.
And then it's going to take them probably two or three years to design a really good chip.
And then you're going to have to try to, you know, come with giant sacks of cash to TSM
and try to convince them to give you, like, volume.
at their fabs, because they're like already just being, you know, inundated with money from
NVIDIA and Apple and stuff. And it takes a while to get ramped up. But eventually the chips
start coming out. And, you know, the irony of it is like, again, it's like even though, you know,
none of these custom silicon chips are going to be as good as the NVIDIA chips, the sort of
way they're made is pretty similar in that they're both going to be using TSM as the fab. And they're, and
they're both using the same machines from this Dutch company, ASML, that actually, like, does the
lithography. So it's like, yeah, they won't have the same brilliant design maybe. But again,
they don't need, that's the thing people miss. It doesn't need to be as good. It could be,
it could be one-fifth as good. And it still makes sense for Amazon to use it because they don't
have to pay 90% gross margin to Nvidia. Because Nvidia has been, it has the luxury of having very
high margins. And what that creates is like, well, if your product is 90% as good, but you
you only take 10% of the margins,
then all of a sudden you're solving,
like, a lot of market problems.
And I'm saying when your margins are so high,
like just to put things into perspective,
companies that sell chips,
like in the semiconductor industry,
it's like generally not such a great industry.
It's very subject to boom and bust cycles
of like overcapacity and, you know.
And so if you look at another area like memory,
DRAM, which, you know,
everyone has it in their phones and their computers,
you know, you might think on the surface that this should be like this great business
because there's only basically three companies in the world that do it.
It's like Micron, Samsung, and SK Hinex.
I mean, there used to be like 15 memory companies,
but they all like either went bust or merged.
And so you would think it would be this oligopolistic thing with great pricing and margins.
But if you look at the history of it over the last 10, 15 years,
it's like it's very cyclical.
and at the very peak when supply demand mismatch is really out of whack
and they can charge really high prices,
they make like a 60% gross margin.
And then,
but if you take the average over the cycle,
it's closer to like 20%.
And at the bottom of the cycle,
gross margins actually turn negative.
Negative, right, right, right.
And so then you look at Nvidia and you're like,
you have a 90 plus percent gross margin on data.
Their overall gross margins, more like 75%,
because they make much lower margins
on the consumer stuff
like for playing video games
and that's because they have competition
from A&D.
You know, like that's what happens
in a competitive market.
And so, but my point is that
when your margins are that high,
it doesn't need to be 90% as good.
It could be literally like 40% as good
and it still is a no-brainer
for Amazon to switch as many loads
over to their own thing
because it's like, you know,
it's like when you buy like a handbag
from Eremes for, you know,
40 grand.
How much do you think it costs them to me?
Even though it's made by hand by some French guy,
like, you know, it's probably only like,
it's like two or three thousand bucks tops,
and then they're charging you $40,000 for the end.
And it's like very similar margins for the GPUs from Nvidia.
And so it's like you don't,
what matters is like,
and the users don't care.
They're submitting requests.
They want to use a model, Lama 3.370,
but they don't care if an NVIDIA card is doing,
the inference on it. And so Amazon is going to, you know, Amazon made their own CPUs called
like Graviton. And they are very aggressive with the pricing of that to try to switch people over
from, if you normally use like an Intel or AMD CPU, try using one of our things and you'll
save a lot of money. And you're going to see that work. They're going to try to push people
over to their product by making it more, you know, they're going to basically split the savings,
you know, with the customers.
And I think that's a lot...
And so all that stuff, you know,
it's like death by a million cuts,
like the combination of the competition
from these different areas.
And then, of course, it's like A&D
does compete with them effectively
in consumer stuff,
but they've been completely absent
in this whole data center AI stuff,
which is, you know, it's this crazy,
like, I mean, they're going to be writing
like business school case studies
about how they squandered trillion-dollar opportunity.
You can't get too mad at them
because they also, like, managed to kill Intel.
So it's like...
At the same time.
It's not like they're not good, too.
And it's so funny because Lisa Sue,
the CEO of AMD is like First Cousins with Jensen Huang from a video.
I did not know that.
Yeah, which is just like, how good are these genes in this family?
But so, yeah, I mean, but if they can get their act together,
and it's so funny because it's like, they're so out of it.
Like, I just don't understand it.
But, like, they were literally, like, people like George Hops,
the guy who's famous for jailbreaking in the iPhone.
and all the stuff. He's like literally by himself without any help from them writing his own stack
of that's like, you know, we'll be able to make these GPUs usable for doing at least, you know,
some training and inference. And so you might see even A&D coming up as a real competitor.
And yeah. Yeah. So, yeah, going back to tracing over like the broad strokes of your article,
I kind of break it out into two parts, two halves. There is the unbundling,
of in Vindia's mode in the hardware side of things via hardware competitors, as you've kind of just
traced over. But then also the deep seek side of things is a rebalancing of the value of software
and an algorithm design maybe is one way to put it. Maybe you can take us to the second half of that
equation where how did deep seek really impact people's understanding of the value of software
and its impact on the value of hardware? Well, so, you know, when you say like what is the
software side of the thesis,
It's not, it actually has very little to do with Deepseek.
What it has to do with is one of the sort of biggest source of
Nvidia's mode has been, because you know,
AMD has quite reasonably good, you know, chips.
So the reason is that Nvidia basically was a very forward thinking.
And when they noticed that like this deep learning stuff
was really taking off from like back in with 2012,
and they so they really figured out that we need to,
make it easy to use our chips for this sort of thing.
And so they have this system called Kuda,
which, because you have to understand,
like, these GPUs are insanely complicated.
I mean, in the old days,
you'd have one CPU with one core.
Now, CPUs are pretty complicated.
Like, I have a CPU in my computer has 32 cores,
but these NVIDIA GPUs are like,
they have like thousands of cores.
Right, that's their whole deal.
They have loss of course.
And so it's like very, like, if you were to try
to, like, write code naively to, like,
take your problem and break it up and send it to thousands of cores and reassemble it.
Like, no one can do that, you know, basically.
And so instead you described your problem using these much more abstract, high-level concepts,
and then Kuda turns that into hyper-optimized code that runs really, really well on
NVIDIA GPUs, but not on anywhere else.
And Kuda is a NVIDIA-built software package to allow developers to use NVIDIA GPUs to their,
their best degree possible. Yeah, without being
like, you know, Einstein.
Like, they can be very smart, but
I mean, it's, it's, it's, it's kind of like a driver.
Is it a, no,
it's, it's, it's like a framework for, yeah, the driver
is a sort of separate layer of,
but it allows the power of it, Nvidia,
GPUs to be expressed to more
people without them having to be
yeah, it's like, it's like the difference between
writing code and Python versus
writing code in like, like,
assembler, like, like, like, which is
the lowest level, you know. And so, and then it's actually, so CUDA is even, most people
actually don't even write CUDA directly. Most people use machine learning frameworks like, you know,
used to be TensorFlow, but it's been sort of totally replaced by something called Pi Torch,
which is sponsored by meta. And so that's what most researchers use is Pi Torch, which lets them
think, like in terms of the math and, you know, as a researcher, say, oh, I have this loss function. I have this
optimizer, and everything's like modular and plug and play. And then you write high-level Python code,
which is like very, very high-level. And then internally, Pi-Torch can then run that on Kuda
and then run it on a GPU from Nvidia very, very officially. But if you have an AMD
GPU, it's not as easy to have your stuff run really, really fast, writing using like Pi-torch
and stuff. And so, and a lot of people were saying that it doesn't matter what anyone else does in terms of chips.
If they don't have Kuda, you know, it's game over. And there's like a two, I think, big assaults on that, which is that you're seeing the rise of these sort of even more high-level frameworks for expressing highly parallelized programming. And so you have this one, MLX is one.
There's another one called Triton, and these are gaining, you know, momentum.
And then for that, it's like Kuda is just one, they can't, you can write your stuff in MLX
and then basically run it on an Nvidia GPU really, really fast.
But you could also make another, you know, compilation target of MLX that could run on
a completely different chip, like the one, you know, the Amazon is making internally, you know,
Traneum chat.
And so, and it's also very high-level language.
So maybe it makes, you know, instead of writing and, you know, targeting Kuda,
maybe you should target MLX or Triton.
And then you can also get run it on using Kuda,
but you could also run it using these other things.
And then you're not locked into using the really expensive Nvidia chips.
So that's one assault.
And then the other one, I think, is this idea that,
and I haven't heard about a lot of people talk about this,
But one thing I'll tell, like I use LLMs all the time for programming,
and they're just stunningly good at that now.
But what they're really, really good at is if you already have a working prototype
of code in Python or JavaScript or whatever,
so it can really understand what it is you're trying to do,
they're unbelievably good at porting that to another language.
So if you have this Python algorithm,
and you want to turn it into like Rust or Golan,
They do that unbelievably well.
Like, maybe not on the first shot, but, you know, with a couple iterations, you can get it all working.
And so what that made me realize is that, you know, because the part of what the CUDA thing, it's like about, it's become a lingua franca.
Like everyone who's good at this kind of programming knows it.
And so they think in terms of CUDA concepts.
It's just the fastest way for them to express these algorithms.
And so I was thinking that, like, they could write their code in CUDA like they normally,
do. But then instead of
using it on a Nvidia GPU,
they could use it almost as like a, what is
called a specification language, where it's
just for documenting the algorithm
in a very efficient, elegant
way. And then they could feed
that into an L.M and say, all right,
now port this into this other framework,
which will work really well with
NDGPUs or with, you know,
NDGPUs or with, you know,
Cerebrus or something. And I think you really
explained this well in the article when you
illustrated, there's like a job market for
Kuda engineers.
Yeah.
And it's insular to the rest of like, you know, engineering jobs, engineers out there.
It's very special.
It's like there's this own independent, like, vertical of job markets and like the cost
for these engineers.
And the way that you illustrated in the article is like, well, those walls break down.
And all of a sudden there's just like not really the same monopoly around Kuda.
No, no, it's not that it's, I think they'll still use Kuda.
But the question is like, can they use Kuda but then not use an NVIDIA GPU?
Right.
Which is where the moat comes from.
Right.
Indy gets at least part of its value from.
Yeah.
And now you did bring up a point about like, so the deep seek in a sense is software
because by writing smarter training software, they did reduce the demand.
But I'd say that's sort of separate.
That's like kind of orthogonal, if you will, to this other stuff, which again, it's like,
so if you took away the deep seek part of it, I mean, you can see the big threats to the
mode software and hardware, how is this?
Now, let me just say, I just saw, but right before we started talking, somebody said,
here's why, you know, my thesis is all wrong.
And they're saying that, well, the problem is that TSM, which is Taiwan Semiconductor,
which builds all these chips.
And they're basically the only ones that can do.
I mean, not the only ones because, like, Samsung can also make pretty good chips, but, like,
for the most, yeah.
Yeah, they make all the Nvidia stuff and most of the Apple.
stuff. But by the way, I want to point out, again, it's like, yes, it would be best to do something
in a four-nanometer process node, which is the smallest you can do. But, you know, you could use
like a bigger, like an older process node and your chips won't be as fast and they won't be as
energy efficient. But you've got a lot of room, wiggle room, because you just, you don't need
it to be as good. You just need to be cheap. But, so anyway, but the objection to my thesis,
is that these guys are book solid.
Even if you came to them with, like, you know, giant bags of money, they're book solid.
And the reason is because...
The manufacturer is book solid.
They're backed up.
They have too much orders.
Yeah.
For the next couple years, they don't care how much money you give them because they're all
book solid and they can't, you know, can't just instantly make a new...
Although, I will say, like, you know, Taiwan-Sembe built a fab in Arizona and there was all
stuff about, oh, it's taking them so long and they can't hire good people. But you know what?
They finally did get it all up and running. And they could literally, if there was enough money in
there to do it, they could just copy paste the blueprints, get another big chunk of land
and like just replicate what they just did again. And they could do that. Like, and it wouldn't take
like that, it wouldn't take that long. And so in any case, but so that's the objections that. So even if
everything I said is true.
These companies, Cerebris and Grock,
and the hyperscalers like Amazon and, you know,
Google and blah, blah, blah,
did meta and so that they won't even be able to make these chips
in enough volumes that it's going to dent in VDIA.
And my response to that is like,
okay, your analysis is essentially conceding
that this is a highly sort of transitory
circumstance here that like they're just very temporarily going to have this advantage and then as soon as
the additional capacity comes online or opens up then there's going to be this massive flood of
alternative supply which is going to pressure market share potential you know if the even if the pie grows
the market share is going to go down but most importantly it's like it doesn't take you know something there's
some stuff that has nothing to do with technology. That's just basic, you know, economic,
industrial finance kind of thinking about how do markets work. And the difference between
having basically a monopoly and having even one or two competitors is like the margins really can
fall quickly because it's like, you know, if you have two office buildings that are like, you know,
98% occupied, nobody's, you know, it's a race to the bottom to try to cut your risk.
But like if both of them start losing tenants and, you know, every day that goes by and this floor is empty is just they're losing money.
And so there's just race to the bottom where they just, and there's this critical threshold where, you know, once, let's say the occupancy rate in a market for office, you know, dips below, let's say, 80%.
The rents, it's very nonlinear, you know, like if occupancy falls another 5%, rents are going to fall a hell of.
a lot more than 5% to make the market clear.
And I think you'll see that the margins can fall very, very quickly once they're the real
competitors.
And then the question is, okay, again, this is not about technology.
This is about how do you rationally value a stock?
And I mean, one of my favorite, I mentioned in my piece that, you know, I once won a prize
from this Value Investor Club website for a short idea.
This was like more than 10 years ago,
but I'll quickly tell you the story of it
because I think it's so relevant here,
which is that this was a company called Petrologistics.
PDH was the ticker.
And they were a company that just had a single plant
that took propane and turned it into propylene,
which is a, and through this, like, random,
you know, it's basically because the shale play happened,
all this, like, I don't have to get into all the details.
Suffice it to say,
They were earning unbelievably high spread, much, much higher than historical or like when they built the plant what they ever expected to earn.
They're earning so much that their profit in one year from running this plant was like 80% of the cost of building a new plant.
And it's not like rocket science to build one of these plants.
You can just go to a big construction company like Bechtel and say, I want a conversion plant for propane to propylene.
and they have off-the-shelf blueprints,
they'll make it for you,
guaranteed in a couple of years.
And sure enough,
this company was earning these higher turns,
and people were putting a big multiple on the earnings,
because they're like, look at this,
the earnings have gone up so much.
But you could tell that all these other plants
were already under construction,
and you actually knew approximately when those plants would come online.
And so you could basically figure out,
all right, even if I grant you that they're going to continue earning these massive margins,
it's going to start stepping down in like a year, and then in 18 months it's really going to step
down. And in 24 months, it's going to be right back to the... So if I want to value this as, let's say,
what is the present value of the future cash flows discounted because of the time value of money,
I can do that. I can say big, big profits this year, a little bit less profits next year,
and then after that, normal profits. And, and...
add up the discounted cash flows and you realize like you can't put a big multiple on earnings that
are not sustainable and and right now like so so if you tell me that oh well but you're wrong because
invidia's going to keep earning these huge profits for the next two or three years it's like dude
you're putting like a 30 40x multiple on that that's essentially implying that it's going to
sustain at this rate like indefinitely and that's just not how you know you should think about
this value of a stock. And so it's really, this is why I want to just say like, and a lot of the
Jevin's stuff, it's like, yeah, I am bullish on the aggregate, like the amount of total demand
for inference is going to skyrocket. The pie is going to grow. That's totally separate question
from will Nvidia be able to continue growing revenues triple digit percentage year over year at these
insanely high margins. That's a completely separate bit. And you need to answer that question
if you want to feel comfortable putting such a high multiple on that earning stream. You have to
know that it's going to sustain, and it seems actually quite likely that it won't sustain.
I do want to dive headfirst into the deep-seek efficiency gains part of this conversation,
because I think that's kind of where we should go next. One thing that you wrote in your article,
you said the sum total of all of these innovations, these are innovations referring to the lab that made DeepSeek,
when layered together has led to the 45x efficiency improvement numbers that have been tossed around online.
And I am perfectly willing to believe that these are in the right ballpark.
Maybe you can just like explain the significance of this new chat GPT-like model, Deepseek,
and how it got to be 45x more efficient and what that efficiency, what 45x efficiency means when it comes to the,
industries that are like are the supply chain to create like the usage of these models.
Sure. So look, I mean, it's funny like in in the West, it's like we have this sort of resource
curse, you know, almost of like we have too much money. It's easier almost to just throw money
at a problem to try to be like really clever. And so, you know, the joke or the sort of
parallel I make is like when you look at people's houses in Saudi Arabia, they're not very
energy efficient.
And that's because they get subsidized power because they have unlimited energy there.
And so there's no, yeah, there's no point in wasting all this extra construction cost on double
playing glass and blah, blah, blah.
And it's a similar thing at like meta and Google and they just have so much operating
cash flow hitting, you know, every quarter.
They're like, fuck it.
Let's just, let's just hire more.
Money's not an object.
Yeah, yeah.
Let's pay our people $5 million a year and let's, or whatever, a million a year.
and let's just, you know, send over to Jensen another $3 billion.
And whereas, you know, this, China, it's, they're not getting paid that much, that's for sure.
And, you know, they do have these export control.
Now, I know a lot of people say, oh, they don't, they're smuggling them in through Singapore.
I'm sure that's happening.
But like.
Smuggling chips.
Yeah, because they, first of all, under Biden, they made it, they have basically have like a slightly,
crippled version of the
Nvidia GPU just for the China market
or export market.
That's not as good as the H-100.
But then also,
I mean, what people point to you, which I think
makes a lot of sense, is like something between 15
and 20 percent of Nvidia's revenue
comes from the tiny nation state of Singapore.
It's like, really? They're using that many GPUs there?
And it's like, because everyone knows that they're somehow
getting laundered and smuggled into
China. And so the question is, we don't even know how many
Nvidia GPUs are in China. And so
we don't really know how many DeepSeek used. But the point
is they don't have as many as we do. And it's not as easy for them to get
them. And so they have to... Maybe the punchline you're making is like
that Tony Stark Iron Man meme of Tony Stark was able to build this in a cave. And that's
China. They don't have an abundance of capital. They don't have an abundance of chips.
They have some chips. They have plenty of capital. But they do have plenty of capital.
They don't have the ability to...
And by the way, they're quickly...
That's a whole other story,
but, like, they hired, like, you know,
they poached some of the smartest guys at TSM
to make their national champion
and Smick or whatever it's called.
And they're obviously not there yet,
but, like, they made a pretty good Huawei CPU.
And I wouldn't be surprised if...
I mean, that's the other giant...
wild card that nobody's really
taking into account. Like, don't count out
they got some of the smartest
people from Taiwan semi over
there. And it's like, they'll
buy the machines from ASML too
and, you know. Right. So, but
anyway,
what I wanted to say is that, you know,
their engineers are
A, necessities mother of
invention. But also, you know,
in the West, we tend to have this
sort of bifurcation in the
market where you're either in the
like research track, in which case you have a PhD and you've written these papers and
you're like a guy like does stuff on the whiteboard or whatever. And often these people are not,
they're not very good engineers. Like there's a joke that these researchers are actually
horrible at programming. They're good at math, horrible at writing optimized code. It's not
obviously not universally true. There are some people who are great at both. But so what happens
is usually the researchers think at these high level,
and then they make like a prototype,
and then they hand it over to these people who are more engineers,
like high-performance optimization guys,
people like John Carmack or Jeff Dean at Google,
who, you know, they're not going to invent the new optimizer
or like, you know, some new loss function for AI models.
But if you give them an algorithm,
they know how to make it run really fast,
you know, on a computer.
And so it's sort of like they,
the way we do it is this sort of two-step process in the West
where the researchers design the thing and prototype it
handed off to the engineering department that says,
all right, we have this algorithm, how can we make it go fast?
The deep seat guys are unbelievable at books.
So it's like instead of having it be two teams working one than the other,
it's like they kind of inverted.
it and they started out with, let's start out first with how can we saturate every ounce of
performance on these GPUs so that nothing is wasted. Because it's like it almost doesn't matter
how fast the GPU can calculate if it's waiting to get data that it needs to do the calculation,
then it's just sitting there idle. Okay. And there's a lot of this interconnect, right? There's a lot of
talking to each other. And so normally you have to dedicate a big chunk of your processing power just
to handling that communication overhead.
So they did a lot of really clever work with making the communication stuff as efficient
as possible, so there's very little overhead.
So they basically started with, rather than say, how do I make this algorithm go fast?
They said, how can I make a really, really fast algorithm that'll really run these GPUs
as much as I can?
and then sort of design a smart training system based on that.
So they sort of inverted things.
And so there's just this collection of sort of optimization tricks.
And by the way, I want to point out, like many of these ideas were not invented by them.
Many of them were actually published by American and other researchers, like Noam Shazir,
who just got rehired by Google for a zillion dollars.
They bought his startup just to get him because he's so, he's like, that.
smart. I mean, and, and, but it's, it's, it's, it's into, it's implementing them in a clever way.
And so I'll just give you just a couple examples of this. It's like, so, you know, the, all this whole
Chachyb-T thing really exploded because there was this model design called the Transformer, which came
out in 2017. This is probably the most cited paper in history now. It's called attention is all you
need. And this kind of, it combined the sort of regular neural nets that we've been using for a while
with something called like the attention mechanism, which is this very clever way of like kind of
contextualizing the information so that like instead of always processing it the same way,
it depends on its context. And you sort of automatically learn how to think about that context.
and storing all that data while you're training
is like one of the major things that use up memory.
And the memory, it's very important because you can't use
like the system memory on a computer.
You have to do everything on the, what's called the V-RAM,
the memory that's very fast memory on the GPU itself.
And that's pretty limited.
And so if you can save on the amount of memory you're using,
that's huge because not only can you do more,
with fewer GPUs, but you're also not transferring as much data because it's just smaller.
And so anyway, there's something called these KV key value caches and indices that you need to
keep in memory while you're training a transformer model.
And they came up with this incredibly smart, I mean, this is probably the coolest thing in
the whole paper, the Deepseek V3 technical paper, is that they realize that, you know, really
it's very wasteful how it's done normally,
that you're storing way more data than you need to,
that the sort of only like some very small subset of that data
actually is meaningful.
And in fact, by storing more than you need to,
you're almost like overfitting to like noise,
and it's not necessary.
And so they...
Maybe a simple way to explain this for listeners
who wants some extra help with that
is it's just maybe closer to how your brain works with attention
where when you're applying attention somewhere,
you're not thinking about every single thing under the sun all at once.
You're kind of focusing on what's necessary.
You can't go too far with the anthropomorphizing.
Like attention in this context means a very specific thing.
And it's not, I don't think it's going to help people understand it.
I can't remember if I heard this in your article, maybe a different one.
But it's like if a house has, you know, 20 different rooms and lights are on in every single room,
even though a person is only in one room.
And this new model only keeps the lights on.
for the specific room that the person is in
at that one given time.
It's some loose broad stroke pattern like that.
Sort of. I mean, it's basically, like,
instead of just naively storing
this massive amount of, like, key value data
that shows you, like, it's basically, like,
if you have, like, the word, job,
it's very different if you say, nice job
versus I just got a new job or, you know, like,
you know, are you going to be able to handle that job for me?
Like, don't, and it's like,
knowing so, so like the word job has like a certain representation in the model,
but that representation has to be altered depending on its context.
That's a broad attention about.
And that means that for every word, every token,
you have to really have to store lots of different things depending on the context.
And that's why it takes up so much memory.
And they're able to store that in a very efficient way using, you know,
basically like just storing this sort of subset of the data in a compressed representation.
that's one thing they did that saved a lot of things.
Another thing they did that's very smart
is what's called like multi-token predictions.
So like usually these models,
they predict the next token,
the next word, basically,
based on the preceding tokens or words.
And, you know, one at a time.
And so it kind of is this bottleneck.
And they're like, well,
what if we try to do, let's say, two or three at a time?
Now, the problem with that is you can't really
predict the second token without knowing what the next token is. And so how can you start with the
second token until you know the first token? But you can do what's called the speculative decoding.
And so, but your speculative decoding might be wrong, in which case you wasted your time
computing that second token. But what they did is they got very good at guessing what that second one
would be, such that 95% of the time they get it right. And so they basically, just from that, you can sort
double your throughput on inference because, and by the way, that's part of the reason why they're
able to charge so little for their API is because that's about inference costs. And so they
said that one trick let them almost double throughput for no additional cost, but just by,
so that's a very clever trick they did. And then they did another very clever trick with just
the, you know, these models are basically just a gigantic list of numbers, if you will,
called the parameters of the model.
And they figured out a way
to store those parameters
in a much more kind of compressed form.
And like normally the way these models are trained
is they use more precision.
You can think of it almost as like more decimal places
of accuracy, but it's not actually how it works,
but it's just sort of close enough to understand conceptually.
And then often what they do is once they then train the model
that way to make it,
so that it can run on a cheaper GPU,
they do what's called quantization,
where they sort of then kind of truncate
and round off the numbers a little bit.
But that does hurt the accuracy,
or not the quality, the intelligence of the model.
And what the deep-seat guys did is they managed to,
instead of having to train at a higher precision
and then quantize to a lower precision at the end,
they managed to figure out how to mostly do the entire process
end-to-end using the smaller representation.
And again, it's like, it's one of these things where efficiency gains are,
they pay for themselves so many times.
Because it's like, not only do you use less memory,
but the calculations go faster.
And, you know, and then you don't need to do as much inter-GPU communication
because there's less data.
And so it's like it's these efficiency gains pay off
in multiple different ways.
And so that's another thing they did.
I mean, there's just this whole laundry list
of like little tricks and optimizations they did
that when you add them all together,
and they're not additive, right?
They're multiplicative.
Like each thing, you know, if this thing doubles it
and this one increases it by 40%,
and this one doubles it also,
you're multiplying those multipliers, if you will.
And that's how you can get this very big number,
like 45 times, which by the way, we don't really know.
You know, we don't know for sure.
They could have lied about the number of GPU hours they use.
One thing is clear, though, that they are charging 95% less for inference.
So either they're losing money on that or they really can do at least the inference part
much cheaper than, you know, we can here in the West because, yeah.
that 95% less money for inference, I think, is really the sticker, the shocking number that is sending companies like META and OpenAI reeling, like Sam Altman had to put out of tweet.
No, actually, meta was up, I think, because it's actually, look, on the one hand, it's bad for META in that they have spent so many billions of dollars on GPUs and they pay so much money to their team to, like, come up with the Lama models and stuff like that.
and then it sort of does make them look a little, like, foolish when these guys are able to beat them at their own game for, you know, on a shoe string.
Right.
But at the same time, what they really care about is how much does it cost them to serve AI to all of their billions of users around the world?
And so it's actually good for them if they can cut their cost 95%.
That's great.
Who is bad for is open AI and anthropic because it's going to put more pressure on their pricing.
Like they're not going, right now, right now OpenAI charges a fortune for the 01 model API.
And even, you know, GPT40 is much more expensive.
And so they're going to probably have to respond by cutting, you know, their API prices significantly.
Which is their, that's where they get their profit from, right?
Well, they don't actually have profits.
Both companies are deeply, both companies are deeply unprofitable at the, you know, consolidated level.
And I actually suspect even at sort of, you know, incremental marginal level,
they're not all that profitable because they're prioritizing revenue growth above sort of everything else.
I don't think it's a case where they lose money on every unit, you know, sold on the margin.
You know, any fast-growing company is going to post consolidated losses just because they're always spending on growth and the new model.
And so.
So the real question is, like, what if OpenAI and Anthropic completely stop trying to do R&D
and making the new model and just try to milk the business for money what they have now,
would they be able to eke out of profit?
And I think, yeah, the answer is probably yes.
But if they have to cut their pricing by 80%, then it's very unclear.
You know, so that's where it starts to be pretty relevant.
The Arbitrum Portal is your one-stop hub to entering the Ethereum ecosystem.
With over 800 apps, Arbitrum offers something for everyone.
Dive into the epicenter of Defy,
where advanced trading, lending, and staking platforms
are redefining how we interact with money.
Explore Arbitrum's rapidly growing gaming hub
from immersed role-playing games,
fast-paced fantasy MMOs,
to casual luck-battle mobile games.
Move assets effortlessly between chains
and access the ecosystem with ease
via Arbitrum's expansive network of bridges and onrifts.
Step into Arbitrum's flourishing NFT and creators-based,
where artists, collectors, and social converge
and support your favorite streamers all on chain.
Find new and trending apps
and learn how to earn rewards across the Arbitrum ecosystem
with limited time campaigns from your favorite projects.
Empower your future with Arbitrum.
Visit portal.arbitrum.io to find out what's next on your web-free journey.
Thello is transitioning from a mobile-first,
EVM-compatible Layer 1 blockchain to a high-performance Ethereum Layer 2
built on OP-Stack with EigenDA and One Block Finality,
all happening soon with a hard-finding
With over 600 million total transactions, 12 million weekly transactions, and 750,000 daily active users,
Sellow's meteoric rise would place it among one of the top layer 2s, built for the real world,
and optimized for fast, low-cost global payments.
As the home of the stable coins, Sellow hosts 13 native stable coins across seven different currencies,
including native USDT on Opera MiniPay, and with over 4 million users in Africa alone.
In November, stablecoin volumes hit $6.8 billion, made for seamless on-chain FX trading.
Plus, users can pay gas with ERC 20 tokens like USDT and USDC and send crypto to phone numbers in seconds.
But why should you care about Selo's transition to a layer two?
Layer 2's Unify Ethereum.
L1's fragmented.
By becoming a layer two, Sello leads the way for other EVM-compatible layer ones to follow.
Follow Sello on X and witness the great Sello happening where Sello cuts its inflation in half as it enters its layer two era and continuing its environmental leadership.
So, Jeffrey, I just want to kind of zoom out and summate everything.
We have this new model, this deep seek model, which is 45 times more efficient than,
than, you know, chat GPT or other competitive models.
That's caused a repricing in VDIA because people think like, oh, wow, 45 times more efficient.
We just need much less hardware in order to make that outcome happen.
It's just we're getting more from less hardware.
And so maybe we've been overpricing a hardware.
And that's what has shocked the market with a repricing of Nvidia.
And then also now, Open AI, Sam Altman, are getting a squeeze because,
because DeepSeek is charging 95% less money for inference requests.
But my broad question to you is like, well, isn't this the expected outcome?
Like, AI and AI technology is on a very steep curve.
And we're seeing, you know, breakthrough efficiency gains across the complete tech stack,
whether it's hardware or the models, we've always known, like AI is going to accelerate very,
very quickly.
And isn't this just what this looks like?
Isn't this kind of the expected outcome here?
like, of course we're going to get more efficient.
That's how technology works.
Like, why is everyone surprised?
I mean, it's clearly not the expected outcome because the stock wouldn't have moved so much.
I mean, it was the expected outcome for me, which is what I wrote my article.
But I think the answer is that everyone does expect progress, progress on the hardware front,
that every year the chips are going to get faster and bigger, progress on the algorithmic front,
that you're going to come up with a better way to train the models or do inference that it's going
to make things faster. I mean, when these LMs first really came out a couple years ago,
they had a much more limited context window, like the amount of text you could put into them.
That has gone up dramatically. And originally everyone thought that that was going to be really
hard to make that be able to go up because they thought it was going to dramatically increase
the amount of memory. But people came up with really brilliant inventions to, to, to,
you know, new algorithms to make it faster.
And so people do expect some level of algorithmic improvement,
some level of hardware improvement every year.
But they expect it to be, you know,
like a Moore's Law-type progression where it's somewhat predictable.
And what really catches people off guard are step function changes
where overnight it goes, no.
So like if the news was that they tripled efficiency, you know,
That would be, you know, I mean, can you imagine, like, if you made an air conditioner that was three times more energy efficient, like you'd crush them, you would get huge market share.
Tripling something, it's like in any normal, you know, if you had something with triple the mileage for your car, like, that would do great.
But, like, we've become so sort of used to it in technology that it's like, you know, but 45 times, okay, now we're talking here.
That's really crazy.
And so when that happens overnight in a way that people didn't anticipate, that's when you get this sort of shocking thing.
And, you know, the thing is like, you know, there's this expression of being priced to perfection.
Like the invidious share price, it only looked reasonable to people who extrapolated these curves out.
And like, you have to be very careful when you extrapolate revenue growth that has been going at 120% year over,
year. And again, it's not just revenues. It's about the margins. And they were basically saying
that the margins would maintain and the revenues would keep growing at this incredible rate.
And as a result, yeah, sure, this is, and that's why every single, every single investment bank,
basically, had a strong buy on Nvidia. All of them. They all got caught completely off sides
with this thing. They were all like scrambling, honestly, to read my article and they're like,
you know, I got inbound request from some investment banks to like help to, because nobody
even wants to talk to their analysts about this. They want to talk to experts. And so they're
scrambling to find experts, not that I'm even an expert, but compared to the equity analysts
at the sales side, apparently I am. And so it was not expected at all, like that it would
happen like step function change like this. And that's what just is like the,
the body blow to the stock is that, hey, this thing was pricing in, you know, clear skies.
And then it's like, all of a sudden it's like, oh, there actually are these threats.
And then again, it's not just the deep seek.
It's like the people were ignoring a lot of these other threats.
And I don't know why because they're literally, these are people where their full-time job is to cover
Nvidia for Goldman Sachs and Morgan Stanley.
And I don't know what the hell they're doing that they didn't, you know, how come they were
talking about, you know, the competitive threats to Kuda.
or like, you know, Cerebrus and GROC.
And maybe they mentioned it,
but they certainly didn't figure out
that this actually is going to be really important.
And with the step function change.
It's not just a step function improvement
because it's also a step function improvement
in a slightly different direction
than what the market was thinking, correct?
It's not, we aren't just skipping ahead on Moore's Law.
We're also going in a different direction.
Well, it's additive to everything else.
It's like you are going to have faster chips next year.
You are going to have more chips next year.
You know, like you always.
are going to have other algorithmic improvements on the margin. But on top of that now, every big
AI lab in the world is going to be, you know, the Lama team at META, the Anthropic guys.
You better believe Zuck has brought these guys into his office and said, we need to use every
one of the tricks these guys are using for Lama 4. Yeah. So like as a consumer of AI products,
It's great.
If you're not exposed to, you know,
NVIDIA, if you don't have open AI equity,
private equity, if you are just a consumer,
you're stoked.
Oh, God.
The products coming down the pipeline
are going to be sick in a very short order.
Oh, and not only just that,
but from a standpoint of like,
you'll be able to run this shit on your own computer.
Like, you get like a $1,000 Mac laptop.
You're going to be able to have like AGI
AI on your computer on tap privately.
And it's like, it's the most miraculous
thing ever. I mean, no one would have believed this even a few years ago.
Is that why Apple is up on the week? Because I think I saw Apple being up three or four
percent when Nvidia was down on its 20 percent. Apple is one of the guys that actually,
it's so funny because Amazon and Microsoft and Open AI, they're all like trumpeting in these
big press releases about their custom chips that they're making. And, you know, Apple's so different.
Apple's like so secretive. And it's like, but you know they have like one of the best silicon
teams in the world.
But they only announce something
if it's like they're ready to sell it to consumers.
If they're making chips internally for their own uses,
like no one even freaking knows about it.
And all the people who do know about it
are like signed up with NDAs like and no one talks about it.
And for all we know, they have pretty fucking awesome chips already.
And so, but they're essentially like users of AI.
You know, so it's good for them.
It means that they'll be able to use some of these tricks to make some of these models.
In fact, there's an app you can get, I think it's called Apollo, on the App Store, that lets you download these models.
And if you have like an iPhone 16 Pro or something, you can just run this thing.
And you could be on an airplane or whatever with no internet in a bunker somewhere and have essentially, you know, not quite AGI on tap, but like, you know, certainly like smarter than most college students on a lot of topics.
and it's wild to see it go.
You know, you could go into airplane mode
and be like asking it all these questions
about, you know, chemistry and physics and history
and it'll give you really good responses
at a reasonable, you know, pace.
And so, yeah, it's good for Apple.
It's good for Apple.
I think it's ultimately good for meta,
which is why meta stock wasn't down.
Right.
You know, so it's not, it's not a bad thing.
It's just bad.
It's so far as, again, it's a recalibration
But, you know, I do think it was excessive for what in one day that the whole, you know,
two trillion of capital wiped out.
But it's, it's, it's, I'm not, I'm not saying that you should be buying the dip in Nvidia, though,
because I think it did get ahead of itself and it could still, look, it could fall to two trillion.
And, you know, two trillion is still a lot of money.
Okay.
Like, this is a company that earned, you know, like five billion dollars like, you know, a few years ago.
So that's still, you know, quite a big valuation.
Jeffrey, there's one last conversation before I let you go is the conversation of synthetic data.
And this, I think, comes from just having stronger and better models,
creates this notion of synthetic data.
And this is also like part of the equation of like the rebalancing of how people value things.
Can you just walk us through this synthetic data conversation?
What is synthetic data?
What do different and stronger models?
models have to do with synthetic data? And what does it mean for the overall supply chain of AI?
Well, I'm not so sure that it, I mean, I think it's an important concept. I'm not sure how
much it applies to sort of those things. What it is is that, you know, when you're training
these models, the pre-training that actually makes the model smart, it's partly a function of
how much compute you apply, you know, how many GPUs and how fast they are, but it's also the amount
and quality of the data that you're training on.
When DeepSeek says we use 15 trillion tokens
in our training set, that's what they're talking about.
And the thing is, it's like there's only so much data
that's of high enough quality
that you'd even want to use it to train a model out there.
Like, if you take all of Wikipedia,
I don't know how many tokens that is,
but it's like not that many.
You know, it's like less than, you know,
It's measured in the low single-digit billions.
Not even close, actually.
Sorry, maybe billions, yeah.
But it's like if you take all the books out there, we're talking really like it's just like a couple trillion.
Like if you talked about like all the newspapers that have ever been written, it's a couple trillion.
But what you're saying is the quality data that's out there is a processable amount of data.
No, no, I'm saying that we're running out of data.
We're running out of data.
Like that people write smart books, like, you know, they're not writing, you know, the books fast enough, basically, to keep supplying us with more and more data.
And so that's a big wall that we've been facing.
Like, how are we going to keep improving the models if we're not going to be able to scale up the data that they're using?
And people say, oh, but you could just take every YouTube video.
But it's like, have you seen most YouTube videos?
It's not going to make your model smarter.
No, it's going to make it dumber.
No.
And so, but there is an exception to this rule.
And so the exception, now, so synthetic data is using an LLM to generate text and then turning around and training a new model on that text.
And so that sounds like very circular.
Like how, it's like me trying to teach myself in a room without a book or anything, just talking to myself and I'm going to teach myself.
Like, how is that supposed to work like in terms of getting new information?
Isn't that in a sense almost getting high on your own supply that it's not going to,
help you. And that is sort of true if, let's say, you're talking about, like, you know, the history
of the Peloponnesian War or something, you're not going to get anything new by just regurgitating
your own output. The exception to all that is if you're talking about, like, logic, math,
computer programs, because in those things, the big difference is that you can verify that what
you said is correct. So, you know, just like the rules of chess are very simple, but it's like almost
unlimited complexity of the possible chess games. It's the same thing. Like, there are so many possible
simple Python programs that are like 100 lines or less that, you know, we've only ever seen a tiny
subset of them. So you could come up with a, you could say, oh, I want to make a Python program that
does XYZ, generate a candidate, and then test and be like, well, when I run it, did I get that
output? And if you did, you know that the program's right. And so now you can say, okay, well, let me
add that to the training set. And it wasn't in the training set originally, but it is correct
and good. And so what you can do is you could start exploring the world of like all possible math
theorems and working out, you know, all these math proofs, verify that they're right and then add them
to the training set. And in that way, you could basically come up with lots of data that's known to
be super high quality. And that's why these models are getting better at logic and math.
at a much faster rate than they're getting better at anything else.
Because you could sort of just keep cranking
and getting this synthetic training data
and then the scaling can just keep going forever.
So that's why it's sort of funny that
which jobs are most at risk from AI,
you know, I think a lot of people thought it was like,
well, we're still going to need people who are like
really, really smart, quantitative stuff.
And it's like, I got news for you.
That's like the thing that they're going to become superhuman at before anything else.
Because like you're still going to want to read the history book by like a really smart human before you read, you know, the AI's history book.
But like the AI mathematician might, you know, keep pretty good, you know, two years from now, like three years from now.
Jeffrey, we heavily talked about your article, which I'll have linked in the show notes.
and people want to go read that for themselves firsthand.
But also just tell us a little bit more about you,
like where you come from, what you do,
what else you're working on.
Sure.
So, in my day job for the last couple years,
I founded the CEO of Pastel Network,
which is actually a crypto project.
PSL is our ticker.
We trade on a few exchanges like MECC and Gate.
And so we started out as sort of,
NFT platform. It's an interesting project. It's based on the Bitcoin core proof of work concept,
but with all this additional layers to it. And but sort of in the last year, we've done a big pivot
to decentralized AI inference. And so I've written just tremendous amount of code in the last year
to essentially let you do inference across all sorts of modalities, all sorts of providers, all sorts of providers,
of AI models, including totally uncensored models.
And you don't have to docks yourself by giving an email address and a credit card and your IP address.
You can just pay with crypto, and it's all pseudonymous.
And it's decentralized.
It's going, all the inferences being handled by these supernotes that anyone can start and themselves.
And you can even, I mean, the example I like to joke about is you can use one of these uncensored versions of the Lama models and say, like, how do I make meth at home?
And this will actually just tell you exactly the recipe, whereas, you know, good luck trying that on chat GPT or claws.
Is this, would you call this the sovereign AI sector?
Yeah, yeah.
But it's really sort of decentralized.
I mean, part of the thinking of it was that, you know, for me is not necessarily on the consumer level.
like chat cpt, although I did make something like that.
If you go to inference.pastel.network, you can try it all in a browser and you could do inference across all these models.
But it's also that to make it sort of an API that if you have another crypto project, like let's say you have a prediction market and you want to make it so anyone can, in a decentralized way, create their own prediction event.
But you want to have some rules around that.
Like you don't want people to make like assassination markets where they're predicting that somebody's going to die by a certain date.
So you need to have some kind of moderation.
You don't want to necessarily have it be that there's a moderator who has like power to delete stuff, right?
Because how's that decentralized?
So but so the, I think the better way to implement something like that is to have an LLM do it in a totally impartial way where we have this prompt that says,
you're not allowed to do an event about, you know,
involving any of these subjects.
And then you have the user wants to create their prediction event.
They have to describe what is being predicted.
At the time they're trying to create that event in the system,
it's going to show it to an LM.
The LM's going to say yes or no.
And based on that, they're going to say,
no, you can't do this.
You have to change it.
Now, if you have this prediction market that's decentralized,
you can't really go and use OpenAI or Claude for this
because that requires an API key hooked up to a credit card
which means that's not decentralized.
It can't work like that.
It has to actually be decentralized.
So that's the idea of pastel is that they could use pastel
and they can say with a straight face in all honesty
that this is decentralized right down the line
that this is fully decentralized
and that it can never be shut down by just,
turning off this one API key or credit card or, you know.
And so that's the basic idea.
And then I have some other side projects like my YouTube transcript optimizer,
which is where I published.
People are very confused.
Why is this not on Medium or something or a substack?
And I'm like, sorry.
It's so funny because I basically was trying to, you know,
help my organic search ranking of my little YouTube tool,
which, you know, I've generated like $1,000 or something.
of revenue from it.
And then it's like in the process, I may have inadvertently contributed to $2 trillion
getting wiped off global equity markets.
Because, you know, the fact is, I really, look, I don't want people to say that I'm some
melaniac here, but the fact of the matter is all of the news headlines came out saying
the stock market crash because deep seek.
And it's like, I'd like to point out that the Deepseek V3 technical paper, the top
talked about the efficiency gains. It came out December 27th.
A month ago.
That's a month ago.
And all these famous people like Andre Carpathie were all over this, talking about this
weeks ago on Twitter. Even the newer model, the R1 model that does the chain of thought,
that paper came out a week ago, and people were all over that.
So why suddenly on Monday did everything crash?
And I'd like to think it's because I'm pretty sure it is that I wrote this article
in a way that sort of speaks to hedge fund managers.
So they can understand it.
And I published it like in the middle, you know,
the night on Friday.
And then it started taking off.
And then it got shared by Chalmuth,
who has, you know, whatever, 1.8 million.
Right?
And Chimoth, and it's been viewed over two million times.
Naval.
Navarav account has two and a half million.
And then like the Y Combinator,
Gary Tan,
and the Y Combinator account.
between them, they have millions of followers.
And not only did they share it, but they were like very effusive in their praise about
this is really smart.
And that went crazy.
And I can tell you, I have been inundated by, you know, requests from huge funds that
want to talk to me about this.
And I believe that it did, in fact, you know, as crazy as it sounds, precipitate the decline.
Obviously, I didn't cause it.
It was caused by the underlying situation.
But in terms of highlighting it, it didn't come from the, you know, investment banks.
And I think part of the problem is just people like are talking in different circles.
Like they're not like the people who are buying Nvidia, you know, with billions of dollars at a big fund,
are not reading the technical papers.
And they're not even necessarily reading the tweets from Andre Carpathie.
You know, they're just relying on sort of this consensus of.
of where things are going.
And all it took was sort of a really in-depth explanation
that made sense to them.
And they were like, holy shit, I didn't know this.
And you know, can I say one other funny thing is I have,
because it's running on my own blog, I have Google Analytics.
I can see real time, like who's, you know, not who, but where they are.
And it's so funny because I just, when it started going,
at first I was so thrilled that 50 people were reading it at once.
And then it was like, before I knew it was like 1,500.
people at any given moment. And it's a 60-minute read. It's 12,000 words, so it's not short.
And, but at first it was like mostly guys in New York, because that's where all the hedge ones are.
And, but then I noticed right before I fell asleep on Saturday night that, you know,
where the biggest place where people were reading it was San Jose. And I'm like, who? That sounds
like like where Nvidia is based. And because there were like hundreds of people from San Jose
reading the thing at the same time.
And as of yesterday is when I last checked, over 2,000 people from San Jose read my article.
And, you know, the funny thing about Nvidia is that they've gone up so much that something like 80% of the employees have more than like $10 million with the stock.
And you know it's like the main thing that they talk about with their spouse and friends of like, man, I have a lot of the stock.
Should I keep staying on for this ride?
And they understand the technology, but maybe they don't.
don't understand how to value a company.
Right.
And they read this, and this thing started passing around like wildfire.
And I was like, oh, my God, I bet you, Jensen's reading this, too.
And I think there's a lot of stock that sort of never hit the market because it was awarded
as RSUs and options to these people.
And it only takes a little bit of that on the margin to start causing imbalances.
And so I wouldn't be surprised if a lot of that cell pressure came from Nvidia employees.
But also it's like these big hedge funds
like control a lot of this.
A lot of fast money players
and they suddenly got spooked
and it's like,
so it's wild to think about that,
you know,
it could have actually been this sort of,
you know,
the Reichstag fire, if you will,
of setting off this whole course of events.
But I actually do believe
without it being so, you know,
I mean, I'm sure there people will say,
no, about this other guy wrote this
and this other guy wrote this.
And I was like, yeah,
but my thing went pretty freaking viral.
And from the right people.
And other people's stuff cited yours, your article.
Sure.
Or, you know, maybe they didn't.
I mean, I saw the guy, Ben Thompson from Statitri.
It sort of sounded like he paraphrased my thing without giving me any credit, but whatever.
You know, it's, but, but I just think it's really funny how, like, there's headline stories today from the New York Times and the Wall Street Journal that both said, you know, they're always, they're always trying to assign causality to stuff.
And they said it was caused by this.
And I was like, not really that because it's a ludicrous concept that the 45x efficiency gains, that was known a month ago.
So you have to explain why there was a one month lag.
Okay.
Whereas like this is very understandable that this spread like wildfire from thought leaders like Jamath and Naval.
I mean, Naval is like put on such a pedestal by the VC guys.
And you know, and the tech hedge fund guys look up to the VC guys.
like Andresa Narawitz and all these guys
and the Ycombinator guys.
They're the experts, right?
And then you have those guys
saying that this is a great article
and it's like, well, okay.
And so of course that can very quickly convince
and it's not like you have to convince everyone.
You just have to convince, you know,
the guys that like KOTU
who are managing $70 billion
that they should maybe like sell a little bit
to get in front of this.
And you're, you know, that's all you need.
And so anyway,
I email both of the journalist that at least, you know, you should be aware that you may have gotten the causality a little rowing on this.
But anyway.
Well, Jeffrey, it's an honor to have the original source of the information on the podcast.
It was great to have you as a guest.
And perhaps as these AI wars, NVIDIA, chip wars, USA, we didn't even get a chance to talk about USA versus China.
But as this progresses, maybe we can get you back on to just keep on commentating.
Yeah, I really appreciate you coming on.
Great.
Thanks a lot.
Bankless Nation, you guys know the deal.
Crypto is risky.
You could lose what you put in.
But it sounds like the traditional market
is also risky too.
But we're headed west.
This is the frontier.
It's not for everyone,
but we're glad you are with us
on the bankless journey.
Thanks a lot.
