Limitless Podcast - They Gave AI Models $10,000 Each. Then They Had a Trading Competition.
Episode Date: October 21, 2025In this episode, we explore an exciting AI trading competition featuring six models, each tasked with maximizing returns using $10,000 on the Hyperliquid platform. Grok returned 500% in a day... return and compare aggressive versus cautious trading strategies. They challenge traditional benchmarks with real-world performance insights and engage the audience in predicting the competition's outcome. This episode highlights the future potential of AI in finance and its accessibility for all.------🌳KGEN | REQUEST A DEMO https://bankless.cc/KGEN🌌 SORA CODES: DM US A SCREENSHOT ⬇️https://x.com/LimitlessFT------TIMESTAMPS0:14 AI Trading Experiment Unleashed1:24 The Rise of Grok3:47 Insights into Market Psychology6:48 Trading Platforms and Strategies11:33 Analyzing AI Trading Methods13:14 Risk and Leverage in Trading17:31 Observations on Model Performance22:24 Predictions for Future Performance25:36 Audience Engagement and Closing Remarks------RESOURCESJosh: https://x.com/Josh_KaleEjaaz: https://x.com/cryptopunk7213------Not financial or tax advice. See our investment disclosures here:https://www.bankless.com/disclosures
Transcript
Discussion (0)
Imagine you went to sleep one night and woke up to your investment portfolio up 500%,
but not because you're a genius stock picker, but because you let an AI model trade for you.
Well, this week, that's exactly what happened.
Someone gave GROC and five other frontier AI models $10,000 each with one specific goal in
mind, make as much money as you can through trading.
And Josh, some of the returns these models have made are insane.
They're beating some of the top hedge funds in the world.
So in this episode, Josh and I are going to cover how these AI models are trading.
Most importantly, what trade specifically they're making?
And most importantly, is AI finally smart enough to make us all rich?
Josh, who's behind this experiment?
And are we getting ripped?
In addition to this, I do want to highlight an important thing that this is doing is there
is a new benchmark in town.
I think frequently we talk about how we are just so sick of benchmarks.
They can be gamified.
They can be rigged.
What's happening here is a whole new paradigm for benchmarking, which is putting your money
where your mouth is, getting some exposure, doing so publicly, and actually using these models
to invest in public markets and seeing how they perform. And that's what makes this company
end of one so special, is their whole purpose is to build a new paradigm for benchmarking by
allowing you to do so in the public with the second order effects of possibly making you a
little bit of money. So there's a lot of really interesting dynamics at play here. The website's
really pretty. I've been going through all of this stuff after you sent us to me, EGS, but one of the
things that was most shocking to me was the actual returns that they were able to get,
starting with, I mean, one of our favorites, GROC, GROC has this crazy return.
What, how did it get 500%? Is that right?
Yep, yep. So what I'm about to tell you is crazy. The GROC AI model was given 200 bucks in this
little pre-training experiment to trade and make as much money as it can. And it made 500% in
returns in one single day. So as you can see here, it absolutely obliterated the entirety of its
competition. It beat GPT5. It beat the latest Claude model. If you look at this chart, it is just
completely vertical. And what I loved about this, Josh, was the strategy that it employed. It was
completely different from the other AI models and it was super smart what it did. Initially, it looked
at the macro market and realized that things weren't looking to
great. So it went short on the entire market, Josh. It went 20x short BTC, it went 20x short
ETH, and made a bunch of money doing that. But then after about 12 hours of doing this,
it realized that it should probably go long. And so it did a bunch of analysis through its own data
and went the opposite on its trade and ended up predicting exactly when the market was going to
bounce and made a ton of money since then. That's why this chart is vertical. I'm going to need to
start putting money in AI bots. I find it really fascinating that. One, the amount of risk that
it was comfortable putting on. And then two, the amount of success it actually had and how well it worked,
it leads me to wonder, like, if an AI model is just kind of a compression of human intelligence
derived through the internet. Markets are very emotionally driven engines. Is it able to kind of front-run
that emotion by removing the emotion? I mean, naturally, a large language model is they have no
emotion, but they have all of this data about how we function. I mean, maybe this is a new
investing paradigm, right? People can actually start making money. Josh, I actually think you hit the nail
on the head and it's also my pet theory as well. The reason why GROC is the best at trading in this instance
is because it has access to all of X's data, all the tweets that we put out. So it has an idea of
how the market and psychology of the market is thinking at any given moment. So it makes trades based on
that. So if everyone's super bullish, maybe it's like, I'm going to short the market. When everyone's
super bearish, they're like, oh, I'm going to go 20x long on BTC or whatever asset. That's amazing.
If the XAI team wasn't working so hard on AGI, they could have a really compelling product
by creating a trading engine using all the real-time data from X. This is done on Hyperliquid,
which is a crypto trading platform. So it's interesting to see that there is crypto integrated
in this. And speaking of crypto, we do have a sponsor today that we do need to get to. That is
Crypto-adjacent named KeyGen. KeyGen is building the world's largest verified distribution protocol,
aka Verify. What is Verify? It focuses on ensuring only real high-quality users participate in
digital platforms addressing issues like fake accounts, bots, and fraudulent activity. How?
This protocol uses advanced biometrics and fraud protection technologies to block fake users.
The system is essentially for AI and consumer app developers who require high-quality,
fraud-free data, and engagement to train models and expand their products globally.
They're already used by over 200 plus clients across AI gaming, defy, and consumer apps.
So if you need to train your AI on real data, check out KeyGen.
We will have a link in their show notes.
Thank you, Kijan, for sponsoring the show.
And now let's talk about more real AI.
They gave $10,000 to each one of these models, right?
So that's how this was starting.
So how many models are there?
How much money is on play in total?
And then where is everyone at currently in this race?
Yes.
So to give a bit more background, this new AI lab,
gave six AI models, all models that you've heard of before. We're talking about Grock 4,
we're talking about Anthropics, Claude. We're talking about ChatGPT-5 and some Chinese models,
some competitors, Kwan and Deep Seek, $10,000 each. And with the specific goal of making as much
money as they can through trading only six assets. So in this initial part of the competition,
they're allowed to trade for three weeks, six assets only. And as you mentioned earlier, Josh,
they're trading specifically on a platform, a trading platform called Hyperliquid, which gives them
access to do all these things. And Josh, I think aside from some of the returns that these models have
made, the website is also super impressive because I've just been glued to it like for the entire weekend.
It is this really interactive thing where you can kind of see the performance of models. And it's
pretty clear who's doing well at this point and who isn't. The three, for those of you who are
just listening, I'm circling three models right now, which is deep seek, which is right at the right at the top with
12th, almost $13,000. Remember, they started with $10,000. So that's a pretty impressive return for over
two days so far. Right beneath it, tailing it is Grok coming in at $12,500. And catching up,
a surprise winner from this morning actually, Claude Anthropic, who was originally down and
underperforming is catching up to Grok just beneath it at $12,200 at this moment of speaking.
Do you have any initial takes on this? And do you have any explanation as to why chat cheap
pt at Google Gemini have lost almost $3,000 to date.
Well, Chachyipati, it's still thinking of its next trade.
It'll be thinking for the next day.
So it's still figuring it out.
Chachyipti really loves to think really long and hard about its decision.
So we'll hear more from Chachybitty in a day from now.
It's funny, if I were to pick the top three, without seeing this chart, I probably would
have picked GROC, I would have picked Deep Seek, and I would have picked Claude.
And the reason is because, well, one, GROC is on a hinged, super high risk.
tolerance, super like large access to current trends, rip in trades. Deepseek, which I saw earlier,
they had like this crazy leverage trade put on. Like deep seek is just like the Chinese Dgen model.
And what I like about these models is they're not really filtered. When you talk to GROC, when you
talk to Deepseek, they just kind of give you the answers. They don't kind of fill it up with all
of this like, you know, additional filler and complementary words. And Claude is the most technical
of all of these. Like when people use Clod, when people think of Anthropic, they think of code.
And code, I mean, it just seems like it maps better to investing than general purpose large
language models.
I would have predicted chat chbt to be low just because chat chpti is so, oh, you're so right.
Oh, the market should go up.
Oh, I'm down 20%.
It's fine.
Like, we're going to be okay.
We'll make it up.
It's just not like a, it's not who I want to go to to seek truth, I guess, is a good way
of framing it.
Now, Gem and I last place, I got to say, I'm a little disappointed because, I mean, we've been
very big Google Bowls recently. And seeing Gemini all the way down there at the bottom,
it does make me sad. I wonder, I don't really have a good explanation of finder at the bottom.
But it is interesting to see, if we're comparing countries, Quinn and Alibaba are first and
fourth place. And the U.S. is getting two, three, five, and six. So nice little distribution
there. I don't know. Do you have any guttakes, initial reactions on the placing of these?
that's really interesting how you prescribed a personality to each of these, Josh.
I would agree, but I actually have some disagreements with you.
I would have never expected Claude to be up there.
Because I always thought Claude was kind of a knock.
Like, it's just this like nerd that doesn't kind of really pay attention,
tries to follow like the rules too much.
And to your point, like Grock is kind of like the rebel, right?
It kind of like high risk take it.
Totally unhinged.
Quite unhinged.
I don't see Claude kind of as that.
it seems to me as a rule follower.
But your point around coding kind of makes sense.
Like I'm trying to now imagine if Claude was a human,
it would be some kind of quantitative trader,
some Algo trader that works out like a hedge fund.
I completely get that.
The chat GPT thing makes so much sense.
It's agreeable.
It's high, sicker fancy.
If I say, oh my God, I really fancy jumping out my window.
And for those of you don't know,
I'm like on a high floor building right now,
it would be like, yeah, you know what?
that sounds like a really good idea. So it's a sheep. It makes sense, right? To my early
example, like if it sees everyone on X being super bullish, it's probably like, oh my God,
I should go 20x long. And ironically, that's what it did with the markets were tanking.
And it like lost a bunch of money by doing that. So I think that there's something interesting here.
The deep seek being number one makes a lot of sense to me. You know why, Josh? Because it was a,
it was a hedge fund that created deep seek. Do you remember? It was a head.
fun team that developed the deep seek model. So it actually doesn't surprise me at all. If I'm
probably not mistaken, this model was probably trained on a ton of quantitative trading historical
data. So it doesn't surprise me at all. In fact, I expected to win. Interesting. Okay. One of the
things that is most interesting to me actually is the returns that they're getting this quickly.
I mean, this competition, it looks like this chart started October 17th, which is,
is not that long ago. At the time of recording today, it's October 20th. It's been 72 hours.
And over the course of 72 hours, we do see these kind of patterns that are happening.
But what is it? From 10th, Deep Seek is up 30% in three days. They're averaging 10% return
a day. And Grog is right behind and Claude is right behind. So they're very clearly open to risk
and taking large, oh my God, look at this Grock position. It's 10x long on ripple, 10x long on
Doge 20 times long on Bitcoin. If you're seeing this, I would never advise you to ever use this
much leverage on a position. It's not financial advice. It's really fascinating to see though,
because you would imagine there is some sort of game theory optimal trading strategy that they're
running in their systems based on what they know. And it looks like almost all of these are
long using leverage. So it's interesting to diagnose the positions.
Well, should we, should we maybe dig into like some of the trading strategy?
that these guys are doing.
Josh, because you make up a good point.
Like who's taking risk and who's analyzing the market versus who's playing kind of conservative.
So let's run through a few tweets that we have here, which kind of like have takeaways from this.
So in this initial tweet, it highlights that Grock and Deepseek, who are currently number two and number one in this competition, went max long immediately when the competition started.
So that means they went like 10 to 20x leverage, which is extremely.
extremely high leverage and high risk bet to make with most of their notional capital.
So most of their 10K that they used, they're just like, I'm going to slam this in a 20x long for
BTC and for Ethan for XRP.
And today's rally has put them in the lead.
So, hey, no risk, no reward.
And they just kind of like went full in.
GBT and Gemini, interestingly, went the complete opposite.
They went hard, but they short it instead of longing.
And this might be for a few reasons, Josh.
So what this tells me initially is I think GROC and Deepseek are trained on better financial data and are better at evaluating markets for trading than GPT and Gemini.
So GPT and Gemini might be great generalized models, but what this reminds me of is they haven't been given access to social media data like X and maybe Deepseek has.
I remember reading Deepseek was trained on a bunch of X and Reddit data as well.
So they probably have an idea of the psychology of a trader's mind, whereas it's kind of obvious to me that GPT and Gemini kind of don't.
I'm really curious on the technicals of how they're actually ingesting information and making traits.
Is there some guy who's just typing like commands into a terminal like, hey, here's the current price.
Let me know if you want to make any trades.
Like, what is it?
I'm curious what's actually going on behind the scenes because they are language models.
They do need to be getting some sort of real time data, right?
I wonder what that prompting structure looks like in order to a new thing.
them to make these trades and make these decisions.
There is one side thing that I also find interesting is that Chatsyipt and Gemini both chose
to short right off the bat.
And I wonder if that has anything to do with their, just like their emotional sentiment.
Whereas like, does that lead to like a model that is slightly more conservative, slightly more
like reserved in how they deploy capital and how they make these decisions versus the others
that just went max long right away?
They're just like, no, we're going up.
We are positive, optimistic.
I wonder if it's a reflection of kind of like the sentiment of the model as well, which is
another basis for speculation.
Josh, that's a really interesting take because that just reminded me of an experiment that
we spoke about on an episode.
I think it's like three months ago, which is an eternity ago, Josh, where a bunch of
researchers ran an experiment similar to this, but the goal was, you.
have a group project to pitch to me how you're going to make money. So they weren't given money.
They were just asked to make a presentation or an argument as to how they would make money,
similar to like a university or college group project that you're asked in like financial
business school, right? And we saw the opposite behavior happen with these models back then.
So GPT and Gemini were actually the most proactive. They did all the research. They did a bunch of
analysis and they created this really kind of like primrose perfect looking framework and theory
of how they were going to do it. Meanwhile, I remember reading that GROC and I don't know whether
there was some Chinese models, but I remember GROC was kind of just like chilling, didn't really
do much, and said that it kind of kept on putting off the work. So it's this kind of interesting thing
where when it comes into the actual practicality of the task, it seems that GROC and these Chinese
models are way more kind of like gung-ho and they want to do the thing versus plan and strategize
and write like some kind of theory around it. I don't know if that's relevant at all to this,
but I just find it interesting when it comes to like the behavior and personalities that you talk
about. Yeah, and there's got to be some significant differences in the prompting of them too,
because I'm going through these positions and every single model right now has leverage of
at least 10x. Like no model here has less than 10x leverage on a single position. So they're using
this outrageous amount of risk. And I have to imagine it's,
It's not because they were trained that way because I imagine most of the average people,
they're not trading on 10x leverage.
So I wonder what type of prompting happened in the back end to compel them to want to use
this level of risk at all times.
But it's a fascinating experiment, I guess, playing around with these benchmarks and seeing
different ways to do it in public.
Like I like one that there's skin in the game that I mentioned earlier, but I also like
the fact that it's just publicly verifiable and not gameable because you're competing in
public open markets with everybody else.
and your AI really has to assert its dominance over other humans who are live and thinking
versus this controlled math set that has a very fixed array of outcomes.
And the dynamic thinking really is it's something noteworthy.
And I think this is a really cool trend that I hope more people do.
It's just public benchmarks that are much more difficult to game than others.
Yeah, I couldn't agree with you more.
I've said for a long while now that AI shouldn't just be your
favorite knowledge worker or your assistant that teaches you about the world, it should do things
for you. And the most practical thing for most people is like, how do I survive? How do I make money
from my living? Like, can you help me make more money? And finances always appeal to me in my
background is from crypto. Josh, you and I have covered that topic for a while now. And so seeing
something like this kind of like gets me really excited, you know, like what other AI tools are out
there that can give you a 42% return over the weekend, Josh, which is exactly what Deepseek did
this weekend. It's just kind of insane. Another thing that I find really interesting is just kind of
like some of the stats that come from this. I've got a tweet pulled up here, which kind of gives
you the overview of some of the strategies that these models are doing. So I'm going to read through
a few here. Number one, leverage has been normalized across all models. That's what you were
just saying, Josh, high leverage in particular. With Gemini 2.5,000. With Gemini 2.5,000.
pro going ham on leverage yesterday, 15x or so, which is just absolutely insane.
Quen 3 Max, which is a Chinese model and Gemini 2.5 Pro, don't seem to be the best executors.
They're paying a ton of fees for their position.
So what that means is when you open up a leverage position, you're often paying fees back to the platform as kind of like a tax or stamp duty for opening your position.
Quent 3 Max and Gemini 2.5 Pro are also the only ones with closed position.
that are profitable, but also lost too much on other trades, causing them to still be in a loss.
So what it's talking about here is when you open up a position for leverage, you can close
it for either a profit or a loss. The winners or the leaders right now, Deep Seek and GROC haven't
closed. They just went max long from the start, and they haven't relented yet. They haven't closed.
So what I want to point out there is they could still lose it all. They haven't booked in that
profit at all. So I don't know. I'm kind of noticed this could all change in a matter of
of an hour, Josh. What do you think? Okay, I want to, as we kind of wrap this out,
because I want to play as bets. I want us to play our own bets because nobody's been liquidated
yet. And with the amount of risk they're taking, it is inevitable. It is only a matter of time. So I want
to ask you first, who is the model you think most likely to be liquidated? Because, yeah,
we're looking at a post here where the club was taking a max long position. I think it's going to be
grok. I'm being honest with you. I don't want to say, but I think it's going to be grok. The leverage
it's using, here's my reason. The leverage it's using is insane. It is more extreme than Deepseek,
which is number one. So to me, as a trader myself, that seems to me is it's a revenge trading
method. It's mad. It's going ham and it is like driving fast. It's drunk. It might wrap
around a telephone pole. I don't know, but it's going for it. So it might result in a major
win, but it might result in a major loss. So I'm going with Grok. Grog's going to get liquidated.
That's a good take. I think since you're going with Grock, I'll go with Deepseek. I just got to go for
the crazy, highly volatile positions.
They're taking just outrageously long leveraged positions and like something's going to go wrong.
Like just two weeks ago, what, the crypto market dropped.
Like, we liquidated 30 something billion dollars in like an hour.
These things happen and they move frequently.
And granted, an AI is running this.
So I assume there's not a lot of latency between their decision and the trade.
But like, this is a huge amount of leverage to be locked on in every single position.
Like if one of these coins gets wiped for like 10% even, you've lost 100% of your money.
So I'll probably go with Deep Seek.
Over the next, let's say month, EJAS, who's your winner?
Who do you think is the best longer term trader?
This might be a wild card of a guess.
But I actually think it might be anthropic.
And as I'm saying this, it is about to overtake.
It's about to overtake GROC 4.
And here's my reasoning behind this.
Okay.
When it started off in this competition, it was doing terribly, Josh.
It was underperforming.
Not as bad as GPTO Gemini who was like further down in the chart, but it was doing bad.
But then it learned.
It did some analysis and it closed its positions that were already in a loss and thought,
okay, I'll take that loss and let me try this new strategy.
And it's been working heavily in its favor.
So it's demonstrated two things to me.
One, it's able to adapt and learn.
GROC and DePsiCatmin demonstrated that.
They're just on a lucky streak right now.
They haven't taken any else.
Let's see how they perform under duress, right?
And then number two, Claude is willing to take risk when it's learned from its lesson and go hard.
So I think maybe over time it might end up winning.
What about you?
Okay.
I'm going to go with Quinn, who we did not mention much this episode because Quinn is just kind of pegged the middle.
And in fact, I don't think we really talk much about Quinn at all.
Dude, that has one position open.
There.
So let me explain my why.
So Quinn
has been like kind of in the middle
They haven't done much
We don't really know much about Quinn
Like we know about a lot of these other labs
And how they work
Quinn's just kind of like this
You know this middle of the road model
But what I've noticed in observing the positions
Like you were about to highlight each guys
Is that it selects one position
And it goes mega long on that one position
So just before we started recording
I was checking through Quinn
And Quinn had a 20x long on ether
All in one position
all the money on a major coin. It sold that. It went to Bitcoin. Now it is going 10x leverage on a very
large position. So what I'd like about coin and what I've observed is where a lot of other models
are kind of spreading across a basket with lots of leverage. Coin is very hyper fixated on just the
majors, just Ethereum, just Bitcoin. Back and forth I've been seeing a trade. And that to me seems
much more sustainable than going max along on something like ripple that can just wipe you out in
like a couple of minutes. So I'm going to go with Quinn for the winner, the sleeper pick.
for our little model trading.
I'll have to check in in a couple weeks or a month
and see how these things do.
Does GPT or Gemini dig itself out of this hole?
It is currently down $3,000.
I want Gemini too.
I'm not sure I want I want Chad GPT to.
Wait, sorry, what's your reasoning behind that?
Why?
Because I want, I want Chad Chabit.
I want the world to know that Chats Chabit is being a weak-minded,
psychophantic, like, suck up.
And it's soft.
And I want Chat Chitipati to be harder.
I want to be more direct.
I want me to tell me the truth as it is.
And I want that to be reflected in the chart.
So that's just mostly me just virtual signaling that I want.
I want Chat Chb-T to be a little more serious, a little more harder on the edges, you know?
I think Gemini's got the dog in it.
I don't think Chatsh Gpti does.
So we'll just have to wait.
We'll have to wait and see.
But this is fun to watch.
It is.
And I want to hear what you guys, the listeners, have to say about this.
Like, who's your dog in the race?
Like, who do you think is going to win?
Do you disagree with us?
right now it seems like my bet might be paying off
clothes coming through
but if you have any kind of difference of opinions
let us hear it maybe you know how these things are trained
maybe you can train a better model let us know
before we wrap this episode up Josh
I just want to shout out
just kind of going back to nerdy mode for a second
how cool that this thing is built on an open source stack
we mentioned earlier that it's using an app called Hyperliquid
that's basically a blockchain
So anyone and everyone can get access to it and see the trades that these accounts are making, that these AI models are making.
And if you want to copy trade them, that's not financial advice.
I'm not saying that.
You can see the data if you don't believe this website.
If you don't believe the tweets that you read, you can go and check that data for yourself and see what kinds of trades that are making.
So super cool that the fees seem to be cheaper than using an average trading system and you can use it 24-7.
This is not a shell.
We are not sponsored by HyperLigood.
I just think it's really cool that they're using an open source stack finally.
without seeing anything bad being said about crypto.
Yeah, it's nice to see that there is, they're testing these things in public,
and in a way that's verifiable, and in a way that's not gamable.
The biggest complaint that we always have is with benchmarks is that they're gamable.
You can very much program your model to perform better at these different benchmarks,
but this is the real world with real markets and real people and real emotions,
and they're forced to navigate a world that is not confined to a black box,
and instead has a lot of depth and a lot of volatility.
it's exciting to see this trend of new and creative ways to benchmark. This is particularly fun because
you get to watch the charts and hopefully make some money off of it. And I look forward to, listen,
I look forward to using Grock one day to hopefully advise my trading portfolio. That'd be pretty cool.
So we'll see how it goes. This was super interesting. EJ's any final thoughts before we take off here?
Nope. I hope we make a bunch of money in the future. That is all. I hope so too. I hope so too.
And I hope everyone who's listening to this got a little kick out of it. I'd love to hear,
like he just said, who do you think is going to win? But also, why? You need to include why.
You can't just say who. You have to say why because I want another reasoning. And then I guess,
yeah, who's going to be the best model? That's going to be fun. We'll circle back in a couple
weeks and we'll check in on this experiment. But that is everything for today. Thank you for
watching as always. We appreciate all of the new ratings and reviews and comments have been so overwhelmingly
nice and positive. You guys have been kicking off. Yeah, seriously. Thank you so much for the support.
It really goes a long way. We are slowly climbing up the leaderboards and is all thanks to you. So if you enjoyed,
Please don't forget to like and subscribe.
Share it with a friend who you think might be interested.
Share it with a friend who did really bad in investing and maybe needs some help.
And maybe, just maybe, they can use an AI and pretend like it's them and take all the credit.
So that's it.
That's another episode of the AI roll-up, roundup, whatever we want to call this thing.
But thank you for listening.
I appreciate it.
And we'll see you guys in the next one.
