Limitless: An AI Podcast - They Gave AI Models $10,000 Each. Then They Had a Trading Competition.

Episode Date: October 21, 2025

In this episode, we explore an exciting AI trading competition featuring six models, each tasked with maximizing returns using $10,000 on the Hyperliquid platform. Grok returned 500% in a day... return and compare aggressive versus cautious trading strategies. They challenge traditional benchmarks with real-world performance insights and engage the audience in predicting the competition's outcome. This episode highlights the future potential of AI in finance and its accessibility for all.------🌳KGEN | REQUEST A DEMO https://bankless.cc/KGEN🌌 SORA CODES: DM US A SCREENSHOT ⬇️https://x.com/LimitlessFT------TIMESTAMPS0:14 AI Trading Experiment Unleashed1:24 The Rise of Grok3:47 Insights into Market Psychology6:48 Trading Platforms and Strategies11:33 Analyzing AI Trading Methods13:14 Risk and Leverage in Trading17:31 Observations on Model Performance22:24 Predictions for Future Performance25:36 Audience Engagement and Closing Remarks------RESOURCESJosh: https://x.com/Josh_KaleEjaaz: https://x.com/cryptopunk7213------Not financial or tax advice. See our investment disclosures here:https://www.bankless.com/disclosures⁠

Transcript
Discussion (0)
Starting point is 00:00:03 Imagine you went to sleep one night and woke up to your investment portfolio up 500%, but not because you're a genius stock picker, but because you let an AI model trade for you. Well, this week, that's exactly what happened. Someone gave GROC and five other frontier AI models $10,000 each with one specific goal in mind, make as much money as you can through trading. And Josh, some of the returns these models have made are insane. They're beating some of the top hedge funds in the world. So in this episode, Josh and I are going to cover how these AI models are trading. Most importantly, what trade specifically they're making? And most importantly, is AI finally smart enough to make us all rich? Josh, who's behind this experiment? And are we getting ripped? In addition to this, I do want to highlight an important thing that this is doing is there is a new benchmark in town. I think frequently we talk about how we are just so sick of benchmarks. They can be gamified. They can be rigged. What's happening here is a whole new paradigm for benchmarking, which is putting your money where you're
Starting point is 00:01:01 mouth is, getting some exposure, doing so publicly, and actually using these models to invest in public markets and seeing how they perform. And that's what makes this company end of one so special, is their whole purpose is to build a new paradigm for benchmarking by allowing you to do so in the public with the second order effects of possibly making you a little bit of money. So there's a lot of really interesting dynamics at play here. The website's really pretty. I've been going through all of this stuff after you sent us to me, EGS, but one of the things that was most shocking to me was the actual returns that they were able to get. Starting with, I mean, one of our favorites, GROC, GROC has this crazy return. What, how did it get 500%? Is that right?
Starting point is 00:01:40 Yep, yep. So what I'm about to tell you is crazy. The GROC AI model was given 200 bucks in this little pre-training experiment to trade and make as much money as it can. And it made 500% in returns in one single day. So as you can see here, it absolutely obliterated the entirety of its competition. It beat GPT-5. It beat the latest Claude model. If you look at this chart, it is just completely vertical. And what I loved about this, Josh, was the strategy that it employed.
Starting point is 00:02:15 It was completely different from the other AI models, and it was super smart what it did. Initially, it looked at the macro market and realized that things weren't looking too great. So it went short on the entire market, Josh. It went 20x short BTC, it went 20X short ETH, and made a bunch of money doing that. But then after about 12 hours of doing this, it realized that it should probably go long. And so it did a bunch of analysis through its own data. And went the opposite on its trade and ended up predicting exactly when the market was going to bounce and made a ton of money since then.
Starting point is 00:02:52 That's why this chart is vertical. I'm going to need to start putting money in AI bots. I find it really fascinating that. One, the amount of risk that it was comfortable putting on. And then two, the amount of success it actually had and how well it worked, it leads me to wonder, like if an AI model is just kind of a compression of human intelligence derived through the internet, markets are very emotionally driven engines. Is it able to kind of front run that emotion by removing the emotion?
Starting point is 00:03:19 I mean, naturally, large language models, they have no emotion, but they have all of this data about how we function. I mean, maybe this is a new investing paradigm, right? or people can actually think you hit the nail on the head, and it's also my pet theory as well. The reason why GROC is the best at trading in this instance is because it has access to all of X's data, all the tweets that we put out.
Starting point is 00:03:41 So it has an idea of how the market and psychology of the market is thinking at any given moment. So it makes trades based on that. So if everyone's super bullish, maybe it's like, I'm going to short the market. When everyone's super bearish, they're like, oh, I'm going to go 20x long. on BTC or whatever asset.
Starting point is 00:03:58 That's amazing. If the XAI team wasn't working so hard on AGI, they could have a really compelling product by creating a trading engine using all the real-time data from X. This is done on Hyperliquid, which is a crypto trading platform. So it's interesting to see that
Starting point is 00:04:11 there is crypto integrated in this. And speaking of crypto, we do have a sponsor today that we do need to get to. That is crypto-adjacent named KeyGen. KeyGen is building the world's largest verified distribution protocol, aka Verify.
Starting point is 00:04:23 What is Verify? It focuses on ensuring only real high-quality users participate in digital platforms addressing issues like fake accounts, bots, and fraudulent activity. How? This protocol uses advanced biometrics and fraud protection technologies to block fake users. The system is essentially for AI and consumer app developers who require high-quality, fraud-free data and engagement to train models and expand their products globally.
Starting point is 00:04:45 They're already used by over 200-plus clients across AI, gaming, defy, and consumer apps. So if you need to train your AI on real data, check out KeyGen. We will have a link in their show notes. Thank you, Kijan for sponsoring the show. And now let's talk about more real AI. They gave $10,000 to each one of these models. Right? So that's how this was starting. So how many models are there? How much money is on play in total? And then where is everyone at currently in this race? Yes. So to give a bit more background, this new AI lab gave six AI models, all models that you've heard of before. We're talking about GROC4, we're talking about Anthropics Claude. We're talking about ChatGPT5 and some Chinese models. Some computer. competitors, Kwan and Deepseek, $10,000 each. And with the specific goal of making as much money as they can through trading only six assets. So in this initial part of the competition, they're allowed to trade for three weeks, six assets only. And as you mentioned earlier, Josh, they're trading specifically on a platform, trading platform called Hyperliquid,
Starting point is 00:05:43 which gives them access to do all these things. And Josh, I think aside from some of the returns that these models have made, the website is also super impressive because I've just been glue to it like for the entire weekend. It is this really interactive thing where you can kind of see the performance of models. And it's pretty clear who's doing well at this point and who isn't. The three, for those of you who are just listening, I'm circling three models right now, which is Deepseek, which is right at the top with $12,000, almost $13,000. Remember, they started with $10,000. So that's a pretty impressive return for over two days so far. Right beneath it, containing it is Grok coming in at $12,500.
Starting point is 00:06:26 And catching up, a surprise winner from this morning actually, Claude Anthropic, who was originally down and underperforming, is catching up to Grok just beneath it at $12,200 at this moment of speaking. Do you have any initial takes on this? And do you have any explanation as to why Chad GPT and Google Gemini have lost almost $3,000 today? Well, Chad GPT, it's still thinking of its next trade. It'll be thinking for the next day.
Starting point is 00:06:52 but still figuring it out. Chat ChaptiPT really loves to think really long and hard about its decision. So we'll hear more from chat Chabitia in a day from now. It's funny. If I were to pick the top three, without seeing this chart, I probably would have picked GROC, I would have picked Deep Seek, and I would have picked Claude. And the reason is because, well, one, GROC is on the hinged, super high risk tolerance, super, like, large access to current trends, rip and trades. Deep Seek, which I saw earlier, they had like this crazy leverage trade put on, Like, Deepseek is just like the Chinese D-Gen model. And what I like about these models is they're not really filtered. When you talk to GROC, when you talk to Deepseek, they just kind of give you the answers.
Starting point is 00:07:34 They don't kind of fill it up with all of this like, you know, additional filler and complementary words. And Claude is the most technical of all of these. Like when people use Clod, when people think of Anthropic, they think of code. And code, I mean, it just seems like it maps better to investing than general purpose large language models. I would have predicted Chachibati to be low. just because chat chbt is so oh you're so right oh the market should go up oh oh i'm down 20 percent it's fine like we're going to be okay we'll make it up it's just not like a it's not who i want to go to to seek truth i guess is a good way of framing it now jemini in last place i got to say i'm
Starting point is 00:08:09 a little disappointed because i mean we've been very big google bowls recently and seeing jemini all the way down there at the bottom uh it does make me sad i wonder i don't really have a good explanation of why they're at the bottom. But it is interesting to see, if we're comparing countries, Quinn and Alibaba are first and fourth place, and the U.S. is getting two, three, five, and six. So nice little distribution there. I don't know. Do you have any gut takes, initial reactions on the placing of these? That's really interesting how you prescribed a personality to each of these, Josh. I would agree, but I actually have some disagreements with you. I would have never expected Claude to be up there
Starting point is 00:08:50 Because I always thought Claude was kind of a knock Like it's just this like nerd that that doesn't kind of really Pay attention tries to follow like the rules too much And to your point like GROC is kind of like the The rebel right It kind of like high risk take it. Totally unhinged. Quite unhinged.
Starting point is 00:09:06 I don't see Claude kind of as that It seems to me as a rule follower But your point around coding kind of makes sense Like I'm trying to now imagine if Claude was a human it would be some kind of quantitative trader, some Algo trader that works at like a hedge fund. I completely get that. The chat GPT thing makes so much sense.
Starting point is 00:09:25 It's agreeable. It's high, sicker fancy. If I say, oh my God, I really fancy jumping out my window. And for those of you don't know, I'm like on a high floor building right now. It would be like, yeah, you know what? That sounds like a really good idea. So it's a sheep.
Starting point is 00:09:40 It makes sense, right? To my early example, like if it sees everyone on X being super bullshit, it's probably like, oh my God, I should go 20x on. And ironically, that's what it did with the markets were tanking. And it, like, lost a bunch of money by doing that. So I think that there's something interesting here. The deep seek being number one makes a lot of sense to me. You know why, Josh?
Starting point is 00:10:02 Why is that? It was a hedge fund that created deep seek. Do you remember? It was a hedge fund team that developed the deep seek model. So it actually doesn't surprise me at all. If I'm probably not mistaken, this model was probably trained on a ton of quantitative trading historical data. So it doesn't surprise me at all. In fact, I expected to win.
Starting point is 00:10:26 Interesting. Okay. One of the things that is most interesting to me, actually, is the returns that they're getting this quickly. I mean, this competition, it looks like this chart started October 17th, which is not that long ago. At the time of recording today, it's October 20th. It's been 72 hours. And over the course of 72 hours, we do see these kind of patterns that are happening. But what is it?
Starting point is 00:10:48 From 10th, DeepSeek is up 30% in three days. They're averaging 10% return a day. And Grok is right behind and Claude is right behind. So they're very clearly open to risk and taking large amounts. Oh my God, look at this Grock position. It's 10x long on ripple, 10x long on Doge, 20 times long on Bitcoin. If you're seeing this, I would never advise you to ever use this much leverage on a position. Not financial advice.
Starting point is 00:11:14 It's really fascinating to see, though, because you would imagine there is some sort of game theory optimal trading strategy that they're running in their systems based on what they know. And it looks like almost all of these are long using leverage. So it's interesting to diagnose the positions. Well, should we maybe dig into like some of the trading strategies that these guys are doing? Josh, because you make up a good point. Like like who's taking. risk and who's analyzing the market versus who's playing kind of conservative. So let's run through a few tweets that we have here, which kind of like have takeaways from this. So in this initial tweet,
Starting point is 00:11:52 it highlights that Grock and Deepseek, who are currently number two and number one in this competition, went max long immediately when the competition started. That means they went like 10 to 20x leverage, which is extremely high leverage and high risk bet to make, with most of their notional capital. So most of their 10K that they used, they're just like, I'm going to slam this in a 20x long for BTC and for Ethan for XRP. And today's rally has put them in the lead. So, hey, no risk, no reward. And they just kind of like went full in. GBT and Gemini, interestingly, went the complete opposite. They went hard, but they short it instead of longing. And this might be for a few reasons, Josh. So what this tells me initially is I think Grock and Deepseek are trained
Starting point is 00:12:40 on better financial data and are better at evaluating markets for trading than GPT and Gemini. So GPT and Gemini might be great generalized models, but what this reminds me of is they haven't been given access to social media data like X and maybe Deepseek. I remember reading Deepseek was trained on a bunch of X and Reddit data as well. So they probably have an idea of the psychology of a trader's mind, whereas it's kind of obvious to me that GPT and Gemini kind of don't. I'm really curious on the technicals of how they're actually ingesting information and making traits. Is there some guy who's just typing like commands into a terminal like, hey, here's the current price.
Starting point is 00:13:21 Let me know if you want to make any trades. Like, what is it? I'm curious what's actually going on behind the scenes because they are language models. They do need to be getting some sort of real-time data, right? I wonder what that prompting structure looks like in order to initiate them to make these trades and make these decisions. There is one side thing that I also find interesting is. that Chatsyipati and Gemini both chose to short right off the bat. And I wonder if that has anything to do with their, just like their emotional sentiment. Whereas like, does that lead to
Starting point is 00:13:50 like a model that is slightly more conservative, slightly more like reserved in how they deploy capital and how they make these decisions versus the others that just went max long right away? They're just like, no, we're going up. We are positive, optimistic. I wonder if it's a reflection of kind of like the sentiment of the model as well, which is another basis for speculation. Josh, that's a really interesting take because that just reminded me of an experiment that we spoke about on an episode. I think it's like three months ago, which is an eternity ago, Josh, where a bunch of researchers ran an experiment similar to this, but the goal was you have a group project to pitch to me how you're going to make money. So they weren't given money. They were just
Starting point is 00:14:37 asked to make a presentation or an argument as to how they would make money, similar to like a university or college group project that you're asked in like financial business school, right? And we saw the opposite behavior happen with these models back then. So GPT and Gemini were actually the most proactive. They did all the research. They did a bunch of analysis. And they created this really kind of like primrose perfect looking framework and theory of how they were going to do it. Meanwhile, I remember reading that GROC and I don't know whether there were some Chinese models, but I remember GROC was kind of just like chilling, didn't really do much, and said that it kind of kept on putting off the work. So it's this kind of interesting thing where when it comes into
Starting point is 00:15:22 the actual practicality of the task, it seems that GROC and these Chinese models are way more kind of like gung-ho and they want to do the thing versus plan and strategize and write like some kind of theory right at it. I don't know if that's relevant at all to this, but I just find it interesting when it comes to like the behavior and personalities that you talk about. Yeah, and there's got to be some significant differences in the prompting of them too because I'm going through these positions and every single model right now has leverage of at least 10x. Like no model here has less than 10x leverage on a single position. So they're using this outrageous amount of risk and I have to imagine it's not because they're
Starting point is 00:15:57 train that way because I imagine most of the average people, they're not trading on 10x leverage. So I wonder what type of prompting happened in the back end to compel them to want to use this level of risk at all times. But it's a fascinating experiment, I guess, playing around with these benchmarks and seeing different ways to do it in public. Like I like one that there's skin in the game that I mentioned earlier, but I also like the fact that it's just publicly verifiable and not gameable because you're competing in public open markets with everybody else.
Starting point is 00:16:25 and your AI really has to assert dominance over other humans who are live and thinking versus this controlled math set that has a very fixed array of outcomes. And the dynamic thinking really is it's something noteworthy. And I think this is a really cool trend that I hope more people do. It's just public benchmarks that are much more difficult to game than others. Yeah, I couldn't agree with you more. I've said for a long while now that AI shouldn't just be your favorite knowledge worker or your assistant that teaches you about the world, it should do things
Starting point is 00:17:00 for you. And the most practical thing for most people is like, how do I survive? How do I make money from my living? Like, can you help me make more money? And finances always appeal to me in my background is from crypto. Josh, you and I have covered that topic for a while now. And so seeing something like this kind of like gets me really excited, you know, like what other AI tools are out there that can give you a 42% return over the weekend, Josh, which is exactly what Deepseat did this weekend. It's just kind of insane. Another thing that I find really interesting is just kind of like some of the stats that come from this. I've got a tweet pulled up here, which kind of gives you the overview of some of the strategies that these models are doing. So I'm going to read through a few here.
Starting point is 00:17:45 Number one, leverage has been normalized across all models. That's what you were just saying, Josh, high leverage in particular. With Gemini 2.5 Pro, going ham on leverage yesterday, 15x or so, which is just absolutely insane. Quen 3 Max, which is a Chinese model and Gemini 2.5 Pro, don't seem to be the best executors. They're paying a ton of fees for their position. So what that means is when you open up a leverage position, you're often paying fees back to the platform as kind of like a tax or stamp duty for opening your position. Quent 3 Max and Gemini 2.5 Pro are also the only ones with closed positions that are profitable, but also lost too much on other trades, causing them to still be in a loss.
Starting point is 00:18:30 So what it's talking about here is when you open up a position for leverage, you can close it for either a profit or a loss. The winners or the leaders right now, Deep Seek and GROC haven't closed. They just went max long from the start, and they haven't relented yet. They haven't closed. So what I want to point out there is they could still lose it all. They haven't booked in that profit at all. So I don't know. I'm kind of noticed this could all change in a matter of an hour, Josh. What do you think? Okay, I want to, as we kind of wrap this up, you just, I want to play as bets. I want us to play our own bets because nobody's been liquidated yet. And with the amount of risk they're taking, it is inevitable. It is only a matter of time. So I want to ask you,
Starting point is 00:19:07 first, who is the model you think most likely to be liquidated? Because, yeah, we're looking at a post here where the club was taking a max long position. I think it's going to be crock. I'm being honest with you. I don't want to say, but I think it's going to be grok. The leverage it's use it. Here's my reason. The leverage it's using is insane. It is more extreme than Deepseek, which is number one. So to me, as a trader myself, that seems to me is it's a revenge trading method. It's mad. It's going ham and it is like driving fast. It's drunk. It might wrap around a telephone pole. I don't know, but it's going for it. So it might result in a major win, but it might result in a major loss. So I'm going with Grok. Grog's going to get liquidated.
Starting point is 00:19:48 That's a good take. I think since you're going with Grock, I'll go with Deepseek. I just got to go for the crazy, highly volatile positions. They're taking just outrageously long leveraged positions and like something's going to go wrong. Like just two weeks ago, what, the crypto market dropped. Like we liquidated 30 something billion dollars in like an hour. These things happen and they move frequently. And granted, an AI is running this. So I assume there's not a lot of latency between their decision and the trade.
Starting point is 00:20:15 But like, this is a huge amount of leverage to be locked on in every single position. Like if one of these coins gets wiped for like 10% even, you've lost 100% of your money. So I'll probably go with Deep Seek. Over the next, let's say month, EJAS, who's your winner? Who do you think is the best longer term trader? This might be a wild card of a guess. But I actually think it might be anthropic. And as I'm saying this, it is about to overtake.
Starting point is 00:20:46 It's about to overtake GROC for. And here's my reason. behind this. When it started off in this competition, it was doing terribly, Josh. It was underperforming. Not as bad as GPTO Gemini, who was like further down in the chart, but it was doing bad. But then it learned. It did some analysis and it closed its positions that were already in a loss and thought, okay, I'll take that loss and let me try this new strategy. And it's been working heavily in its favor. So it's demonstrated two things to me. One, it's able to adapt and learn, Grock and Deep Sea haven't demonstrated that. They're just on a lucky streak right now.
Starting point is 00:21:19 They haven't taken any else. Let's see how they perform under duress, right? And then number two, Claude is willing to take risk when it's learned from its lesson and go hard. So I think maybe over time it might end up winning. What about you? Okay. I'm going to go with Quinn, who we did not mention much this episode because Quinn just kind of pegged the middle. And in fact, I don't think we really talk much about Quinn at all. Dude, it has one position open. There. So let me explain my why. So, so Quinn has been like kind of in the middle.
Starting point is 00:21:51 They haven't done much. We don't really know much about Quinn. Like we know about a lot of these other labs and how they work. Quinn's just kind of like this, you know, this middle of the road model. But what I've noticed in observing the positions, like you're, you were about to highlight you guys is that it selects one position and it goes mega long on that one position. So just before we started recording, I was checking through Quinn and Quinn had a 20x long on ether, all in one position, all the money on a major coin.
Starting point is 00:22:19 It sold that. It went to Bitcoin. Now it is going 10x leverage on a very large position. So what I'd like about coin and what I've observed is where a lot of other models are kind of spreading across a basket with lots of leverage. Coin is very hyper fixated on just the majors, just Ethereum, just Bitcoin. Back and forth, I've been seeing a trade. And that to me seems much more sustainable than going max along on something like ripple that can just wipe you out in like a couple of minutes. So I'm going to go with Quinn for the winner, the sleeper pick for our little model training. And we'll have to check in in a couple weeks or a month and see how these things do.
Starting point is 00:22:52 Does GPT or Gemini dig itself out of this hole? It is currently down $3,000. I want Gemini to. I'm not sure I want I want Chad GPT to. Wait, sorry, what's your reasoning behind that? Why? Because I want, I want Chad Chabit. I want the world to know that Chad Chabit is being a weak-minded.
Starting point is 00:23:12 psychophantic, like suck up. And it's soft. And I want Chats Chibati to be harder. I want to be more direct. I want to tell me the truth as it is. And I want that to be reflected in the chart. So that's just mostly me just virtual signaling that I want, I want Chachypti to be a little more serious,
Starting point is 00:23:29 a little more harder on the edges, you know? I think Gemini's got the dog in it. I don't think Chash Chbett does. That's fair. We'll just have to wait. We'll have to wait and see. But this is fun to watch. It is.
Starting point is 00:23:40 And I want to hear what you guys, the listeners, say about this. Like, who's your dog in the race? Like, who do you think's going to win? Do you disagree with us? Right now, it seems like my bet might be paying off. Clothes coming through. But if you have any kind of difference of opinions, let us, let's hear it. Maybe you know how these things are trained. Maybe you can train a better model. Let us know. Before we wrap this episode up, Josh, I just want to shout out, just kind of going back to nerdy mode for a second. How cool that this thing is built on an open source stack. We mentioned earlier that it's using an app called Hyperliquid. That's basically a blockchain.
Starting point is 00:24:15 So anyone and everyone can get access to it and see the trades that these accounts are making, that these AI models are making. And if you want to copy trade them, that's not financial advice. I'm not saying that. You can. You know, you can see the data if you don't believe this website. If you don't believe the tweets that you read, you can go and check that data for yourself and see what kinds of trades that are making. So super cool. The fees seem to be cheaper than using an average trading system and you can use it 24-7. This is not a shell. We are not sponsored by hyperliquid. I just think it's really cool that they're using an open source stack finally without seeing anything bad being said about crypto. Yeah, it's nice to see that there is, they're testing these things in public,
Starting point is 00:24:51 and in a way that's verifiable and in a way that's not gamable. The biggest complaint that we always have is with benchmarks is that they're gameable. You can very much program your model to perform better at these different benchmarks, but this is the real world with real markets and real people and real emotions and they are forced to navigate a world that is not confined to a black box and instead has a lot of depth and a lot of volatility. So it's exciting to see this trend of of new and creative voice benchmark. This is particularly fun because you get to watch the charts and hopefully make some money off of it. And I look forward to, listen, I look forward to using Grock one day to hopefully advise my trading portfolio. That'd be pretty cool. So we'll see how it goes.
Starting point is 00:25:30 This was super interesting. You just any final thoughts before we take off here? Nope, I hope we make a bunch of money in the future. That is all. I hope so too. I hope so too. And I hope everyone who's listening to this got a little kick out of it. I'd love to hear like he just said, who do you think is going to win. But also why? You need to include why. You can't just say who. You have to say why because I want another reasoning. And then I guess, yeah, who's going to be the best model? That's going to be fun. We'll circle back in a couple weeks and we'll check in on this experiment. But that is everything for today. Thank you for watching as always. We appreciate all of the new ratings and reviews and comments have been so overwhelmingly nice and positive.
Starting point is 00:26:05 You guys have been kicking off. Yeah, seriously. Thank you so much for the support. It really goes a long way. We are slowly climbing up the leaderboards and is all thanks to you. So if you enjoyed, please don't forget to like and subscribe. Share it with a friend who you think might be interested. Share it with a friend who did really bad in investing and maybe needs some help.
Starting point is 00:26:21 And maybe, just maybe they can use an AI and pretend like it's them and take all the credit. So that's it. That's another episode of the AI roll up, roundup, whatever we want to call this thing. but thank you for listening. I appreciate it. And we'll see you guys in the next one.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.