Limitless Podcast - They Gave AI Models $10,000 Each. Then They Had a Trading Competition.

Starting point is 00:00:03 Imagine you went to sleep one night and woke up to your investment portfolio up 500%, but not because you're a genius stock picker, but because you let an AI model trade for you. Well, this week, that's exactly what happened. Someone gave GROC and five other frontier AI models $10,000 each with one specific goal in mind, make as much money as you can through trading. And Josh, some of the returns these models have made are insane. They're beating some of the top hedge funds in the world. So in this episode, Josh and I are going to cover how these AI models are trading.

Starting point is 00:00:36 Most importantly, what trade specifically they're making? And most importantly, is AI finally smart enough to make us all rich? Josh, who's behind this experiment? And are we getting ripped? In addition to this, I do want to highlight an important thing that this is doing is there is a new benchmark in town. I think frequently we talk about how we are just so sick of benchmarks. They can be gamified.

Starting point is 00:00:56 They can be rigged. What's happening here is a whole new paradigm for benchmarking, which is putting your money where your mouth is, getting some exposure, doing so publicly, and actually using these models to invest in public markets and seeing how they perform. And that's what makes this company end of one so special, is their whole purpose is to build a new paradigm for benchmarking by allowing you to do so in the public with the second order effects of possibly making you a little bit of money. So there's a lot of really interesting dynamics at play here. The website's really pretty. I've been going through all of this stuff after you sent us to me, EGS, but one of the

Starting point is 00:01:28 things that was most shocking to me was the actual returns that they were able to get, starting with, I mean, one of our favorites, GROC, GROC has this crazy return. What, how did it get 500%? Is that right? Yep, yep. So what I'm about to tell you is crazy. The GROC AI model was given 200 bucks in this little pre-training experiment to trade and make as much money as it can. And it made 500% in returns in one single day. So as you can see here, it absolutely obliterated the entirety of its competition. It beat GPT5. It beat the latest Claude model. If you look at this chart, it is just completely vertical. And what I loved about this, Josh, was the strategy that it employed. It was

Starting point is 00:02:15 completely different from the other AI models and it was super smart what it did. Initially, it looked at the macro market and realized that things weren't looking to great. So it went short on the entire market, Josh. It went 20x short BTC, it went 20x short ETH, and made a bunch of money doing that. But then after about 12 hours of doing this, it realized that it should probably go long. And so it did a bunch of analysis through its own data and went the opposite on its trade and ended up predicting exactly when the market was going to bounce and made a ton of money since then. That's why this chart is vertical. I'm going to need to start putting money in AI bots. I find it really fascinating that. One, the amount of risk that

Starting point is 00:02:59 it was comfortable putting on. And then two, the amount of success it actually had and how well it worked, it leads me to wonder, like, if an AI model is just kind of a compression of human intelligence derived through the internet. Markets are very emotionally driven engines. Is it able to kind of front-run that emotion by removing the emotion? I mean, naturally, a large language model is they have no emotion, but they have all of this data about how we function. I mean, maybe this is a new investing paradigm, right? People can actually start making money. Josh, I actually think you hit the nail on the head and it's also my pet theory as well. The reason why GROC is the best at trading in this instance is because it has access to all of X's data, all the tweets that we put out. So it has an idea of

Starting point is 00:03:43 how the market and psychology of the market is thinking at any given moment. So it makes trades based on that. So if everyone's super bullish, maybe it's like, I'm going to short the market. When everyone's super bearish, they're like, oh, I'm going to go 20x long on BTC or whatever asset. That's amazing. If the XAI team wasn't working so hard on AGI, they could have a really compelling product by creating a trading engine using all the real-time data from X. This is done on Hyperliquid, which is a crypto trading platform. So it's interesting to see that there is crypto integrated in this. And speaking of crypto, we do have a sponsor today that we do need to get to. That is Crypto-adjacent named KeyGen. KeyGen is building the world's largest verified distribution protocol,

Starting point is 00:04:22 aka Verify. What is Verify? It focuses on ensuring only real high-quality users participate in digital platforms addressing issues like fake accounts, bots, and fraudulent activity. How? This protocol uses advanced biometrics and fraud protection technologies to block fake users. The system is essentially for AI and consumer app developers who require high-quality, fraud-free data, and engagement to train models and expand their products globally. They're already used by over 200 plus clients across AI gaming, defy, and consumer apps. So if you need to train your AI on real data, check out KeyGen. We will have a link in their show notes.

Starting point is 00:04:55 Thank you, Kijan, for sponsoring the show. And now let's talk about more real AI. They gave $10,000 to each one of these models, right? So that's how this was starting. So how many models are there? How much money is on play in total? And then where is everyone at currently in this race? Yes.

Starting point is 00:05:11 So to give a bit more background, this new AI lab, gave six AI models, all models that you've heard of before. We're talking about Grock 4, we're talking about Anthropics, Claude. We're talking about ChatGPT-5 and some Chinese models, some competitors, Kwan and Deep Seek, $10,000 each. And with the specific goal of making as much money as they can through trading only six assets. So in this initial part of the competition, they're allowed to trade for three weeks, six assets only. And as you mentioned earlier, Josh, they're trading specifically on a platform, a trading platform called Hyperliquid, which gives them access to do all these things. And Josh, I think aside from some of the returns that these models have

Starting point is 00:05:50 made, the website is also super impressive because I've just been glued to it like for the entire weekend. It is this really interactive thing where you can kind of see the performance of models. And it's pretty clear who's doing well at this point and who isn't. The three, for those of you who are just listening, I'm circling three models right now, which is deep seek, which is right at the right at the top with 12th, almost $13,000. Remember, they started with $10,000. So that's a pretty impressive return for over two days so far. Right beneath it, tailing it is Grok coming in at $12,500. And catching up, a surprise winner from this morning actually, Claude Anthropic, who was originally down and underperforming is catching up to Grok just beneath it at $12,200 at this moment of speaking.

Starting point is 00:06:39 Do you have any initial takes on this? And do you have any explanation as to why chat cheap pt at Google Gemini have lost almost $3,000 to date. Well, Chachyipati, it's still thinking of its next trade. It'll be thinking for the next day. So it's still figuring it out. Chachyipti really loves to think really long and hard about its decision. So we'll hear more from Chachybitty in a day from now. It's funny, if I were to pick the top three, without seeing this chart, I probably would

Starting point is 00:07:04 have picked GROC, I would have picked Deep Seek, and I would have picked Claude. And the reason is because, well, one, GROC is on a hinged, super high risk. tolerance, super like large access to current trends, rip in trades. Deepseek, which I saw earlier, they had like this crazy leverage trade put on. Like deep seek is just like the Chinese Dgen model. And what I like about these models is they're not really filtered. When you talk to GROC, when you talk to Deepseek, they just kind of give you the answers. They don't kind of fill it up with all of this like, you know, additional filler and complementary words. And Claude is the most technical of all of these. Like when people use Clod, when people think of Anthropic, they think of code.

Starting point is 00:07:44 And code, I mean, it just seems like it maps better to investing than general purpose large language models. I would have predicted chat chbt to be low just because chat chpti is so, oh, you're so right. Oh, the market should go up. Oh, I'm down 20%. It's fine. Like, we're going to be okay. We'll make it up.

Starting point is 00:08:01 It's just not like a, it's not who I want to go to to seek truth, I guess, is a good way of framing it. Now, Gem and I last place, I got to say, I'm a little disappointed because, I mean, we've been very big Google Bowls recently. And seeing Gemini all the way down there at the bottom, it does make me sad. I wonder, I don't really have a good explanation of finder at the bottom. But it is interesting to see, if we're comparing countries, Quinn and Alibaba are first and fourth place. And the U.S. is getting two, three, five, and six. So nice little distribution there. I don't know. Do you have any guttakes, initial reactions on the placing of these?

Starting point is 00:08:38 that's really interesting how you prescribed a personality to each of these, Josh. I would agree, but I actually have some disagreements with you. I would have never expected Claude to be up there. Because I always thought Claude was kind of a knock. Like, it's just this like nerd that doesn't kind of really pay attention, tries to follow like the rules too much. And to your point, like Grock is kind of like the rebel, right? It kind of like high risk take it.

Starting point is 00:09:04 Totally unhinged. Quite unhinged. I don't see Claude kind of as that. it seems to me as a rule follower. But your point around coding kind of makes sense. Like I'm trying to now imagine if Claude was a human, it would be some kind of quantitative trader, some Algo trader that works out like a hedge fund.

Starting point is 00:09:20 I completely get that. The chat GPT thing makes so much sense. It's agreeable. It's high, sicker fancy. If I say, oh my God, I really fancy jumping out my window. And for those of you don't know, I'm like on a high floor building right now, it would be like, yeah, you know what?

Starting point is 00:09:38 that sounds like a really good idea. So it's a sheep. It makes sense, right? To my early example, like if it sees everyone on X being super bullish, it's probably like, oh my God, I should go 20x long. And ironically, that's what it did with the markets were tanking. And it like lost a bunch of money by doing that. So I think that there's something interesting here. The deep seek being number one makes a lot of sense to me. You know why, Josh? Because it was a, it was a hedge fund that created deep seek. Do you remember? It was a head. fun team that developed the deep seek model. So it actually doesn't surprise me at all. If I'm probably not mistaken, this model was probably trained on a ton of quantitative trading historical

Starting point is 00:10:22 data. So it doesn't surprise me at all. In fact, I expected to win. Interesting. Okay. One of the things that is most interesting to me actually is the returns that they're getting this quickly. I mean, this competition, it looks like this chart started October 17th, which is, is not that long ago. At the time of recording today, it's October 20th. It's been 72 hours. And over the course of 72 hours, we do see these kind of patterns that are happening. But what is it? From 10th, Deep Seek is up 30% in three days. They're averaging 10% return a day. And Grog is right behind and Claude is right behind. So they're very clearly open to risk and taking large, oh my God, look at this Grock position. It's 10x long on ripple, 10x long on

Starting point is 00:11:06 Doge 20 times long on Bitcoin. If you're seeing this, I would never advise you to ever use this much leverage on a position. It's not financial advice. It's really fascinating to see though, because you would imagine there is some sort of game theory optimal trading strategy that they're running in their systems based on what they know. And it looks like almost all of these are long using leverage. So it's interesting to diagnose the positions. Well, should we, should we maybe dig into like some of the trading strategy? that these guys are doing. Josh, because you make up a good point.

Starting point is 00:11:39 Like who's taking risk and who's analyzing the market versus who's playing kind of conservative. So let's run through a few tweets that we have here, which kind of like have takeaways from this. So in this initial tweet, it highlights that Grock and Deepseek, who are currently number two and number one in this competition, went max long immediately when the competition started. So that means they went like 10 to 20x leverage, which is extremely. extremely high leverage and high risk bet to make with most of their notional capital. So most of their 10K that they used, they're just like, I'm going to slam this in a 20x long for BTC and for Ethan for XRP. And today's rally has put them in the lead.

Starting point is 00:12:19 So, hey, no risk, no reward. And they just kind of like went full in. GBT and Gemini, interestingly, went the complete opposite. They went hard, but they short it instead of longing. And this might be for a few reasons, Josh. So what this tells me initially is I think GROC and Deepseek are trained on better financial data and are better at evaluating markets for trading than GPT and Gemini. So GPT and Gemini might be great generalized models, but what this reminds me of is they haven't been given access to social media data like X and maybe Deepseek has. I remember reading Deepseek was trained on a bunch of X and Reddit data as well.

Starting point is 00:13:01 So they probably have an idea of the psychology of a trader's mind, whereas it's kind of obvious to me that GPT and Gemini kind of don't. I'm really curious on the technicals of how they're actually ingesting information and making traits. Is there some guy who's just typing like commands into a terminal like, hey, here's the current price. Let me know if you want to make any trades. Like, what is it? I'm curious what's actually going on behind the scenes because they are language models. They do need to be getting some sort of real time data, right? I wonder what that prompting structure looks like in order to a new thing.

Starting point is 00:13:31 them to make these trades and make these decisions. There is one side thing that I also find interesting is that Chatsyipt and Gemini both chose to short right off the bat. And I wonder if that has anything to do with their, just like their emotional sentiment. Whereas like, does that lead to like a model that is slightly more conservative, slightly more like reserved in how they deploy capital and how they make these decisions versus the others that just went max long right away? They're just like, no, we're going up.

Starting point is 00:14:03 We are positive, optimistic. I wonder if it's a reflection of kind of like the sentiment of the model as well, which is another basis for speculation. Josh, that's a really interesting take because that just reminded me of an experiment that we spoke about on an episode. I think it's like three months ago, which is an eternity ago, Josh, where a bunch of researchers ran an experiment similar to this, but the goal was, you. have a group project to pitch to me how you're going to make money. So they weren't given money.

Starting point is 00:14:37 They were just asked to make a presentation or an argument as to how they would make money, similar to like a university or college group project that you're asked in like financial business school, right? And we saw the opposite behavior happen with these models back then. So GPT and Gemini were actually the most proactive. They did all the research. They did a bunch of analysis and they created this really kind of like primrose perfect looking framework and theory of how they were going to do it. Meanwhile, I remember reading that GROC and I don't know whether there was some Chinese models, but I remember GROC was kind of just like chilling, didn't really do much, and said that it kind of kept on putting off the work. So it's this kind of interesting thing

Starting point is 00:15:21 where when it comes into the actual practicality of the task, it seems that GROC and these Chinese models are way more kind of like gung-ho and they want to do the thing versus plan and strategize and write like some kind of theory around it. I don't know if that's relevant at all to this, but I just find it interesting when it comes to like the behavior and personalities that you talk about. Yeah, and there's got to be some significant differences in the prompting of them too, because I'm going through these positions and every single model right now has leverage of at least 10x. Like no model here has less than 10x leverage on a single position. So they're using this outrageous amount of risk. And I have to imagine it's,

Starting point is 00:15:56 It's not because they were trained that way because I imagine most of the average people, they're not trading on 10x leverage. So I wonder what type of prompting happened in the back end to compel them to want to use this level of risk at all times. But it's a fascinating experiment, I guess, playing around with these benchmarks and seeing different ways to do it in public. Like I like one that there's skin in the game that I mentioned earlier, but I also like the fact that it's just publicly verifiable and not gameable because you're competing in

Starting point is 00:16:23 public open markets with everybody else. and your AI really has to assert its dominance over other humans who are live and thinking versus this controlled math set that has a very fixed array of outcomes. And the dynamic thinking really is it's something noteworthy. And I think this is a really cool trend that I hope more people do. It's just public benchmarks that are much more difficult to game than others. Yeah, I couldn't agree with you more. I've said for a long while now that AI shouldn't just be your

Starting point is 00:16:54 favorite knowledge worker or your assistant that teaches you about the world, it should do things for you. And the most practical thing for most people is like, how do I survive? How do I make money from my living? Like, can you help me make more money? And finances always appeal to me in my background is from crypto. Josh, you and I have covered that topic for a while now. And so seeing something like this kind of like gets me really excited, you know, like what other AI tools are out there that can give you a 42% return over the weekend, Josh, which is exactly what Deepseek did this weekend. It's just kind of insane. Another thing that I find really interesting is just kind of like some of the stats that come from this. I've got a tweet pulled up here, which kind of gives

Starting point is 00:17:39 you the overview of some of the strategies that these models are doing. So I'm going to read through a few here. Number one, leverage has been normalized across all models. That's what you were just saying, Josh, high leverage in particular. With Gemini 2.5,000. With Gemini 2.5,000. pro going ham on leverage yesterday, 15x or so, which is just absolutely insane. Quen 3 Max, which is a Chinese model and Gemini 2.5 Pro, don't seem to be the best executors. They're paying a ton of fees for their position. So what that means is when you open up a leverage position, you're often paying fees back to the platform as kind of like a tax or stamp duty for opening your position. Quent 3 Max and Gemini 2.5 Pro are also the only ones with closed position.

Starting point is 00:18:23 that are profitable, but also lost too much on other trades, causing them to still be in a loss. So what it's talking about here is when you open up a position for leverage, you can close it for either a profit or a loss. The winners or the leaders right now, Deep Seek and GROC haven't closed. They just went max long from the start, and they haven't relented yet. They haven't closed. So what I want to point out there is they could still lose it all. They haven't booked in that profit at all. So I don't know. I'm kind of noticed this could all change in a matter of of an hour, Josh. What do you think? Okay, I want to, as we kind of wrap this out, because I want to play as bets. I want us to play our own bets because nobody's been liquidated

Starting point is 00:19:02 yet. And with the amount of risk they're taking, it is inevitable. It is only a matter of time. So I want to ask you first, who is the model you think most likely to be liquidated? Because, yeah, we're looking at a post here where the club was taking a max long position. I think it's going to be grok. I'm being honest with you. I don't want to say, but I think it's going to be grok. The leverage it's using, here's my reason. The leverage it's using is insane. It is more extreme than Deepseek, which is number one. So to me, as a trader myself, that seems to me is it's a revenge trading method. It's mad. It's going ham and it is like driving fast. It's drunk. It might wrap around a telephone pole. I don't know, but it's going for it. So it might result in a major

Starting point is 00:19:44 win, but it might result in a major loss. So I'm going with Grok. Grog's going to get liquidated. That's a good take. I think since you're going with Grock, I'll go with Deepseek. I just got to go for the crazy, highly volatile positions. They're taking just outrageously long leveraged positions and like something's going to go wrong. Like just two weeks ago, what, the crypto market dropped. Like, we liquidated 30 something billion dollars in like an hour. These things happen and they move frequently. And granted, an AI is running this.

Starting point is 00:20:11 So I assume there's not a lot of latency between their decision and the trade. But like, this is a huge amount of leverage to be locked on in every single position. Like if one of these coins gets wiped for like 10% even, you've lost 100% of your money. So I'll probably go with Deep Seek. Over the next, let's say month, EJAS, who's your winner? Who do you think is the best longer term trader? This might be a wild card of a guess. But I actually think it might be anthropic.

Starting point is 00:20:41 And as I'm saying this, it is about to overtake. It's about to overtake GROC 4. And here's my reasoning behind this. Okay. When it started off in this competition, it was doing terribly, Josh. It was underperforming. Not as bad as GPTO Gemini who was like further down in the chart, but it was doing bad. But then it learned.

Starting point is 00:21:02 It did some analysis and it closed its positions that were already in a loss and thought, okay, I'll take that loss and let me try this new strategy. And it's been working heavily in its favor. So it's demonstrated two things to me. One, it's able to adapt and learn. GROC and DePsiCatmin demonstrated that. They're just on a lucky streak right now. They haven't taken any else.

Starting point is 00:21:20 Let's see how they perform under duress, right? And then number two, Claude is willing to take risk when it's learned from its lesson and go hard. So I think maybe over time it might end up winning. What about you? Okay. I'm going to go with Quinn, who we did not mention much this episode because Quinn is just kind of pegged the middle. And in fact, I don't think we really talk much about Quinn at all. Dude, that has one position open.

Starting point is 00:21:44 There. So let me explain my why. So Quinn has been like kind of in the middle They haven't done much We don't really know much about Quinn Like we know about a lot of these other labs And how they work

Starting point is 00:21:57 Quinn's just kind of like this You know this middle of the road model But what I've noticed in observing the positions Like you were about to highlight each guys Is that it selects one position And it goes mega long on that one position So just before we started recording I was checking through Quinn

Starting point is 00:22:12 And Quinn had a 20x long on ether All in one position all the money on a major coin. It sold that. It went to Bitcoin. Now it is going 10x leverage on a very large position. So what I'd like about coin and what I've observed is where a lot of other models are kind of spreading across a basket with lots of leverage. Coin is very hyper fixated on just the majors, just Ethereum, just Bitcoin. Back and forth I've been seeing a trade. And that to me seems much more sustainable than going max along on something like ripple that can just wipe you out in like a couple of minutes. So I'm going to go with Quinn for the winner, the sleeper pick.

Starting point is 00:22:47 for our little model trading. I'll have to check in in a couple weeks or a month and see how these things do. Does GPT or Gemini dig itself out of this hole? It is currently down $3,000. I want Gemini too. I'm not sure I want I want Chad GPT to. Wait, sorry, what's your reasoning behind that?

Starting point is 00:23:05 Why? Because I want, I want Chad Chabit. I want the world to know that Chats Chabit is being a weak-minded, psychophantic, like, suck up. And it's soft. And I want Chat Chitipati to be harder. I want to be more direct. I want me to tell me the truth as it is.

Starting point is 00:23:22 And I want that to be reflected in the chart. So that's just mostly me just virtual signaling that I want. I want Chat Chb-T to be a little more serious, a little more harder on the edges, you know? I think Gemini's got the dog in it. I don't think Chatsh Gpti does. So we'll just have to wait. We'll have to wait and see. But this is fun to watch.

Starting point is 00:23:39 It is. And I want to hear what you guys, the listeners, have to say about this. Like, who's your dog in the race? Like, who do you think is going to win? Do you disagree with us? right now it seems like my bet might be paying off clothes coming through but if you have any kind of difference of opinions

Starting point is 00:23:54 let us hear it maybe you know how these things are trained maybe you can train a better model let us know before we wrap this episode up Josh I just want to shout out just kind of going back to nerdy mode for a second how cool that this thing is built on an open source stack we mentioned earlier that it's using an app called Hyperliquid that's basically a blockchain

Starting point is 00:24:15 So anyone and everyone can get access to it and see the trades that these accounts are making, that these AI models are making. And if you want to copy trade them, that's not financial advice. I'm not saying that. You can see the data if you don't believe this website. If you don't believe the tweets that you read, you can go and check that data for yourself and see what kinds of trades that are making. So super cool that the fees seem to be cheaper than using an average trading system and you can use it 24-7. This is not a shell. We are not sponsored by HyperLigood.

Starting point is 00:24:41 I just think it's really cool that they're using an open source stack finally. without seeing anything bad being said about crypto. Yeah, it's nice to see that there is, they're testing these things in public, and in a way that's verifiable, and in a way that's not gamable. The biggest complaint that we always have is with benchmarks is that they're gamable. You can very much program your model to perform better at these different benchmarks, but this is the real world with real markets and real people and real emotions, and they're forced to navigate a world that is not confined to a black box,

Starting point is 00:25:12 and instead has a lot of depth and a lot of volatility. it's exciting to see this trend of new and creative ways to benchmark. This is particularly fun because you get to watch the charts and hopefully make some money off of it. And I look forward to, listen, I look forward to using Grock one day to hopefully advise my trading portfolio. That'd be pretty cool. So we'll see how it goes. This was super interesting. EJ's any final thoughts before we take off here? Nope. I hope we make a bunch of money in the future. That is all. I hope so too. I hope so too. And I hope everyone who's listening to this got a little kick out of it. I'd love to hear, like he just said, who do you think is going to win? But also, why? You need to include why.

Starting point is 00:25:47 You can't just say who. You have to say why because I want another reasoning. And then I guess, yeah, who's going to be the best model? That's going to be fun. We'll circle back in a couple weeks and we'll check in on this experiment. But that is everything for today. Thank you for watching as always. We appreciate all of the new ratings and reviews and comments have been so overwhelmingly nice and positive. You guys have been kicking off. Yeah, seriously. Thank you so much for the support. It really goes a long way. We are slowly climbing up the leaderboards and is all thanks to you. So if you enjoyed, Please don't forget to like and subscribe. Share it with a friend who you think might be interested.

Starting point is 00:26:17 Share it with a friend who did really bad in investing and maybe needs some help. And maybe, just maybe, they can use an AI and pretend like it's them and take all the credit. So that's it. That's another episode of the AI roll-up, roundup, whatever we want to call this thing. But thank you for listening. I appreciate it. And we'll see you guys in the next one.

Limitless Podcast - They Gave AI Models $10,000 Each. Then They Had a Trading Competition.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.