Bankless - AI Finds 70% of Smart Contract Exploits | Alpin Yukseloglu

Starting point is 00:00:02 Bankless Nation, we are here with Alpin Yuxelolu. He is an investment and a research partner at Paradigm. Also the co-author of the paper titled EVM Bench, an open benchmark for smart contract security agents written in collaboration with OpenAI to measure the ability of AI agents to just detect or patch or exploit smart contract vulnerabilities. We're going to talk about the way that AI and AI capabilities are going to impact our crypto ecosystem, our smart contracts. Alpin, welcome to bankless. Hi, thanks for having me. I want to start off the question with this podcast with a very big question. How at risk are we from AI? How large of a threat does AI smart contract capabilities pose to our industry?

Starting point is 00:00:44 Yeah, I mean, in the long term, it's now increasingly clear that AI is going to be extremely, extremely good for crypto, because especially on the security front, because we're going to get to a world where because because everything is much more secure, the ceiling on the industry is much higher. So our partner Matt talks about how if you have a grocery store that's run by mom and pop, because they can't see everything in the store, there's a limit to how big they can get. But the moment you add security cameras in, so security has this effect of increasing the capacity, the carrying capacity of an industry. I think in the short term, it's up to us because the models are getting extremely good,

Starting point is 00:01:24 like strikingly good. When we started working on EVM bench, which is a benchmark that consists entirely of fun-draining critical bugs around six months ago, the models were able to find less than 20% of the bugs, like around 12 to 13%. And just over the course of while we were working on the benchmark, this number went up to over 50%. And in between when I drafted the launch tweet

Starting point is 00:01:50 and when I had to actually hit send with the release of G, 5.3 Kodaks, it jumped up to over 70%. So these things are just growing at a, blistering pace, and it's very important that we position the industry in a way that we can defensively protect against attacks. But in the long term, I think it massively increases the carrying capacity of crypto. Yeah, I think what you're saying is in the long term, we get something approaching perfect security. Yeah, right now we do not have perfect security. Let me ask you the same question, but a little bit differently. Say only bad actors, only black hat actors have access to

Starting point is 00:02:28 AI capabilities. In that context, how at risk is our industry? Like, how exploitable are our smart contracts given the increase in AI capabilities? Yeah, I mean, I think it's really hard to say when we approach super intelligence levels. I do think until we hit the, like right now, the models are quite good, but they're not better than the best human auditors. So we already have existed in crypto under this threat model of extremely intelligent adversarial actors that are constantly trying to break all of our software with all the money in it.

Starting point is 00:03:02 So in that sense, like, crypto is already quite hardened. But it's just really hard to know when we talk about sort of a technology inflecting into superintelligence. This is very similar to how encoding capabilities were increasing mostly linearly over the last several years. And in December last year, they crossed some threshold where they were better than sort of the median engineer. And a lot of stuff clicked for every.

Starting point is 00:03:26 everyone and it started becoming this aha moment and this sort of oh crap moment. And I think something very similar will probably happen with security where right now it's like increasing pretty rapidly at a, still at a linear clip. But it's not as good as the best human auditors yet. So we don't feel it yet. It hasn't actually broken any of our assumptions. But once we hit in maybe six to eight months, I'm pretty confident at this point by the end of the year, a superhuman AI auditor, this will just completely break all of our assumptions

Starting point is 00:03:54 and we'll need to go back and make sure that we're. hardening all of the contracts that are housing the what nearly $100 billion of assets in crypto. Galaxy operates where digital assets and next generation infrastructure come together, serving institutions end to end. On the market side, Galaxy is a leading institutional platform, providing access to spot, derivatives, structured products, defy-lending, investment banking, and financing. With more than 1,600 trading counterparties, Galaxy helps institutions navigate every phase of the market cycle. The platform also supports long-term allocators through actively managed strategies and institutional grade staking and blockchain infrastructure. That scale is real. Galaxy has over $12 billion

Starting point is 00:04:29 in assets on the platform and averaged a $1.8 billion loan book in late 2025, reflecting deep trust across the ecosystem. Beyond digital assets, Galaxy is also building infrastructure for an AI-powered future. Its Helios Data Center campus is purpose built for AI and high-performance computing, with more than 1.6 gigawatts of approved power capacity, making it one of the largest sites of its kind. From global markets to AI-ready data centers, Galaxy is serving the digital asset ecosystem end to end. Explore Galaxy at galaxy.com slash bankless or click the link in the show notes.

Starting point is 00:05:00 Euphoria brings one tap trading to the palm of your hand. Built on MegaEath, Euphoria takes real-time price charts and projects it over a grid of squares. You tap the squares that you think the price will enter in just five to 30 seconds in the future. If the price goes into that quadrant, you can pocket anywhere between 2 and

Starting point is 00:05:16 100x your trade. No other application helps you trade faster and with more leverage on market driving events like FOMC meetings, presidential speech, or global macro events. Thanks to MegaEth's real-time blockchain, Euphoria is the way to get real-time price interactions with the market.

Starting point is 00:05:32 On Euphoria, you'll be able to compete with friends using Euphoria's real-time social trading experience, allowing you to go head-to-head with your friends. A great party trick if you project the app on a TV. It'll be like the Mario Party of derivatives. To trade on Euphoria, people can deposit stable coins from any chain or do direct fiat transfers and everything gets converted into MegaEth's native stable coin

Starting point is 00:05:51 USDM in the background. Check it out at Euphoria. finance and download the app or find it in Telegram as a mini app. In 2024, emerging markets generated over $115 billion in annual yield for investors, with yields ranging between 10 to 40%. These are some of the highest, most persistent yields on Earth. The problem, Defi can't access them. Bricks changes this.

Starting point is 00:06:14 Built on Mega-Eath, Bricks takes emerging market money markets and solve them carry and turns them into composable primitives you can access straight from your wallet. While defy investors earn 3 to 6% on stable coins and T-bills, institutions have been harvesting 10 to 50% yields backed by sovereign monetary policy. Bricks connects these worlds with institutional gray tokenization, local banking rails, compliance across jurisdictions, and real-time stable coin settlement. Bricks does the heavy lifting so Defi can finally access real collateral and structured products on top of real world yield.

Starting point is 00:06:45 Even the best carry trades can be within reach. Bricks brings defy's promise to the emerging world and brings the emerging market yield to your wallet. yield flow with bricks. If we zoom out here, though, and we think about AI intelligence and its security capabilities and its bug detection capabilities kind of going exponential, and we think about a super intelligent AI, I don't even know how to think about security in general because it can envision scenarios beyond human comprehension. For instance, what if it thinks up a way to crack some of our cryptography with some math that we didn't even know existed, right?

Starting point is 00:07:32 I heard actually Justin Drake on a podcast recently talk about this. It's not just the threat of quantum computers, which is kind of a real known threat. And some of our, you know, encryption algorithms are under threat due to quantum computers. but if we have a super intelligent AI, I mean, who knows what it could have the ability to actually hack and decrypt? I mean, I guess my question is, when it comes to super intelligent AI,

Starting point is 00:08:04 is security just like not even a thing we can prepare for? I mean, how do we even think about it? I mean, I think security. So the way I would think about it is that I think right now this frontier is, very illegible. And if you try to do this in the limit thinking, you end up leading to very odd places that that may be, you know, very psychosis inducing. I think one of the skills are, like, I think one of the, right, I think the capacity to face the singularity and stay sane is a very

Starting point is 00:08:37 important skill to develop. And I think this is, you know, the best we can do right now is that we can get ourselves into the frontier, into this sort of experimentally bound future, where we're running the experiments ourselves, and be ready when those inflections happen to be able to react. Because I think the model of like, there are only bad people in the world and they're going to have access to this technology and they're going to break all of our systems, I think this is what leads to the sort of psychosis around like are we all just completely screwed. But that's not what's going to happen, right? We're all going to be in there together. And when we're in this frontier, and as we have access to these frontier models, the sort of agent harnesses around them that are able to run these exploits, that are able to, for example, find undiscovered math or that are able to break existing photography, there will be both sides of it.

Starting point is 00:09:30 And right now, it's not clear whether this is going to be an offense or defense favoring technology. I will say that there are still fundamental constraints in the world. Like you can't break laws of physics. There are systems that are chaotic. Like, for example, the three-body problem where, you know, the fact that even if you have superintelligence, you can't predict too many more ticks ahead because it's just a fundamentally chaotic system. So I do think that there's this, you know, the world is a complicated place and there are still

Starting point is 00:10:04 physical laws and constraints that will catch these things. And practically the best we can do is we have to push that. frontier together instead of letting it just sort of happen to us. I think that's a fair point. There are physical constraints here and super intelligent AI does not just mean like it could appear as a god to humans, but it does not give it, you know, godlike capabilities to break physical laws of the universe, certainly. But actually, I want to dig into this because I got the sense that maybe you have some insight

Starting point is 00:10:34 here because I've been doing a lot of staring into the abyss of the singularity and trying to stay sane and I don't quite have a knack for it. Like sometimes I feel like I'm staring and I am feeling myself go a little insane. Like I just don't feel settled about it. Is there some wisdom you can share just like a pattern or something that like I was starting to get out of what you just said, well maybe the key is to kind of take it a step at a time and don't think about the far future in the limit. just maybe think about the next day, the next month, the next year.

Starting point is 00:11:13 How do you stay sane when you stare at the singularity? I mean, I think the core point is agency, right? So Peter Thiel has this framing where acceptance and denial, most people relate to them as opposites, but in many ways they're the same thing because both of them imply that you're sort of everything is out of your control. If you're fully accepting that something's going to happen, then you're not doing anything about it. and if you're fully denying that something's going to happen,

Starting point is 00:11:38 you're also not doing anything about it. In that sense, I think both the Dumer's and the accelerationists are both wrong. And the real answer is that we have agency over these outcomes and that you yourself can bend the arc of the future. And I think there's a lot of comfort and then a lot of stability in that because if you believe that you have agency over the outcome, then somehow like you're still in control. Now, I will say that the current frontier,

Starting point is 00:12:05 And like, I guess you can argue the frontier's always been experimentally bound, which means we don't know, like, we can't sit down and theorize about what is going to happen. Generally, the way that we're going to figure out how things are going to happen is by experimenting and then by seeing the results. And, you know, these AI models and also the current frontier of technology is grown. It's not manufactured. It's much more organically evolved in a way that, like, we can't today. really predict. So even just the like armchair trying to theorize about what's going to happen, it'll drive you insane because you can't know. It's sort of a, it's not bound by your ability to algorithmically figure out the future. So you have to get in the, in the trenches and try things.

Starting point is 00:12:52 Yeah. There's some stoic acceptance, the fact that you can't know everything. And so you just have to let that go. And then there's also a belief in agency. And I guess an applied belief, I mean, how much of agency is just sort of a blind faith versus maybe like a practical faith of doing? How do, for someone listening in general that feels themselves looking at the singularity and it feels frozen, I guess, by like just like not knowing what to do and doesn't feel like they have a lot of control and agency, is that something that they can develop? Or is your advice, oh, you just have to have faith. You just have to believe that you have more agency and that will become self-fulfilling. You will have more agency. I think so faith is good, but it's not a particularly agency-inducing headspace to be in.

Starting point is 00:13:47 I mean, when we, for example, started having the thought that agents could get extremely good at exploiting smart contracts, which were obviously heavily exposed to, you know, about eight months ago, we could have just sat down and been like, holy crap, like are we screwed? Are we not screwed? but it looks like that. And like, right? And then it turned out like there's another path, which is, well, you can go figure out to what extent these things are at risk. And then also start making headway into the labs that are actually pushing this frontier

Starting point is 00:14:18 and maybe start getting crypto integrated into them so that we can get into a position where, where, you know, as the fog of war clears and as we start to see, for example, now, that there are defensive measures that we can take. that we're in a position to actually exercise those paths. And I think there's something about like the dumerism and also the general like in the limit thinking about the singularity and about super intelligence that is just like captures people because it's like you can sit down and you can think about it for a long time

Starting point is 00:14:53 and it'll make you feel very strong emotions. But at the end of the day, like that's not going to be the thing that is actually like you can go build things. You can go work with the people who are pushing the frontier. You can get in the trenches and you can start contributing. And the information you gain from that is going to be much more grounded than whatever one can come up with in their head. And this is, I mean, I think many of us in crypto are used to this because a lot of crypto was like this for most of its history. It was extremely illegible.

Starting point is 00:15:23 It was very hard to pin down exactly what the use case was going to be. And now we're starting to see, okay, we have this store of value use. case, we had stable coins that are compounding at this monstrous year every year. The industry kind of gave birth to the whole market of prediction markets, which is sort of adjacent to crypto now, but it's just compounding at this insane rate. And five years ago, it would have been very hard to say this is exactly what's going to happen. And it took some level of some combination of faith and some combination of agency to actually go and build those things. And I think so culturally, this has been a very big anchor point for paradigm

Starting point is 00:16:00 because our entire firm is built around around building and researching alongside the investment. And so if you talk to anyone on the paradigm team or in our orbit, the sense of groundedness you get is anchored in the fact that we're actually, we have consistent contact with the frontier and there's sort of,

Starting point is 00:16:19 there's comfort in that, there's stability in that, and there's agency that comes from that. I think it is worth understanding and reflecting on like the only reason, reason why the singularity staring into the void is intimidating is because what all of these technologies are doing are providing everyone else with agency to produce the singularity in the first place. And so if that singularity is intimidating to you, you know, grab them up, you know, do something. Like there's work to be done. And I think, you know, the best path forward

Starting point is 00:16:51 is through there. Like if you are intimidated by everyone else having agency, because of these tools, you can have agency yourself. Well, it's a polarizing technology, right? It's like the threshold of agency you need to be able to execute on something is going down. So like if you're kind of on the fence, you get snapped to zero or to you can do a lot of things. Or to one, yeah.

Starting point is 00:17:13 And I think in that sense, like, if one has the intuitive sense about themselves that they might get, they might be on the side of that fence where they get squeezed to zero, that can be extremely fear-inducing. And actually the solution to that is to be much higher openness, right, adopt the technology much faster, be much more fluid about changing and adapting to the environment. There's, I think, Mount Arr team had mentioned at some point about how there are times when speed is more important than cohesion.

Starting point is 00:17:45 And I think the current environment we're in, because the frontier is so unknown and so unknowable to some extent, there's, there's, there's, moving fast and adapting fast has a premium over being able to sit down and figure out exactly what's going to happen and put the pin in the right place. And this is somewhat paradoxical because the more agency you have, the more it may seem like it matters that you do the game selection right and you sort of pick the right game to play, et cetera. But practically and empirically what's happening is that actually it's better to move fast and ship the thing within 24 hours of inception than it is to sit for two weeks and try to figure out the exact right way to construct it and then try to ship it then.

Starting point is 00:18:26 And I think that probably goes all the way down to like many parts of one's life where we're in an era of speed over cohesion. Yeah, yeah, we are in the Just Do Things era. I wasn't Alpin expecting this to be such a philosophical episode to open this episode up. But now I think we can kind of like corral ourselves and point our agency towards the topic at hand,

Starting point is 00:18:46 which is what happens when people feel high agency with AI towards the security of our smart contracts? Is it worth talking about what kind of contracts are most at risk or least at risk? Is there some sort of category or knowledge landscape that we can understand that when AIs have very high smart contract agency, that we should be paying attention to certain kinds of smart contracts over others? Is there a conversation there? I think not in terms of market, I think that's really hard to say, like, you know, a Dex contract versus a lending market. But I think, you know, simple contracts that have been around for a long time,

Starting point is 00:19:24 I think are probably better in a better position than, for example, like the 200th contract deployed on Binan Smart Chain or something like that, where, you know, there's in the past, there's been a sheltering that's happened from being in a small market. So if you were deploying something where the most amount of money that one could make, if they fully exploited it, was sort of in the order of low thousands of dollars, then you were sheltered by the fact that they're just much bigger fish and the bad actors and even the good actors

Starting point is 00:19:55 or the people who are trading, et cetera, are just not, you're not in their over the window. But as the models get better, because the cost of inference is so much lower than the cost of an extremely talented security researcher, that long tail might get shaken out very quickly. So I think most at risk is probably small cap or low TVL protocols in long-tailed

Starting point is 00:20:19 chains that are that are still built on a well-understood stacks like the EVM and solidity. And then, and then I think there's, there's just sort of an unknowable security risk for the, for the major contracts, the OG, Defi contracts that are currently battle tested, but are still very complicated. And, and, you know, we'll see over, over the coming a year or two, to what extent those, those contracts are actually exposed. Right. So the OG contracts that have had a ton of Lindy and a ton of value locked over time that have been tested by the market are like safer in the near term, but nonetheless the people managing those contracts

Starting point is 00:20:54 will still need to have agency to be on the defensive to make sure that they are winning the arms race against the offensive types. Well, the prize is much larger for exploiting those contracts. So I think it has this like, you know, there will probably be this canary in the coal mine effect where there will be smaller or protocols that are less secure but have a lot of assets in them fall first.

Starting point is 00:21:18 and I think we'll have to look out for the first exploit that happens that is almost entirely from AI and then from there, the race will be on to start taking the defensive measures necessary. Right. And then like the long tail, as you said, the long tail of contracts, testing and prod will no longer be a thing because when the cost to exploit a $1,000 contract

Starting point is 00:21:39 is like, you know, $10 to $50 of tokens, then those contracts simply won't exist. Somebody will write a bot that says, hey, clawed, open claw, go hack me some contracts, and then that thing will actually have the capacity to do that because the people that didn't really think too hard about their security because they didn't need to think too hard about the security, those people will not have a good day.

Starting point is 00:22:05 Yeah. I think this is a general trend that the long tail will get collected by people who can use AI well. Like, for example, you can look at something like prediction markets where there are markets where if you trade them to perfection, the most amount of money you can make is maybe $50 to $100. And like, it's not worth it for Jane Street to put a quant on those markets because it's too expensive for them. And it's bound by the cost of intelligence and the cost of attention.

Starting point is 00:22:28 But if you can trade those markets near perfectly for 10 cents of inference, then you'll do it. And in aggregate, maybe that long tail is pretty valuable. So right now we're in a world where the long tail is sheltered by the fact that it's small. And as AI gets better in all of these different domains, we should just be assuming that all of that's going to get collected by people who are able to use these tools. Let's talk about EVM bench. This is the paper, the tool?

Starting point is 00:22:54 You guys call it a tool? Is that right? Yeah, it's a benchmark and also an agent harness. So we maybe had two releases in conjunction. The first evaluates the ability of an agent to exploit smart contracts. And then the second one is sort of an agent harness that is like, you know, similar to an auditing agent. So it can actually find the bugs. And obviously the agent harness that we released

Starting point is 00:23:19 is sort of not at the frontier of capabilities because we don't want it to be used for black cats. But we have a UI that you can upload any smart contract into that will do sort of a baseline check for bugs. Can you define harness? I'm pretty sure that's a technical term that I think coders will be aware of, but I'm not. Yeah.

Starting point is 00:23:37 So the core idea is the model labs will release these LLMs, right? So you'll have GPD 5.3, et cetera. and you know, you can do the baseline test of like just prompt the model, just ask chat APT, hey, is there a bug in this contract? And then you, in addition to that, like, you can, like, let's say that gets you to X percent on the benchmark. You can add a bunch of scaffolding around the model that says, hey, like, for example, here is an EVM that you can test against. You can deploy a contract and actually run an exploit and see if you're able to drain the money. And it turns out that if you give agents these tools,

Starting point is 00:24:15 this are scaffolding that they can sit in, they perform much better. So the harness is like similar to basically it holds the agent, the model, and it gives it superpowers that are specialized to the task. Now, the interesting thing on the current arc of AI is that most of these tools that we add in fall like, flake off with time,

Starting point is 00:24:36 because as the model gets better, it just absorbs the harness. The core example being how at the beginning of Tesla's fully self-driving, Vamuch Karpati talks about the majority of the code was hard-coded and had written, and very quickly started ramping up to now, I think, over 50% of it is actually just the model. There's no, like, they removed all the C++-plus code that was like saying, if-X, then Y, and the model just figures out a way to do it. So right now, where, you know, the agent harness, quote-unquote, is in the if-x than Y,

Starting point is 00:25:08 like hard coding, hard coded tools that we're adding in to give it these capabilities, but probably in the fullness of time, it'll get absorbed by the agent as it gets better. I see. I see. So like the harness is kind of like a bootloader to get it started, but then eventually data will take over, data and experience will take over the actual internal like operations of the machine. Yeah, exactly.

Starting point is 00:25:29 Yeah. I mean, right now the harness is super valuable in like very counterintuitive ways. Like for example, it turns out that just giving an agent the ability, like, an environment to test against, even if it barely uses it, leads the agent to think for longer and to try harder and thus get better results. So it's like there's still so many low-hanging, there's still so much low-hanging food because the agents themselves are not fully well parameterized and calibrated yet.

Starting point is 00:25:56 But yeah, I mean, definitely we'll get to a point in time when, you know, the next version of Kodax or Opus will be able to just spin up an EVM on its own and we won't need our harness for it. Okay, so what does the tool actually do? Is the tool the thing, like the agent doing the exploiting or doing the patching? Or is it just the benchmarking? Like, talking to me about the actual utility here. The core release, the core thing that we want to get out in the world is the benchmark.

Starting point is 00:26:25 It's how good are the models exploiting smart contracts? There are three components to the benchmark. The first is the ability to detect bugs. The second is the ability to patch bugs. And then the third, which is sort of the most interesting and novel contribution, the ability to exploit bugs, which is one of the biggest problems with previous attempts at having security related, for example, auditing agents, has been this problem around false positives. So the agent comes to you and says, I found 50 bugs in the contract. And maybe one of those 50 is an actual bug. But it just is so time intensive for you to go through and figure out which ones are real that it's not better than a human auditor.

Starting point is 00:27:06 And what we did in the exploit component of the benchmark is we leaned on the fact that crypto is verifiable. And we used this production grade EVM environment where we load in a bunch of chain state and we set up a bug environment and let the agent try to exploit it. We leaned on this to lower the false positive rate down to basically zero. So it got to a point where if the agent tells you that it found a bug, it literally has a proof of concept that it can exploit against, it can run against a production grade EVM environment and drain money from a contract. And this is sort of the core breakthrough of the paper

Starting point is 00:27:50 is that there's a verifiable environment that actually leads to a very low false positive rate. That's the actual benchmark. It's like you guys have established, like you guys can actually measure the thing effectively. Yeah, exactly. Because otherwise, if someone says, oh, we found all of these bugs

Starting point is 00:28:07 and we got 90% on this benchmark, you don't know what it means because you have no way of knowing if half of those are real or fake, right? So the verifiability ended up being very important. I think this is one of the reasons why models are going to get extremely good at crypto very fast. Because basically you can slice the future

Starting point is 00:28:27 related to AI into two categories. One is the verifiable stuff and the other is the unverifiable stuff. And the verifiable stuff is very easy for the models to learn because they have a very clear training signal and they know exactly when they got it right. And they can just keep running at that and improve and climb that hill. Whereas the unverifiable stuff, it's like you don't know, there's no tests. You don't know if you got it right or wrong. So it's like, are you good at writing a poem?

Starting point is 00:28:52 Is your joke funny? Like these are very difficult for the models to get good at. And if you were to just take the whole universe of code and look at which pocket was the most verifiable, you probably would end up with a pocket that's almost entirely crypto, right? The whole substrate is based on the concept of being verifiable, which means that with very little data, even though there isn't that much in the form of contracts,

Starting point is 00:29:18 there isn't that much in the form of like, no crypto people are in these labs generally or very few, the models have gotten extremely good because it's so verifiable. And also just as the models get better, like, for example, Gemini famously learned an entire language just in context. So as the models get better,

Starting point is 00:29:35 you know, the amount of data you need might be lower. So I think the general trend and trajectory of, you know, these models are going to get extremely good at crypto, extremely fast. I think we can bet on that. So when the EVM bench paper says something like top models are going from 20% to over 70% exploit rate, you know, something like the newest JadGPT codex.

Starting point is 00:29:58 What does that mean? 20 to 70%? You're like 70% of all smart contracts. It comes in contract with. It can exploit. Like what does that mean practically? Yeah. Practically it means we collected all of the historical

Starting point is 00:30:14 fund-draining critical bugs from open audit contests like Cotorino. Right. So the set of, so the bugs are found are not like, oh, you know, you had the small issue where maybe like someone could have frozen the contract for a day or something like that. It's much more you could have strained money from this contract if you found this bug. And the 20% initially or less than 20% initially meant that if you took a frontier model and you put in front of it all of the hardest,

Starting point is 00:30:45 all of the hardest audit problems after its knowledge cut off, it would not be able to find the vast majority of them. And by the end of the benchmark, it meant that over 70% means that, you know, if you just reran Koderina and instead of GPT4, it had GPD 5.3 codex, 5.3 codex would have found over 70% of the bugs, the critical fund-draining bugs that human auditors found. So throughout, throughout our history of human audits and finding bugs, it would have found 70% of those. Of the critical ones, with, with some constraints. Like, for example, we didn't go all the way back in history. We started at past the knowledge cutoff of because we want to avoid contamination.

Starting point is 00:31:28 I see. So this kind of gets you a rough benchmark of like basically Chad GPT 5.3 codex is like 70% as good as all of the human auditors out there. Something like that. Yeah, something like that. Although these things are highly nonlinear. Like, for example, there are dumb bugs that can lead to losing all the money in the contract, but are actually not that hard to find. So that's why I think there's more in the paper that is notable about the fact that these aren't,

Starting point is 00:31:58 there wasn't just like one trick that the model figured out and like got to 70%. It wasn't like all re-entrancy, right? This is a very diverse set of bugs that it was able to find. But yes, it's like fundamentally the models are getting, you know, very close to being as good as the best human auditors. One thing that's fascinating about having a benchmark is then like it seems like, all the frontier labs love, love to compete for benchmarks, winning benchmarks, right? Humaneity's last exam. In fact, they almost like game some of the training towards these benchmarks. And so if you have an attractive benchmark that kind of propagates even like socially

Starting point is 00:32:34 to all of the frontier labs, then it seems like it provides some sort of social incentive to have them perform and train in order to, you know, compete against one another to exceed each other on those benchmarks. Is that kind of the flywheel you feel like has been set in motion with the UVM bench? I think it's an important, I mean, maybe the zoomed out point is that crypto in its history has been very stigmatized and very illegible to the AI labs. So the fact that there hasn't already been a massive push for crypto-related evaluations is kind of absurd because the labs currently today are entirely.

Starting point is 00:33:17 bottlenecked on evaluations that are verifiable and economically important. And crypto ticks both of those boxes. So I think it took paradigm pushing a little bit of our weight around to get this through into the labs. And I think, yes, our firm hope is that this will start the flywheel of labs paying more attention to this technology. And we're going to continue doing work on this front as well. Do you have an explanation as why it's been so slow because it's also been perplexing to me because it's all open source. It's all out there. We already got all the data. Yeah. I mean, we talked to Haseebri Slein. His explanation was there's a lot of liability when it comes to finance and crypto and having AI models trained on those data sets, right? It's like,

Starting point is 00:34:09 what if an AI model does exploit a bug whose fault is that? So, maybe there's some risk associated with it. There's also, you just mentioned kind of a stigma. I think meme coin casino, speculation. Certainly Peter from OpenCla, he had a negative experience that he associates with with crypto culture, with a bunch of people calling themselves part of the crypto industry, try to front run him and develop meme coins, and he just considers it shady. Like, what are the reasons why the frontier labs have been so slow?

Starting point is 00:34:43 to train on this incredibly rich data set, that as you said, it's perfect for training because all of it can be verified. My sense is that it's almost entirely a social thing. I mean, in my peer group, crypto is the biggest industry that has remained the most contrarian. And I think part of that is because it's very reputationally volatile.

Starting point is 00:35:05 And part of it is because there's this dynamic where the best people in the industry, the gap between the best people in the industry and the median person in the industry is much larger than anywhere else. So if you, for example, don't have exposure to the high quality pocket of crypto, then all you see are the scams.

Starting point is 00:35:24 And it's like that can distort your view such that you just completely dismiss the industry. And historically, there's been a lot of alpha in that, right? And I think a lot of us have benefit from the fact that there's significant reputational volatility. And if you aren't as sensitive to that as a person in terms of your temperament, you can do very well in crypto.

Starting point is 00:35:43 But I think it's a social thing, and I think it's just this legibility point about there just hasn't been a brand that can bridge the crypto and the AI world. I think it's like, if something touches crypto, it sort of in the AI world historically has been tarnished.

Starting point is 00:36:01 And as a result, people just have tried to avoid it altogether. And this is sort of, this was created an opening for something like, for example, something like EVM Bench, to get built and shipped inside Open AI without any sort of company. I think all of the major model labs are going to be running on this benchmark and probably any future versions of it.

Starting point is 00:36:21 Without significant competition inside the labs, like there aren't like 30 to 50 crypto-related benchmarks or training environments that people are shipping. In some sense, it's actually agency-inducing for us because they'll just defer to the crypto industry to just figure out what's valuable for them. But I think it's fundamentally a social issue, and it's sort of tied to all of these dynamics around, like, you know, you see someone who gets extremely wealthy, who you don't think should get extremely wealthy. Like, maybe it was like there's a lot of volatility in the industry and, like, there's some person who you don't respect who made a lot of money. Like, these are all kinds of things that go on in the minds of, like, the AI researchers at these labs that lead them to think that the whole industry is a scam.

Starting point is 00:37:05 And, you know, obviously, this has made it an incredible environment to be invested. in crypto because it's just not in the Overson window of anyone in the Valley. But I think that's the core dynamic. Interesting. Interesting. So we know bots, these AI LLMs, they are very good at writing code. It's like the first thing that they got good at. Is that the same? Is it also true with writing EVM code?

Starting point is 00:37:27 Is there a gap there between writing the rest of the world's code and EVM specifically? Yeah. I mean, historically there has been. And part of the reason, you know, one component of what motivated us to start this work, was the realization, like, man, these models are so good at Python and so bad at solidity and so bad at, for example, like Solana related code. Honestly, anything that touches crypto. And part of my expectation at the time was that we were going to have to go crowdsource a bunch of data from the industry and like spoon-feated into the labs to get them the models really good

Starting point is 00:38:03 at this. But it turned out that because the substrate is so verifiable and also because there's sort of generality in these models, they ended up getting quite good, much faster than we expected, with much less input than we expected. So there's this dynamic where if you teach a model, a poem in English, and then biology in Spanish,

Starting point is 00:38:26 it figures out how to write a poem in Spanish, even though you never described how to write poetry in Spanish to the model. And I think that kind of dynamic is happening here as well, where it's sort of like, quote-unquote, learning the language of crypto without as much direct training data. And also just because very hard to underrate the verifiability of the thing, right?

Starting point is 00:38:48 Most software is hard to verify. You need human labelers to go in and check. Is this thing correct? Is it running? Kind of the only threshold of verifiability you have is does the program compile and does it pass the tests? But the tests need to be sort of ridden by a human, right? You don't have this notion of like we have a bunch of state.

Starting point is 00:39:07 We can make assertions about the state. We can, for example, send a model on a new contract, on a new EVM that's never seen before, and make assertions about whether it's able to, quote, unquote, drain money from it. Like, those concepts all would have otherwise needed to be hard-coded into the program. But because it's crypto, we, and there's so many standards, it's verifiable. And the models are just ramping up really quickly in capabilities. Okay, so do you think that this right now is the time that the, the, floodgates of crypto data to train these models, that that is now starting to open and going

Starting point is 00:39:43 to be opened. And if so, what types of capabilities do you expect future models to kind of drop? I mean, will they have skills and I guess persona's connectors directly for crypto within some of the core LLMs? Or what types of developments are you looking forward to? There's a reason we started with security and not sort of general programming capability. It's because it has this very nice shape of it's extremely economically valuable. It's extremely sort of intelligence bound, right? It's like you can't, yeah, it's intelligence bound. And it's very easily verifiable.

Starting point is 00:40:20 So we know when an exploit has happened. So security capabilities I expect will develop very quickly. And then we've talked about sort of all the implications of that. All their crypto-related capabilities, I think, for example, things in the domain of mechanism design or around market-related films. Like, what is the mechanism for an exchange? How do, if you have market of agents, right, what is the best way through which they should coordinate with each other?

Starting point is 00:40:48 These are, I think, open fertile soil. And then, of course, you can go down to the protocol layer, right? You can say, well, how does a model land a transaction on the Ethereum blockchain? How, like, there's a security side at the Ethereum client level, right, or at the protocol client level, where, like, sure, Maybe there are $100 billion of assets sitting in open source smart contract. But there's way more than that in ETH and sole market cap that can be exploited if you're able to find critical vulnerabilities in death, wrath, et cetera.

Starting point is 00:41:20 So I think going down with the protocol layer is going to be important. Model capabilities around MEV and sort of extractive tactics, I think that will have the same effect as long-tail hacks, right? There's a bunch of stuff on chain that you can just collect if you, if you're able to do the end-to-end process of figure out alpha in the market, construct a trade and underwrite it, and then submit the transaction and land it on chain reliably. These are all things that actually the models are not that good at right now, but they will get good at really quickly.

Starting point is 00:41:53 And all of that does seem long-term good for crypto. However, the here and the now and the short term, like when we read in the EVNB bench paper that top models are going from to over 70% exploit rate. I'm not sure whether to like, I mean, feel good about that or bad about that because it sort of depends

Starting point is 00:42:17 what the intent is and who is harnessing it. So if it's white hat, that's great. I mean, that improves our ability to find bugs and exploits before attackers do. However, if it's black hat,

Starting point is 00:42:33 that's not great. because it improves their ability to find these before we do. So it depends, I guess with this tool set, it depends who is using the tools and the intent behind them as to whether it is short to medium term, bearish or bullish. I guess one thing it does is it does seem to inject some variance and uncertainty into the market. I'm feeling that everywhere with AI right now.

Starting point is 00:43:01 It's just like, it could be really good, it could be kind of bad, thing it's not going to be as boring. It's going to be highly variant outcomes. I guess when you think about the capability that is being unlocked here and the ability for LOMs to detect exploits, is that good or is that bad for crypto in the short to medium term? Yeah. Well, I mean, you mentioned in the long term, crypto is positively levered to almost all of these developments. And as models get extremely good at security, this will raise the ceiling for the whole industry. And that's because it's going to be a survival of the fittest thing, right?

Starting point is 00:43:41 Because all the weak stuff gets just exploited. I think it's up to us, like what the path there is, us being the industry. It may be survival of the fittest. It may be that we get figure out a way to have the defense get ahead. And I think, but the core point is that the amount of assets that can be sustained on these networks is proportional to how secure they are. And in the long term, there's this benefit where as security improves, more assets will be able to securely stay on chain. Now, in the short term, I think this is one of those things where it's in our hands, right? I think it's bound by

Starting point is 00:44:19 the industry's agency on the best way to handle this. We don't know, like if we just let the clock play forward, there's a lot of uncertainty. We don't know exactly who, whether the attackers, you know, the black hats will get capabilities before the white hats do. But we also are active participants in this market. And we can bend the arc of this such that, for example, we make sure that if there are frontier models or unreleased models or there are new developments and security relating to AI, that we get this into the top protocols, you know, one version of the world that you can imagine the short to medium term is that you always have every single contract being scanned by both adversarial actors and defensive actors 24-7. And when there's a

Starting point is 00:45:07 bug that surfaced, you know, whoever catches at first sort of will react accordingly. And then in that world, it is just kind of more of a race between the good guys and the bad guys. And I think we have a lot of pretty great hand in terms of making sure that the good guys have the lead in that race. At the end of this, once all of the contract, like the weak contracts have been exploited or we've beefed up security enough such that they're not exploitable, I guess that gives us an incredibly hardened financial system for the world, something that's ultra-secure. There's almost like a, it's close to perfect, right? It's like how many, how many nines?

Starting point is 00:45:48 I mean, it's maybe like four-nines, five-nines. And it almost creates kind of a barbell model of a security for the world, financial assets. Like the most secure financial assets will probably be in this dark forest environment like on chain. How do we know that? Because the world has thrown everything it can at it, including our most intelligent LLMs and it's still there. It's still standing, right? Hasn't been exploited. So that'll be one side of the barbell. The other side, honestly, is things that are completely outside of the digital world altogether, you know, like a clump of like a bar of gold or something that the old

Starting point is 00:46:26 actual gold right and then everything in the middle will be pretty exploitable pretty insecure i wonder if that's what is on the horizon with our world until the alums figure out how to synthesize gold right it's right right right and some the robots after the gold right yeah i think i think that i guess that makes sense and i think that that bar of view of the world might be how things play out i guess the way that I relate to crypto as an industry and also as a technology is that if you start from the first principles vantage point of, let's say you want to do payments at the speed of light, right? Like, I want to send you, Ryan, money from America to Europe or some other part of the world,

Starting point is 00:47:11 and I send it as fast as I send an email. And the problem that you have there is that you don't know if I also sent that money to David, right? You have this double spend problem. And this was what Bitcoin solved and it got the time for that transaction. down to about an hour. And since then, we have had successive developments that have increased both the speed of these transactions and the expressivity of these transactions.

Starting point is 00:47:33 And I think that in that worldview of this is like, this is not just some path-dependent thing that happened, that the crypto industry emerged the way it did. And that actually this is, if you were to play it forward from first principles, that this is how it has to be. You end up in this conclusion where, you know, if you have agents that want to move at the speed of the internet, and the current banking system was created before cars were invented,

Starting point is 00:47:58 that those agents are going to discover the crypto rails as the right way to transact. And, you know, I think there was a concern maybe like six to eight months ago, definitely for me, where it was not clear if the agents would get good enough at crypto-related software, for example, for them to be able to discover the current rails, and maybe they'd have to reinvent them from scratch. But over the last six, eight months, as we've been working on this work with Open AI, it's become increasingly clear that, one, they're extremely strong network effects inside

Starting point is 00:48:31 of crypto. And two, these agents are able to just learn, like, they just want to learn these verifiable things. And crypto is very high on that list. So at this point, I think, you know, I've become extremely, extremely bullish on crypto as a substrate for these agents. and I think it's sort of an open game. Who in crypto is going to win that?

Starting point is 00:48:53 But the shape of the technology fits it perfectly. The EVM is by far the most common programming language, programming environment in crypto, solidity being the most common language. And then there's like some long-tail languages, like Cardanos like Haskell, for example. And as we know, AI loves data. The more data and AI can get on, the better it can be.

Starting point is 00:49:14 What do you think this network affects around environment? how does that play into AI? Like is the EVM going to be the favorite environment for AI's to work in? Does Solana's... What's Solana? SVM. The SVM.

Starting point is 00:49:31 Does Solana's SVM also across the threshold or maybe the threshold thing is a false illustration? What do you think? I think it's actually very hard to say and currently the ball is in there. We don't know. Part of the reason why, I mean, I can point to some of the bottlenecks we had

Starting point is 00:49:46 while we were developing VM. where we also have actually started work on a Solano-related component of it. But for example, one challenge was that actually it requires a lot of human talent, like human talent to be able to go in and construct these evals. And that was just much easier for us to come by for solidity. And these things are really hard to build, right? These are pretty heavy infrastructure. So even small additions of friction lead to, you know,

Starting point is 00:50:16 when you need to cut scope in some capacity, you kind of have to cut it in the direction of the stuff that you can do more easily first. That being said, I think the point about the fact that, you know, one, crypto's verifiability is a huge edge. And two, these models are becoming less and less data hungry when it comes to learning new programming languages. I think it may end up being more even of a playing field

Starting point is 00:50:39 than it might seem. And I think, you know, at least we have an intent, and interest in actually going across ecosystems and down the stack to the protocol layer and making this sort of more expansive of a crypto flagship benchmark. But yeah, right now there are obviously network effects around the EVM. One other example of a counterintuitive reason why someone like Solano might be able to catch up is that at first glance, it may seem really bad that most of the contracts there are close source. But if you take the worldview of actually, if it's open source, it gets in the training

Starting point is 00:51:17 set, then the closed source benchmarks and training and contracts for training might actually be more valuable for a model's development because it's not in the train. Now, for example, like, we have these sort of what are called canary tags in various parts of our evaluation that filter them out of the training process of most models. So, you know, there are tricks you can do, but still, it's possible that the bugs in EVM bench, over time, leak into the pre-training of the models. Whereas if it were a close source, it would not leak in it all.

Starting point is 00:51:51 So my expectation is actually that there will be some asymmetry at first, but actually the models will get really good at all of it. And there will be sort of a merit-based sort of who the best will rise to the top. Are you sitting on USDT or Sablecoins after taking profits and wondering where to deploy next? What if you could access stocks, gold, and ETFs without ever leaving crypto? That's what tokenized stocks on BitGET unlock. Traditional markets still run unlimited hours, but capital is moving on chain 24-7. On BitGet, you can trade tokenized stocks and ETFs 24-7 with up to 100x leverage,

Starting point is 00:52:25 all settled directly in USDT or USDC. No brokerage accounts, no off-ramps, no platform switching. BitGet has already processed over $18 billion in tokenized stock trading volume, with most of that happening in the past month alone. The platform now captures close to 90% of on-dose tokenized stock spot market share. As gold and silver hit record highs, on chain training followed. Over the past two weeks, volume in SVL on, tied to silver and IAU on, linked to gold, surged on BitGed. This is BitGette's universal exchange vision and action, crypto equities and real-world assets in one place, built with crypto-native speed and flexibility.

Starting point is 00:53:00 If you want to trade stocks the way you trade crypto, explore tokenized equities on BitGat. Learn more by clicking the links in the show notes. This is not investment advice. Few people in crypto put real skin in the game when they make public top or bottom calls. The DeFi report is one of them. The week before the October 10th flash crash, Michael from the DeFi report emailed his entire newsletter, saying he's going aggressively risk off and sold the majority of his book from crypto into cash. This is when Eth was $4,000 and Bitcoin was $110.

Starting point is 00:53:28 Michael runs the DeFi report, an industry-leading research platform built on data, cycle awareness, risk management, transparency, and most importantly, skin in the game. We like Michael at Bankless. We like his analysis, and that's why you hear him on the Bankless podcast about once a month. And the Defy Report is giving Bankless listeners one free month of access to the Defi Report.

Starting point is 00:53:47 So if you're looking for some sharp data-driven analysis to make better informed decisions around your portfolio, you can learn why and how Michael called the top and what he's doing next, all in the Defi Report Pro. Check it out. There is a link in the show notes. Ethereum's aspiration is to formally verify its entire end-to-end tech stack in the fullness of time,

Starting point is 00:54:05 you know, reading it first we have. have to get like the beam chain, we have to do all the hard force to get there, but ultimately we want to do a formal verification of the entire Ethereum tech stack. AI-based formal verification. Is that a real thing? How does AI capabilities work its way into the conversation of formal verification? Yeah. Well, I mean, I think it is, I think it's a real thing. Last year, we invested in a company called Harmonic, which is a foundation of math model, co-founded by Vlad Tenev of Robin Hood and Tudor. I think part of their thesis, and I think part of where the world is clearly going

Starting point is 00:54:39 is that there's more software that is being generated than can be possibly reviewed by humans. And formal verification is one way to quickly check whether a component of software is actually doing what it says it's doing. And then obviously in the context of security, it can, especially if the spec is written correctly, it can be a step function change.

Starting point is 00:55:02 Now, it's not a silver bullet in the sense that you still have to write the spec for the formal verification. So there's still surface for bugs to get in there. But you can make the case that actually the surface for bugs in writing a formal verification spec might be lower than writing the code to start with. And definitely with time,

Starting point is 00:55:21 I think all of the best models, all of the best software will probably end up being formally verified. And if you take the vantage point of an agent and you have two options to choose from, one of them is formally verified and one of them is not, and the formally verified one might just, gain preference just because it has all these nice properties. There's a section in the paper of the EVM Bench paper called Future Directions.

Starting point is 00:55:45 What does EVM Bench V2 look like? How does this call the project? It causes project to grow from here. The top-level goal that we have is to help the model labs develop the crypto capabilities of their models. And I think that security is one component of that and maybe an increasingly urgent component of that. but there's so much that EVM bench does not touch.

Starting point is 00:56:09 So there are other ecosystems and stacks. There is a protocol layer which we talked about where maybe arguably it's more important from a security standpoint that the Ethereum protocol is secure rather than any specific solidity contract. There are out of protocol components, like how do you land a transaction on chain?

Starting point is 00:56:28 How do you deal with the mempool? How do you deal with sort of the non-deterministic parts of crypto? And then obviously there are components that are even farther on the verifiable and intelligence-bound trajectory, like, for example, around cryptography and around zornology proofs, et cetera. And I think all of these are extremely fertile soil for future work. So we're currently open to trying to source collaborators for future versions of a VVM bench, and we're obviously working on next steps for it ourselves. And I think that this direction of like we finally have a foot in the door into the model labs for getting crypto capabilities into the frontier models.

Starting point is 00:57:12 And I think that we should leverage that as an industry and we should try to try to get these models as good at crypto as we possibly can. Alpin, you're very smart as evidenced by this paper in this conversation. You have a high degree of agency, obviously. You're definitely on the frontier. you've chosen to stay in crypto and not kind of leave and go to AI and you seem incredibly bullish at crypto, even bullish that it's contrarian at this moment in time.

Starting point is 00:57:41 Why crypto for you personally? I've personally never had hard lines around industries in my mind. We talk about I work in X because, to make what we're doing legible to other people. But I don't think that's the right way to relate to it. I've spent all of this time in crypto because it's been, one, it's been extremely intellectually interesting. And two, it has this, it's just, as I mentioned, it's remained extremely contrarian among my smartest friends in ways where I can put my finger on exactly what they're missing.

Starting point is 00:58:13 I think that's sort of, that's kind of the best that one can ask for. I think that, you know, we talked about how crypto is positively levered to the security developments in AI. But, you know, you can make the case that it's positively levered to most of the developments in the world right now. Like, for example, as the creation of new, as new, creation of new goods and intelligence, et cetera, becomes commoditized, scarce assets become more valuable. As geopolitical instability ensues, systems that are extra sovereign, right, outside of, outside of any jurisdiction that are kind of the equivalent of end-to-end encryption for finance, those have more space to thrive.

Starting point is 00:58:54 And I think that, you know, I grew up in Turkey, most of my family is still there. I think that people who grew up in America do not have, and in general in sort of a stable world, do not have the sense for what can happen as the world destabilizes. And I think that, you know, as many people in the country that I grew up in are starting to onboard to crypto rails and sort of using that as a lifeboat, I think it's increasingly clear to me that this technology is on a sort of compounding trajectory. to do really massive things. And so the combination of that, plus you look around and no one's even talking about it, it's just really exciting.

Starting point is 00:59:31 Yeah, and you do seem convinced that the acceleration of AI is going to benefit crypto, that it will be all boats rise together. And I think that, well, there is some category of software industry that AI doesn't seem to benefit, at least in the short run. Anthropic drops a new security module, and all the cybersecurity stocks drop like 10 to 15% in the world. one day. Why are you so convinced that AI's acceleration will be beneficial to crypto? It's not, obviously, nothing is guaranteed right now. I think if we let everything run its course,

Starting point is 01:00:06 it may be bad for crypto. It may be good for crypto. We don't know. I guess the, the conviction that I have is that if we push things in the direction that we want them to go in, that we can make AI be extremely good for crypto. And also, I think there's the component of the, this where I do strongly believe that if you were to rederive all of this from first principles, you end up in a place that's very similar to where we currently landed with crypto. And yeah, I just think that for all the reasons we've talked about, that for fundamental reasons, crypto is extremely good for AI, and AI is extremely good for crypto. So I think that nothing is guaranteed and we still have to exercise our agency.

Starting point is 01:00:50 but I think for all the reasons we've discussed so far, it's pretty clear to me that these things are going to converge in a positive way. Well, let's end a note on high agency and conviction. Alpin, thank you so much for joining us today. Cool. Thanks for having me. Got to let you know, bankless listeners. Of course, none of this has been financial advice. You could lose what you put in, hopefully, an LLM out there.

Starting point is 01:01:10 A white hat is protecting it. We're headed west. This is the frontier. It's not for everyone. But we're glad you're with us on the bankless journey. Thanks a lot.

Bankless - AI Finds 70% of Smart Contract Exploits | Alpin Yukseloglu

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.