Bankless - AI Finds 70% of Smart Contract Exploits | Alpin Yukseloglu
Episode Date: March 5, 2026AI is getting dangerously good at smart contract security. Faster than crypto is ready for. Alpin Yukseloglu joins Bankless to break down EVMBench (built with OpenAI), a benchmark testing whether AI a...gents can detect, patch, and exploit real fund-draining bugs and why the jump from ~12–13% exploit-finding to 70%+ could rewrite today’s security assumptions. We unpack what that “70%” really means, why crypto’s verifiability is an ideal training ground, why AI labs haven’t prioritized crypto data yet, and what a 24/7 blackhat vs whitehat AI arms race means for DeFi. --- 📣SPOTIFY PREMIUM RSS FEED | USE CODE: SPOTIFY24 https://bankless.cc/spotify-premium --- BANKLESS SPONSOR TOOLS: 🔮POLYMARKET | #1 PREDICTION MARKET https://bankless.cc/polymarket-podcast 🪐GALAXY | INSTITUTIONAL DIGITAL FINANCE https://bankless.cc/galaxy-podcast ⚡ EUPHORIA | REAL-TIME ONE-TAP TRADING https://bankless.cc/euphoria 🌐BRIX | EMERGING MARKET YIELD https://bankless.cc/brix 🏅BITGET TRADFI | TRADE GOLD WITH USDT https://bankless.cc/bitget 🎯THE DEFI REPORT | ONCHAIN INSIGHTS https://thedefireport.io/bankless --- TIMESTAMPS 0:00 AI’s exploit leap: 12% → 70% and the “Superhuman auditors” 7:02: Staring at the singularity without losing your mind 10:31 Agency » doom: the Thiel framing 19:10 What’s most at risk (and what’s safer) 23:37 What EVMBench actually is (benchmark + harness) 27:03 Why exploiting is the key: killing false positives 29:24 AI gets “good at crypto” fast: verifiability 30:56 What “70% exploit rate” really means 33:32 Why AI labs avoided crypto (it’s not technical) 43:38 Blackhat vs whitehat: how the race plays out 47:21 Agents and “payments at the speed of light” 51:02 EVM vs Solana: network effects 56:18 AI formal verification as an endgame 58:06 EVMBench V2: expanding the frontier 59:54 Why Alpin stays in crypto --- RESOURCES Alpin Yukseloglu https://x.com/0xalpo EVMBench https://paradigm.xyz/evmbench --- Not financial or tax advice. See our investment disclosures here: https://www.bankless.com/disclosures
Transcript
Discussion (0)
Bankless Nation, we are here with Alpin Yuxelolu. He is an investment and a research partner at Paradigm.
Also the co-author of the paper titled EVM Bench, an open benchmark for smart contract security agents written in collaboration with OpenAI to measure the ability of AI agents to just detect or patch or exploit smart contract vulnerabilities.
We're going to talk about the way that AI and AI capabilities are going to impact our crypto ecosystem, our smart contracts.
Alpin, welcome to bankless.
Hi, thanks for having me.
I want to start off the question with this podcast with a very big question.
How at risk are we from AI?
How large of a threat does AI smart contract capabilities pose to our industry?
Yeah, I mean, in the long term, it's now increasingly clear that AI is going to be extremely, extremely good for crypto,
because especially on the security front, because we're going to get to a world where because because everything is much more secure,
the ceiling on the industry is much higher.
So our partner Matt talks about how if you have a grocery store that's run by mom and pop,
because they can't see everything in the store, there's a limit to how big they can get.
But the moment you add security cameras in, so security has this effect of increasing the capacity,
the carrying capacity of an industry.
I think in the short term, it's up to us because the models are getting extremely good,
like strikingly good.
When we started working on EVM bench, which is a benchmark that consists entirely of
fun-draining critical bugs around six months ago,
the models were able to find less than 20% of the bugs,
like around 12 to 13%.
And just over the course of while we were working on the benchmark,
this number went up to over 50%.
And in between when I drafted the launch tweet
and when I had to actually hit send with the release of G, 5.3 Kodaks,
it jumped up to over 70%.
So these things are just growing at a,
blistering pace, and it's very important that we position the industry in a way that we can
defensively protect against attacks. But in the long term, I think it massively increases the
carrying capacity of crypto. Yeah, I think what you're saying is in the long term, we get something
approaching perfect security. Yeah, right now we do not have perfect security. Let me ask you the same
question, but a little bit differently. Say only bad actors, only black hat actors have access to
AI capabilities.
In that context, how at risk is our industry?
Like, how exploitable are our smart contracts given the increase in AI capabilities?
Yeah, I mean, I think it's really hard to say when we approach super intelligence levels.
I do think until we hit the, like right now, the models are quite good, but they're not
better than the best human auditors.
So we already have existed in crypto under this threat model of extremely intelligent adversarial
actors that are constantly trying to break all of our software with all the money in it.
So in that sense, like, crypto is already quite hardened.
But it's just really hard to know when we talk about sort of a technology inflecting into
superintelligence.
This is very similar to how encoding capabilities were increasing mostly linearly over
the last several years.
And in December last year, they crossed some threshold where they were better than sort of
the median engineer.
And a lot of stuff clicked for every.
everyone and it started becoming this aha moment and this sort of oh crap moment.
And I think something very similar will probably happen with security where right now it's
like increasing pretty rapidly at a, still at a linear clip.
But it's not as good as the best human auditors yet.
So we don't feel it yet.
It hasn't actually broken any of our assumptions.
But once we hit in maybe six to eight months, I'm pretty confident at this point by the
end of the year, a superhuman AI auditor, this will just completely break all of our assumptions
and we'll need to go back and make sure that we're.
hardening all of the contracts that are housing the what nearly $100 billion of assets in crypto.
Galaxy operates where digital assets and next generation infrastructure come together, serving institutions
end to end. On the market side, Galaxy is a leading institutional platform, providing access to
spot, derivatives, structured products, defy-lending, investment banking, and financing. With more than
1,600 trading counterparties, Galaxy helps institutions navigate every phase of the market cycle.
The platform also supports long-term allocators through actively managed strategies and institutional
grade staking and blockchain infrastructure. That scale is real. Galaxy has over $12 billion
in assets on the platform and averaged a $1.8 billion loan book in late 2025, reflecting
deep trust across the ecosystem. Beyond digital assets, Galaxy is also building infrastructure
for an AI-powered future. Its Helios Data Center campus is purpose built for AI and high-performance
computing, with more than 1.6 gigawatts of approved power capacity, making it one of the largest
sites of its kind. From global markets to AI-ready data centers, Galaxy is serving the digital
asset ecosystem end to end.
Explore Galaxy at galaxy.com
slash bankless or click the link in the show notes.
Euphoria brings one tap trading
to the palm of your hand. Built on MegaEath,
Euphoria takes real-time price charts
and projects it over a grid of squares.
You tap the squares that you think the price
will enter in just five to 30 seconds in the future.
If the price goes into that quadrant,
you can pocket anywhere between 2 and
100x your trade. No other
application helps you trade faster and with more
leverage on market driving events
like FOMC meetings, presidential speech,
or global macro events.
Thanks to MegaEth's real-time blockchain,
Euphoria is the way to get real-time
price interactions with the market.
On Euphoria, you'll be able to compete with friends
using Euphoria's real-time social trading experience,
allowing you to go head-to-head with your friends.
A great party trick if you project the app on a TV.
It'll be like the Mario Party of derivatives.
To trade on Euphoria, people can deposit stable coins
from any chain or do direct fiat transfers
and everything gets converted into MegaEth's native stable coin
USDM in the background.
Check it out at Euphoria.
finance and download the app or find it in Telegram as a mini app.
In 2024, emerging markets generated over $115 billion in annual yield for investors,
with yields ranging between 10 to 40%.
These are some of the highest, most persistent yields on Earth.
The problem, Defi can't access them.
Bricks changes this.
Built on Mega-Eath, Bricks takes emerging market money markets and solve them carry
and turns them into composable primitives you can access straight from your wallet.
While defy investors earn 3 to 6% on stable coins and T-bills, institutions have been harvesting
10 to 50% yields backed by sovereign monetary policy.
Bricks connects these worlds with institutional gray tokenization, local banking rails,
compliance across jurisdictions, and real-time stable coin settlement.
Bricks does the heavy lifting so Defi can finally access real collateral and structured
products on top of real world yield.
Even the best carry trades can be within reach.
Bricks brings defy's promise to the emerging world and brings the emerging market yield to your wallet.
yield flow with bricks.
If we zoom out here, though, and we think about AI intelligence and its security capabilities
and its bug detection capabilities kind of going exponential, and we think about a super intelligent
AI, I don't even know how to think about security in general because it can envision
scenarios beyond human comprehension.
For instance, what if it thinks up a way to crack some of our cryptography with some math that we didn't even know existed, right?
I heard actually Justin Drake on a podcast recently talk about this.
It's not just the threat of quantum computers, which is kind of a real known threat.
And some of our, you know, encryption algorithms are under threat due to quantum computers.
but if we have a super intelligent AI,
I mean, who knows what it could have the ability
to actually hack and decrypt?
I mean, I guess my question is,
when it comes to super intelligent AI,
is security just like not even a thing we can prepare for?
I mean, how do we even think about it?
I mean, I think security.
So the way I would think about it is that I think right now
this frontier is,
very illegible. And if you try to do this in the limit thinking, you end up leading to very
odd places that that may be, you know, very psychosis inducing. I think one of the skills are,
like, I think one of the, right, I think the capacity to face the singularity and stay sane is a very
important skill to develop. And I think this is, you know, the best we can do right now is that
we can get ourselves into the frontier, into this sort of experimentally bound future,
where we're running the experiments ourselves, and be ready when those inflections happen
to be able to react. Because I think the model of like, there are only bad people in the world
and they're going to have access to this technology and they're going to break all of our systems,
I think this is what leads to the sort of psychosis around like are we all just completely screwed.
But that's not what's going to happen, right? We're all going to be in there together.
And when we're in this frontier, and as we have access to these frontier models, the sort of agent harnesses around them that are able to run these exploits, that are able to, for example, find undiscovered math or that are able to break existing photography, there will be both sides of it.
And right now, it's not clear whether this is going to be an offense or defense favoring technology.
I will say that there are still fundamental constraints in the world.
Like you can't break laws of physics.
There are systems that are chaotic.
Like, for example, the three-body problem where, you know, the fact that even if you have
superintelligence, you can't predict too many more ticks ahead because it's just a fundamentally
chaotic system.
So I do think that there's this, you know, the world is a complicated place and there are still
physical laws and constraints that will catch these things.
And practically the best we can do is we have to push that.
frontier together instead of letting it just sort of happen to us.
I think that's a fair point.
There are physical constraints here and super intelligent AI does not just mean like it could
appear as a god to humans, but it does not give it, you know, godlike capabilities to break
physical laws of the universe, certainly.
But actually, I want to dig into this because I got the sense that maybe you have some insight
here because I've been doing a lot of staring into the abyss of the singularity and trying
to stay sane and I don't quite have a knack for it.
Like sometimes I feel like I'm staring and I am feeling myself go a little insane.
Like I just don't feel settled about it.
Is there some wisdom you can share just like a pattern or something that like I was starting
to get out of what you just said, well maybe the key is to kind of take it a step at a time
and don't think about the far future in the limit.
just maybe think about the next day, the next month, the next year.
How do you stay sane when you stare at the singularity?
I mean, I think the core point is agency, right?
So Peter Thiel has this framing where acceptance and denial, most people relate to them
as opposites, but in many ways they're the same thing because both of them imply that
you're sort of everything is out of your control.
If you're fully accepting that something's going to happen, then you're not doing anything
about it.
and if you're fully denying that something's going to happen,
you're also not doing anything about it.
In that sense, I think both the Dumer's and the accelerationists are both wrong.
And the real answer is that we have agency over these outcomes
and that you yourself can bend the arc of the future.
And I think there's a lot of comfort and then a lot of stability in that
because if you believe that you have agency over the outcome,
then somehow like you're still in control.
Now, I will say that the current frontier,
And like, I guess you can argue the frontier's always been experimentally bound, which means we don't know, like, we can't sit down and theorize about what is going to happen.
Generally, the way that we're going to figure out how things are going to happen is by experimenting and then by seeing the results.
And, you know, these AI models and also the current frontier of technology is grown.
It's not manufactured.
It's much more organically evolved in a way that, like, we can't today.
really predict. So even just the like armchair trying to theorize about what's going to happen,
it'll drive you insane because you can't know. It's sort of a, it's not bound by your ability to
algorithmically figure out the future. So you have to get in the, in the trenches and try things.
Yeah. There's some stoic acceptance, the fact that you can't know everything. And so you just
have to let that go. And then there's also a belief in agency. And I guess an applied belief,
I mean, how much of agency is just sort of a blind faith versus maybe like a practical faith of doing?
How do, for someone listening in general that feels themselves looking at the singularity and it feels frozen, I guess, by like just like not knowing what to do and doesn't feel like they have a lot of control and agency, is that something that they can develop?
Or is your advice, oh, you just have to have faith.
You just have to believe that you have more agency and that will become self-fulfilling.
You will have more agency.
I think so faith is good, but it's not a particularly agency-inducing headspace to be in.
I mean, when we, for example, started having the thought that agents could get extremely good at exploiting smart contracts,
which were obviously heavily exposed to, you know, about eight months ago, we could have just sat down and been like, holy crap, like are we screwed?
Are we not screwed?
but it looks like that.
And like, right?
And then it turned out like there's another path,
which is, well, you can go figure out to what extent these things are at risk.
And then also start making headway into the labs that are actually pushing this frontier
and maybe start getting crypto integrated into them so that we can get into a position
where, where, you know, as the fog of war clears and as we start to see, for example, now,
that there are defensive measures that we can take.
that we're in a position to actually exercise those paths.
And I think there's something about like the dumerism
and also the general like in the limit thinking about the singularity
and about super intelligence that is just like captures people
because it's like you can sit down and you can think about it for a long time
and it'll make you feel very strong emotions.
But at the end of the day, like that's not going to be the thing that is actually
like you can go build things.
You can go work with the people who are pushing the frontier.
You can get in the trenches and you can start contributing.
And the information you gain from that is going to be much more grounded than whatever one can come up with in their head.
And this is, I mean, I think many of us in crypto are used to this because a lot of crypto was like this for most of its history.
It was extremely illegible.
It was very hard to pin down exactly what the use case was going to be.
And now we're starting to see, okay, we have this store of value use.
case, we had stable coins that are compounding at this monstrous year every year. The industry kind
of gave birth to the whole market of prediction markets, which is sort of adjacent to crypto now,
but it's just compounding at this insane rate. And five years ago, it would have been very hard
to say this is exactly what's going to happen. And it took some level of some combination of faith
and some combination of agency to actually go and build those things. And I think so culturally,
this has been a very big anchor point for paradigm
because our entire firm is built around
around building and researching
alongside the investment. And so
if you talk to anyone on the paradigm team or in our orbit,
the sense of groundedness you get
is anchored in the fact that we're actually,
we have consistent contact with the frontier
and there's sort of,
there's comfort in that, there's stability in that,
and there's agency that comes from that.
I think it is worth understanding
and reflecting on like the only reason,
reason why the singularity staring into the void is intimidating is because what all of these
technologies are doing are providing everyone else with agency to produce the singularity in the
first place. And so if that singularity is intimidating to you, you know, grab them up,
you know, do something. Like there's work to be done. And I think, you know, the best path forward
is through there. Like if you are intimidated by everyone else having agency, because of these tools,
you can have agency yourself.
Well, it's a polarizing technology, right?
It's like the threshold of agency you need
to be able to execute on something is going down.
So like if you're kind of on the fence,
you get snapped to zero or to you can do a lot of things.
Or to one, yeah.
And I think in that sense, like,
if one has the intuitive sense about themselves
that they might get,
they might be on the side of that fence
where they get squeezed to zero,
that can be extremely fear-inducing.
And actually the solution to that is to be much higher openness, right, adopt the technology much faster, be much more fluid about changing and adapting to the environment.
There's, I think, Mount Arr team had mentioned at some point about how there are times when speed is more important than cohesion.
And I think the current environment we're in, because the frontier is so unknown and so unknowable to some extent, there's, there's, there's,
moving fast and adapting fast has a premium over being able to sit down and figure out exactly
what's going to happen and put the pin in the right place. And this is somewhat paradoxical
because the more agency you have, the more it may seem like it matters that you do the game
selection right and you sort of pick the right game to play, et cetera. But practically and empirically
what's happening is that actually it's better to move fast and ship the thing within 24 hours
of inception than it is to sit for two weeks and try to figure out the exact right way to construct it
and then try to ship it then.
And I think that probably goes all the way down
to like many parts of one's life
where we're in an era of speed over cohesion.
Yeah, yeah, we are in the Just Do Things era.
I wasn't Alpin expecting this to be such a philosophical episode
to open this episode up.
But now I think we can kind of like corral ourselves
and point our agency towards the topic at hand,
which is what happens when people feel high agency with AI
towards the security of our smart contracts?
Is it worth talking about what kind of contracts are most at risk or least at risk?
Is there some sort of category or knowledge landscape that we can understand that when AIs have very high smart contract agency,
that we should be paying attention to certain kinds of smart contracts over others?
Is there a conversation there?
I think not in terms of market, I think that's really hard to say, like, you know, a Dex contract versus a lending market.
But I think, you know, simple contracts that have been around for a long time,
I think are probably better in a better position than, for example,
like the 200th contract deployed on Binan Smart Chain or something like that,
where, you know, there's in the past, there's been a sheltering that's happened
from being in a small market.
So if you were deploying something where the most amount of money that one could make,
if they fully exploited it, was sort of in the order of low thousands of dollars,
then you were sheltered by the fact that they're just much bigger fish
and the bad actors and even the good actors
or the people who are trading, et cetera,
are just not, you're not in their over the window.
But as the models get better,
because the cost of inference is so much lower
than the cost of an extremely talented security researcher,
that long tail might get shaken out very quickly.
So I think most at risk is probably small cap
or low TVL protocols in long-tailed
chains that are that are still built on a well-understood stacks like the EVM and solidity.
And then, and then I think there's, there's just sort of an unknowable security risk for the,
for the major contracts, the OG, Defi contracts that are currently battle tested, but are still
very complicated. And, and, you know, we'll see over, over the coming a year or two,
to what extent those, those contracts are actually exposed. Right. So the OG contracts that have
had a ton of Lindy and a ton of value locked over time that have been
tested by the market are like safer in the near term,
but nonetheless the people managing those contracts
will still need to have agency to be on the defensive
to make sure that they are winning the arms race
against the offensive types.
Well, the prize is much larger for exploiting those contracts.
So I think it has this like, you know,
there will probably be this canary in the coal mine effect
where there will be smaller or protocols that are less secure
but have a lot of assets in them fall first.
and I think we'll have to look out for the first exploit that happens
that is almost entirely from AI and then from there,
the race will be on to start taking the defensive measures necessary.
Right.
And then like the long tail, as you said,
the long tail of contracts,
testing and prod will no longer be a thing
because when the cost to exploit a $1,000 contract
is like, you know, $10 to $50 of tokens,
then those contracts simply won't exist.
Somebody will write a bot that says,
hey, clawed, open claw, go hack me some contracts,
and then that thing will actually have the capacity to do that
because the people that didn't really think too hard
about their security because they didn't need to think too hard
about the security, those people will not have a good day.
Yeah. I think this is a general trend
that the long tail will get collected by people who can use AI well.
Like, for example, you can look at something like prediction markets
where there are markets where if you trade them to perfection,
the most amount of money you can make is maybe $50 to $100.
And like, it's not worth it for Jane Street to put a quant on those markets
because it's too expensive for them.
And it's bound by the cost of intelligence and the cost of attention.
But if you can trade those markets near perfectly for 10 cents of inference,
then you'll do it.
And in aggregate, maybe that long tail is pretty valuable.
So right now we're in a world where the long tail is sheltered by the fact that it's small.
And as AI gets better in all of these different domains,
we should just be assuming that all of that's going to get collected by people who are able to use these tools.
Let's talk about EVM bench.
This is the paper, the tool?
You guys call it a tool?
Is that right?
Yeah, it's a benchmark and also an agent harness.
So we maybe had two releases in conjunction.
The first evaluates the ability of an agent to exploit smart contracts.
And then the second one is sort of an agent harness that is like, you know, similar to an auditing agent.
So it can actually find the bugs.
And obviously the agent harness that we released
is sort of not at the frontier of capabilities
because we don't want it to be used for black cats.
But we have a UI that you can upload any smart contract into
that will do sort of a baseline check for bugs.
Can you define harness?
I'm pretty sure that's a technical term
that I think coders will be aware of, but I'm not.
Yeah.
So the core idea is the model labs will release these LLMs, right?
So you'll have GPD 5.3, et cetera.
and you know, you can do the baseline test of like just prompt the model, just ask
chat APT, hey, is there a bug in this contract? And then you, in addition to that, like, you can,
like, let's say that gets you to X percent on the benchmark. You can add a bunch of scaffolding
around the model that says, hey, like, for example, here is an EVM that you can test against.
You can deploy a contract and actually run an exploit and see if you're able to drain the money.
And it turns out that if you give agents these tools,
this are scaffolding that they can sit in,
they perform much better.
So the harness is like similar to basically it holds the agent,
the model,
and it gives it superpowers that are specialized to the task.
Now, the interesting thing on the current arc of AI
is that most of these tools that we add in fall like,
flake off with time,
because as the model gets better,
it just absorbs the harness.
The core example being how at the beginning of Tesla's fully self-driving,
Vamuch Karpati talks about the majority of the code was hard-coded and had written,
and very quickly started ramping up to now, I think, over 50% of it is actually just the model.
There's no, like, they removed all the C++-plus code that was like saying,
if-X, then Y, and the model just figures out a way to do it.
So right now, where, you know, the agent harness, quote-unquote, is in the if-x than Y,
like hard coding, hard coded tools that we're adding in to give it these capabilities,
but probably in the fullness of time, it'll get absorbed by the agent as it gets better.
I see.
I see.
So like the harness is kind of like a bootloader to get it started, but then eventually data will
take over, data and experience will take over the actual internal like operations of the
machine.
Yeah, exactly.
Yeah.
I mean, right now the harness is super valuable in like very counterintuitive ways.
Like for example, it turns out that just giving an agent the ability, like,
an environment to test against, even if it barely uses it,
leads the agent to think for longer and to try harder and thus get better results.
So it's like there's still so many low-hanging,
there's still so much low-hanging food because the agents themselves are not fully well
parameterized and calibrated yet.
But yeah, I mean, definitely we'll get to a point in time when, you know,
the next version of Kodax or Opus will be able to just spin up an EVM on its own
and we won't need our harness for it.
Okay, so what does the tool actually do?
Is the tool the thing, like the agent doing the exploiting or doing the patching?
Or is it just the benchmarking?
Like, talking to me about the actual utility here.
The core release, the core thing that we want to get out in the world is the benchmark.
It's how good are the models exploiting smart contracts?
There are three components to the benchmark.
The first is the ability to detect bugs.
The second is the ability to patch bugs.
And then the third, which is sort of the most interesting and novel contribution, the ability to exploit bugs, which is one of the biggest problems with previous attempts at having security related, for example, auditing agents, has been this problem around false positives.
So the agent comes to you and says, I found 50 bugs in the contract.
And maybe one of those 50 is an actual bug.
But it just is so time intensive for you to go through and figure out which ones are real that it's not better than a human auditor.
And what we did in the exploit component of the benchmark is we leaned on the fact that crypto is verifiable.
And we used this production grade EVM environment where we load in a bunch of chain state and we set up a bug environment and let the agent try to exploit it.
We leaned on this to lower the false positive rate down to basically zero.
So it got to a point where if the agent tells you that it found a bug,
it literally has a proof of concept that it can exploit against,
it can run against a production grade EVM environment
and drain money from a contract.
And this is sort of the core breakthrough of the paper
is that there's a verifiable environment
that actually leads to a very low false positive rate.
That's the actual benchmark.
It's like you guys have established,
like you guys can actually measure the thing effectively.
Yeah, exactly.
Because otherwise, if someone says,
oh, we found all of these bugs
and we got 90% on this benchmark,
you don't know what it means
because you have no way of knowing
if half of those are real or fake, right?
So the verifiability ended up being very important.
I think this is one of the reasons
why models are going to get extremely good at crypto very fast.
Because basically you can slice the future
related to AI into two categories.
One is the verifiable stuff and the other is the unverifiable stuff.
And the verifiable stuff is very easy for the models to learn because they have a very clear
training signal and they know exactly when they got it right.
And they can just keep running at that and improve and climb that hill.
Whereas the unverifiable stuff, it's like you don't know, there's no tests.
You don't know if you got it right or wrong.
So it's like, are you good at writing a poem?
Is your joke funny?
Like these are very difficult for the models to get good at.
And if you were to just take the whole universe of code
and look at which pocket was the most verifiable,
you probably would end up with a pocket that's almost entirely crypto, right?
The whole substrate is based on the concept of being verifiable,
which means that with very little data,
even though there isn't that much in the form of contracts,
there isn't that much in the form of like,
no crypto people are in these labs generally or very few,
the models have gotten extremely good
because it's so verifiable.
And also just as the models get better,
like, for example,
Gemini famously learned an entire language just in context.
So as the models get better,
you know, the amount of data you need might be lower.
So I think the general trend and trajectory of,
you know, these models are going to get extremely good at crypto,
extremely fast.
I think we can bet on that.
So when the EVM bench paper says something like top models
are going from 20% to over 70% exploit rate,
you know, something like the newest JadGPT codex.
What does that mean?
20 to 70%?
You're like 70% of all smart contracts.
It comes in contract with.
It can exploit.
Like what does that mean practically?
Yeah.
Practically it means we collected all of the historical
fund-draining critical bugs
from open audit contests like Cotorino.
Right.
So the set of, so the bugs
are found are not like, oh, you know, you had the small issue where maybe like someone could have
frozen the contract for a day or something like that. It's much more you could have strained money
from this contract if you found this bug. And the 20% initially or less than 20% initially
meant that if you took a frontier model and you put in front of it all of the hardest,
all of the hardest audit problems after its knowledge cut off, it would not be able to find the vast
majority of them. And by the end of the benchmark, it meant that over 70% means that, you know,
if you just reran Koderina and instead of GPT4, it had GPD 5.3 codex, 5.3 codex would have found
over 70% of the bugs, the critical fund-draining bugs that human auditors found.
So throughout, throughout our history of human audits and finding bugs, it would have found 70%
of those. Of the critical ones, with, with some constraints.
Like, for example, we didn't go all the way back in history.
We started at past the knowledge cutoff of because we want to avoid contamination.
I see.
So this kind of gets you a rough benchmark of like basically Chad GPT 5.3 codex is like 70% as good as all of the human auditors out there.
Something like that.
Yeah, something like that.
Although these things are highly nonlinear.
Like, for example, there are dumb bugs that can lead to losing all the money in the contract,
but are actually not that hard to find.
So that's why I think there's more in the paper that is notable about the fact that these aren't,
there wasn't just like one trick that the model figured out and like got to 70%.
It wasn't like all re-entrancy, right?
This is a very diverse set of bugs that it was able to find.
But yes, it's like fundamentally the models are getting, you know, very close to being as good as the best human auditors.
One thing that's fascinating about having a benchmark is then like it seems like,
all the frontier labs love, love to compete for benchmarks, winning benchmarks, right?
Humaneity's last exam. In fact, they almost like game some of the training towards these
benchmarks. And so if you have an attractive benchmark that kind of propagates even like socially
to all of the frontier labs, then it seems like it provides some sort of social incentive
to have them perform and train in order to, you know, compete against one another to exceed
each other on those benchmarks.
Is that kind of the flywheel you feel like has been set in motion with the UVM bench?
I think it's an important, I mean, maybe the zoomed out point is that crypto in its history
has been very stigmatized and very illegible to the AI labs.
So the fact that there hasn't already been a massive push for crypto-related evaluations
is kind of absurd because the labs currently today are entirely.
bottlenecked on evaluations that are verifiable and economically important. And crypto ticks both of
those boxes. So I think it took paradigm pushing a little bit of our weight around to get this
through into the labs. And I think, yes, our firm hope is that this will start the flywheel of
labs paying more attention to this technology. And we're going to continue doing work on this front as
well. Do you have an explanation as why it's been so slow because it's also been perplexing to me
because it's all open source. It's all out there. We already got all the data. Yeah. I mean,
we talked to Haseebri Slein. His explanation was there's a lot of liability when it comes to
finance and crypto and having AI models trained on those data sets, right? It's like,
what if an AI model does exploit a bug whose fault is that? So,
maybe there's some risk associated with it.
There's also, you just mentioned kind of a stigma.
I think meme coin casino, speculation.
Certainly Peter from OpenCla, he had a negative experience that he associates with
with crypto culture, with a bunch of people calling themselves part of the crypto industry,
try to front run him and develop meme coins, and he just considers it shady.
Like, what are the reasons why the frontier labs have been so slow?
to train on this incredibly rich data set,
that as you said, it's perfect for training
because all of it can be verified.
My sense is that it's almost entirely a social thing.
I mean, in my peer group,
crypto is the biggest industry
that has remained the most contrarian.
And I think part of that is because it's very reputationally volatile.
And part of it is because there's this dynamic
where the best people in the industry,
the gap between the best people in the industry
and the median person in the industry
is much larger than anywhere else.
So if you, for example, don't have exposure
to the high quality pocket of crypto,
then all you see are the scams.
And it's like that can distort your view
such that you just completely dismiss the industry.
And historically, there's been a lot of alpha in that, right?
And I think a lot of us have benefit
from the fact that there's significant reputational volatility.
And if you aren't as sensitive to that as a person
in terms of your temperament,
you can do very well in crypto.
But I think it's a social thing,
and I think it's just this legibility point
about there just hasn't been
a brand that can bridge
the crypto and the AI world.
I think it's like, if something touches crypto,
it sort of in the AI world historically
has been tarnished.
And as a result, people just have tried to
avoid it altogether.
And this is sort of, this was created an opening
for something like, for example,
something like EVM Bench,
to get built and shipped inside Open AI without any sort of company.
I think all of the major model labs are going to be running on this benchmark
and probably any future versions of it.
Without significant competition inside the labs,
like there aren't like 30 to 50 crypto-related benchmarks or training environments
that people are shipping.
In some sense, it's actually agency-inducing for us
because they'll just defer to the crypto industry to just figure out what's valuable for them.
But I think it's fundamentally a social issue, and it's sort of tied to all of these dynamics around, like, you know, you see someone who gets extremely wealthy, who you don't think should get extremely wealthy.
Like, maybe it was like there's a lot of volatility in the industry and, like, there's some person who you don't respect who made a lot of money.
Like, these are all kinds of things that go on in the minds of, like, the AI researchers at these labs that lead them to think that the whole industry is a scam.
And, you know, obviously, this has made it an incredible environment to be invested.
in crypto because it's just not in the Overson window of anyone in the Valley.
But I think that's the core dynamic.
Interesting. Interesting.
So we know bots, these AI LLMs, they are very good at writing code.
It's like the first thing that they got good at.
Is that the same?
Is it also true with writing EVM code?
Is there a gap there between writing the rest of the world's code and EVM specifically?
Yeah.
I mean, historically there has been.
And part of the reason, you know, one component of what motivated us to start this work,
was the realization, like, man, these models are so good at Python and so bad at solidity
and so bad at, for example, like Solana related code. Honestly, anything that touches crypto.
And part of my expectation at the time was that we were going to have to go crowdsource a bunch
of data from the industry and like spoon-feated into the labs to get them the models really good
at this. But it turned out that because the substrate is so verifiable and also because
there's sort of generality in these models,
they ended up getting quite good,
much faster than we expected,
with much less input than we expected.
So there's this dynamic where if you teach a model,
a poem in English,
and then biology in Spanish,
it figures out how to write a poem in Spanish,
even though you never described
how to write poetry in Spanish to the model.
And I think that kind of dynamic is happening here as well,
where it's sort of like, quote-unquote,
learning the language of crypto
without as much direct training data.
And also just because very hard to underrate the verifiability of the thing, right?
Most software is hard to verify.
You need human labelers to go in and check.
Is this thing correct?
Is it running?
Kind of the only threshold of verifiability you have is does the program compile
and does it pass the tests?
But the tests need to be sort of ridden by a human, right?
You don't have this notion of like we have a bunch of state.
We can make assertions about the state.
We can, for example, send a model on a new contract, on a new EVM that's never seen before,
and make assertions about whether it's able to, quote, unquote, drain money from it.
Like, those concepts all would have otherwise needed to be hard-coded into the program.
But because it's crypto, we, and there's so many standards, it's verifiable.
And the models are just ramping up really quickly in capabilities.
Okay, so do you think that this right now is the time that the, the,
floodgates of crypto data to train these models, that that is now starting to open and going
to be opened. And if so, what types of capabilities do you expect future models to kind of
drop? I mean, will they have skills and I guess persona's connectors directly for crypto within
some of the core LLMs? Or what types of developments are you looking forward to?
There's a reason we started with security and not sort of general programming capability.
It's because it has this very nice shape of it's extremely economically valuable.
It's extremely sort of intelligence bound, right?
It's like you can't, yeah, it's intelligence bound.
And it's very easily verifiable.
So we know when an exploit has happened.
So security capabilities I expect will develop very quickly.
And then we've talked about sort of all the implications of that.
All their crypto-related capabilities, I think, for example, things in the domain of mechanism design
or around market-related films.
Like, what is the mechanism for an exchange?
How do, if you have market of agents, right,
what is the best way through which they should coordinate with each other?
These are, I think, open fertile soil.
And then, of course, you can go down to the protocol layer, right?
You can say, well, how does a model land a transaction on the Ethereum blockchain?
How, like, there's a security side at the Ethereum client level, right,
or at the protocol client level, where, like, sure,
Maybe there are $100 billion of assets sitting in open source smart contract.
But there's way more than that in ETH and sole market cap that can be exploited if you're
able to find critical vulnerabilities in death, wrath, et cetera.
So I think going down with the protocol layer is going to be important.
Model capabilities around MEV and sort of extractive tactics, I think that will have the same
effect as long-tail hacks, right?
There's a bunch of stuff on chain that you can just collect if you, if you're able to do the
end-to-end process of figure out alpha in the market, construct a trade and underwrite it,
and then submit the transaction and land it on chain reliably.
These are all things that actually the models are not that good at right now, but they will
get good at really quickly.
And all of that does seem long-term good for crypto.
However, the here and the now and the short term, like when we read in the EVNB bench paper
that top models are going from
to over 70% exploit rate.
I'm not sure whether to like,
I mean, feel good about that
or bad about that
because it sort of depends
what the intent is
and who is harnessing it.
So if it's white hat,
that's great.
I mean, that improves our ability
to find bugs and exploits
before attackers do.
However, if it's black hat,
that's not great.
because it improves their ability to find these before we do.
So it depends, I guess with this tool set,
it depends who is using the tools and the intent behind them
as to whether it is short to medium term, bearish or bullish.
I guess one thing it does is it does seem to inject some variance
and uncertainty into the market.
I'm feeling that everywhere with AI right now.
It's just like, it could be really good,
it could be kind of bad,
thing it's not going to be as boring. It's going to be highly variant outcomes. I guess when you think
about the capability that is being unlocked here and the ability for LOMs to detect exploits,
is that good or is that bad for crypto in the short to medium term? Yeah. Well, I mean, you mentioned
in the long term, crypto is positively levered to almost all of these developments. And as models get
extremely good at security, this will raise the ceiling for the whole industry.
And that's because it's going to be a survival of the fittest thing, right?
Because all the weak stuff gets just exploited.
I think it's up to us, like what the path there is, us being the industry.
It may be survival of the fittest.
It may be that we get figure out a way to have the defense get ahead.
And I think, but the core point is that the amount of assets that can be sustained on these
networks is proportional to how secure they are. And in the long term, there's this benefit where
as security improves, more assets will be able to securely stay on chain. Now, in the short term,
I think this is one of those things where it's in our hands, right? I think it's bound by
the industry's agency on the best way to handle this. We don't know, like if we just let the
clock play forward, there's a lot of uncertainty. We don't know exactly who, whether the
attackers, you know, the black hats will get capabilities before the white hats do.
But we also are active participants in this market. And we can bend the arc of this such that,
for example, we make sure that if there are frontier models or unreleased models or there are
new developments and security relating to AI, that we get this into the top protocols, you know,
one version of the world that you can imagine the short to medium term is that you always have every
single contract being scanned by both adversarial actors and defensive actors 24-7. And when there's a
bug that surfaced, you know, whoever catches at first sort of will react accordingly. And then in that
world, it is just kind of more of a race between the good guys and the bad guys. And I think we have
a lot of pretty great hand in terms of making sure that the good guys have the lead in that race.
At the end of this, once all of the contract, like the weak contracts have been exploited
or we've beefed up security enough such that they're not exploitable, I guess that gives us
an incredibly hardened financial system for the world, something that's ultra-secure.
There's almost like a, it's close to perfect, right?
It's like how many, how many nines?
I mean, it's maybe like four-nines, five-nines.
And it almost creates kind of a barbell model of a security for the world,
financial assets. Like the most secure financial assets will probably be in this dark forest
environment like on chain. How do we know that? Because the world has thrown everything it can
at it, including our most intelligent LLMs and it's still there. It's still standing, right?
Hasn't been exploited. So that'll be one side of the barbell. The other side, honestly,
is things that are completely outside of the digital world altogether, you know, like a
clump of like a bar of gold or something that the old
actual gold right and then everything in the middle will be
pretty exploitable pretty insecure i wonder if that's what is on the horizon with
our world until the alums figure out how to synthesize gold right it's right right right
and some the robots after the gold right yeah i think i think that i guess that makes sense
and i think that that bar of view of the world might be how things play out i guess the way that
I relate to crypto as an industry and also as a technology is that if you start from the first
principles vantage point of, let's say you want to do payments at the speed of light, right?
Like, I want to send you, Ryan, money from America to Europe or some other part of the world,
and I send it as fast as I send an email.
And the problem that you have there is that you don't know if I also sent that money to David,
right? You have this double spend problem.
And this was what Bitcoin solved and it got the time for that transaction.
down to about an hour.
And since then, we have had successive developments
that have increased both the speed of these transactions
and the expressivity of these transactions.
And I think that in that worldview of this is like,
this is not just some path-dependent thing that happened,
that the crypto industry emerged the way it did.
And that actually this is, if you were to play it forward
from first principles, that this is how it has to be.
You end up in this conclusion where, you know,
if you have agents that want to move at the speed of the internet,
and the current banking system was created before cars were invented,
that those agents are going to discover the crypto rails as the right way to transact.
And, you know, I think there was a concern maybe like six to eight months ago,
definitely for me, where it was not clear if the agents would get good enough
at crypto-related software, for example,
for them to be able to discover the current rails,
and maybe they'd have to reinvent them from scratch.
But over the last six, eight months, as we've been working on this work with Open AI,
it's become increasingly clear that, one, they're extremely strong network effects inside
of crypto.
And two, these agents are able to just learn, like, they just want to learn these verifiable
things.
And crypto is very high on that list.
So at this point, I think, you know, I've become extremely, extremely bullish on crypto as
a substrate for these agents.
and I think it's sort of an open game.
Who in crypto is going to win that?
But the shape of the technology fits it perfectly.
The EVM is by far the most common programming language,
programming environment in crypto,
solidity being the most common language.
And then there's like some long-tail languages,
like Cardanos like Haskell, for example.
And as we know, AI loves data.
The more data and AI can get on, the better it can be.
What do you think this network affects around environment?
how does that play into AI?
Like is the EVM going to be the favorite environment
for AI's to work in?
Does Solana's...
What's Solana?
SVM.
The SVM.
Does Solana's SVM also across the threshold
or maybe the threshold thing is a false illustration?
What do you think?
I think it's actually very hard to say
and currently the ball is in there.
We don't know.
Part of the reason why, I mean,
I can point to some of the bottlenecks we had
while we were developing VM.
where we also have actually started work on a Solano-related component of it.
But for example, one challenge was that actually it requires a lot of human talent,
like human talent to be able to go in and construct these evals.
And that was just much easier for us to come by for solidity.
And these things are really hard to build, right?
These are pretty heavy infrastructure.
So even small additions of friction lead to, you know,
when you need to cut scope in some capacity,
you kind of have to cut it in the direction of the stuff
that you can do more easily first.
That being said, I think the point about the fact that,
you know, one, crypto's verifiability is a huge edge.
And two, these models are becoming less and less data hungry
when it comes to learning new programming languages.
I think it may end up being more even of a playing field
than it might seem.
And I think, you know, at least we have an intent,
and interest in actually going across ecosystems and down the stack to the protocol layer
and making this sort of more expansive of a crypto flagship benchmark.
But yeah, right now there are obviously network effects around the EVM.
One other example of a counterintuitive reason why someone like Solano might be able to catch up
is that at first glance, it may seem really bad that most of the contracts there are close
source. But if you take the worldview of actually, if it's open source, it gets in the training
set, then the closed source benchmarks and training and contracts for training might actually
be more valuable for a model's development because it's not in the train. Now, for example, like, we have
these sort of what are called canary tags in various parts of our evaluation that filter them out
of the training process of most models. So, you know, there are tricks you can do, but still,
it's possible that the bugs in EVM bench, over time,
leak into the pre-training of the models.
Whereas if it were a close source,
it would not leak in it all.
So my expectation is actually that there will be some asymmetry at first,
but actually the models will get really good at all of it.
And there will be sort of a merit-based sort of who the best will rise to the top.
Are you sitting on USDT or Sablecoins after taking profits and wondering where to deploy next?
What if you could access stocks, gold, and ETFs without ever leaving crypto?
That's what tokenized stocks on BitGET unlock.
Traditional markets still run unlimited hours, but capital is moving on chain 24-7.
On BitGet, you can trade tokenized stocks and ETFs 24-7 with up to 100x leverage,
all settled directly in USDT or USDC.
No brokerage accounts, no off-ramps, no platform switching.
BitGet has already processed over $18 billion in tokenized stock trading volume,
with most of that happening in the past month alone.
The platform now captures close to 90% of on-dose tokenized stock spot market share.
As gold and silver hit record highs, on chain training followed.
Over the past two weeks, volume in SVL on, tied to silver and IAU on, linked to gold, surged on BitGed.
This is BitGette's universal exchange vision and action, crypto equities and real-world assets in one place, built with crypto-native speed and flexibility.
If you want to trade stocks the way you trade crypto, explore tokenized equities on BitGat.
Learn more by clicking the links in the show notes.
This is not investment advice.
Few people in crypto put real skin in the game when they make public top or bottom calls.
The DeFi report is one of them.
The week before the October 10th flash crash, Michael from the DeFi report emailed his entire newsletter,
saying he's going aggressively risk off and sold the majority of his book from crypto into cash.
This is when Eth was $4,000 and Bitcoin was $110.
Michael runs the DeFi report, an industry-leading research platform built on data,
cycle awareness, risk management, transparency, and most importantly, skin in the game.
We like Michael at Bankless.
We like his analysis,
and that's why you hear him
on the Bankless podcast about once a month.
And the Defy Report is giving Bankless listeners
one free month of access to the Defi Report.
So if you're looking for some sharp data-driven analysis
to make better informed decisions around your portfolio,
you can learn why and how Michael called the top
and what he's doing next, all in the Defi Report Pro.
Check it out.
There is a link in the show notes.
Ethereum's aspiration is to formally verify
its entire end-to-end tech stack in the fullness of time,
you know, reading it first we have.
have to get like the beam chain, we have to do all the hard force to get there, but ultimately we
want to do a formal verification of the entire Ethereum tech stack.
AI-based formal verification. Is that a real thing? How does AI capabilities work its way
into the conversation of formal verification? Yeah. Well, I mean, I think it is, I think it's a real
thing. Last year, we invested in a company called Harmonic, which is a foundation of math model,
co-founded by Vlad Tenev of Robin Hood and Tudor. I think part of their thesis, and I think
part of where the world is clearly going
is that there's more software that is being generated
than can be possibly reviewed by humans.
And formal verification is one way to quickly check
whether a component of software is actually doing
what it says it's doing.
And then obviously in the context of security,
it can, especially if the spec is written correctly,
it can be a step function change.
Now, it's not a silver bullet
in the sense that you still have to write the spec
for the formal verification.
So there's still surface for bugs to get in there.
But you can make the case that actually the surface for bugs
in writing a formal verification spec might be lower
than writing the code to start with.
And definitely with time,
I think all of the best models,
all of the best software will probably end up being formally verified.
And if you take the vantage point of an agent
and you have two options to choose from,
one of them is formally verified and one of them is not,
and the formally verified one might just,
gain preference just because it has all these nice properties.
There's a section in the paper of the EVM Bench paper called Future Directions.
What does EVM Bench V2 look like?
How does this call the project?
It causes project to grow from here.
The top-level goal that we have is to help the model labs develop the crypto capabilities
of their models.
And I think that security is one component of that and maybe an increasingly urgent component
of that.
but there's so much that EVM bench does not touch.
So there are other ecosystems and stacks.
There is a protocol layer which we talked about
where maybe arguably it's more important
from a security standpoint
that the Ethereum protocol is secure
rather than any specific solidity contract.
There are out of protocol components,
like how do you land a transaction on chain?
How do you deal with the mempool?
How do you deal with sort of the non-deterministic parts of crypto?
And then obviously there are components that are even farther on the verifiable and intelligence-bound trajectory,
like, for example, around cryptography and around zornology proofs, et cetera.
And I think all of these are extremely fertile soil for future work.
So we're currently open to trying to source collaborators for future versions of a VVM bench,
and we're obviously working on next steps for it ourselves.
And I think that this direction of like we finally have a foot in the door into the model labs for getting crypto capabilities into the frontier models.
And I think that we should leverage that as an industry and we should try to try to get these models as good at crypto as we possibly can.
Alpin, you're very smart as evidenced by this paper in this conversation.
You have a high degree of agency, obviously.
You're definitely on the frontier.
you've chosen to stay in crypto
and not kind of leave and go to AI
and you seem incredibly bullish at crypto,
even bullish that it's contrarian at this moment in time.
Why crypto for you personally?
I've personally never had hard lines around industries in my mind.
We talk about I work in X because,
to make what we're doing legible to other people.
But I don't think that's the right way to relate to it.
I've spent all of this time in crypto because it's been, one, it's been extremely intellectually
interesting. And two, it has this, it's just, as I mentioned, it's remained extremely contrarian
among my smartest friends in ways where I can put my finger on exactly what they're missing.
I think that's sort of, that's kind of the best that one can ask for. I think that, you know,
we talked about how crypto is positively levered to the security developments in AI. But, you know,
you can make the case that it's positively levered to most of the developments in the world right now.
Like, for example, as the creation of new, as new, creation of new goods and intelligence,
et cetera, becomes commoditized, scarce assets become more valuable.
As geopolitical instability ensues, systems that are extra sovereign, right, outside of,
outside of any jurisdiction that are kind of the equivalent of end-to-end encryption for finance,
those have more space to thrive.
And I think that, you know, I grew up in Turkey, most of my family is still there.
I think that people who grew up in America do not have, and in general in sort of a stable world,
do not have the sense for what can happen as the world destabilizes.
And I think that, you know, as many people in the country that I grew up in are starting to onboard to crypto rails
and sort of using that as a lifeboat, I think it's increasingly clear to me that this technology is on a sort of compounding trajectory.
to do really massive things.
And so the combination of that, plus you look around
and no one's even talking about it, it's just really exciting.
Yeah, and you do seem convinced that the acceleration of AI
is going to benefit crypto, that it will be all boats rise together.
And I think that, well, there is some category of software industry
that AI doesn't seem to benefit, at least in the short run.
Anthropic drops a new security module,
and all the cybersecurity stocks drop like 10 to 15% in the world.
one day. Why are you so convinced that AI's acceleration will be beneficial to crypto?
It's not, obviously, nothing is guaranteed right now. I think if we let everything run its course,
it may be bad for crypto. It may be good for crypto. We don't know. I guess the, the conviction
that I have is that if we push things in the direction that we want them to go in, that we can make
AI be extremely good for crypto. And also, I think there's the component of the,
this where I do strongly believe that if you were to rederive all of this from first principles,
you end up in a place that's very similar to where we currently landed with crypto.
And yeah, I just think that for all the reasons we've talked about, that for fundamental
reasons, crypto is extremely good for AI, and AI is extremely good for crypto.
So I think that nothing is guaranteed and we still have to exercise our agency.
but I think for all the reasons we've discussed so far,
it's pretty clear to me that these things are going to converge in a positive way.
Well, let's end a note on high agency and conviction.
Alpin, thank you so much for joining us today.
Cool. Thanks for having me.
Got to let you know, bankless listeners.
Of course, none of this has been financial advice.
You could lose what you put in, hopefully, an LLM out there.
A white hat is protecting it.
We're headed west.
This is the frontier.
It's not for everyone.
But we're glad you're with us on the bankless journey.
Thanks a lot.
