Limitless Podcast - OpenRouter: The Only AI Tool You'll Ever Need | Founder Alex Atallah
Episode Date: August 4, 2025In this episode, we chat with Alex Atallah, founder of OpenRouter AI, a platform that aggregates over 400 LLMs. He shares his transition from co-founding OpenSea to leading innovations in AI,... addressing fragmentation in the AI model landscape. We discuss community engagement, model analytics, and the challenges of open-source vs. closed-source frameworks. Join us for insights on the future of AI and how user control can shape technological advancements at Open Router!------🌌 LIMITLESS HQ: LISTEN & FOLLOW HERE ⬇️https://limitless.bankless.com/https://x.com/LimitlessFT------TIMESTAMPS0:00 Intro2:06 Journey from OpenSea to OpenRouter5:52 Exploring Frontiers of Technology7:16 Patterns in New Opportunities10:06 The Role of Enthusiast Communities13:13 Early Innovations in AI15:18 Insights on Model Development19:17 Understanding OpenRouter’s Functionality24:13 Choosing the Right Model27:04 Benchmarking and Performance Metrics29:27 The Importance of Token Metrics34:24 Collaborations with Major AI Players35:20 Open Source vs. Closed Source Models39:19 Future Trends in Model Adoption43:06 The Role of Innovation in AI46:23 Comparing Global AI Talent50:29 Data Utilization Strategies57:18 Future of AI Agents1:01:20 OpenRouter's Vision for the Future1:04:04 Trends in AI and NFTs------RESOURCESAlex Atallah: https://x.com/xanderatallahOpenRouter: https://openrouter.ai/Josh: https://x.com/Josh_KaleEjaaz: https://x.com/cryptopunk7213------Not financial or tax advice. See our investment disclosures here:https://www.bankless.com/disclosures
Transcript
Discussion (0)
What if I told you there was a single website you could go to where you can chat to any major AI model from one single interface?
It's kind of like chat GPT, but instead every prompt gets routed to the exact AI model that will do the best job for whatever your prompt might be.
Well, on today's episode, we're joined by Alex Atala, the founder and CEO of OpenRouter AI.
It's the fastest growing AI model marketplace with access to over 400 LLMs, making it the only place that really knows how.
people use AI models, and more importantly, how they might use them in the future.
It's at the intersection of every single prompt that anyone writes and every model that they might
ever be. Alex Atala, welcome to the show. How are you, man?
Thanks, guys. Great. Thanks so much for having me on.
So it is a Monday. How does the founder of Open Route to spend his weekend? Presumably,
you know, out and about chilling, relaxing, not at all focused on the company?
Oh, I usually, I love weekends with no meetings plan and I just go to a coffee show.
shop and just have tons of hours stacked in a row to do things that require a lot of momentum build
up. So I did that at coffee shops on Saturday and Sunday. And then I watched Blade Runner again.
Okay. Okay. So when we were preparing for this episode, Alex, I couldn't help but think that
you've had a pretty insane decade of startup
foundership, right?
So Open Router is kind of like your second major thing
that you've done, but prior to doing that,
you were the founder and CTO of OpenC,
the biggest NFT marketplace out there,
and now you're focused on one of the biggest AI companies out there.
So it sounds like you're kind of like the pivot point
of two of the most important technology sectors
over the last decade.
How do you, can you just give us a bit of background as to, you know, how you ended up here?
And more importantly, where you started, walk us through the journey of OpenC and how you ended up at OpenRata AI.
Yeah, so I co-founded OpenC with Devin Finser at the very beginning of 2018, very end of 2017.
It was the first NFT marketplace.
And it was not dissimilar to OpenRouter
in that there was a really fragmented ecosystem
of NFT metadata and media that gets attached to these tokens.
And it was the first example of something in crypto
that could be non-fungible, meaning it's a single thing
that can be traded from person to person.
Most things in the world are non-fungible.
Like a chair is non-fungible.
A currency is fungible.
So it was back in 2018,
no one was really thinking about crypto in terms of non-fungible goods.
And the problem with non-fungible goods is that there weren't any real standard set up.
There was a lot of heterogeneous like implementations for how to get like a non-fungible item
represented and tradable in a decentralized.
way. So OpenC
organized
this very heterogeneous
inventory
and put it together in one place.
We came up with like a metadata standard.
We did a lot of
work to
really make the experience super good
for each collection.
And you see a lot of those
a lot of similarities with how
AI works today too, where there's
also just a very heterogeneous
ecosystem, a lot of different APIs and different, like, features supported by language
model providers. And OpenRouter similarly does a lot of work to organize at all. I was at OpenC
until 2022 when I was kind of feeling the itch to do something new. And at the very end of
left in August, and then ChatGBT came out.
a few months later.
And my biggest question around that time was whether it was going to be a winner take-all
market, because opening I was very far ahead of everybody else.
And we had cohere command.
We had a couple open source models.
But opening, I was the only really usable one.
I was doing little projects to experiment with the GPT3 API.
And then Lama came out in January.
really exciting about a tenth of size,
one on a couple benchmarks,
but it wasn't really chatable yet.
And it wasn't until a few months later
that somebody, a team at Stanford, distilled it
into a new model called alpaca.
Distillation means you take the model and you customize it
or fine-tune it on a set of synthetic data
that they made using chat GBT as a research,
project.
And that was,
it was the first successful
major distillation that I'm aware of.
And it was an actually usable model.
I was like on the airplane talking to him and was like,
wow,
this is,
if it only took $600 to make something like this,
then you don't need $10 million to make a model.
There might be like tens of thousands,
hundreds of thousands of models in the future.
And suddenly this started to look like a new,
like economic primitives,
a new building block that people,
that kind of design.
their own place on the internet. And there wasn't one. There wasn't a place where you could discover
new language models and see who uses them and why. And that's how open router got started.
That's amazing. So one of the things that we're obsessed with on this channel in particular is
exploring frontiers and how to properly see these frontiers and analyze them and understand when
they're going to happen. And when I was going through your history, you have this talent consistently
over time. And even as far back as early on I read you, you were hacking Wi-Fi routers and
a hackathon. You're very early to that. You are early to the NFTs. You were early to understanding
AI and the impact that it would have. And what I'd love for you to explain is, is the thought process
and the indicators you look for when exploring these new frontiers. Because clearly there's some
sort of pattern matching going on. Clearly, you have some sort of awareness of what will be important
and why it will be important and then inserting yourself into that narrative. So are there patterns,
are there certain things that you look for when searching for these new opportunities and
that led you to make these decisions that you have? I think there's a lot.
be said for finding enthusiast communities and seeing if you're going to join it.
Can you be an enthusiast with them?
Like whenever something new comes out that has like some kind of ecosystem potential,
there's, there are going to be enthusiast communities that pop up.
And the internet has made it self-sert.
You could just join the communities.
Discord, I think, is an incredible.
super underrated
platform because
the communities feel kind of private.
You're like getting,
you don't feel like you're,
you know, seeing somebody trying to
get, you know, like advertise something
for SEO juice.
There's no SEO juice in Discord.
It's just people talking about
what they're passionate about and
it gets really niche.
And when you find a, like, an
interest group in Discord
that like has to
do with some new piece of technology that's just being developed right now and doesn't really
work very well at all. You get people who are just trying to figure out what to do with it and how
to make it better. And I think that's like, that's the first core piece of magic that jumps to
mind. There's got to be like a willingness to be weird. Because like if you jump into any of these
communities at face value,
it's stupid.
Like, oh, this is like
just a game or it's like a really
weird game. I mean, I'm not really
a perennible game.
So I'm going to leave right now.
And you, and
not only do you have to be weird, but you have to be
creative. Like, okay, it's,
you know, this is, you're just cats
on the blockchain and people are just
like trading cats back and forth.
It's, you can't like
look at the community as simply that.
think about what you could do with it.
What is this unlock that wasn't achievable before?
And I think there are people who just are good,
who will do this, and they'll join the communities and brainstorm live,
and you can see everybody brainstorming in real time.
But another incredible example of this was the mid-jury discord.
It became the biggest server in Discord by far.
And why did that happen?
Well, it started with something weird, silly, maybe not super useful.
But you could see all the enthusiasts, like remixing and brainstorming live,
how to turn it into something beautiful and how to make it useful.
and
then
you know
just explode it
like
I
it's the most
it's the
it's the
it's the most incredible
like niche community
I think
the Discord has ever seen
because of like
how
useless it started
and how insanely
exciting it became
so
um like
I mean
I think I saw big sleep
I was like playing around
with this model called big sleep
and
2021 that
what you generate images that look
kind of like deviant art
and you could see you could like
they're all animated images and they
none of them really made sense but you could get some really
cool stuff not like potentially something you'd want to make your desktop
wallpaper and if you're really like deep in some deviant art
communities you know you kind of appreciate it
and so that that was like oh there's like a kernel of something here
and it took like another year or two
before Mid Journey started to like pick up
but that was like
Where were you seeing all of this Alex?
Like where were you scouring?
Just random forums or just wherever your nose told you to go?
But basically there's this Twitter account
I'm trying to remember what's called
that posts AI research papers
and
and like kind of tries to show what you can do with them
and I discovered this Twitter account in like 2021.
And I think it was not,
it wasn't at all like related to crypto,
but it was a way, you know,
Big Sleep was like the first thing I saw that used AI to generate things
that could potentially be NFTs.
So I started experimenting around like how,
how much you could direct it to make an NFTs.
collection that would make any sense.
It was very, very difficult.
But that was how
that was like the first
generative. And this was before you were even
thinking about starting Open Rata, right?
Yeah, yeah. This was when I was
full time at OpenC.
Oh, is, yeah, I got the,
it's a collic.
This Twitter account.
All right.
I really recommend it.
They basically post papers and explore how this paper gets useful.
They post animations.
They make AI research kind of fun to engage with.
And that was my first experience.
Okay.
So, I mean, that's a massive win for X or formerly as it was known back then, Twitter as a platform, right?
It gave birth to kind of like two of the biggest technologies, crypto,
or also known as crypto, Twitter.
And now apparently like, you know, all the AI research stuff,
which kind of put you on to the path that led you to Open Rattah.
So if I've got this right, Alex, you know, you were full time at OpenC,
which, you know, multi-billion dollar company,
loads of important stuff to do there,
but you still found the time to kind of scour this fringe technology,
because that's what AI was at the time prior to kind of GPT2
or GBT3, no one really knew about this.
And you were playing around with these gen AI models,
this generative AI models that would, you know,
create this magical little substance
and maybe it came in the form of a picture or a weird little cat,
and you kind of like jumped into these niche forums of enthusiasts, as you say,
and kind of explored that further.
And it sounds like you kind of like hone that even beyond your journey from OpenC
when you left.
I remember actually meeting you in this,
kind of like this abyss between you leaving OpenC
and starting OpenRouter,
where you were kind of brainstorming a bunch of these ideas.
And I remember a snippet from our conversation
in like one of the weworks here,
where you just kind of like had whiteboarded a bunch of AI stuff.
And one of those things was kind of like the whole topic of inference.
And if I'm being honest to the Ix,
I had no idea what that word even meant back then.
I was extremely focused on all the NFTs.
stuff and all the crypto stuff, my background's in all of that. But I just found that fascinating
that you always had your nose in some of the early communities. And I think that's a really
important lesson there. I want to pick up on something that you actually brought up when you
said you discovered kind of like your path to open router, Alex. And that is, you said you were
playing around with these early AI models. So not the GPTs, before Claude was even created. You'll
playing around with these random models that you would find either on forums, on Twitter,
or on Reddit, right? And you would experiment with them. And I find it fascinating that back
then, even when GPT became a thing, you were convinced that there would be hundreds of thousands,
would you say hundreds of thousands of AI models? Back then, that wasn't a normal view.
Back then, everyone was like, you need hundreds of millions of dollars. Maybe it was tens of millions
of dollars back then. And it was going to be a rich man's
game. Yeah, it was basically the
Alpaca Project that
that kind of put me over
the sets on
model, on there being like
many, many, many models instead of just
a very small number.
Can you explain what the Alpaca project
is for the audience? Yeah.
So,
the Alpaca project,
you know, after
Lava came out,
you really could not chat with it very well.
It was a text completion model.
there were like a couple benchmarks where it beat gpti 3
and it was about a tenth of the size of
what most people thought gptu3 was sized at
so it was a pretty incredible achievement
but you can it wasn't really like it didn't the user experience wasn't there
and the alpaca project took chat gpti
and generated a bunch of synthetic outputs
and then they fine-tuned Lama on those synthetic outputs.
And this did two things to Lama.
It taught it style and it taught it knowledge.
It taught it like the style is like how to chat,
which was the big user experience gap.
And it made it smarter.
Like you can fine-tuning transfer to both style and knowledge.
And the model would like respond to things that it had,
the content of synthetic data was reflected in the model's performance on benchmarks after that point.
So if you can do that without revealing all the data that goes in,
now there's a way you could sell data via API without just dumping all the data out to the world
and then never being able to monetize it again.
So there's like a brand new business model around data that emerges.
Yet like the ability to create, to work towards open intelligence and build like new architectures,
test them more quickly and and fine-tune them quickly.
Basically you can build on top of the work of giants.
You don't have to start from zero every time.
A lot of like the biggest development.
experience
innovations just involve
giving developers a higher stair
to start walking up
so they don't have to start at the bottom of the
staircase every single time.
And
that was like the
big
generous
give that Lama
had for the community.
And it wasn't the only company doing
open source models. Mastrol
came out with 7B Instruct.
a few months later.
It was an incredible model.
Then they came out with the first open weight mixture of experts a few months later.
It felt like actual intelligence, but completely open.
And all of these provide higher and higher stairs for other developers to kind of like,
basically to crowdsource new ideas from the whole planet.
And let these new ideas build on top of really good,
foundations. So, you know, when that, when that like whole picture started to form into place,
um, it felt like, okay, it was going to be like a huge inventory situation. You're kind of like
NFT collections were a huge inventory situation. Obviously completely different, really different
market dynamics, really different type of, of goal that buyers have. Um, and so a lot of like my
early experimentation. Like I made like a Chrome extension called Window AI. I did like a few other
things. We're just about learning how the ecosystem works and like what makes it different and how the
like what people really want, what developers really want. So that leads us to open router itself,
right? So I kind of want you to help explain to the listeners who aren't familiar with OpenRouter what it
does. Because I think a lot of people there, the way they interact with an AI is they send a prompt to
their model of choice. They use chat GPT or they use the
Grock app or they're on Gemini. And they kind of live in these siloed worlds. And then the next step up from
the people are those kind of who use it professionally, who are developers. They're interacting with
APIs. Maybe they're not interfacing with the actual UI, but they're calling a single model.
And OpenRouter kind of exists on top of this, right? Can you walk us through how it works and why
so many people love using Open Router? Open Router is an aggregator and marketplace for large
language models. You can kind of think of it as like a, you know, like a strike meets cloud
flare for both of them.
It's like a single pane of glass.
You can orchestrate, discover, and optimize all of your intelligence needs in one place.
You know, one billing provider gets you all the models.
There's like 470 plus now.
Like all the models, they sort of implement features, but they do it differently.
And they also, there's a lot of like intelligence brownouts, as Andre Carpast
calls them where models just stay down all the time.
Even the, you know,
even the top models like Anthropic and Gemini and OpenAI.
So what we do is, you know,
we like developers need a lot of choice.
CTOs need a lot of reliability.
CFOs need predictable costs.
CSOs need like complex policy controls.
All of these are inputs to what we do,
which is,
build like a single pane of glass that makes models more reliable, lower cost,
gives you more choice, and it helps you choose between all the options for what to source your
intelligence.
How does it work?
Because I would imagine, like, EJAS and I on the show, we frequently talk about benchmarks,
right, where a certain model is the best at coding.
And that infers that maybe you should go to that model to do all of your coding needs because
it's the best at it. But it would appear as if it's not true if you're routing through a lot of
different providers. So how do you consider which provider gets routed to when and how to get
the best result for what you're asking? So we've taken a different approach so far, which is instead
of like focusing on a production router that picks the model for you, we tried to help you
choose the bottle. So we build lots, we create lots of analytics, both on your account and
on our rankings page to help you browse and discover the models that like the power users
are really using successfully on a certain type of workload. Because we think like developers today
primarily want to choose the model themselves. Switching between all families can result in like a lot
like very unpredictable behavior.
But once you've chosen your model,
we try to help developers not need to think about the provider.
There are like sometimes dozens of providers for a given model,
all kinds of companies,
including the hyperscalers like AWS,
Google Vertex and Azure,
and like scaling startups,
like together, fireworks,
deep infra, and a long tail of providers that provide very unique features, very exceptional
performance. There's all kinds of differentiators for them. So we do is we collect them all in one
place. And if you want a feature, you just get the providers that support it. If you want
performance, you get prioritized to the providers that have high performance. If you really are
cost-sensitive, you could prioritize to the providers that are really low-cost today.
And we basically create all these lanes. There's like innumerable ways you could get
routed, but you're in full control of the overall user experience that you're aiming for.
And that's what we found that was missing from the whole ecosystem was just a way of doing that.
And, and, you know, we get like between, on average, 5 to 10% uptime boosts over going to providers directly just by load balancing and, and sending you to the top provider that's, like, up and able to handle your request.
And we do, like, we really focus hard on efficiency and performance.
we only add about like 20 to 25 milliseconds of latency on top of your request,
and it all gets deployed very close to your servers up the edge.
So we overall get, you know, just we stack providers.
We figure out like what you can benefit from that everybody else is doing
and just give you the power of big data as a developer just accessing your model of choice.
So it kind of allows you to harness the collective knowledge of everybody, right? You get all of the data,
you have all of the queries, you know which yields the best result, and you're able to deliver
the best product for them. Now, in terms of actual LLMs, EJ has actually pulled this up just before,
which is the leaderboard. And I'm interested in how you guys think about LLMs, which are the best,
how to benchmark them, and how you route people through them. Is there a specific, do you believe
that benchmarks are accurate and do you reflect those in the way that you route traffic through these models?
In general, we have taken the stance that we want to be the capitalist benchmark for models.
Like, what is actually happening?
And part of this is that I really think both the law of large numbers and the enthusiasm of power users
are really, really valuable for everybody else.
like when you're routing to
like Claude in
let's say you're routing to Claude for
and you're based in Europe
there are, you know, all of a sudden
there might be like a huge variance in throughput
from one of the providers
and we're only able to detect that
if like some other users have discovered it before you.
And so we route around the provider
that's like running kind of slow in Europe.
and send you, if your data policies allow it,
to a much faster provider somewhere else.
And that allows you to get faster performance.
So that's like on the provider level,
how like numbers help.
On the like model selection level,
like what you see on this rankings page here,
power users will, like when we put up a model,
like we put up a new model today from a new model lab called ZAI,
like the power user's instant.
discover it. We have this LLM enthusiast
community that
dives in and
really figures out
what a model is good for
along a bunch of core use cases.
The power users figure out which
workloads are interesting
and then you can just see in the data
what they're doing
and everybody can benefit for.
That's why we like open up
our data and share it
for free on the rankings page here.
I'm seeing
this one consistent unit across all these rankings, Alex, which is tokens, right? And Josh and I have
spoken about this on the show before, but I'm wondering how, like, you've chosen this specific
unit to measure, you know, how good or effective these models are, how consumed or used they are.
Can you tell us a bit more as to why you've picked this particular unit and what that tells you
as like the open router platform as to like how a user is using a particular model?
Yeah, I think, I mean, I think dollars is a good metric too.
The reason we chose tokens is primarily because we were seeing prices come down really quickly.
For most of the open router's been around since the beginning of 2023.
And I didn't want a model to be penalized in the rankings just because, like,
prices are going down really dramatically.
Now, like, there's a paradox called Jevin's paradox, which is that when prices decrease,
like 10x, users' use of some component of infrastructure increases by more than 10x.
And so maybe they didn't get any legs at all.
But I thought there were some other advantage.
just to using tokens too.
Tokens like don't have this penalty and don't rely on Jevin's paradox,
which can have like a lot of lag.
They also are a little bit of a proxy for time.
You know, a model that is like generating a lot of tokens and doing so for a while
across a lot of users,
means that a lot of people are like reading those tokens and actually doing something
with them.
And same goes for input.
Like if I really want to like send an enormous number of,
of documents and the model has like a really, really, really tiny prompt pricing,
I think that's still valuable in something that we want to see.
We want to see that this model is like processing an enormous number of documents.
That's like a use case.
That should show up in the rankings.
And so we decided to go with tokens.
We might like add dollars in the future.
But I think tokens are, you know, they don't have this like Jevin's paradox.
ox lag. And there wasn't anything else. Like, nobody was doing any kind of like overall
analytics. We didn't see any other company even do it until Google did a few months ago,
where they started publishing the total amount of tokens processed by Gemini. So we'll see,
like, what the, you know, which use cases really need dollars, but, but tokens have been
holding up pretty well. Yeah. I mean, this dashboard,
is awesome. And I recommend anyone that's listening to this that can't see our screen to get on
Open Router's website and check it out. I've been following it for the last two weeks, kind of
pretty rigorously, Alex. And what I love is you can literally see, so two weeks ago,
Grock 4 got released, right? And Josh and I were making a ton of videos on this. We were using
it for pretty much everything that we could do. And then this other model came out of China,
much a few days after called Kimi K2.
And I was like, oh, yeah, whatever.
This is just some random Chinese model.
I'm not going to focus on it.
And then I kept seeing it in my feed and I thought, okay, maybe I'll give this a go.
And I kind of like went straight to Open Rata just kind of like almost gauge the interest
from a wider set of AI users.
And I saw that it was skyrocketing, right?
And then I saw that, you know, Quinn dropped their models last week.
And again, I came to Open Rata and it like preceded the trend, right?
people had already started using it.
So I love how you describe Open Router as this kind of like prophetic orb, basically,
where the enthusiasts of the community itself can kind of like front run very popular trends.
And I think that's a very powerful moat.
And kind of on this path, Alex, I noticed that a lot of these major model providers see the value in this, right?
So if I'm not mistaken, Open AI kind of like used your platform to kind of see,
secretly launched their
frontier model before they officially
launched it, right?
Can you walk us through
how that comes about and more importantly,
why they want to do that and why they chose
OpenEI
will sometimes
give early access to
models to some of their customers for
testing, and we asked
them if they wanted
to try a stealth model with us, which
we had never done before.
It involved
launching it as
under another name and
seeing how users
respond to it without having any
bias or sort of
inclamation for against the
model at the onset.
And
it would be like a new
way of testing it and a new way
of, it was like
an experiment for both us and them.
And they generously
decided to take the leap of faith and try it.
And we launched GPT4.1 with them at,
and we called it Quasar Alpha.
And it was a million token context length model,
opening us first very, very long context model.
And it was also optimized for coding.
And the incredible, there were a couple incredible things that happened.
First, we have this community of benchmarkers that run open source benchmarks.
And we give a lot of them grants to help fund the benchmarks, grants of open router tokens.
They'll just run the suite of tests against all the models.
And some of them are very creative.
Like there's one that tests, like, you know, ability to generate fiction.
There's one that tests, like, how, like, whether it can make a 3D object in Minecraft called MCBench.
There are a few that test different types of coding proficiency.
There's one that just focuses on how good it is at Ruby, because Ruby turns out a lot of the models are not great at Ruby.
There are a lot of, like, languages that all the models are pretty bad at.
And so we have this, like, long tail of very niche benchmarks, and all the benchmarkers,
ran for free their benchmarks on Quasar Alpha and found pretty incredible results for most of them.
And so the model got like, you know, Open AI got got this feedback in real time.
We kind of like help them find it and they like they made another snapshot, which we launched as Optimus Alpha.
and they could compare the feedback that they got from the two snapshots.
And then two weeks later, they launched GPD4.1 live for everybody.
So it was like an experiment for us, and we've done it again since
with another model provider that's still working on it.
And it's kind of like a cool way of learning.
of crowdsourcing benchmarks that you wouldn't have expected
and also getting unbiased community sentiment.
That's great.
So now when we see a new model pop up
and we want to test GPT5,
we know where to come to to try it early.
We'll see because the rumor is it's coming soon.
So we're on your watch list.
But I do want to ask you about open source for closed source
because this has been an important thing for us.
We talk about this a lot.
You have a ton of data on this.
I'm looking at the leaderboards.
There are open source models that are doing very well.
close source, what are your takes in general?
How do you feel about open source versus closed source models,
particularly around how you serve them to users?
Both models, both types of models have supply problems.
But the supply problems are very different.
Typically what we see with close source models is that there's very few suppliers,
usually just one or two.
Like with GROC, for example, there's GROC direct and there's Azure.
with Anthropic, there's Anthropic direct,
there's Google Vertex, there's AWS Bedrock,
and then we also like deploy it in different regions.
We have an EU deployment for customers who'd like only want their data,
like to stay in the EU.
And we do custom deployments for the closed source models too
to just kind of guarantee good throughput high and high rate limits for people.
The, like, the tricky part is that, like, the demand,
usually the close-source models are doing most of the tokens on open router.
It's dominant, you know, it's probably 80-ish, 70 to 80 percent close-source tokens today.
But the open-source models have a much more.
more fragmented supply, like, sell side order book.
And like the rate limits for each provider is like a less stable on average.
It usually takes a while for the hyperscalers to serve a new close source,
a new open source model.
So we, so the load balancing work that we do on open source models tends to be a lot more valuable.
The load balancing work that we do for close source models
tends to be very focused on like caching and feature awareness,
making sure you're getting like clean cache hits
and only transitioning over to new providers when your cache is expired.
For open source models, like there's way less caching.
Very, very few open source models implement caching.
And so like switching between providers becomes more common.
and we also track a lot of quality differences between the open source providers.
Some of them will deploy at lower quantization levels,
which means it's kind of like a way of compressing the model.
Generally, doesn't have an impact on the quality of the output.
And yet we still see some odd things from some of the open source providers.
And so we run tests internally.
to detect those outputs, and we're building up a lot more muscle here soon,
so that they get pulled out of the routing lane and don't affect anyone.
So closed source accounts for 80% or something like that, a very large amount.
Do you see that changing?
Because that post we just had, it's a 9 out of the 10 fastest growing LLMs last week.
They were open source.
And every time it seems like China comes out with another model, it was Kimi K2 a week or two ago,
it kind of really pushes the frontier of open source forward.
And the rate of acceleration of open source seems to be as fast, if not faster than
closed source, where it's just making these improvements very quickly.
It has the benefit of being able to compound in speed-based because it's open source
and everyone can contribute.
Do you think that starts to change where the percentage of tokens you're issuing are from
open source models versus close source?
Or do you continue to see a trend where it's going to be Google, it's going to be open
AI that are serving a majority of these tokens to users.
In the short term, we're likely to see open source models continue to dominate the fastest
growing model category on open router.
And the reason for that is that a lot of users who come for a closed source model,
but then decide they want to optimize later, either they want to save on costs or
or like try out a new model
that's supposed to be a little bit better
in some direction that their app cares about
or their use case cares about.
Then they leave the close source model
and go to an open source model.
So open source tends to be like a last mile optimization thing.
Making a big generalization there
because the reverse can happen too.
And so because it's a last mile optimization thing,
the jump from this model is not
being used at all.
This model is really being used by a couple of people who,
like,
have left Claude for and,
and want to try,
like,
some new coding use case will be,
like, bigger.
Then, you know,
the closed source models, which started a really high base and don't have,
like, growth quite as dramatic.
So,
um,
the other part of your question, though,
was whether there's going to be like a flippening of clothes and
or,
some sort of like chipping it away at that monopoly of close source tokens.
It's hard to predict these things because, you know,
I think like the biggest problem today with open source models
is that the incentives are not as strong.
Like the model lab and the model provider,
they've, you know, they're sort of established incentives
for how to grow as a company and attract good, high quality,
AI talent and
giving the model weights away
impairs those incentives.
Now, like,
we might see,
yeah, this is where we might see, like,
the centralized providers
helping in the future.
A way for, like,
a really good incentive scheme
that, like, allows
high quality talent to work on
an open source model
that remains
open weights at least
could fix this.
I try to stay close to the
decentralized providers
and learn a lot from them.
There's some cool, on the provider side,
on running inference,
I think there's some really cool
incentive schemes being worked on,
but on actually developing the models
themselves, I haven't seen too much,
unfortunately.
So I think,
If we see one, opening in the radar.
And until we do, I'd personally doubt it.
TBD, do you have personal takes on how you feel about open source first closed source?
Because this has been a huge topic we've been debating too.
It's just the ethical concerns around alignment and close source models versus open source.
When you look at the competitors, China, generally speaking, is associated with open source,
whereas the United States is generally associated with closed source.
And we saw Lama and meta, like, release the open source models, but now,
they're raising a ton of money to pay a lot of employees, a lot of money to probably develop
a close source model. So it seems like the trends are kind of split between U.S. and China. And I'm
curious if you have any personal takes, even outside of OpenRouter, of which you think serves
better for the long-term outlook on, I mean, the position of the United States or just the general
safety and alignment conversation around AI. I mean, like a very simple fundamental difference
between the two is that
an innovation in open source models
can be copied more quickly than an innovation
in closed source models.
So in terms of velocity
and how far ahead one is over the other,
that is a massive structural difference
that means that closed source models
should be theoretically always ahead
until a really interesting incentive scheme
develops like I mentioned before.
I think that's, you know, I don't see like evidence that's going to change.
In terms of China versus the U.S., it's, I think it's very interesting that China has not
had like a major close source model.
And I don't really see a great reason why, I'm not aware of any reasons that's not
That's not going to be the case in the future.
My prediction is that there's going to be a close source model from China.
And, you know, if, you know, if, like, it's possible that deepsies kind of like,
and moonshot and a few of the, and Quinn have, like, built up really sticky talent pools.
But generally with talent pools, after enough years have passed, people quit and go and create new companies and build new talent pools.
And so we should see some of that.
It's not the case that the AI space has NDAs or non-competes that the hedge fund space has.
That might happen in the future too.
But assuming that, you know, the current non-compete culture continues,
there should be like more companies that pop up in China over time.
And I'm betting that some of them will be closed source.
And my guess is that like the two nations will start to look more similar.
Yeah, I guess, you know, that's why you have Zuck,
dishing out 300 mil to a billion dollar salary offers to a bunch of these guys, right?
One more question on China versus the U.S.
I kind of agree with you.
I didn't really expect China to be the one to lead open source anything, let alone the most important technology of our time.
Do you think is their secret source to building these models, Alex?
And I know this might be out of the forte of Open Rattor specifically,
but as someone who has studied this technology for a while now,
I'm struggling to figure out what advantage they had.
You know, they're discovering all these new techniques.
And maybe the simple answer is like constraints, right?
They don't have access to all of Nvidia's chips.
They don't have access to infinite compute.
So then maybe they're forced to kind of like figure out other ways around the same kinds of problems that Western companies are focused on.
But it's pretty clear that America, with all its funding, hasn't been able to make these frontier breakthroughs.
So I'm curious whether you are aware of or know some kind of technical,
moat that Chinese AI researchers or these AI teams that are featuring on OpenRata day in and day
out have over the U.S.
Well, I don't know.
There are certainly some that they've come up with that like Deepseek had a lot of
very cool inference innovations that they published in their paper.
But a lot of what they published in the original R1 paper were things that like that Open AI had
done independently,
themselves,
many months before.
So,
like,
on the inference side
and on some of the model side,
I think, like,
DeepSeek,
we had talked to their team
for years before R1 came out.
They had many models before that,
and they were always, like,
a pretty sharp,
optimistic, like, team for doing inference.
Like, they came up with, like,
the best user experience for caching prompts,
long before Deep Csiecar 1 came out.
And they had very good pricing.
They were like, by far the strongest Chinese team that we were aware of well before that happened.
And so I'm guessing there was like some talent accumulation that they were working on in China for people who wanted to stay in China.
And that's a huge advantage.
American companies are obviously not doing that.
Duck is very on point that a lot of this is just based on talent.
A lot of AI is open and out there and just like,
and very composable, like a big tree of knowledge.
There's a paper that comes out and it cites like 20 other papers
and you can go and read all of the cited papers.
And then you like have kind of a basis for understanding the paper,
but you really have to go one level deeper
and read all the cited papers two levels down
to really understand what's going on.
And it's just that no very few people can do that.
And it takes like a lot of years of experience
to like actually apply that knowledge
and learn all these things that have not been written in any paper at all.
And there's just there's just such such a small number of people
who can really lead research on all the different dimensions
that go on to making them all.
And, and, like, the border between China and the U.S. is pretty defined.
You have to leave China and move to the U.S. and really establish yourself here.
So I do think there's, like, country arbitrage.
There's, like, there's, you know, the hedge fund background arbitrage.
There's hardware arbitrage.
Like, there's, like, a ton of hardware that's only available in China, but not here, vice versa.
that creates an opportunity.
And this will just continue to happen.
Yeah, I think this arbitrage is fascinating.
I read somewhere that there's like probably less than 200 or 250 researchers in the world
that are kind of like worthy of working at some of these frontier AI model labs.
And I looked into some of the backgrounds of the team behind Kimi K2,
which is this recent open source model out of China,
which kind of like broke all these crazy rankings.
I think it was like a trillion parameter model or something crazy like that.
And a lot of them worked at some of the top American tech companies.
And they all graduated from this one university in China.
I think it's Singhua, which apparently is like, you know, the Harvard of AI in China, right?
So it's pretty crazy.
But Alex, I wanted to shift the focus of the conversation to a point that you brought up earlier in this episode,
which is around data.
So here's the context that Josh and I have spoken about this at Lutz, right?
We are obsessed with this feature on OpenAI, which is memory, right?
And I know a lot of the other memory, sorry, a lot of the other AI models have memory as well.
But the reason why we love it so much is I feel like the model knows me, Alex.
I feel like it knows everything about me.
It can personally curate any of my prompt.
It just gets me.
It knows what I want.
and it just serves it up to me and a platter, and off I go, you know, doing my thing.
Now, Open Router sits on top of like, kind of like the query layer, right?
So you have all these people writing all these weird and wonderful prompts and kind of routing it through on towards like different AI models.
You hold all of that data, or maybe you have access to all of that data.
And I know you have something called private chat as well where you don't have access to it.
talk to me about what open route and what you guys are thinking about doing with this data.
Because presumably, or in my opinion, you guys have actually the best mode, arguably better than chat GPT,
because you have all these different types of prompts coming from all these different types of users for all these different types of models.
So theoretically, you could spin up some of the most personal AI models for each individual user if you wanted to.
Do I have that correct or am I speaking crazy?
No, that's true.
It's something we're thinking about.
By default, your prompts are not logged at all by,
we don't have prompts or completions for new users by default.
You have to toggle it on in settings.
But, you know, like the result,
a lot of people do toggle it on.
And as a result, I think we have like by far,
the largest multi-model prompt dataset.
But what we've done today,
we've barely done anything with it.
We classify a tiny, tiny, tiny subset of it,
and that's what you see in the rankings page.
But what it could be done on like a per account level
is really like three main things.
One, memory right out of the box.
You can get this today by like combining open router
with like a memory as a service.
There's like a couple companies that do this,
like Mem Zero and Super Memory.
And we can partner with one of those companies
or do something similar and just like provide a lot of distribution.
And that basically gets you like a chat GPT as a service
where it feels like the model really knows you
and context automatically gets,
the right context gets added to your prompt.
The other things that we can do are like help you select the right model
more intelligently.
There's a lot of models
where there's like a super clear
like migration decision
that needs to be made.
And we can just see this very clearly in the data.
But we right now,
we just like, you know,
we have like a channel or like some kind of communication channel
open with the customer.
We can just tell them like, hey,
and we notice you're using this model a ton.
It's been deprecated.
This model is significantly better.
you should move this kind of workload over to it.
Or like this workload,
you'll get way better pricing if you do this.
And that's basically like,
that's the only sort of guidance
and kind of like opinionated routing we've done so far.
And it could be a lot more intelligent,
a lot more out of the box,
a lot more built into the product.
And then the last thing we can do.
I mean, there's probably tons of things
we're not even thinking about.
But, like, getting really, really smart about how models and providers are responding to prompts and showing you just the really coolest data.
Just like telling you what kinds of prompts are going to which models and how those models are replying and just like characterizing the reply and all kinds of interesting.
ways like did the model refuse to answer? What's the refusal rate? Did the model, um,
did the model like successfully make a tool call or did it decide to ignore all the tools that
you passed in? That's a huge one. Um, did the model like pay attention to its context? Did, uh,
you know, did, did some kind of truncation happening, happen before you sent it to the model?
So there's all kinds of like edge cases that, that cause developers apps to just get,
Gummer and they're all detectable.
I'm so happy you said that
because I have this kind of like hot take
but maybe not so hot take, which is,
I actually think all the frontier models right now
are good enough to do the craziest stuff ever
for each user. But we just haven't been able to unlock it
because it just doesn't have the context.
Sure, you can attach it to a bunch of different tools and stuff,
but if it doesn't know when to use the tool
or how to process a certain prompt
or if the users themselves don't know how to read
what the output of the AI model themselves,
like you just said,
you know, we need some kind of like analytics into all of this.
Then we're just kind of like walking around like headless chickens almost, right?
So I'm really happy that you said that.
One other thing that I wanted to get your take on
on the data side of things is,
I just think this whole concept or notion of AI agents
is becoming such a big trend, Alex.
And I've noticed a lot of frontier model labs
released new models
that kind of spin up several instances of their AI model
and they're tasked with a specific role, right?
Okay, you're going to do the research.
You're going to do the August rating.
You're going to look online via a browser, blah, blah, blah, blah, blah.
And then they coalesce together at the end of that little search
and refine their answer and then present it
to someone, right? You know, GropFa does this, Claude does this, and a few other models.
I feel like with this data that you're describing, Open Rrata could be or could offer that as a
feature, right, which is essentially you can now have super intuitive, context-rich agents
that can do a lot more than just talk to you or answer your prompts, but they could probably
do a bunch of other actions for you. Is that a fair take, or is that something that maybe
might be out of the realm of open route.
Our strategy is to be the best
inference layer for agents.
And what I think
what I think developers want
is control over how their agents work.
And our developers at least
want to use us as like a single pane of glass
for doing inference.
But they want to like see
and control the way an agent looks.
An agent is basically just something
that is doing inference in a loop
and controlling the direction it goes.
So what we want to do is just build incredible docs,
really good primitives that make that easy to do.
So I think a lot of our developers
are just people building agents.
And so what they want is they want the primitives to be solved
so that they can just keep creating new versions and new ideas
without worrying about like, you know,
re-implementing tool calling over and over again.
And so like at least for,
this is like a, it's a tough problem given how many models,
there's like a new model or provider every day.
And people actually want them and use them.
So to standardize this,
make these tools really dependable,
that's kind of like where we want to focus
so that agent developers don't have to worry about it.
As we level up towards closer and closer to getting to AGI,
beyond, I'm curious what OpenRouter's kind of endgame is,
if you have one.
What is the master plan where you hope to end up?
Because the assumption is as these systems get more intelligent,
as they're able to kind of make their own decisions
and choose their own tool sets,
what role does OpenRouter play in continuing to route that that data through?
Do you have a kind of master plan, a grand vision of where you see this all heading to?
You're saying like as agents get better at choosing the tools that they use,
what becomes our role when like the agents are really good at that?
Yes, yes.
And like, where do you see OpenRourter fitting into the picture?
And what would be the best case scenario for this future of OpenRouter?
Right now, Open Router is a bring-your-own-tool platform.
We don't have like a marketplace of MCPs yet.
And I do think like a lot of the,
I think most of the most used tools will be ones that developers configure themselves.
Agents just work like, they're given access to it.
Like I think like a holy grail for open router is that the,
The ecosystem is going to like, basically,
my prediction for how the ecosystem is going to evolve is that all the models
are going to be adding state and other kinds of stickiness that just make you want to stick with them.
So they're going to add server-side tool calls.
They're going to add, like, you know, web search that is stateful.
Like, I had memory.
They're going to add all kinds of things that try to prevent developers from leaving.
and an increase lock-in.
And open router is doing the opposite.
We want, like, developers to not feel vendor lock-in.
We want them to feel like they have choice
and they can, like, use the best intelligence,
even if they didn't look long.
You know, it's never too late to switch to a more intelligent model.
That would be, like, you know, a good, always-on outcome for us.
And so what I think is,
we'll end up doing is like partnering with other companies or building the tools ourselves
if we have to so that developers don't feel stuck. That's how I you know there's a lot of ways the
ecosystem could evolve but that's how I would put it in that shell.
Okay now there's another personal question that I was really curious about because I was also
right there with you in the crypto cycle when NFTs got absolutely huge was a big user of open
see. And it was kind of this trend that went up and then went down. And NFTs kind of fizzled out. It
wasn't as hot anymore. And AI kind of took the wind from the sales. And it's a completely separate
audience, but a similar thing where now it's the hottest thing in the world. And I'm curious how
you see the trend continuing. Is this a cyclical thing that has ups and downs? Or is this a one-way
trajectory of more tokens every day, more AI every day? Do you see it being a cyclical thing? Or is this a one-way
trend towards up and to the right.
NFTs kind of follow crypto in a indirect way.
When crypto has ups and downs, NFTs generally lag a bit, but they have similar ups and downs.
And crypto is an extremely long-term play on, like, building a new financial system.
and there are so many reasons that that's not going to happen overnight.
And they're like, it's very entrenched reasons.
Whereas AI, there are some overnight business transformations going on.
And the reason AI, I think, moves a lot.
But one of the reasons that AI moves a lot faster is it's just about making computers behave more like humans.
So if a company already works with a bunch of humans,
then there's some engineering that needs to be done.
There's some thinking about how to scale this.
But in general, I think that after seeing what can be possible,
inference will be the fastest growing operating expense for all companies.
It'll be like, oh, we can just hire high-performing employees.
at a click of a button.
And they're,
they all form predictably.
They're all AI.
And,
and we can measure,
and they,
they work 24-7.
They scale elastically.
It's like,
you know,
it's not that hard.
It's not like huge mental model shift.
It's just like a huge upgrade to the way
companies work today,
in most cases.
So it's just completely different from crypto.
There's,
like,
other than both being,
you know,
than NFTs, I mean, other than both being new, they're fundamentally very different changes.
You're probably one of very few people in the world right now that has crazy insights to every single AI model.
Definitely more than the average user, right?
Like, I have like three or four subscriptions right now, and I think I'm a hot shot, but you get access to like 47 models right now on Obrata.
So an obvious question that I have for you is,
I'm not going to say in the next couple of years,
because everything moves way too quickly in this sector.
But over the next six months,
is there anything really obvious to you
that should be focused on within the AI sector?
Maybe it's like the way that certain models should be designed,
or perhaps it's at the application layer
that no one's talking about right now.
Because going on to like going on from our earlier part
the conversation, you just pick these trends out really early.
And I'm wondering if you see anything.
And it doesn't have to be open-ractor related.
It could just be AI-related.
I've seen the models trending words caring more about how resourceful they are
than what knowledge they have in the bank.
Not all of, I feel like a lot of the application, I think the model labs maybe,
a lot of them, I don't know how many of them really deeply believe that.
But, you know, a couple of them talk about it.
And I don't think it's really hit the application space yet.
Because people will ask chat GPT things.
And if the knowledge is wrong, they think the model is stupid.
And that's just kind of a bad way of evaluating a model.
Like, whatever knowledge a person has, whatever a person like where calls happen at a certain time,
like, does not, it's not a proxy for how smart they are.
The intelligence and usefulness of a model is going to trend towards how good it is at using tools.
And how good it is at like paying attention to its context, a long, long, long, long context.
So it's like it's total memory capacity and accuracy.
So I think those two things need to be like emphasized more.
It might be that models pull all of their knowledge from online databases,
from real-time scraped indices of the web,
along with a ton of real-time updating data sources.
And they're always kind of like relying on some sort of database for knowledge,
but relying on their reasoning process for tool-calling.
You know, like, we put it in, like, we spend probably, we spend probably like the plurality of our time every week on tool calling and figuring out how to make it work really well.
Humans, like the big difference between us and animals is that we're tool users and tool builders.
And that's like where human acceleration and innovation has happened.
So how do we get models creating tools and using tools very?
very, very effectively.
There's very little, like,
there are very few benchmarks.
There's very little prior art.
There's the towel bench for measuring how good a model is at tool calling.
But there's, and there's like maybe a few others.
There's sui bench for measuring how good a model is at multi-turn programming tasks.
It's very hard to run, though.
It costs like, you know, for Sonnet, it could cost like $1,000 to run it.
the user experience for kind of evaluating the real intelligence in these models is not good.
And so as much as we don't have benchmarks listed on OpenRouter today, I love benchmarks.
And I think the app ecosystem and developer ecosystem should spend a lot more time making very cool and interesting ones.
Also, we will give credit grants for all the best ones.
I highly encourage it.
Well, Alex, thank you for your time today.
I think we're coming up on a close now.
That was a fascinating conversation, man.
And I think your entire journey from just non-AI stuff,
so OpenC all the way to OpenRouter,
has just been a great indicator of where these technologies are progressing
and more importantly, where we're going to end up.
I'm incredibly excited to see where Open Ratter goes beyond just prompt,
prouting. I think some of the stuff you spoke about on the data side of things is going to be
fascinating and arguably one of your bigger features. So I'm excited for future releases. And as
Josh said earlier, if GPT5 is releasing through your platform first, please give us some credits.
We would love to use it. But for the listeners of this show, as you know, we're trying to bring on
the most interesting people to chat about AI and frontier tech. We hope you enjoyed this episode.
ways. Please like, subscribe and share it with any of your friends who would find this interesting,
and we'll see you on the next one. Thanks, folks.
