The Infra Pod - Betting on Open Source Models to be the future (Chat with Benny, CTO of Fireworks AI)
Episode Date: April 28, 2026In this episode of The Infra Pod, hosts Tim Chen (Essence VC) and Ian Livingstone (Keycard) sit down with Benny Chen, co-founder of Fireworks AI, to explore the evolving world of AI inference infrastr...ucture.Benny shares his journey from Meta — where capacity planning meetings made it clear GPUs were heading "up and to the right" — to co-founding Fireworks AI before ChatGPT even launched. The conversation dives deep into why the team bet early on inference over training, how they approached model optimization from horizontal compiler techniques to per-model kernel tuning, and why model customization is the key to unlocking better-than-frontier performance for vertical use cases.Benny discusses the reality of open source vs. closed models, the rise of agentic workloads, and why the real question isn't which model to use — it's which tasks have already been saturated. This episode is packed with technical insights on inference infrastructure, reinforcement learning for model customization, and what it means to truly adopt an AI-native engineering culture.0:24 Benny's journey and founding Fireworks AI3:23 Early conviction: betting on inference before ChatGPT8:29 Pivoting from PyTorch training to text inference15:42 Horizontal vs. per-model optimization strategies11:14 Open source vs. frontier models: the real gap32:35 How customers engage: PLG to hands-on customization17:37 When to move off frontier models33:42 The future of agentic memory and data sovereignty32:35 Fireworks' differentiation in a crowded market33:53 Spicy Future: AI doomers, bot management, and going fully out of loop
Transcript
Discussion (0)
Welcome back to the InfraPod.
This is Tim from Essence, and Ian, let's go.
Hey, this is Ian Livingston, CEO and co-founder of Keycard,
making it pause for agents to get access to resources
of destroying your world.
I clearly be more excited today.
I'm going to be joined by Benny Chen,
co-founder of Fireworks AI.
How are you doing, Benny?
Tell us about yourself.
And more importantly, why in the world convinced you to go and start a company,
which is one of the most insane things that human can do
in this economy, minimally.
Nice to see you, Ian.
Yeah, I'm Benny.
I'm one of the co-founder of fireworks.
We serve and train open source models
and help people scale.
For me, starting a startup wasn't
so crazy of an idea, to be honest.
I've always wanted to do startups
and I had a lot of startup side gigs
while working at meta.
And the big, big indicator for me
before I got out,
was that in all the capacity planning meetings I was in,
because I was on the hook to help the ads all plan for capacity,
all the GPU capacity were up and to the right kind of curve.
And it was very clear to me that AI infrastructure was going to take off.
We started before chatyPD came out,
so we didn't know it was going to take off in the Gen.
AI sense, but we knew it was going to take off.
In retrospect, all those numbers are now like peanuts numbers.
compared to the number we are talking about, even for meta.
But yeah, it was middle of 22.
We knew something was going to come.
We wanted to jump into the action ourselves, so we started fireworks.
I think it's been a roller coaster, but I really appreciate.
I think we've been really lucky and really appreciate how things have been going,
and we're here to help all the developers scale their open source model workload.
And what was like, I mean, it was really some unique insight that you had in the process of
deciding to start the company like, hey, we can do this differently?
Like there's a reason, like, a reason that we can like be super successful when other people
can.
What was sort of some thoughts or some early ideas?
The early ideas, I don't think with that contrarium, maybe in the sense that we spend
most of our career optimizing machine learning workload.
we knew where the bottlenecks are, and we were, I think, way ahead of the market at that time,
knowing sort of all the mega-Mex 7's outlook and knowing what the roadmap for Nvidia looks like
and how much we can help on both inference and training. Fast for three or four years,
there are many people and players in the market now, and whether it is unique insights or
not today, I think it's questionable, but I think back then it was definitely a unique insight
that in all capacity planning, honestly not a unique insight amongst ourselves, but I was
surprised it was like a unique insight in the market. But at least in all the capacity planning we
were in, the inference workload is always more than a training workload. And there's sort of like
a symbiosis relationship between training and inference. And we spent most of an effort on
inference. And that has really paid off.
maybe another thing I can head on is when we started working on inference,
we honestly, honestly didn't think that was unique inside.
Like everyone involved in the process just assumed that was the right thing to do
and that was what we're going to do.
So to be perfectly honest, that was at that stage,
like everyone in the market were working on training.
And somehow we were the odd one out.
And honestly, we were a little bit surprised,
but we had the conviction that inference is the way to go.
and inference is the one we want to focus on.
We still have a lot of workload in our system
that's more like reinforcement learning
that is helping people customize their models
so they can migrate off of frontier models.
But a lot of it still has inference workload
rooted in there, especially for reinforcement learning.
And we genuinely believe, like,
when we help people customize models,
helping them set up the inference workload
end to end to end, is the place where we want to be.
Yeah, so I think it would be actually very interesting
and talk about sort of the
not just a history, but
like the jumping into inference game, maybe
around 2023.
I remember like you guys started 2022,
right? And early fundings, I remember
started at 2023.
Back when I remember, I heard of
fireworks, it didn't
really have the inference
and the tagline I remember. It was really
around the pie torch ecosystem. If I remember
correctly, that was sort of like the genesis
because I remember you guys were working on pie torch
and we're trying to bring
a bunch of more infrastructure things
around Pipe Torch
but if I remember correctly
inference wasn't like the
off the only thing
or like the main focus thing
when you guys started fireworks
I remember correctly
and so really just curious
because you said you jump into inference much earlier
like what shifted you much earlier
since starting the company
into like you know we should go all into inference
and also like
very curious about what is all in an inference
even means because I think VLM was just getting started.
There was projects like open source stuff.
What do you guys do to jump into that game much earlier?
I think that's a good question.
In the very, very beginning, we worked on Pytote training platform for recommendation systems.
And when chatybd came out, like end of 22, there was, like everyone's question is like,
should we pivot and if we want to pivot, how we want to pivot?
At that point, we saw the potential for generative AI workload.
we saw the potential for text-related workload.
We see that the models are getting smarter and smarter at every iteration.
And what it meant for us in terms of all-in inference is that we were picking the text vertical in Gen.
AI, and we are dead focused on making sure the text inference part of Gen AI is effective.
I do think there were some debates on the way, whether we want to continue our
our PytTorch training platform business versus being focused on just one workload for the whole
company.
But I think we came to the conclusion that we're a small team, we have to stay focused.
So then we had to pivot away from all the other businesses that were already starting to make
some like reasonable amount of money.
But we had big ambition and we want to make sure we're able to get to a point where
we can realize our big ambition.
So text-focused inference workload is where the whole company pivoted to for a good deal or two before we expand it to other verticals.
And so obviously we're all using inference now, right?
One way or another, our whole life is just inferencing everywhere, you know?
And so I'm very curious because you guys started 2023, which is probably the best time ever to really start this, right?
You're early enough, things are just taking off.
Like you said, text was the main use case.
not going on your website, you know, click on models or platform.
You have vision, you know, language, you have images, videos.
Like the models exploded.
The model data types exploded as well.
And so I'm very curious, I guess, from your journey of this fireworks, do you start with, obviously
started with text, started with maybe like the big open source models back then was like a llama
and then it started to be lava.
I guess how do you choose which models you should start to host and maybe even do more work on?
I'm very curious, like, do you try to do per model optimization or just very generic optimizations
when it comes to helping performances on getting these model performance inference to be much better?
Do you try to just do horizontal improvements and not trying to focus on too much on the very per specific models?
or you do also figure out what are specific models should you really go and work on and make them optimized for.
Just curious how that strategy, if ever that strategy ever kind of like it comes in fruition for you guys.
Yeah, that's a good question.
So initially we focused a lot on the horizontal improvements.
In our early blogs, we were talking about leveraging Kuda graph,
which is like a technique to sort of lower the Pite torch graphs into something that's continuous on GPU.
so minimize the amount of back and forth between GPU and CPU.
And also a bunch of more horizontal techniques,
like compiler-based techniques to help our own developers,
like optimize the model in-house.
So once we're done with the horizontal approach
or like where most of the performance gains
that you can get from horizontal are diminishing returns,
we started focusing on each of the model
and started doing like fuse different kernels,
like set up correct attention,
kernels for different things.
So we started publishing blogs about fire attention,
where we were optimizing FPA kernels,
we were optimizing NVFP4 kernels,
optimizing AMD kernels, things like that.
A lot of those are model-specific because not even model-specific,
but also workload-specific.
And it depends on where most of the demand were coming in from,
and we were working with our customers to optimize those workload.
Now, it is hard to tell upfront
what the workload will be
and where our customers' demand will be.
So we have something called a 3D optimizer
where we have a database of
previously collected profiles
and then we searched within the database
to see what is the best setup for our performance workload.
That's sort of like an automation
to keep things going efficiently.
Honestly, this year, the agents are getting so smart
I also wonder every day how we can automate better and serve our customer better.
But yeah, I think a lot of our workload were model-specific going forward,
how we can make the whole thing even more efficient with bots.
That will be the interesting question to answer this year.
There's been this giant debate.
I'm just so interested to hear your take on this, Benny.
There's been a long conversation between the foundation models,
like all encompassing equal world.
Why would you ever fine-tune anything?
We can do inference time, context engineering, and everything's great.
Obviously, fireworks as an inference platform, you know,
is doing this from a very specific angle.
Like, what's your take and what are you saying your customer basis
to like why people are, you know, moving against the grain,
if you will, against, you know, this sort of idea that it's just going to be
open AI, entropic Google, that kind of own the world of most of this computer workload?
I think that's where,
great question. When we are early enough in the adoption cycle, I do think it is safest for most
startups and companies to procure sort of the IBM. It's always the saying. I think it's like no one
gets fired for procuring from IBM, right? Because it was the mainframe that works fairly well.
I mean, there's so many analogies to mainframe, I feel it's shocking, right? Like, there's a giant
machine that runs in a giant room.
Now literally have the MVL 72 racks.
It's like a few tons that rose in and serves one model.
It's also like very capics intensive.
And we are still early in the adoption cycle where it's so confusing that for a lot of customers,
it's just safe to procure the IBMs on the market.
At the same time, we work with a lot of companies that have huge volume.
And for those companies, it's almost impossible to procure the IBMs in the sense that they do have so much going that they need to make sure that unit economics are good.
We're also at a phase where there's so much money pouring into the system that people are not really examining the unit economics.
One day, the reality will catch us up.
And we want to make sure we help people, we help the startups and the enterprises of the world to get there so that when the reality catches up,
their unit economics are good.
And the other aspect to it is also model quality.
A lot of people sort of assume that it's always the best to rely on open source models
because the open source models have the best set up and whatnot.
But these models are like machines.
And if you are able to program these machines effectively,
you can make these machines way more effective at your job than using like an out-of-box frontier models.
For example, one of the discussion topic I had with the healthcare provider was he was telling me why he customized the model on a platform because the drug names are new.
There's new drug coming out every year.
And you need to teach the model what this drug name even mean.
When I saw the Latin name for those drugs, I thought it was just gibberish.
I couldn't even understand what's going on.
Every vertical has these kind of new knowledge coming in every day and how to interpret those knowledge, how to use those knowledge.
that is vertical dependent.
So I do believe for things like booking you a restaurant
or handling's personal assistant tasks for you,
those probably will get saturated
and both open and close-source models
will work very, very well on those tasks.
That's very common.
For vertical-specific use case,
I genuinely believe that there will be a lot of customization
in the future,
and there is a lot of potential
for fireworks to help these companies
get to a point where it's not just for unit economics,
But also that the model is just hands out better, better than the frontier models.
We saw that with our JinSpark engagement.
We saw that with our Vesel engagement,
we're just better with the frontier model, better than the frontier models.
And we have more and more these use cases coming up on a platform
that really helps the developers come up with better models than Frontier.
And that is very, very exciting for us,
because that helps them unlock new use cases, drive more revenue,
and it's not just a unit economics game.
So I think it will be interesting to talk about
how do you usually engage with customers
because I think there's a huge array of different types of customers, right?
So people can just go to fireworks,
they can directly sign out as a developer.
Those are more like your PLG type of developers
to sign up and doing just plain inference.
But you just mentioned you actually do hands-on,
work with some customers to figure out how to do,
get better performance and frontier models.
But there's so many choices.
how do we figure out what kind of fine tuning should we do?
What base model should I use?
What type of options?
And most people, I don't think, for my conversations,
most people don't even know what to do at all.
You know, like they have no real concepts of what they even start with.
So oftentimes, I think definitely need a lot of guidance.
So just very curious, how do you guys try to structure,
what are the type of engagements you like to be on?
and what is a typical engagement might look like
to help people be successful,
you know, getting to the model they want
and run on fireworks.
Most of our customers go through the self-serve platform for sure.
So most of the customers just kick off jobs on their own.
We do have like API documentations and we try to help.
Honestly, like I think one of the distinction we're making is like
helping people's cloud code sometimes is more important
than helping the human themselves.
I'm not sure if you saw like Kapathi's recent tweet
about like auto research, right?
That's also one of the pattern we found that just like making sure their cloud code has a good
experience means that their human driver will have a better experience.
So we do help like making sure our documentation is very clean and up to date.
And then the AGIs will help drive a lot of the setup work.
So for most people, they go through either the UI or the cloud code and just use our platform as is.
And then for some customers, we do see a lot of potential,
and we do have these more hands-on engagement with them.
And for those, it's more of case-by-case kind of setup.
I'm really quite interested to dig more into this in terms of what patterns you're saying.
Sounds like you're saying like most people start with a generalized foundation model.
They then kind of get something working, and then they're going to like, okay,
so now we've got to optimize this because there's a dispenser where there's accuracy or there's some, like,
performance issue or whatever, and now we're going to go into, you know,
basically fine-tune our own model off of open source or whatever, and we need to run
that someplace really efficiently and fast. Is that effectively what the workful looks like?
It's like proof, proof of concept, two expansion to run, let's go, build something,
then we get into production. Yeah, I would say that's 80% of the use cases and patterns we
absorb on our platform. The other 20% is people who start with open models directly.
we saw a lot of success for that as well recently.
I think if you check like Open Router or other places for traffic,
I mean, I cannot share what's on fireworks,
but there are public places where you can check
and open models are also doing really, really well.
And it's shocking how much have a lot of the easy tasks
been saturated by open models.
One maybe nuance here is that a lot of people think the open model
and the closed models are getting closer together.
I think they're actually not getting,
any closer than last year. There's still a big gap on frontier model. What has changed this
year, though, is that the watermark has risen so high that, for example, if you just want to ask
your open claw to book your restaurant, the open models are just good enough. Now, for all the
AGI tasks, the open models is still not good enough, but it's good enough to book you a restaurant.
So then a lot of open claws are going through open models. Gotcha. And so we're at this point
where, like, and we've always said this, but like, it's starting to become very clear that there's a
certain set of tasks where like additional intelligence or capability actually doesn't yield
additional benefit.
And there's like a set of new tasks and these open models where you can just like defer to
that.
It's cheaper.
It's easier to fine-tuned.
You more private, like the security concerns, you can run it yourself, better your
economics.
Where do you sit on this sort of debate?
Like there was a period of time where open models did linearly follow like the foundation
models, almost one to one.
It would be like, Lama came out and something.
somehow is the same as GP Thrive 3, right?
And it doesn't seem to be the case anymore.
I'm curious, do you think that that foundation models continue to have an upward trajectory
in terms of capability intelligence away from what's open?
Or what do you think?
Like, what's the trajectories of these two, like, capability on a graph that we thought about it?
I mean, over time.
Yeah, I think for the most sophisticated task, the frontier models are still better, for sure.
I think what's different is that we are running out of things.
to ask. And to be perfectly transparent, maybe like 20% of my day is really, really intellectually
challenging and very difficult. I have to answer some pretty difficult questions. But yes,
the other 80% where honestly, like, I made a decision. I just need to execute. I need to make sure
these things happen. I need to set up unit tests. I need to set up CI. I need to set up all the
rigs scaffolding around my code. I don't know if that requires
frontier knowledge in the year.
So I think there's a difference in whether there's still a gap
versus all the tasks are going to get saturated.
I actually think the task getting saturated will happen faster
before the open and close model closed the loop.
And honestly, at the point where 90% of the task of the day gets saturated,
I don't know if the distinction between open and close models matter at that point.
That makes complete sense.
So your view, and I totally agree with this, it follows the line of like,
Well, generalized models are great.
Then we have to specialize.
And the specializations where you get great unit economics.
And there is a set, like, there's a long-tail of tasks that if you have a long-tails
task and the user is typing in a chat box and I can't figure out how to do it, I need to
generalize model.
But the minute that it becomes a repeatable task that has a very well or more bounded, let's
say, like success or goal space, this can become specialized.
We can drive costs that way.
I'm curious as you look at people you're talking to, maybe potentially your customers,
what's the rate of specialization of specific agents, right?
Are they building specific specialized sub-agents that are using specialized models,
or we continue to see more generalized use of models with prompt engineering
and for multi-agent architectures inside companies as a build feature?
On our platform, we're seeing more general model and less specialized agents.
What I mean by that is for a lot of the,
customers we have, the model is the product. Because a lot of times, as the model gets better,
especially when the base model gets better, your scaffolding becomes less relevant. It's hard to share
the examples we have internally, but for example, Anthropic recently shared that their clock code
got better at compression because the model just is smarter now and knows what to remember. So then
all your previous clutches about like memory and whatnot,
becomes a little bit pointless if the model is just smart enough to know what to remember.
At the same time, a lot of these, oh, the model is smart enough to know what to remember is constrained on the harness itself,
where a lot of people are doing reinforcement learning on cloud code itself.
So I want to tease out the nuance here a little bit more in the sense that a lot of people have products that looks very general,
but actually have very limited number of tools that they use and just learns how to use those tools very effectively.
For Cloud, it might be like read file, write file, like user terminal kind of stuff.
Those tools are very, very general, but they are like very also useful for certain settings.
If you have like a vision intensive task, right, I don't know how useful like these kind of tools will be.
So depending on the vertical, our customer operates in, they set up the agent harness themselves.
And we help them drive reinforcement learning on their harness, on their product.
itself so that the model improves on their product. And the model becomes the product itself.
And that's very useful mental pattern. I think for a lot of startups out there who still want
to debate about the model being a product or whether the agent scaffolding is the product.
I really think this year the distinction is not that important. Like you have the harness,
you improve the model underneath. The model is the product. The model customization, I'm going to get
so easy this year.
that it really doesn't matter if you want to make the distinction or not.
It will become the same thing.
So going back into sort of like this frontier model versus the open source models a bit,
I think, you know, looking at probably the case studies you have and, you know,
what being shown on a website is mostly like the cursor types, you know,
the companies that have their own models.
They need to host it and work with you to provide the best economic, right?
is those cursors, you know, fast generations and stuff like that.
I think besides all the people that are basically their whole business is the tool and model is a huge part of the business.
There's actually a huge amount of companies now they just want to use, right?
I'm building my, I want to change my developer to be all using AI, right?
Every enterprise is right now is all going all through this AI adoption transformation, like brainstorming, right?
it's really hard for folks to understand
they always start with the frontier models
because it's had the most stated arts
right they understand the economics
maybe not as great but we need to really go for it
I'm curious on your side do you see
when the shift might happen
either for existing people that
okay I'm using Claude
and is using so much with my bandwidth
or is privacy what starts for folks
to start to consider hey we should not
just use Frontier we should start to
specialize or start to really consider having open source models to be part of our stack.
Besides the folks like cursor types, like just a normal consumption types, do you see folks like that trying to go into
open source models as well? And what is the typical motivation or like points where they start to
jump over? Yeah, we got a lot of good feedback on Twitter recently. Like I really appreciate it was
like DHH who talked about they were using Kimi on fireworks for their coding use case.
And a lot of it has to do with, like, one, privacy to cost.
Because, like I was describing earlier, if the task is saturated for 90% of your day,
it really doesn't matter if you're using Opus or Sona or Kimi.
You can use the Kimi on fireworks and just get the best speed out of your coding experience.
Developers are really smart, and they get to pick and choose what's the best unit economics
and what's the best speed for their setup.
And we're here to help.
I think Anthropic also came up with like a fast mode, right?
But the fast mode, I think, is like five times more expensive.
And we had like two colleagues in the company who would turn off fast,
and then like the money printer just went on.
Yeah, I think like they blew past their limits like in like a week or something.
So it is very, very expensive to use Frontier fast model all the time.
And we're here to help developers who are interested in using alternative models
that are much faster that's served on fireworks.
But coming back to the, I think,
there's sort of like an underlying motivation to your question
is like in terms of like general adoption
for a lot of enterprises,
would they rather adopt Opus or like GPD 5.4,
or rather adopt open source models.
And I think it depends on the belief of the founder.
A lot of founders we talk to fundamentally believe
that they need to control their stack.
And those founders are much more interested in open source models.
And there are people who fundamental beliefs
is totally okay to be a renter.
And those people are much more,
lean much more into frontier models.
And I do think, like,
a lot of whole source models have restrictions
that you may or may not agree with
and data retention policy that you may or may not agree with.
And we're here to help.
I'm really curious, like, just speaking on data retention,
a lot of your customers are building features,
but the model is the feature, right?
It drives your business, like,
Croisher tabs is a great example.
I'm not sure if that's how they use fireworks for it.
But as an example of like a model that they spent a long time
with a composer model or whatever it's called,
as you work with these types of data sensitive customers,
how do you think about things like memory, right, in memory storage
and gravity of memory in relation to the model and these sense of work cases,
especially when, as I think will like mid-market companies,
the enterprise.
Like, you know, you're spending a lot of time working with these companies
that are selling effectively like AI features or co-pilots,
you know, glorified co-pilots to developers.
And what we're seeing this year is a lot of those co-pilots are trying to graduate into agents.
And as things graduate to become agents, the concept of memory comes up.
And memory has huge data sovereignty, governance, issues related to it.
You know, like many enterprises do not want the data to leave their premises or their control.
It's a core component of their architecture.
So I'm curious, how do you think about memory,
how you think about memory or location to inference and what the future of memory is, inference, and the future of agents are, in general, in relation to memory.
Yeah, that's honestly, like, a fantastic question that we think about every day.
There are a few parts to memory I want to tease out a little bit, and maybe we can discuss them separately.
One is more like a behavioral.
Behavioral as in I remember how to use this tool, so I have a short circuit in my brain to
do certain things quickly.
Either it's like
using shortcuts
instead of like clicking my mouse
or like, for example,
for certain API calls,
I know how to fill in
the dates
in certain format.
It's like timestamp versus certain formats
so then I don't have to wait for a error code
to come back for me to correct
and then like try again.
For these type of memory,
we often find a lot of
success in terms of like memorizing failures, in terms of learning how to use the tools more
effectively, these are very friendly towards some sort of model customization ideas.
Even for very lightweight customization, just letting them model know, like, oh, like,
this is what like a golden employee behavior look like.
And then like you give them the best trajectory that the model have done and then sort of
reinforce what good behavior looks like.
That I classify as more like behavior change or like character design for your model or for
your employee.
And when we talk about like how having all these enterprises adopt these tools,
oftentimes like they try to prompt their way out of it and have like a massive prompt
on like what to do when, what to do when.
That definitely can work.
What we found is sometimes if you want character, behavior change, you should just customize
model.
The other is more factual.
or like, oh, what happened?
And for what happened,
we see a lot of companies adopting
like Markdown files
and then put them into the repo,
but more recently,
just connecting to Slack
and actual system records.
Because for a Markdown,
sometimes they go stale,
sometimes they go out of sync
with what your product looks like.
So it depends on what these people want to do.
And now we are here to help
for setting up like data connectors
to different system record
and making sure their models
can work effectively with the system record.
And then there are cases where it's more kind of like team setting,
also like collective memory,
where your whole team prefer to use that tool in one way,
but your other team in the company prefer to use the same tool in the other way.
So then those, I think, were very effective just through markdown files
and different folders where like, okay, this team has this folder
and they just want to use the tool this way,
because that's not really like a general,
or behavior change, it's more like a collective memory record kind of thing.
I do think the most interesting discussion that even the market is having right now is that
on the second category, where it's just agent connecting with system record databases,
whether the product side of those system record are still valuable and can still charge
huge amount of dollar per seat, or do you have more and more of these open data kind of
platform coming in and sort of challenge the old school seat base setup.
We had a lot of success just connecting agents with Google Docs and Notion and just let the
agent run and then like keep track of all our engagements and making sure that those are
properly counted for and making sure there's like no loose ends.
But yeah, like I'm sure the system record companies can figure something out and, you know,
educate me on what their business model looks like 10 years from now.
given that there has been like increased competition around inferencing i'm just curious what your take
is how do you guys like i guess differentiate yourself like what is a general sort of strategy when it
comes to like people asking you why should i pick fireworks versus all the other blah options right
and the number options seem to just keep going up every year what has been like the main
focus for fireworks for you guys like hey this is how we stand up against usual competitions
happening.
We're much more focused on what we focus on versus differentiation,
and we're very focused on customized models.
We're very focused on helping people do reinforcement learning or model customization
so then they can beat frontier models as much as we can.
And then for customized models, we are also very focused on better union economics,
because as soon as you control the customization stack,
you can be more aggressive with different quantization techniques
and different ways where you want to reduce the cost
for the final model that you're training on.
And I do think there are other model inference providers out there
and the CSPs are also giving to the game.
And the numbers, the amount of monies
are thrown into the system is mind-boggling,
to say the least.
So I'm not naive to think that you get to stand in one corner
and not get bothered by.
At the same time, I do think focus is very, very important.
And we did focus on model customization.
We're dead focused on making sure that we have a really good stack to customize models
and serve those customized models.
Awesome.
Well, we're jumping into our favorite section called Spicy Future.
Spicy Futures.
Tell us, what is your spicy hot take?
Are things that you believe that most people don't believe in yet?
Most people in the Bay Area bubble or most people in the United States, I think.
Wherever you want to focus on.
We'll leave it open for you.
Yeah.
Because I think within the, like the 70 mile radius within like the Bayer SF,
there's so many Duma.
There's so many people who just think that, oh, like, everything will be done for.
There will be no work for anyone in the coming decade.
And my argument is that like for people in the 1800s, the AGI is already here.
The office job is the UBI program.
and I don't know how many
like real American farmers you know
who still plan their own food
and even if they plan their own food
they probably use automation to do that as well.
So at least I'm not a duma.
I do think the people in the bubble
severely underestimate the general drive
and the adaptability of people.
People always have ambitions.
They have goals to hit
and they have drive to achieve those ambitions.
Yeah.
I think people in Bay Area, like, shockingly minimize the amount of creativity that the general public will have.
And people always find a way out.
Like, I myself, like, I was an immigrant when I was 10.
My family moved to New Zealand.
And then I went to UCLA for undergrad when I was 20.
And I met so many interesting immigrants on the way who barely speaks the language and can barely
function in the society they move to and still figure a way out.
Learn the trade they need to learn and figure out what the society needs, right?
As long as there's demand, it's like impossible to, I feel like, make an argument that
the EGI will always fulfill the demand.
So as long as the demand is not fulfilled and as long as human and our creatives coming up
with new demand, I'm sure things will be fine.
So I don't know if that's a contrary intake, but that's my honest take.
The doomers are just belittling everyone else a little bit too much.
AI is changing everyone's life right now, you know, one way or another, regardless where you are and how you think.
So I think that depending on what your normal day has been, the impact is real, right?
Yeah.
Maybe to take this other, this spicy out to maybe other direction, like, what is something that you,
maybe from like even from the infar world, given that you talk to a lot of engineers and developers, right?
you know, is there certain topics that you often kind of like feel like they're still stuck in?
Maybe it's a specialized model.
Maybe like, okay, they shouldn't even start do vibe coding or something like that.
Is there something that you even like an engineer world or infrastructure world that you have feel like,
okay, people hasn't fully able to see the full potential of AI yet?
What are typical things, you know, to be like fully AI native?
Everyone's like fully ready.
Do you see what are the typical things people?
what I've been doing to adopt AI much better than others.
Maybe even in the radius that we're in,
done much better.
Yeah, I do think a lot of,
this was the advice when I was giving to
some of the first-time managers at Meta.
It's like you have to learn how to empower people,
and in this case, you have to learn how to empower your bot.
And a lot of people are still, still prefer to do it themselves,
because honestly, it's fun.
It's fun to write code yourself.
But as a manager for the bot,
maybe your most important job is to set up CI's,
set up unit tests,
and making sure your bot can be effective, right?
But setting up CI is not a pleasant work for many people.
But like the mentality has to change.
Now you're managing a bunch of bots.
So you better behave like a manager
and behave in the way that you can document all your knowledge
instead of just having something in your head
and like just try to do yourself and be like,
hey, I'm having fun here.
I'm trying to set it up in the way
that I can do my work most effectively.
No, like, the most effective way is to set up
how to set up like 20 or 30 bots on your system
and making sure they can be effective.
That mentality change, I think,
it's very, very hard for a lot of very good engineers
because we have a lot of engineers who are like 10X engineers
where we hire into the company,
and then you tell them, you'd be like,
hey, like, it's actually more important
to think about how you can manage a team of bots.
Well, that's management skills, right?
That's not really engineering skills.
So I think the real archetype,
that people should be shooting for now.
It's more like the tech lead manager,
like in a meta term,
I think it's like E7 kind of tech lead manager set up,
where you're mostly figuring out how to empower
a bunch of your employees
and making sure you can do the work.
The other thing that I always push on also
is like full out of loop setup.
Because I think a lot of people are still very used to
having some bot as clutches
and just making sure it motivates maybe like 50, 60% of your work.
I always put,
on like, hey, what's the last 20% that you can have a bot just, like, answers the question
and does your job completely. And I always push on that because I want to make sure for certain
automation and for certain important things in the company that everyone relies on, the human should
be completely out of the loop. And just make sure that you can just talk to a bar on Slack and just get
the work done. And that, that for me is very, very important because then you don't have to stay up
And you really, really transform your mentality to be a manager and be like,
how do I make sure this part has all the context and needs, all the tools it needs to be
successful?
Then you can scale beyond just automation.
Otherwise, I do think people will get stuck in the, like, automating part of your job.
And like, it's actually like not an optimal setup for the company.
Amazing.
Well, we have so many questions we could have asked, but I guess do it, the time constraints.
Tell me where, if people want to learn more about fireworks, you know, know, maybe even
user product, where do people can find you in fireworks?
Yeah, fireworks.
That's our URL.
And we publish interesting blogs every now and then.
Most recently, we published a blog about numerical alignment for a large mixture of expert
models.
That was a fun experience for me.
So hopefully it's also a fun read for everyone listening to this part.
and please go take a look.
Awesome. Thanks so much.
Thank you. Thank you.
