The Infra Pod - Betting on Open Source Models to be the future (Chat with Benny, CTO of Fireworks AI)

Episode Date: April 28, 2026

In this episode of The Infra Pod, hosts Tim Chen (Essence VC) and Ian Livingstone (Keycard) sit down with Benny Chen, co-founder of Fireworks AI, to explore the evolving world of AI inference infrastr...ucture.Benny shares his journey from Meta — where capacity planning meetings made it clear GPUs were heading "up and to the right" — to co-founding Fireworks AI before ChatGPT even launched. The conversation dives deep into why the team bet early on inference over training, how they approached model optimization from horizontal compiler techniques to per-model kernel tuning, and why model customization is the key to unlocking better-than-frontier performance for vertical use cases.Benny discusses the reality of open source vs. closed models, the rise of agentic workloads, and why the real question isn't which model to use — it's which tasks have already been saturated. This episode is packed with technical insights on inference infrastructure, reinforcement learning for model customization, and what it means to truly adopt an AI-native engineering culture.0:24 Benny's journey and founding Fireworks AI3:23 Early conviction: betting on inference before ChatGPT8:29 Pivoting from PyTorch training to text inference15:42 Horizontal vs. per-model optimization strategies11:14 Open source vs. frontier models: the real gap32:35 How customers engage: PLG to hands-on customization17:37 When to move off frontier models33:42 The future of agentic memory and data sovereignty32:35 Fireworks' differentiation in a crowded market33:53 Spicy Future: AI doomers, bot management, and going fully out of loop

Transcript
Discussion (0)
Starting point is 00:00:03 Welcome back to the InfraPod. This is Tim from Essence, and Ian, let's go. Hey, this is Ian Livingston, CEO and co-founder of Keycard, making it pause for agents to get access to resources of destroying your world. I clearly be more excited today. I'm going to be joined by Benny Chen, co-founder of Fireworks AI.
Starting point is 00:00:22 How are you doing, Benny? Tell us about yourself. And more importantly, why in the world convinced you to go and start a company, which is one of the most insane things that human can do in this economy, minimally. Nice to see you, Ian. Yeah, I'm Benny. I'm one of the co-founder of fireworks.
Starting point is 00:00:40 We serve and train open source models and help people scale. For me, starting a startup wasn't so crazy of an idea, to be honest. I've always wanted to do startups and I had a lot of startup side gigs while working at meta. And the big, big indicator for me
Starting point is 00:01:00 before I got out, was that in all the capacity planning meetings I was in, because I was on the hook to help the ads all plan for capacity, all the GPU capacity were up and to the right kind of curve. And it was very clear to me that AI infrastructure was going to take off. We started before chatyPD came out, so we didn't know it was going to take off in the Gen. AI sense, but we knew it was going to take off.
Starting point is 00:01:27 In retrospect, all those numbers are now like peanuts numbers. compared to the number we are talking about, even for meta. But yeah, it was middle of 22. We knew something was going to come. We wanted to jump into the action ourselves, so we started fireworks. I think it's been a roller coaster, but I really appreciate. I think we've been really lucky and really appreciate how things have been going, and we're here to help all the developers scale their open source model workload.
Starting point is 00:01:58 And what was like, I mean, it was really some unique insight that you had in the process of deciding to start the company like, hey, we can do this differently? Like there's a reason, like, a reason that we can like be super successful when other people can. What was sort of some thoughts or some early ideas? The early ideas, I don't think with that contrarium, maybe in the sense that we spend most of our career optimizing machine learning workload. we knew where the bottlenecks are, and we were, I think, way ahead of the market at that time,
Starting point is 00:02:36 knowing sort of all the mega-Mex 7's outlook and knowing what the roadmap for Nvidia looks like and how much we can help on both inference and training. Fast for three or four years, there are many people and players in the market now, and whether it is unique insights or not today, I think it's questionable, but I think back then it was definitely a unique insight that in all capacity planning, honestly not a unique insight amongst ourselves, but I was surprised it was like a unique insight in the market. But at least in all the capacity planning we were in, the inference workload is always more than a training workload. And there's sort of like a symbiosis relationship between training and inference. And we spent most of an effort on
Starting point is 00:03:25 inference. And that has really paid off. maybe another thing I can head on is when we started working on inference, we honestly, honestly didn't think that was unique inside. Like everyone involved in the process just assumed that was the right thing to do and that was what we're going to do. So to be perfectly honest, that was at that stage, like everyone in the market were working on training. And somehow we were the odd one out.
Starting point is 00:03:52 And honestly, we were a little bit surprised, but we had the conviction that inference is the way to go. and inference is the one we want to focus on. We still have a lot of workload in our system that's more like reinforcement learning that is helping people customize their models so they can migrate off of frontier models. But a lot of it still has inference workload
Starting point is 00:04:11 rooted in there, especially for reinforcement learning. And we genuinely believe, like, when we help people customize models, helping them set up the inference workload end to end to end, is the place where we want to be. Yeah, so I think it would be actually very interesting and talk about sort of the not just a history, but
Starting point is 00:04:30 like the jumping into inference game, maybe around 2023. I remember like you guys started 2022, right? And early fundings, I remember started at 2023. Back when I remember, I heard of fireworks, it didn't really have the inference
Starting point is 00:04:46 and the tagline I remember. It was really around the pie torch ecosystem. If I remember correctly, that was sort of like the genesis because I remember you guys were working on pie torch and we're trying to bring a bunch of more infrastructure things around Pipe Torch but if I remember correctly
Starting point is 00:05:01 inference wasn't like the off the only thing or like the main focus thing when you guys started fireworks I remember correctly and so really just curious because you said you jump into inference much earlier like what shifted you much earlier
Starting point is 00:05:16 since starting the company into like you know we should go all into inference and also like very curious about what is all in an inference even means because I think VLM was just getting started. There was projects like open source stuff. What do you guys do to jump into that game much earlier? I think that's a good question.
Starting point is 00:05:34 In the very, very beginning, we worked on Pytote training platform for recommendation systems. And when chatybd came out, like end of 22, there was, like everyone's question is like, should we pivot and if we want to pivot, how we want to pivot? At that point, we saw the potential for generative AI workload. we saw the potential for text-related workload. We see that the models are getting smarter and smarter at every iteration. And what it meant for us in terms of all-in inference is that we were picking the text vertical in Gen. AI, and we are dead focused on making sure the text inference part of Gen AI is effective.
Starting point is 00:06:17 I do think there were some debates on the way, whether we want to continue our our PytTorch training platform business versus being focused on just one workload for the whole company. But I think we came to the conclusion that we're a small team, we have to stay focused. So then we had to pivot away from all the other businesses that were already starting to make some like reasonable amount of money. But we had big ambition and we want to make sure we're able to get to a point where we can realize our big ambition.
Starting point is 00:06:50 So text-focused inference workload is where the whole company pivoted to for a good deal or two before we expand it to other verticals. And so obviously we're all using inference now, right? One way or another, our whole life is just inferencing everywhere, you know? And so I'm very curious because you guys started 2023, which is probably the best time ever to really start this, right? You're early enough, things are just taking off. Like you said, text was the main use case. not going on your website, you know, click on models or platform. You have vision, you know, language, you have images, videos.
Starting point is 00:07:26 Like the models exploded. The model data types exploded as well. And so I'm very curious, I guess, from your journey of this fireworks, do you start with, obviously started with text, started with maybe like the big open source models back then was like a llama and then it started to be lava. I guess how do you choose which models you should start to host and maybe even do more work on? I'm very curious, like, do you try to do per model optimization or just very generic optimizations when it comes to helping performances on getting these model performance inference to be much better?
Starting point is 00:08:07 Do you try to just do horizontal improvements and not trying to focus on too much on the very per specific models? or you do also figure out what are specific models should you really go and work on and make them optimized for. Just curious how that strategy, if ever that strategy ever kind of like it comes in fruition for you guys. Yeah, that's a good question. So initially we focused a lot on the horizontal improvements. In our early blogs, we were talking about leveraging Kuda graph, which is like a technique to sort of lower the Pite torch graphs into something that's continuous on GPU. so minimize the amount of back and forth between GPU and CPU.
Starting point is 00:08:47 And also a bunch of more horizontal techniques, like compiler-based techniques to help our own developers, like optimize the model in-house. So once we're done with the horizontal approach or like where most of the performance gains that you can get from horizontal are diminishing returns, we started focusing on each of the model and started doing like fuse different kernels,
Starting point is 00:09:11 like set up correct attention, kernels for different things. So we started publishing blogs about fire attention, where we were optimizing FPA kernels, we were optimizing NVFP4 kernels, optimizing AMD kernels, things like that. A lot of those are model-specific because not even model-specific, but also workload-specific.
Starting point is 00:09:31 And it depends on where most of the demand were coming in from, and we were working with our customers to optimize those workload. Now, it is hard to tell upfront what the workload will be and where our customers' demand will be. So we have something called a 3D optimizer where we have a database of previously collected profiles
Starting point is 00:09:56 and then we searched within the database to see what is the best setup for our performance workload. That's sort of like an automation to keep things going efficiently. Honestly, this year, the agents are getting so smart I also wonder every day how we can automate better and serve our customer better. But yeah, I think a lot of our workload were model-specific going forward, how we can make the whole thing even more efficient with bots.
Starting point is 00:10:29 That will be the interesting question to answer this year. There's been this giant debate. I'm just so interested to hear your take on this, Benny. There's been a long conversation between the foundation models, like all encompassing equal world. Why would you ever fine-tune anything? We can do inference time, context engineering, and everything's great. Obviously, fireworks as an inference platform, you know,
Starting point is 00:10:55 is doing this from a very specific angle. Like, what's your take and what are you saying your customer basis to like why people are, you know, moving against the grain, if you will, against, you know, this sort of idea that it's just going to be open AI, entropic Google, that kind of own the world of most of this computer workload? I think that's where, great question. When we are early enough in the adoption cycle, I do think it is safest for most startups and companies to procure sort of the IBM. It's always the saying. I think it's like no one
Starting point is 00:11:28 gets fired for procuring from IBM, right? Because it was the mainframe that works fairly well. I mean, there's so many analogies to mainframe, I feel it's shocking, right? Like, there's a giant machine that runs in a giant room. Now literally have the MVL 72 racks. It's like a few tons that rose in and serves one model. It's also like very capics intensive. And we are still early in the adoption cycle where it's so confusing that for a lot of customers, it's just safe to procure the IBMs on the market.
Starting point is 00:12:04 At the same time, we work with a lot of companies that have huge volume. And for those companies, it's almost impossible to procure the IBMs in the sense that they do have so much going that they need to make sure that unit economics are good. We're also at a phase where there's so much money pouring into the system that people are not really examining the unit economics. One day, the reality will catch us up. And we want to make sure we help people, we help the startups and the enterprises of the world to get there so that when the reality catches up, their unit economics are good. And the other aspect to it is also model quality. A lot of people sort of assume that it's always the best to rely on open source models
Starting point is 00:12:50 because the open source models have the best set up and whatnot. But these models are like machines. And if you are able to program these machines effectively, you can make these machines way more effective at your job than using like an out-of-box frontier models. For example, one of the discussion topic I had with the healthcare provider was he was telling me why he customized the model on a platform because the drug names are new. There's new drug coming out every year. And you need to teach the model what this drug name even mean. When I saw the Latin name for those drugs, I thought it was just gibberish.
Starting point is 00:13:25 I couldn't even understand what's going on. Every vertical has these kind of new knowledge coming in every day and how to interpret those knowledge, how to use those knowledge. that is vertical dependent. So I do believe for things like booking you a restaurant or handling's personal assistant tasks for you, those probably will get saturated and both open and close-source models will work very, very well on those tasks.
Starting point is 00:13:49 That's very common. For vertical-specific use case, I genuinely believe that there will be a lot of customization in the future, and there is a lot of potential for fireworks to help these companies get to a point where it's not just for unit economics, But also that the model is just hands out better, better than the frontier models.
Starting point is 00:14:09 We saw that with our JinSpark engagement. We saw that with our Vesel engagement, we're just better with the frontier model, better than the frontier models. And we have more and more these use cases coming up on a platform that really helps the developers come up with better models than Frontier. And that is very, very exciting for us, because that helps them unlock new use cases, drive more revenue, and it's not just a unit economics game.
Starting point is 00:14:36 So I think it will be interesting to talk about how do you usually engage with customers because I think there's a huge array of different types of customers, right? So people can just go to fireworks, they can directly sign out as a developer. Those are more like your PLG type of developers to sign up and doing just plain inference. But you just mentioned you actually do hands-on,
Starting point is 00:14:56 work with some customers to figure out how to do, get better performance and frontier models. But there's so many choices. how do we figure out what kind of fine tuning should we do? What base model should I use? What type of options? And most people, I don't think, for my conversations, most people don't even know what to do at all.
Starting point is 00:15:15 You know, like they have no real concepts of what they even start with. So oftentimes, I think definitely need a lot of guidance. So just very curious, how do you guys try to structure, what are the type of engagements you like to be on? and what is a typical engagement might look like to help people be successful, you know, getting to the model they want and run on fireworks.
Starting point is 00:15:38 Most of our customers go through the self-serve platform for sure. So most of the customers just kick off jobs on their own. We do have like API documentations and we try to help. Honestly, like I think one of the distinction we're making is like helping people's cloud code sometimes is more important than helping the human themselves. I'm not sure if you saw like Kapathi's recent tweet about like auto research, right?
Starting point is 00:16:03 That's also one of the pattern we found that just like making sure their cloud code has a good experience means that their human driver will have a better experience. So we do help like making sure our documentation is very clean and up to date. And then the AGIs will help drive a lot of the setup work. So for most people, they go through either the UI or the cloud code and just use our platform as is. And then for some customers, we do see a lot of potential, and we do have these more hands-on engagement with them. And for those, it's more of case-by-case kind of setup.
Starting point is 00:16:42 I'm really quite interested to dig more into this in terms of what patterns you're saying. Sounds like you're saying like most people start with a generalized foundation model. They then kind of get something working, and then they're going to like, okay, so now we've got to optimize this because there's a dispenser where there's accuracy or there's some, like, performance issue or whatever, and now we're going to go into, you know, basically fine-tune our own model off of open source or whatever, and we need to run that someplace really efficiently and fast. Is that effectively what the workful looks like? It's like proof, proof of concept, two expansion to run, let's go, build something,
Starting point is 00:17:13 then we get into production. Yeah, I would say that's 80% of the use cases and patterns we absorb on our platform. The other 20% is people who start with open models directly. we saw a lot of success for that as well recently. I think if you check like Open Router or other places for traffic, I mean, I cannot share what's on fireworks, but there are public places where you can check and open models are also doing really, really well. And it's shocking how much have a lot of the easy tasks
Starting point is 00:17:45 been saturated by open models. One maybe nuance here is that a lot of people think the open model and the closed models are getting closer together. I think they're actually not getting, any closer than last year. There's still a big gap on frontier model. What has changed this year, though, is that the watermark has risen so high that, for example, if you just want to ask your open claw to book your restaurant, the open models are just good enough. Now, for all the AGI tasks, the open models is still not good enough, but it's good enough to book you a restaurant.
Starting point is 00:18:17 So then a lot of open claws are going through open models. Gotcha. And so we're at this point where, like, and we've always said this, but like, it's starting to become very clear that there's a certain set of tasks where like additional intelligence or capability actually doesn't yield additional benefit. And there's like a set of new tasks and these open models where you can just like defer to that. It's cheaper. It's easier to fine-tuned.
Starting point is 00:18:38 You more private, like the security concerns, you can run it yourself, better your economics. Where do you sit on this sort of debate? Like there was a period of time where open models did linearly follow like the foundation models, almost one to one. It would be like, Lama came out and something. somehow is the same as GP Thrive 3, right? And it doesn't seem to be the case anymore.
Starting point is 00:18:59 I'm curious, do you think that that foundation models continue to have an upward trajectory in terms of capability intelligence away from what's open? Or what do you think? Like, what's the trajectories of these two, like, capability on a graph that we thought about it? I mean, over time. Yeah, I think for the most sophisticated task, the frontier models are still better, for sure. I think what's different is that we are running out of things. to ask. And to be perfectly transparent, maybe like 20% of my day is really, really intellectually
Starting point is 00:19:34 challenging and very difficult. I have to answer some pretty difficult questions. But yes, the other 80% where honestly, like, I made a decision. I just need to execute. I need to make sure these things happen. I need to set up unit tests. I need to set up CI. I need to set up all the rigs scaffolding around my code. I don't know if that requires frontier knowledge in the year. So I think there's a difference in whether there's still a gap versus all the tasks are going to get saturated. I actually think the task getting saturated will happen faster
Starting point is 00:20:06 before the open and close model closed the loop. And honestly, at the point where 90% of the task of the day gets saturated, I don't know if the distinction between open and close models matter at that point. That makes complete sense. So your view, and I totally agree with this, it follows the line of like, Well, generalized models are great. Then we have to specialize. And the specializations where you get great unit economics.
Starting point is 00:20:30 And there is a set, like, there's a long-tail of tasks that if you have a long-tails task and the user is typing in a chat box and I can't figure out how to do it, I need to generalize model. But the minute that it becomes a repeatable task that has a very well or more bounded, let's say, like success or goal space, this can become specialized. We can drive costs that way. I'm curious as you look at people you're talking to, maybe potentially your customers, what's the rate of specialization of specific agents, right?
Starting point is 00:21:01 Are they building specific specialized sub-agents that are using specialized models, or we continue to see more generalized use of models with prompt engineering and for multi-agent architectures inside companies as a build feature? On our platform, we're seeing more general model and less specialized agents. What I mean by that is for a lot of the, customers we have, the model is the product. Because a lot of times, as the model gets better, especially when the base model gets better, your scaffolding becomes less relevant. It's hard to share the examples we have internally, but for example, Anthropic recently shared that their clock code
Starting point is 00:21:42 got better at compression because the model just is smarter now and knows what to remember. So then all your previous clutches about like memory and whatnot, becomes a little bit pointless if the model is just smart enough to know what to remember. At the same time, a lot of these, oh, the model is smart enough to know what to remember is constrained on the harness itself, where a lot of people are doing reinforcement learning on cloud code itself. So I want to tease out the nuance here a little bit more in the sense that a lot of people have products that looks very general, but actually have very limited number of tools that they use and just learns how to use those tools very effectively. For Cloud, it might be like read file, write file, like user terminal kind of stuff.
Starting point is 00:22:29 Those tools are very, very general, but they are like very also useful for certain settings. If you have like a vision intensive task, right, I don't know how useful like these kind of tools will be. So depending on the vertical, our customer operates in, they set up the agent harness themselves. And we help them drive reinforcement learning on their harness, on their product. itself so that the model improves on their product. And the model becomes the product itself. And that's very useful mental pattern. I think for a lot of startups out there who still want to debate about the model being a product or whether the agent scaffolding is the product. I really think this year the distinction is not that important. Like you have the harness,
Starting point is 00:23:16 you improve the model underneath. The model is the product. The model customization, I'm going to get so easy this year. that it really doesn't matter if you want to make the distinction or not. It will become the same thing. So going back into sort of like this frontier model versus the open source models a bit, I think, you know, looking at probably the case studies you have and, you know, what being shown on a website is mostly like the cursor types, you know, the companies that have their own models.
Starting point is 00:23:45 They need to host it and work with you to provide the best economic, right? is those cursors, you know, fast generations and stuff like that. I think besides all the people that are basically their whole business is the tool and model is a huge part of the business. There's actually a huge amount of companies now they just want to use, right? I'm building my, I want to change my developer to be all using AI, right? Every enterprise is right now is all going all through this AI adoption transformation, like brainstorming, right? it's really hard for folks to understand they always start with the frontier models
Starting point is 00:24:24 because it's had the most stated arts right they understand the economics maybe not as great but we need to really go for it I'm curious on your side do you see when the shift might happen either for existing people that okay I'm using Claude and is using so much with my bandwidth
Starting point is 00:24:41 or is privacy what starts for folks to start to consider hey we should not just use Frontier we should start to specialize or start to really consider having open source models to be part of our stack. Besides the folks like cursor types, like just a normal consumption types, do you see folks like that trying to go into open source models as well? And what is the typical motivation or like points where they start to jump over? Yeah, we got a lot of good feedback on Twitter recently. Like I really appreciate it was like DHH who talked about they were using Kimi on fireworks for their coding use case.
Starting point is 00:25:19 And a lot of it has to do with, like, one, privacy to cost. Because, like I was describing earlier, if the task is saturated for 90% of your day, it really doesn't matter if you're using Opus or Sona or Kimi. You can use the Kimi on fireworks and just get the best speed out of your coding experience. Developers are really smart, and they get to pick and choose what's the best unit economics and what's the best speed for their setup. And we're here to help. I think Anthropic also came up with like a fast mode, right?
Starting point is 00:25:50 But the fast mode, I think, is like five times more expensive. And we had like two colleagues in the company who would turn off fast, and then like the money printer just went on. Yeah, I think like they blew past their limits like in like a week or something. So it is very, very expensive to use Frontier fast model all the time. And we're here to help developers who are interested in using alternative models that are much faster that's served on fireworks. But coming back to the, I think,
Starting point is 00:26:23 there's sort of like an underlying motivation to your question is like in terms of like general adoption for a lot of enterprises, would they rather adopt Opus or like GPD 5.4, or rather adopt open source models. And I think it depends on the belief of the founder. A lot of founders we talk to fundamentally believe that they need to control their stack.
Starting point is 00:26:47 And those founders are much more interested in open source models. And there are people who fundamental beliefs is totally okay to be a renter. And those people are much more, lean much more into frontier models. And I do think, like, a lot of whole source models have restrictions that you may or may not agree with
Starting point is 00:27:05 and data retention policy that you may or may not agree with. And we're here to help. I'm really curious, like, just speaking on data retention, a lot of your customers are building features, but the model is the feature, right? It drives your business, like, Croisher tabs is a great example. I'm not sure if that's how they use fireworks for it.
Starting point is 00:27:21 But as an example of like a model that they spent a long time with a composer model or whatever it's called, as you work with these types of data sensitive customers, how do you think about things like memory, right, in memory storage and gravity of memory in relation to the model and these sense of work cases, especially when, as I think will like mid-market companies, the enterprise. Like, you know, you're spending a lot of time working with these companies
Starting point is 00:27:43 that are selling effectively like AI features or co-pilots, you know, glorified co-pilots to developers. And what we're seeing this year is a lot of those co-pilots are trying to graduate into agents. And as things graduate to become agents, the concept of memory comes up. And memory has huge data sovereignty, governance, issues related to it. You know, like many enterprises do not want the data to leave their premises or their control. It's a core component of their architecture. So I'm curious, how do you think about memory,
Starting point is 00:28:11 how you think about memory or location to inference and what the future of memory is, inference, and the future of agents are, in general, in relation to memory. Yeah, that's honestly, like, a fantastic question that we think about every day. There are a few parts to memory I want to tease out a little bit, and maybe we can discuss them separately. One is more like a behavioral. Behavioral as in I remember how to use this tool, so I have a short circuit in my brain to do certain things quickly. Either it's like using shortcuts
Starting point is 00:28:47 instead of like clicking my mouse or like, for example, for certain API calls, I know how to fill in the dates in certain format. It's like timestamp versus certain formats so then I don't have to wait for a error code
Starting point is 00:29:03 to come back for me to correct and then like try again. For these type of memory, we often find a lot of success in terms of like memorizing failures, in terms of learning how to use the tools more effectively, these are very friendly towards some sort of model customization ideas. Even for very lightweight customization, just letting them model know, like, oh, like, this is what like a golden employee behavior look like.
Starting point is 00:29:35 And then like you give them the best trajectory that the model have done and then sort of reinforce what good behavior looks like. That I classify as more like behavior change or like character design for your model or for your employee. And when we talk about like how having all these enterprises adopt these tools, oftentimes like they try to prompt their way out of it and have like a massive prompt on like what to do when, what to do when. That definitely can work.
Starting point is 00:30:04 What we found is sometimes if you want character, behavior change, you should just customize model. The other is more factual. or like, oh, what happened? And for what happened, we see a lot of companies adopting like Markdown files and then put them into the repo,
Starting point is 00:30:20 but more recently, just connecting to Slack and actual system records. Because for a Markdown, sometimes they go stale, sometimes they go out of sync with what your product looks like. So it depends on what these people want to do.
Starting point is 00:30:34 And now we are here to help for setting up like data connectors to different system record and making sure their models can work effectively with the system record. And then there are cases where it's more kind of like team setting, also like collective memory, where your whole team prefer to use that tool in one way,
Starting point is 00:30:54 but your other team in the company prefer to use the same tool in the other way. So then those, I think, were very effective just through markdown files and different folders where like, okay, this team has this folder and they just want to use the tool this way, because that's not really like a general, or behavior change, it's more like a collective memory record kind of thing. I do think the most interesting discussion that even the market is having right now is that on the second category, where it's just agent connecting with system record databases,
Starting point is 00:31:28 whether the product side of those system record are still valuable and can still charge huge amount of dollar per seat, or do you have more and more of these open data kind of platform coming in and sort of challenge the old school seat base setup. We had a lot of success just connecting agents with Google Docs and Notion and just let the agent run and then like keep track of all our engagements and making sure that those are properly counted for and making sure there's like no loose ends. But yeah, like I'm sure the system record companies can figure something out and, you know, educate me on what their business model looks like 10 years from now.
Starting point is 00:32:06 given that there has been like increased competition around inferencing i'm just curious what your take is how do you guys like i guess differentiate yourself like what is a general sort of strategy when it comes to like people asking you why should i pick fireworks versus all the other blah options right and the number options seem to just keep going up every year what has been like the main focus for fireworks for you guys like hey this is how we stand up against usual competitions happening. We're much more focused on what we focus on versus differentiation, and we're very focused on customized models.
Starting point is 00:32:42 We're very focused on helping people do reinforcement learning or model customization so then they can beat frontier models as much as we can. And then for customized models, we are also very focused on better union economics, because as soon as you control the customization stack, you can be more aggressive with different quantization techniques and different ways where you want to reduce the cost for the final model that you're training on. And I do think there are other model inference providers out there
Starting point is 00:33:12 and the CSPs are also giving to the game. And the numbers, the amount of monies are thrown into the system is mind-boggling, to say the least. So I'm not naive to think that you get to stand in one corner and not get bothered by. At the same time, I do think focus is very, very important. And we did focus on model customization.
Starting point is 00:33:34 We're dead focused on making sure that we have a really good stack to customize models and serve those customized models. Awesome. Well, we're jumping into our favorite section called Spicy Future. Spicy Futures. Tell us, what is your spicy hot take? Are things that you believe that most people don't believe in yet? Most people in the Bay Area bubble or most people in the United States, I think.
Starting point is 00:34:02 Wherever you want to focus on. We'll leave it open for you. Yeah. Because I think within the, like the 70 mile radius within like the Bayer SF, there's so many Duma. There's so many people who just think that, oh, like, everything will be done for. There will be no work for anyone in the coming decade. And my argument is that like for people in the 1800s, the AGI is already here.
Starting point is 00:34:29 The office job is the UBI program. and I don't know how many like real American farmers you know who still plan their own food and even if they plan their own food they probably use automation to do that as well. So at least I'm not a duma. I do think the people in the bubble
Starting point is 00:34:49 severely underestimate the general drive and the adaptability of people. People always have ambitions. They have goals to hit and they have drive to achieve those ambitions. Yeah. I think people in Bay Area, like, shockingly minimize the amount of creativity that the general public will have. And people always find a way out.
Starting point is 00:35:13 Like, I myself, like, I was an immigrant when I was 10. My family moved to New Zealand. And then I went to UCLA for undergrad when I was 20. And I met so many interesting immigrants on the way who barely speaks the language and can barely function in the society they move to and still figure a way out. Learn the trade they need to learn and figure out what the society needs, right? As long as there's demand, it's like impossible to, I feel like, make an argument that the EGI will always fulfill the demand.
Starting point is 00:35:52 So as long as the demand is not fulfilled and as long as human and our creatives coming up with new demand, I'm sure things will be fine. So I don't know if that's a contrary intake, but that's my honest take. The doomers are just belittling everyone else a little bit too much. AI is changing everyone's life right now, you know, one way or another, regardless where you are and how you think. So I think that depending on what your normal day has been, the impact is real, right? Yeah. Maybe to take this other, this spicy out to maybe other direction, like, what is something that you,
Starting point is 00:36:25 maybe from like even from the infar world, given that you talk to a lot of engineers and developers, right? you know, is there certain topics that you often kind of like feel like they're still stuck in? Maybe it's a specialized model. Maybe like, okay, they shouldn't even start do vibe coding or something like that. Is there something that you even like an engineer world or infrastructure world that you have feel like, okay, people hasn't fully able to see the full potential of AI yet? What are typical things, you know, to be like fully AI native? Everyone's like fully ready.
Starting point is 00:36:57 Do you see what are the typical things people? what I've been doing to adopt AI much better than others. Maybe even in the radius that we're in, done much better. Yeah, I do think a lot of, this was the advice when I was giving to some of the first-time managers at Meta. It's like you have to learn how to empower people,
Starting point is 00:37:18 and in this case, you have to learn how to empower your bot. And a lot of people are still, still prefer to do it themselves, because honestly, it's fun. It's fun to write code yourself. But as a manager for the bot, maybe your most important job is to set up CI's, set up unit tests, and making sure your bot can be effective, right?
Starting point is 00:37:36 But setting up CI is not a pleasant work for many people. But like the mentality has to change. Now you're managing a bunch of bots. So you better behave like a manager and behave in the way that you can document all your knowledge instead of just having something in your head and like just try to do yourself and be like, hey, I'm having fun here.
Starting point is 00:37:56 I'm trying to set it up in the way that I can do my work most effectively. No, like, the most effective way is to set up how to set up like 20 or 30 bots on your system and making sure they can be effective. That mentality change, I think, it's very, very hard for a lot of very good engineers because we have a lot of engineers who are like 10X engineers
Starting point is 00:38:16 where we hire into the company, and then you tell them, you'd be like, hey, like, it's actually more important to think about how you can manage a team of bots. Well, that's management skills, right? That's not really engineering skills. So I think the real archetype, that people should be shooting for now.
Starting point is 00:38:31 It's more like the tech lead manager, like in a meta term, I think it's like E7 kind of tech lead manager set up, where you're mostly figuring out how to empower a bunch of your employees and making sure you can do the work. The other thing that I always push on also is like full out of loop setup.
Starting point is 00:38:50 Because I think a lot of people are still very used to having some bot as clutches and just making sure it motivates maybe like 50, 60% of your work. I always put, on like, hey, what's the last 20% that you can have a bot just, like, answers the question and does your job completely. And I always push on that because I want to make sure for certain automation and for certain important things in the company that everyone relies on, the human should be completely out of the loop. And just make sure that you can just talk to a bar on Slack and just get
Starting point is 00:39:20 the work done. And that, that for me is very, very important because then you don't have to stay up And you really, really transform your mentality to be a manager and be like, how do I make sure this part has all the context and needs, all the tools it needs to be successful? Then you can scale beyond just automation. Otherwise, I do think people will get stuck in the, like, automating part of your job. And like, it's actually like not an optimal setup for the company. Amazing.
Starting point is 00:39:50 Well, we have so many questions we could have asked, but I guess do it, the time constraints. Tell me where, if people want to learn more about fireworks, you know, know, maybe even user product, where do people can find you in fireworks? Yeah, fireworks. That's our URL. And we publish interesting blogs every now and then. Most recently, we published a blog about numerical alignment for a large mixture of expert models.
Starting point is 00:40:17 That was a fun experience for me. So hopefully it's also a fun read for everyone listening to this part. and please go take a look. Awesome. Thanks so much. Thank you. Thank you.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.