a16z Podcast - Building AI Agents for Enterprise Operations

Starting point is 00:00:00 Voice was the unlock to many of the operations that are really needed to move the world if we talk about supply chain. This is not a supply chain specific problem that we are solving. It's actually an enterprise coordination problem. The bigger problem in the coming years for voice here is really knowing when to talk and what not to talk. So it's understanding all these nuances in the work more than making the latency faster or making the voices more realistic, which I don't think that's a limiting factor today. I feel like Happy Robot has always been at the forefront of kind of humanness.

Starting point is 00:00:34 Do you want the customers to know they're talking to an AI? Where does that go? I think it's super important that. Most AI demos happen in controlled environments. The real challenge begins when AI has to operate inside large organizations, where information is fragmented across systems, teams, emails, phone calls, and workflows that have evolved over years. Logistics and supply chains have become an early proving ground for these systems.

Starting point is 00:00:59 Systems. Success depends not just on model intelligence, but on coordination, context, and the ability to execute work reliably in the real world. Anisha Charya and Olivia Moore speak with Pablo Palafox and Luis Parag from Happy Robot, about voice AI, enterprise agents, and the challenges of deploying AI in operationally complex industries. Olivia and I are here with the two incredibly talented founders of Happy Robot, Pablo and Luis. Welcome, guys. Thank you, guys. Super excited. Very excited to have you. We're overdue to have this conversation. Well, look, we're here to kind of talk about the company and the incredible journey that you've

Starting point is 00:01:37 been on. I know when we first met you, there had been a lot of buzz amongst YC founders and other folks about how you guys are sort of at the edge of the technology and then really getting a lot of pull from a go-to-market perspective. So maybe take us back in time to the little office that had four or five people on 20th street and what the origins of the company and the product were. 100%. So Luis and I met on a lot of the company.

Starting point is 00:02:00 our second day of college, just to set the scene, ever since we've been building stuff together. Our other co-founder, Chavi, he happens to be my brother, so I've known him for a little while. We always wanted to build something together, right? So when we got into YC, we were looking for complex problems we could solve. Keep in mind that Lisa and I had been literally building submarines for robotics competitions to find mannequins underwater. That is the sort of problems we were looking for. So when we decided on solving for that complexity, we looked at what Chavi was doing as a CFO of the largest olive oil distributor in the world. He was literally moving tons of olive oil across the ocean.

Starting point is 00:02:39 And that was that complexity that drew us into logistics and supply chain. He literally had to hire insurance to call drivers to see where they were, to see where the shipment was? Because Walmart was asking him, where the hell is my shipment of olive oil? So that was the sort of problems that we wanted to tackle. And maybe you can talk about why we actually started with voice there. I guess we took it from a very tech-driven approach.

Starting point is 00:03:02 Really, the limiting factor back then was having an agent that could speak on the phone realistically. Like, we were in conferences, how he was like traveling all around, like asking people, hey, if we were to create a voice agent that could pick up the phone and sell these loads and track these shipments, would you buy it? And it's like, dude, of course, this is a no-brainer. I just don't think you can do it. So it was more so like the idea market fit or product market fit made sense from the beginning. It was more so like, can we prove ourselves, we can build this technology?

Starting point is 00:03:28 And you know, LLMs were picking up. We're talking about late 20, 23, probably. LLMs were, like, decent enough. Eleventh was picking up with the X2 speeds and everything was kind of working together, but we had to build something that could actually connect all the dots and actually make something work, no? That kind of shaped our company where we really had technology

Starting point is 00:03:46 and innovation as the core of our company and always pushing this frontier and on. Solve it probably like firsthand. So that's how we got starting in the OIST phase. Amazing. One of my favorite memories of working with you guys is actually when we first met outside a very crowded coffee shop, and you called one of the live voice agents, and it was seamless,

Starting point is 00:04:03 and it did an incredible job in a very non-ideal environment. I feel like a lot of people might know Happy Robot from your amazing demo videos of the voice agent. And that's definitely not all the product is, but it's an important part of it. So maybe walk us through like, why voice to start, and then what voice is maybe unlocked for you more broadly. Yeah. What Louise was saying is very important. Voice was the unlock to many of the operations that are really needed to move the world if we talk about supply chain.

Starting point is 00:04:33 So when we were going to these conferences and people were like, know we're going to build these things that talk on the phone. Negotiating rates on shipments was actually a big one. So we actually fine-tuned LLMs back then. Like we fine-tuned Mistral and Lama to actually make those voice agents faster because otherwise using some GPD-4 at the time was like extremely slow. And GP2 3.5 at the time was like terrible at reasoning and actually negotiating. So we had to do a lot of tricks behind the scenes, build our own agent infrastructure, if you

Starting point is 00:05:06 well, but also build our own voice agent capabilities so that we could innovate faster than competition. And that actually gave us a really good edge in logistics and transportation in the early beginnings. So we started working with these freight brokers. Then we expanded to these freight forwarders, then ocean carriers, then tracking companies. And today we actually serve many of the largest companies in the space of supply chain. We were discussing before, no, nine of the top 10 freight brokers in the U.S., seven of the top 10 tracking companies,

Starting point is 00:05:34 like some of the largest fleets that actually move our goods everywhere in the U.S., which is crazy. Two of the largest ocean carriers, those big boats we see in the bay. That is sort of customers that we needed to build for and where voice was the analog for many of the operations. So it sounds like it wasn't just voice. It was also voice plus negotiation. So perhaps track and trace, which is customer support and sales, which is sort of this negotiation is where we started. And I think that forced us to build a deeper set of technology than we otherwise would have built. Maybe, Luis, take us on the technology journey a little bit.

Starting point is 00:06:05 Yeah. So before I tackle that, I guess one of the things that we had very clear from the beginning when you're working on the frontier of technology is really what you have to reinvent versus what already exists, no? And I think people might take an approach where they just reinvent everything just for the sake of it. some people would just wrap around anything else and be like more of a go-to-market thing, we started like tackling the limiting factor always. And again, back then, GPD, Pablo mentioned, like 3.5 was relatively fast and not so good. So we had to find you on the LLM. Soon enough, we realized that prompting and all these good models came out, prompting was good enough, scratch that, let's do that. And always focusing on that limiting factor,

Starting point is 00:06:41 then voices, like the background noises, like supply scene is extremely messy. You're talking with drivers in their trucks with the radio on and background music and noises and accents. So always focusing on those limiting factors. On the negotiation part, something we got very often was, how do you prevent the bot from hallucinating a rate or like max by? He's like, dude, I'm building this thing and it's just hallucinating max by and it doesn't know how to negotiate. How are you guys able to do that?

Starting point is 00:07:08 And I think it's because you don't need to show the AI why it doesn't need to see. And I think we're very opinionated about this from the beginning where we're building these proxy servers and actually exposing to the agent only the things they need to see. And actually max by, the max amount of money the bot can actually see or actually negotiate, it's not even exposed to the bot. We were not exposing that. We were doing external negotiation algorithms so that the bot would just ask for permission, literally the same way a human would, like, hey, let me ask my boss.

Starting point is 00:07:34 And it was really just calling a tool and asking for permission to do more. And we would inject back the rate, no? So those sort of things, instead of like just putting in the context, we're not having the LLN just freestyle it, We do it in a more deterministic approach. So it's always a mix of probabilistic plus deterministic where you need to let everything to the AI, no? It's building for the real world.

Starting point is 00:07:53 The real world is messy. Those things are going to happen where someone tries to jailbreak the agent and get that maximum of money that they can get. But we needed to build those guardrails very early on so that we could actually go to the likes of C.H. Robinson or Uber Freid and all of these big players that would only trust us if we actually were building real technology.

Starting point is 00:08:15 It was pretty clear for us. We knew that we didn't want to focus on the long tail of logistics and transportation because it's a very tricky space, but we knew we needed to serve the enterprise in transportation and supply chain and logistics. So that was very clear. That shaped the type of products that we had to build, the type of primitives that we had to build. It's so interesting because Happy Robot was very early to both voice AI and enterprise agents more broadly, which is great. And also, it's like the ground has been shifting under our feed because the models are themselves are kind of changing and evolving. So rapidly, to your point about fine-tuning versus prompting versus kind of what to do next, maybe we can talk through a few

Starting point is 00:08:55 of the use cases you have where it's very clear that a smarter model by itself doesn't just do it and why you need to buy a platform. I can bring up the Coon and Agil use case. We recently announced our partnership with the Marquis like Freight Forwarder, great partners. I was having a personal lunch with their head of air. Shout out to our friend Inve at Kununegel at his housing. And what I learned from their operations is that this is not a simple customer service type of create a ticket in Zendesk and you're done or you reply based on a knowledge space. Customer support for these real economy industries like logistics, transportation, freight forwarders, broader supply chain, even other industries like the telco space or the utility space.

Starting point is 00:09:43 it's not as easy as just replying base off of a knowledge base again. There's a lot that happens afterwards that really has to get done to provide that update to the customer. So example, freight forwarding, Kununagel, they are serving customers, very large customers, I cannot name who,

Starting point is 00:10:03 but imagine that you are a big customer of Kuninago and you ask, hey, where is my shipment? What happens now is an agent has to turn around and go find it. That goal find it is very complex. You need that coordination. You need basically an orchestration agent that is, okay, this is an air shipment.

Starting point is 00:10:21 So obviously relates to airlines. Who is the airline on this shipment? Okay, let me go to the airline's website. So we have browsing agents that go and scrape the website of the airline. Oh, bummer, it's not there. There's no update. Damn, I need to go send an email. Okay, I'm sending an email to the airline.

Starting point is 00:10:39 Two hours later, no reply. Okay, I need to reason that if they don't reply now, going to miss my SLA with my customer. So now I need to call them. I'm going to keep calling them until someone picks up at the airline and tell me where the hell is my shipment. So that is a sort of coordination that we need to make happen for transportation and logistics, really.

Starting point is 00:10:59 And that has shaped the type of product that we had to build at an IV. Yeah. No, I subscribe everything. I guess another example on the negotiation, which is always like how we started and all these demos when pre-viral. And I guess one point of how raw intelligence really wasn't enough is when you're negotiating, for example, loads and there's like 10 carriers or 10, like, a buyer's calling it at the same time, you cannot have all those agents like doing work independently, which is what happens to a certain degree with like humans. They're on a floor and sure, they shout to each other and

Starting point is 00:11:36 they're like, hey, this is a very hot load. Please negotiate hard. I have someone interested, you know. All this information is really not in the model. So what we started doing is when you have inbound calls for the same load, you can start like sharing context across them. Like, hey, I have someone. They're very interested. Please push harder.

Starting point is 00:11:53 Like, this is a hot load. So all this information sharing is literally what you put in the context window at any point in time. Like general intelligence or the raw intelligence doesn't really know if someone else is calling on that load. So it's all that about like, what do you know of the business? What do you know about the negotiation strategies? Maybe you know that pushing harder on this load because it's like cross-border is going to be better or whatnot.

Starting point is 00:12:14 Like that's not general intelligence. That's very specific. And different enterprises operate differently. Like you cannot just build an agent, fine-tune it and have it work at any type of company. All those nuances is outside of the model. And it's that context later that we're trying to create. No, and that actually like we can talk about how actually doing the work and executing the work is what gets you that. It's like learning by experience.

Starting point is 00:12:35 You do something and you learn and you explore that space. of the context later so that you can keep learning, no? Really interesting. So you talked about two different things there. Pablo, you just talked about a very cross-functional workflow. Luis, you talked about the complexity of really mastering sales, you know. You guys started as sales and support. And so what are some of the other surprises that you've had,

Starting point is 00:12:58 having started with more complexity, I think, than some of your competitors? So one thing that we heard from one of the largest tracking companies recently was, typically when we buy technology, we see where we can apply that. With you guys, we actually have a problem and we come to you guys with that problem because we know that as a platform that you've built, we can pretty much build any type of agent for our operations,

Starting point is 00:13:22 from sales through customer service, back office support and operations and even collections. So some of the use cases that customers came to us were, hey, we have a huge collections problem. Can you build an agent to reach out a customer? via email or voice and collect money. We're like, of course. We talked about these use cases

Starting point is 00:13:43 with one of our largest supply chain companies and customers where we need to call customers to recover duties on parcels. And today we're running campaigns of 20 to 50,000 daily outreach to customers, collecting duties on parcels that otherwise they would not get if they don't pay the duty on the parcel. So that that's,

Starting point is 00:14:06 that sort of surprises, if you will, we've gotten from customers. No, like, yeah, I also need to recruit drivers. Can you do that? We obviously can build an agent that not only just recruits drivers, actually connects to the operation so that now they know they can service a truck with a customer earlier because now they have a driver to move that. So there's all sorts of interesting connections between the functions. Maybe I'll give you another example.

Starting point is 00:14:34 we built an agent to reach out to maintenance shops to see where a truck or when a truck was ready. You could just leave that agent in a silo and just have an agent that is practically reaching out of those repair shops to see when the truck is ready. Well, it turns out that the sooner you know when the truck is ready, the sooner you can put it in the market to sell it as capacity for your customers to actually move things. So that was a very interesting realization of how sales in these case and maintenance were tightly connected. So that is the context that we talk about. There has to be an underlying context sharing across the different functions in a business. So that the whole business optimizes for a global maxima, if you will, or a global minima, depending on what optimization problem you're trying to run, versus just minimizing the problem in one function, if that makes sense.

Starting point is 00:15:28 And then maybe can you talk about, like, how do you discover these workflows? Who discovers them? Who builds them? How do they get built? I mean, maybe Louise, talk a bit about that. Yeah. So we're very forward-deployed. So we were early on understood that really to solve the customer's pain point,

Starting point is 00:15:45 we had to build software that adapts to their operations and not the other way around, which is like the old era before AI was you build something and ask people to like run their business, however you think they should be run. but we think it should be the other way around. So from the very beginning, we started like hiring and building this forward deployed motion with FDEs, forward deployed engineers, like everyone is talking about them now.

Starting point is 00:16:09 But I think it's about like really being customer obsessed and really focusing on like the value at and their problem and really sitting down with them and going to their offices and learning what they need. So sure, there's a lot of like synergies in the industry and what you learn from a customer might be relevant. to another, but very soon we realize that there's not a one-size-fits-all. Even within inbound carer sales, even within this workflow, in the enterprise,

Starting point is 00:16:36 maybe there's like a long tail where this might apply, but enterprises operate very, very differently. And that's why we build a platform that is flexible enough to adapt to anyone's operation. And it's because we were trying to plug and play what we built somehow with a customer to another one. It didn't really work. Like they want something different.

Starting point is 00:16:54 They want to change the procedure. They want to call these tools. They want to escalate whenever the carrier is not vetted and someone else wants to do it automated. So we really had to build almost like horizontal technology because of the variety of all the nuances in this industry, no? And that's how we create a platform that is not optimized for like specific tasks, but more so optimized for like doing work.

Starting point is 00:17:15 So our primitives are around workflows and data and integrations and, you know, SOPs, prompts. You don't see like particular tasks being modeled because that, that's almost too opinionated. And customers don't want, like, opinion, like their vendors forming opinions how to run their operations. Like, they've been running it for a long time.

Starting point is 00:17:35 They don't know more about their business than I do. I just come with the technology, and I just want to, like, solve their problems, no? Yeah, absolutely. I feel like the forward deployed motion has been crucial for AI application layer businesses. And also it's prompted a lot of questions about, what are margins, what is, like, a service versus a product,

Starting point is 00:17:54 but kind of where is the long-term alpha and moat? Maybe walk us through how you think about productizing the work that your FTEs do, which I think is kind of a unique strength of happy robot. Where does the forward-deployed motion start and end? Do you do custom work for one customer? I would love to understand how that works. Maybe let's start in the beginning. Yeah.

Starting point is 00:18:17 I was the first forward-deployed engineer without knowing it, I guess. Yeah. Which is pretty much what any founder would do, you know? Like, you just go to your customers, spend. a week there and just chase down the people that are actually doing the thing that you want to help them automate, right? So I did that and I would be like pinging these guys like, do it. You need to build this thing because it's going to make my life a lot easier and he's actually going to be replicable across customers because I've seen it. So please build it. And he would be like,

Starting point is 00:18:42 really? Do I need to build that? So there was that good tension between kind of that forward deployed motion and the product team. So we kept going with these like separate worlds for a little bit where I would be leading the FDE team and the deployment strategists that we realized at some point we actually needed. That was a bit of a realization, a bit of a parenthesis here. We started just with forward deployed engineers,

Starting point is 00:19:05 and then the customers like, wait, you have these people building, but who is managing? I'm like, I guess that's like the deployment strategy to some degree. So the deployment strategy is a figure that scopes the problem so that the forward deployed engineer can spend more time on building,

Starting point is 00:19:21 although now what we see is that the right FD or deployment strategies, they have to be very cross-functional. Close parenthesis on the type of profile. So what ended up happening is we were too disconnected from like the forward deployed world and the product team.

Starting point is 00:19:37 So we realized that that needed to be part of Luis's world so that the FD team would actually be an extension of product, which is what they should always be. It's an extension of products so that we can implement product faster. We can gather the feedback faster.

Starting point is 00:19:53 from the customers, and hence capture that context faster than anyone else. So it's a bit of these iterative loop that we let Louise really realize we needed. Yeah, I'll add to that. I mean, if we go with like first principles, what are we doing? We're deploying agents across different functions and channels in the enterprise. So our product is built for the deployment of an agent. Like we really understand the deployment lifecycle because we work very closely with our with our customers and we are actually deploying these agents,

Starting point is 00:20:26 and something cool that happens is that you have, to a certain degree, your user in your house, because we're building for the FDE for the most part. And it's not entirely true. Of course, some enterprises, they really appreciate having a platform and we can talk about that later, about how interesting the mix of coming with a platform they can also use and an FDE that they can trust.

Starting point is 00:20:46 It's actually something very rare, and they mentioned that. But I guess to the point of like the deployment lifecycle, we really understand what, what it takes to deploy an agent, no? There's a scoping phase, there's a building, there's testing, there's monitoring, there's like a self-learning loop. So I guess the point is

Starting point is 00:21:01 every feature we're building in the product is optimized for that deployment lifecycle, and the only way to know if that works is being very close to the deployment, no? And if these are doing these deployments, or they're getting feedback from the other team. So actually, more than the FD is being very close to the product,

Starting point is 00:21:19 I think it's more so like our product is a combination of a platform and a forward deployed motion, and it would really not exist. And there's like this conversation about like services and stuff. The difference is that the forward deploy engineer are like catalyst or accelerators to value,

Starting point is 00:21:33 but what we're living in the customer are like agents running. There's a platform. Once the FDs have done the work, they leave a working thing. So it's almost like you spend that time, you deploy a thing, but you're not delivering like an output

Starting point is 00:21:47 that the FDE has done. You're literally delivering the agent working on a platform. So it's a very difficult. different distinction of pure services versus like a forward deploy implementation plus the platform running the value forever, hopefully. Awesome. I feel like another thing that has been a topic of discussion is kind of what is the value of

Starting point is 00:22:09 systems of record in the AI era? Does every application company need to become one of those? And I know you at Happy Robot have a view on kind of systems of record versus maybe systems of action or systems of execution. So we'd love to talk through kind of your view on that topic. Maybe I'll start quickly. We see ourselves as that layer of execution,

Starting point is 00:22:31 really. That's where the magic happens. You have to start doing the work to capture that context. So it's very important that we start with executing work, with getting the thing done, implementing one agent,

Starting point is 00:22:43 implementing the second agent, connecting them through that context layer. But the context layer happens after you're actually doing the work. They're more than ever, the importance is on the execution layer. So for us, and Louise can comment on that more, that data piece is a very important piece,

Starting point is 00:22:59 but it happens after the agents. So what we've built is twin. Twin for us is really that data layer will reconnect systems of record of the customer, your CRM, your ERP, your transportation management system, whatever it is, your Snowflake instance, and where agents can also populate their own or restore their own context.

Starting point is 00:23:22 It's almost HapBot native data points. So we've basically created these data layer that holds both customer records and HapBerabot agent created records, if you will. Yeah, I think there's an interesting tension in how much time you need to spend ahead of deploying the agents on clean the data versus just deploying the agents

Starting point is 00:23:45 and cleaning the data through doing the execution. And I think it's a mix. I think what we realize is these agents are creating a lot of information that really hasn't been captured before. And it really doesn't fit in any of these systems. Because it's more like high dimensional, semantic, almost like memory intelligence. So I guess the point is many enterprises, I guess, are waiting to clean their data sources so that they can power this workforce of agents. And I think by doing the work and by actually having agents,

Starting point is 00:24:17 execute the work, you're going to clean the data as you go. Because humans are great, of course, but they have a lot of limitations. Like they kind of be in the same place, in two places at the same time. They drop a lot of threats. They're not very diligent and putting the data in the right system. Like sometimes you forget, sometimes you write it down. So actually you can clean all your data sources and then you can still run with humans and it's actually going to probably get dirty very, very soon. The good thing about it is very deliant where it puts data. So it's through the process of executing work, you're going to progressively start cleaning all your data sources because you're going to get visibility into all these things, no? So not only are you connecting

Starting point is 00:24:54 the data, like the systems of record, like rows and columns and different entities, it's more so creating relationships across them. So again, the shipment in the TMS is just a record. Like, that's really not the IPs. How, that record might exist in many different enterprises. Like, it's latitude, rate, whatever it is. Like, that really doesn't mean anything. What means something is, how an enterprise is going to, what the enterprise is going to do with that. Like once it gets into the system, how their processes are built, how their humans are going to deal with that. So all that is really not in the system.

Starting point is 00:25:28 It's more so, like, in people's brains, a lot of these contexts is like tribal notice, the operators' whole. And so a certain degree, they, it's not, it's super fragmented, no? So actually, by doing the work, we're going to learn a lot about this more conversational record or intelligence, but also we're going to start, like, cleaning the end systems of record just by doing the work very consistently, now? You know, Luis, that's such an interesting topic because it's sort of like my intuition,

Starting point is 00:25:53 my naive intuition is that information about execution is maybe ephemeral or the value that decays over time. But I think what we're describing is how the value actually compounds over time and maybe that actually enriches the information in the system of record. Which one is true and why? Yeah, I mean, I think you're, so what you're doing by doing the execution, as I said, is one, creating a better understanding about the relationships

Starting point is 00:26:16 of all these different entities. So you're starting to connect the TMS, the CRM, the ERP, the snowflake, the notion page you have, the docs, everything is so disconnected. You're going to start connecting it, but you're also going to start enriching the relationship to how to deal with those particular records, no? So I guess the compounding comes from like two angles. One is having clear or cleaner data sources, like literally the data points is going to make everyone's life easier, but also understanding how to relate those different entities across the business, no? So I think it compounds from multiple angles. And then how much are you, initially I imagine you're capturing the way work has done on day zero,

Starting point is 00:26:54 but over time you're changing the way that work has done. What is that interaction like? Yeah, and I think if you think about it from a context perspective, the FDs are really just seeding this state graph. Like if you try to model the business as a world model or a model of the business, you need to see it somehow. You can just put the agent to work from the zero. But then there's a point where there's a flywheel

Starting point is 00:27:19 where like the second and third and fifth deployment takes less time. But I think the FDs are the ones like going to the business and starting to seat all this context layer and actually leaving it there for like learning and the second and the third one. So there's always this call start problem. And we talk about like fine tuning SLMs in the future

Starting point is 00:27:38 like reinforcement learning and all that stuff. I think that really doesn't make sense if you didn't have the basics and you don't have the first and second agent in production. And that's why these are so important to, like, actually start this flywheel. They would go there, go there, interview

Starting point is 00:27:52 the operators, get all the specs and actually put those first agents to work. And from there, the system is going to start learning and getting all these contexts and sharing it across functions and across channels, no? I feel like if I was trying to train someone to do my job, the context that they need

Starting point is 00:28:08 does not live in Salesforce or any traditional software system. It probably would live in needing transcripts. And emails and casual conversations and even things that software can't capture, hasn't captured. I know you guys have this concept of the pyramid of complexity and how starting with some of these primitives allows you to get into more and more complex work over time. Maybe we could walk through some examples of the type of work that happy robot agents can do. So the pyramid of work, as we define it, is essentially the easy, repeatable,

Starting point is 00:28:44 low-hanging fruits type of work at the bottom. Think about an easy B2B sales call, an easy customer service type of operation, some payment collection type of work, kind of the highly-repetable, easily automatable type of work. One thing that we've already talked about here is how those actually interconnect,

Starting point is 00:29:07 which is very important. Like you might have these disconnected or siloed functions today in a company, but very important to keep in mind that those are actually very connected. And going back to the pyramid of work, what you have at the top is the deep, complex work that is highly strategic, that is almost the information that the CEO of that company needs to make decisions. So when we think about the work that we're doing with our customers, we might start at some, we might start somewhere in the bottom of the pyramid, but very,

Starting point is 00:29:44 fast we're going up the pyramid by combining those agents from sales and customer service and collections, combining the context as Luis was saying so that you build on top and you build on top of every layer so that every decision you make is based off of more context across the board. When you're talking to that customer that has a complaint, you might want to remember that you already upsold them last month. And sometimes human agents might not even remember that when you're talking to a driver that had an issue at his delivery two weeks ago. You might want to remember that from the operations team because maybe now you're more lenient with the rate that you are given them.

Starting point is 00:30:23 Those things are highly interconnected and you need to build on top of them so that you grow into the strategic type of decisions. Yeah. And I would add that the real, my opinion, the real economic leverage and value for the enterprises really lifts at the top of the pyramid. Like those are the decisions that are less volume. Like if you think about it at the base, you have much more volume. At the top, you have fewer decisions that are actually going to drive the outcomes of the enterprises.

Starting point is 00:30:50 And we keep talking and hearing about like outcome-based pricing or consumption-based pricing and whatnot. I think really if you reach the top of the pyramid is where you really make decisions that drive the revenue of the company. But you cannot start at the top. Like those decisions are highly contextualized. The same way you can probably not be the CEO of a company if you don't understand anything, happening below. So actually the only way to get to the top and make those decisions by actually capturing all the context underneath. And that's where everyone is getting stuck at. Like, everyone is focusing at that base. It makes sense. It's already to a certain point being

Starting point is 00:31:24 commoditized. Like, those are simpler tasks and people keep talking about, like, you know, like, AGI and general models being able to, like, automate that work, maybe. But the point is, if you get stuck at a corner of that base, you're never going to climb that pyramid of complexity. because in order to claim, you need to actually capture context across channels and across functions. We've mentioned this a lot of bunch of times now. When I was explaining the example of a negotiation, I was talking about phone calls, but what if you get an email from another carrier actually putting an option? Like all of a sudden, what if the voice agents don't know that there's an email coming through for the same load?

Starting point is 00:32:00 Like it's the same information. It doesn't matter the channel, no? And also what you learn from that carrier is the same, like the same customer you have or the same carrier you have when you're tracking a load or doing all these things. So if you focus on automating this part of the base, that one corner for everyone, you're probably not going to be able to climb this pyramid of complexity. So it's our creating a unified understanding of the business

Starting point is 00:32:21 in order to start climbing that pyramid of complexity and going to like the deeper complex decisions that actually drive economic value for the enterprises. Really interesting. Maybe you guys talk about how that opportunity has set you up to be pulled into other markets. Now we're starting to see Poland financial services, utilities, telecommunications. So why is the work that we've done in supply chain applicable to these other markets?

Starting point is 00:32:46 With DHL, we've deployed over 40 agents across 80 countries, agents that are sharing contexts across regions and functions. What I realized, what the team realized when working with DHL and many others like Kuninagle or CMA-CGM, second largest ocean carrier in the world, was, wow, this is not a supply chain specific problem that we are solving. It's actually an enterprise coordination problem. When we think about ourselves as a startup

Starting point is 00:33:18 or like 120 people, you know, we might have like some like miscommunications here and there, but really we don't have a coordination problem in the company. You can easily reach out to the people involved and you just ask questions. That doesn't happen in a company as big as DHL or FedEx or Deutsche Telecom or T-Mobile or Telefonica, these massive enterprises that have hundreds of thousands of people

Starting point is 00:33:43 just coordinating it in work. We recently started working with one of the largest utility companies in Latam and Europe. They have over 10 million customers, dozens of thousands of employees across the world. How on earth are they going to know real-time, how to best serve their customers when they themselves don't even have the tools to interconnect quickly and to share context across them quickly.

Starting point is 00:34:09 So what we realized is we were not really solving for a supply chain problem. We were solving for the coordination problem of the enterprise. Think about a utility, receiving a customer call with someone complaining about a leaky boiler. First of all, you should already know that that customer already had the problem 10 days ago. That's for sure.

Starting point is 00:34:27 Second of all, you should also know that the technician you sent was not the right technician. So now in this second attempt to fix that boiler, you need to send the right technician and the technician that is best suited for that particular boiler type. So that is now on the operation side potentially, or you could frame that as an operation type of problem, no, versus when I started with the customer calling in,

Starting point is 00:34:50 that's more of a customer service type of problem, right? Again, to the point of how these functions are interconnected. But what happens after that technician is being dispatched to the customer's house? Well, now you have an additional layer of coordination between a customer and the technician and the company that is lending the trucks to send that technician. That is that coordination problem that we saw in these industries in the real economy. Operationally complex businesses like utilities, oil and gas, telcos. So we're now seeing this pull from the market. We're already working with in POS, three of the largest telcos in the world.

Starting point is 00:35:26 We're being pulled into home and auto insurance because the sort of coordination problem of, dispatching a tow track to help you when your car breaks down is very similar to when a trucking company has a broken truck. That sort of problems are repeatable across the real economy, if you will, when there's this coordination problem across customers, partners, and your own employees. I think this market, too, has, you know, broad-based voice-first customer support agents. There's the models themselves in voice trying to move into business. being agentic. And then there's more verticalized solutions that can move more horizontally.

Starting point is 00:36:07 How do you think about, like, what is a happy robot-shaped problem and where does that expand into over time versus what are problems that are maybe less interesting for you to tackle longer term? Yeah, I would say highly communicational. Like, and actually more than communication, like interface of work to, like, interface to the external work. meaning also like browsing a website to like retrieve the ETA of a shipment is some sort like interaction with the outside world. Voice to a certain point is a soft API as we were talking about. Same as an email is a soft API or a website is a soft API.

Starting point is 00:36:44 Like when you're exchanging information between systems, of course an API programmatically makes more sense, but sometimes that doesn't really, it's not the case. So however we can help move the flow of information between systems via voice, email, browsing a website or whatever it takes. And also when there's this high complexity when the decisions are like contextualized and it's not like the SOPs are not super clear, no? I think that's the bigger point where sometimes

Starting point is 00:37:16 the enterprise doesn't really know themselves. Like people don't know what they know. You can ask them what they're doing and it's like, well, I'm doing this, but they really don't know the specificity of what they're doing. So it's actually through doing this execution of work that we're learning a lot about how these companies operate. So when the SOPs are not clear and it's like super communication driven,

Starting point is 00:37:36 I think that's where we shine. Really cool. Luis, I want to actually pick your brain a little bit about the voice models themselves. Many of the other companies that we may overlap with rely on 11 labs, which is a fabulous technology where, of course, investors in 11. You guys have done a bunch of your own model work. Why, what are the kind of tradeoffs of, you know, a vertical model versus a horizontal model?

Starting point is 00:37:58 Maybe take us through a bit of that. Yeah, my 11 is great. We actually used them for a long time and they're great, of course. I guess to the point before, I were always focusing on the limiting factor and seeing what do we need to do to solve the current problems of the market. I guess we started very soon realizing how there was a problem in turn-taking detection. Like end of turn is probably the biggest problem in voice AI. And we realized that very early on because everyone was focusing on making the latest. lower and making the voices more realistic.

Starting point is 00:38:32 And that's fine. But I don't really think that's the bottleneck right now to deployment of these agents, not even the intelligence. Like model capability is high enough. Like we're using models in certain uses that were released like two years ago. Right. Like sure, like everyone is like pushing the frontier

Starting point is 00:38:48 and increasing context windows and making more reasoning budget. And PhDs doing customer support now. Exactly. Like everyone is waiting for someone to release like a 10 trillion talking context window to like do whatever. Like, we were using models from one year and a half ago to call drivers and ask if they're going to make it on time. You don't need PhD-level intelligence for that. I guess the point is, as we make models faster, we realize how important the conversation handling and the flow of the conversation is.

Starting point is 00:39:14 If you think about it, the faster the models get, the more you're going to interrupt. And the harder is going to be to have a normal conversation. And actually, if you think about it, the bigger problem in the coming years for, like, voice air is really knowing when to talk and when not to talk. And sometimes you need to speak fast. Sometimes you need to wait because a person has not done talking. Sometimes you might need to stop and think. And that's something that the models are not today very good at, like really stopping and knowing when a question is hard and when they need to like probably trigger a reasoning thread that is more async and just think about it and say something like, um, and really be thinking, not something you put in the

Starting point is 00:39:51 problem because it's cool, but just literally have them think, no? So it's all about understanding the conversation, when is it my time to talk and what should I say, no? So we invest a lot in this end of turn interruption handling, filler detections, background noises, like if my mom is speaking at the back of the car, the butt doesn't need to know or interrupt, no? So it's understanding all these nuances in the work more than making the latency faster, which of course we can be improved or making the voices more realistic, which again, I don't think that's the limiting factor today. Yeah, it's interesting. It feels like we're at the point where the models are so good that as they get better, especially with voice, it actually takes us

Starting point is 00:40:35 further away from humanness in some cases. Like the latency is too fast, or the interruption handling is too sensitive. Like if someone says a filler word, you don't necessarily want the model to react. You want it to keep talking. I feel like Happy Robot has always been at the forefront of kind of humanness, how do you think about how that shapes product development? How do you think about what that looks like five years from now? Do you want the customers, the end customer, to know they're talking to an AI? Do you want it to feel like a perfectly human experience? Where does that go?

Starting point is 00:41:15 I think it's super important that the experience remains as human as possible, even if you say that it's an AI. We're now live with hundreds of thousands of end customers or end users talking to our agents, not only via email or chatbot or website, whatever it is, but mostly through voice. Like voice is one of our more like one of our primary channels. And one thing we saw is even if you say it's an AI, even if you disclose at the beginning, hey, Mr. Driver, I'm an AI agent. I'm calling you because I need to know where you are.

Starting point is 00:41:53 At the beginning, they might be like, what do you just say? But then very, very soon they forget. And they forget in a good way because they are now just having a normal conversation with a system that is smart enough to not make their life or their day even harder than it was already before. So I think the conversationalist, the conversationalness, the human-like capabilities are very important to make technology work. So for us, the product is shaped around that experience. Some people were telling us at the beginning, like, no, you don't need these agents to sound superhuman.

Starting point is 00:42:32 Why are you investing so much on the text-speech? Why do you care if the agent just mispronounces a load number, a shipment number? Like, what do you mean? That's the whole point. You want the experience to be as good as possible. So it's very important that we continue building towards a really human-like experience. Again, voice is obviously a primary channel for us, but even across the board, like, everything should feel human. Everything should feel just a very natural exchange of information as we were discussing before.

Starting point is 00:43:03 We're just trying to build an AI workforce that is almost colleagues to the employees in these companies so that they almost collaborate together. That is very important to the DNA that we're building in Haberoa. It almost goes with the name, if you will. Like there's that, there's, there's that, um, human-like sense in the product we build for our customers. Yeah. Well, you know, Pablo, to build on that, it also strikes me that you make the employees, the human employees of many of your customers also more human. And so far as, you know, I think it was Keeley was telling me a story about DHL and Home Depot

Starting point is 00:43:40 and the folks that had previously spent all week on a phone trying to just schedule deliveries with Home Depot, we're now taking folks out for dinner and building deeper relationships. Maybe talk a bit about what is the future of sort of humans and agents working together in these enterprises. It's a bright feature. It's a very cool feature because a lot of the work that we're helping our customers automate is work that no one really wants to do. Think about collecting payments from customers. Would you really want to be calling your customer to be like, hey, like, you know, like this invoice is past doom, man? like, are you going to pay?

Starting point is 00:44:16 Who wants to be doing that, right? Who wants to be calling a list of doorman accounts to see who would want to ship with us or who would want to be picking up a call from an angry customer that whose delivery was late or whose technician broke the boiler or whose technician didn't fix the router? That is the sort of problems that agents can help your human teams alleviate so that, again, your humans can actually take that steak dinner with your customer and work on building up the relationship, not unfixed in the operational problems. That's the problem space we're looking at.

Starting point is 00:44:56 The operational complexity that these businesses have that no one really wants to do, but that has to get done. Thanks so much for joining us today, guys. We know you are very busy serving a lot of very happy customers, and there's so many more exciting things to come for Happy Robot. Thank you so much. Thank you for supporting us all the way. Thanks for listening to this episode of the A16Z podcast. If you like this episode, be sure to like, comment, subscribe, leave us a rating or review,

Starting point is 00:45:24 and share it with your friends and family. For more episodes, go to YouTube, Apple Podcasts, and Spotify. Follow us on X at A16Z and subscribe to our substack at A16Z.com. Thanks again for listening, and I'll see you in the next episode. As a reminder, the content here is for informational purposes only. should not be taken as legal business, tax, or investment advice, or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any A16Z fund. Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast. For more details, including a link to our investments, please see A16Z.com forward slash disclosures.

a16z Podcast - Building AI Agents for Enterprise Operations

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.