Latent Space: The AI Engineer Podcast - Brex’s AI Hail Mary — With CTO James Reggio

Starting point is 00:00:00 We have like three pillars for AI strategy. We have our corporate AI strategy, which is how are we going to adopt and like buy AI tooling across the business and basically every single function to be able to 10x our workflows. And we have our operational AI strategy, which is how are we going to buy and build solutions that enable us to lower our cost of operations as a financial institution? And then the final pillar is the product AI pillar, which is like, are we going to introduce new features, that enable Brex to be a part of the corporate AI pillar of our customers. It's like we want to build features and be a solution that somebody else is saying to their board, hey, we adopted Brex, and this is part of our corporate AISR. Hey, everyone, welcome to the Light and Space podcast.

Starting point is 00:00:53 This is Celessio, I'm joined by Swix, editor of Layden Space. Hey, hey, hey, and we're here with Gives VergeoC2 at Brex. Welcome. Hey, thank you for having me. Thanks for visiting from up in Seattle where I've been a little bit. It's cold up there. Yeah, and we have an atmospheric river hitting the city right now, so a lot of blowing. Yeah, well, yeah, we're getting the full-on winter effect right now.

Starting point is 00:01:15 Well, you're here to, we talk about the sort of AI transformation within Braxton. There's a lot of interesting tip-bits that we were going to draw from your article, but also your background. You've got a wide array of experience from Stripe to Banter to Convoy. And I think also mostly I'm interested in your journey as one of the rare people that have transitioned from like a mobile engineering leader to a CTO, which I think is also a bit more rare. I used to have this comment in a past where there's a career ceiling for people who work on client-only things, where usually they don't hit CTO, whereas they typically promote the backend people, the backend clouding for people to CTO. Yeah, you know, it's something that I hear fairly frequently because there aren't that many

Starting point is 00:01:59 folks with a front-end background who reach this level of leadership. And it's exciting for me to be able to represent that group. But I'll say that even though my resume kind of reflects that I've been more on the front end of things. It's probably more my experience as a founder a couple times over that actually helped me get to this level of my career working for somebody else, becoming the CTO is very much like a leadership and in like general business role as much as it is a technical role. And so I think it was more the skills that I built from starting companies and trying to build those up made me a decent fit and enabled me to get the nod from Pedro to take this on as my predecessor left about two years ago. Yeah.

Starting point is 00:02:36 One thing, I'm curious to you guys' commentary, this is a little bit broad, unscheduled, but a lot of startups are bragging about how many ex-founders they have. And yes, to some extent, you want people with the founder mentality and agency, which is what you did, to be your employees and to take initiative in the company. But also, I wonder if it's becoming anti-Signo sometimes. I don't know if you've thought about this. I think it's more about the turn for me, especially when people are hiring ex-founders. is like if you're truly of the founder gene,

Starting point is 00:03:07 it's kind of hard to just stay somewhere. It's like an IC for too long. And then it's like, all right, I joined this thing and then in one year I'm back to being a founder. I'm curious for you. I'm sure you thought about leaving and like doing another company and say.

Starting point is 00:03:19 In fact, that was the alternative. I was considering even at the time that I got the phone call where they made me the offer to become CTO, I was thinking about leaving to go start a company. And, you know, I think what's interesting about it, we actually launched sort of like a new recruit an employee value proposition for Brex a couple months ago called Quidders Welcome, where we actually intentionally are leaning into this idea that we have a disproportionate

Starting point is 00:03:44 number of folks who go on to become founders or heads of a department when they leave our company, and we celebrate that. It's actually something that I'm very proud of. And it means that we welcome in people who want to get a different experience. I think that there's certainly a lot of founders who don't make it, don't scale their own businesses to the scale that we've achieved at Brex, so there's something to be learned when they come in. And then we're very happy to, like, support people on their way out. And so I actually really like hiring former founders or future founders.

Starting point is 00:04:15 The one value proposition I find that's most relevant, because a lot of the folks we're hiring as AI engineers are kind of folks that are either, like, winding down their companies or considering maybe running AI startup, the thing that resonates the most with them is that we oftentimes can give them problems to solve that are interesting, problems that maybe they even want to want to build their own startup around, but with instant distribution, right? Like that, that is the, that is the allure. It's like, you can come into this business and build, like, financial AI applications

Starting point is 00:04:43 and instantly have deployed to roughly 40,000 customers across, you know, the Fortune 100, down to, you know, tens of thousands of startups. So that, that's what is, I think, appealing. The founders, but the challenge then is making sure that we set them up for success in an environment that still feels a little bit like the startup that they might build themselves versus like something that's too corporate. Yeah, instead of doing your own company and then come into you and be like,

Starting point is 00:05:06 can I integrate into Brex? Yeah, get all the data. Yeah, exactly. How's the engineering team structure? Yeah, so we have about 300 people in engineering, like 350 total across EPD. And for the most part, we structure around our product domains.

Starting point is 00:05:24 And so this means that Brex is a corporate card. It's also a corporate bank account. expense management, travel, and accounting. And so we actually have sort of full stack product domains that are roughly like 30, 40 people for each of those that have everything from like the low level infrastructure up to the Web and Mobile experiences. That's generally like the structure of our engineering organization. And then we have naturally like a organization that focuses on infrastructure,

Starting point is 00:05:52 security, IT. And then there are two additional centers. of excellence that we've kind of built that kind of violate that org design where we've felt the need to put more focus or like operate slightly differently. And AI is one of those areas where we have another team of just roughly about 10 people who are focused primarily on LLM applications. And we wanted to create a bit of a separation there because the way that we were thinking about this and this is actually something we did this summer is we paused and asked ourselves on our AI journey towards like infusing our product with AI and

Starting point is 00:06:28 generating customer value, we asked ourselves, like, what would a company that was founded today to disrupt Brex look like? And then we tried to basically use the answer to that question to form this team internally. So it's a little bit off to the side. Ideally, everybody kind of comes up to speed and contributes, you know, LLM features, but we have this sort of off on the side right now in a centralized manner. What's the difference in AI adoption for those teams? So like, are the people on the LLM team, like much bigger cursor users, clock co-users, or like, do you see similar diffusion? It's actually fairly, fairly uniform across the entire engineering department. It's actually kind of funny, like, one of our largest cursor users is actually an engineering manager. So, like,

Starting point is 00:07:13 and I think that this also just, like, speaks to our core value of operated all levels, where we want all of our EMs and everybody in leadership to still basically do the job that they're managing, manage the work. So it actually is, I think the journey of getting everybody into using agentic coding was not sort of exclusive to like the AI group.

Starting point is 00:07:33 Yeah. In fact, I think this podcast was actually set up because I called outreach to Pedro because he tweeted this. I assume this is the center of X. He says, I started a new company inside Brex to build the future of agentic finance.

Starting point is 00:07:46 No BS just builders building 986 and pushing production grade agents to 30,000 finance teams, now 40,000. and then he actually has like a little job description which I think is really interesting I'll skip that and go straight to Brex Accelerated Grow 5X and Cut Burn 99% in the past 18 months I assume that's a mix of internal AI automation and other stuff but where basically I wanted to put some headline numbers up front to impress people yeah before we dig into the details yeah absolutely and you're correct that's the that's the team that we have this like AI team

Starting point is 00:08:17 you're actually what was that very young team yeah it's very young I mean it's and it's been really interesting. The composition of the team is very young, like AI native 20-year-olds who basically grew up with the tech, kind of paired off with more like staff level, software engineers that have been up for a while who can kind of navigate the existing code bases and

Starting point is 00:08:35 like understand the product and the customer deeply. Like we've formed these really a couple of tight tight knit pods in the AI org where it's like three people. Generally somebody who has like more of a product customer focus background that like staff engineer who knows where the skeletons are and then like a much younger

Starting point is 00:08:51 or like AI native engineer who can just do things with agents that like the rest of us, dinosaurs maybe don't, don't, can't either dream of or like, or where our, I think, I think part of it is like sometimes the too much experience or too much knowledge of how to solve a problem and actually be an impediment to thinking differently about it and thinking about it from like an AI first lens. But yes, we've been slowly growing that team just in the same way that like a pre-seed startup, you want to be very, very careful about talent density. and like very deliberate, like only hire when you absolutely need it. And so, yeah, at this point, it's just about 10 people.

Starting point is 00:09:27 And I think it was probably four or five people. I think everybody was actually in the photo that was attached to that tweet when Pedro put that out a couple months ago. Yeah, we'll put it up. It's a photo at 1.20 a.m. on a Friday. Yes. Oh, yeah, yeah. Because we always do, we always do like Friday demos.

Starting point is 00:09:42 And like that's a time for everybody to get like kind of an exact review time. And so... Everyone in Seattle? Those folks were all in Seattle. But they're actually geographically distributed. We have a couple folks here, a couple in Sao Paulo, a couple in Seattle. At Decibo, we have this like A Center of Excellence, which are basically the people running these teams across companies. Yep.

Starting point is 00:10:02 How do you make the other engineers not feel like you're not special? I think that's something that I hear a lot is like, hey, you know, why aren't these people working all the cool LM things? And like, I'm stuck working on, you know, the KYC integration with whatever. You know what I mean? It's like, how do you build that culture? You know, it's interesting. I thought that that would be more of a problem, but the benefit of having really optimized our engineering culture around business impact actually causes it to cut in the other direction. Some folks don't want to work on the AI products because it doesn't have as much clear direct like business impact right now. Doesn't impact revenues directly. And so I think folks, for the most part, we've enabled folks who have as strong as I work on AI products to join that team.

Starting point is 00:10:47 like somebody transferred out of our expense management organization to come over there because they're really passionate about taking like their knowledge of like policy evaluation and and bringing it into the AI team. But the most part, I think everybody understands like how their work ladders up. And there's some like friendly rivalry because like the folks who say we're a card product, they drive 60% of our direct revenue. And so they're pretty happy with that. And they don't feel like they're being left out. And I will also say, As you probably saw in this piece that we put out with first round, there is a lot of smaller applications of LLMs peppered throughout all of our product and operations teams.

Starting point is 00:11:27 It's just some of the more novel, like, agentic layer that sits on top of Brex that has been put together, like in this sort of isolated team. So it's not like folks aren't getting to build with LLMs or use LLM on a daily basis. Yeah, maybe run people through the Brex agent platform. We'll put the diagram in the video where you have at the LLM Gateway, you have like the OmbCP layer,

Starting point is 00:11:47 we just at David, the creator of MCP right before you. So this is very timely. Yeah. How did you start building that? What's the architecture? Yeah, the architecture, you know, I think simple is, is elegant. And we've had basically an LLN gateway and a basic hand-rolled platform from the very early days. In fact, right before being tapped to become CTO, I was leading like an AI labs team

Starting point is 00:12:10 internally in the wake of like the announcement of chat GPT, you know, everybody, saw this through technology, is that, hey, what are we going to do with it? And so one of the first things that we did, I think January 20, 23, that would have been, was try to put together some internal infrastructure that made it possible for us to deploy, deploy, manage version and evel prompts and then be able to manage, like, data egress and model routing and have some very basic, like, observability and cost monitoring in an LOM gateway. But that's infrastructure that we stood up, and it still continues to power a lot of those smaller, more, let's say, like, precise applications of LLM. So, like, for instance, we've,

Starting point is 00:12:51 we set up a completely automated pipeline for evaluating customer applications to get them onboarded instantly to Brex, which is something that used to require human intervention, either for underwriting or KYC. But now we basically have a series of agents and, particularly like research agents that will go and do the work that humans would normally do. And so that's running on top of this, this hand-rolled framework. And then for the agents on Brex that we announced in our fall release, which is like this agentic layer that we're building that sort of sits on top of Brex and can embody workflows that a finance team would normally hire humans for, we've actually started using Mastra for that as like the kind of primary framework for AcceleratingS.

Starting point is 00:13:38 We actually have built everything in TypeScript, which is another like technology choice that's answers the question of like what will we do if we started Brex today but isn't the case for all of our existing back-end code which is either Kotlin or Elixir. And then we have a mix of PG-Vector pine cone and like I think what we've seen is we're always we're always re-evaluating the tech and framework choices as we go because the half-life of code has declined so significantly with agented coding. It's actually quite easy for us and for anyone else to kind of try on for size, a variety of different pieces of tech to figure out what is going to be most economic for solving the problem.

Starting point is 00:14:17 The Woclico Mastra, that's a new choice, an interesting one. Yeah, I mean, I think that the main reason that we adopted Mastro is that it provided the ergonomics that we were actually, that the ergonomics of Mastra are quite similar to the internal LLM framework that we built two and a half years ago, whereas like Langecane was available at the time, two and a half, three years ago. It didn't quite feel right to us when we were trying to, it kind of addressed the things that weren't the pieces that we needed to address, which was like being able to have really simple observability and logging, tracing. Langchain didn't do it?

Starting point is 00:14:58 I mean, at that time, it didn't. I think it was really, I think it was. Or they fixed that. Yeah, no, they certainly did. But so we did, I'm trying to remember, because this is now ancient history, we evaluated link chain, turned off of it, built our own thing. And then as we were looking, we kind of want to deprecate this internal framework that we built

Starting point is 00:15:17 because at the end of the day, it's not leveraged for us to maintain that. And Mastra ended up fitting the bill for the feature set that we were looking for. And I think what's been interesting is about half of the applications that we're building right now on the agent layer are running on Mastra. And then the other half are actually still running on like yet another or internally developed framework, which is a framework that's focused more on networks of agents. So sort of multi-agent orchestration versus more like strict, like, you know, single turn or like workflows,

Starting point is 00:15:51 which are easier to use like either Landgraf or Mastra. Tell us about your multi-agent framework. I mean, what are the design considerations? Why is this the first we're hearing about it? Yeah, yeah. So it's funny. A big reason why we haven't written more about this is that it continues to evolve quite a bit. I feel like we actually had a blog post that we were going to put out in conjunction with the fall release,

Starting point is 00:16:14 talking about how we built this. And by the time that we finished, you know, the blog post and had all the package ready, it was already like halfway outdated. And so the way that this has started to emerge is this multi-agent network approach to implementation was when we were trying to scale up our sort of consumer-grade Brex assistant. So if you think about like Brex and our customers, there's really like two very broad personas that we serve. serve members of a finance team who are generally like going to be doing like in roles like accountant or controller or head of T&E.

Starting point is 00:16:46 For those folks, they are going to be interacting with agents that are much more specific to their roles. But then the other broad cohort of users we have are like employees of companies that have deployed BREX. So you go join a new company. That company uses BREX. You get your BREX card. And our goal for employees is for BREX to completely disappear.

Starting point is 00:17:07 Like the best UI, UX for BREX is just the card. like every single thing that you have to do in the software beyond just swiping the card is like an opportunity for AI to to eliminate some work for you. And so what we thought was the right approach to solving for that was to embody like an executive assistant for every employee. Because I as an executive at Brex, I have an EA. And she knows enough about me. She has access to my calendar, my email, has all the context on when I'm traveling and for what business purposes. And so she's basically able to do everything that I would be obligated to do in Brex, be it like booking travel or like doing expense documentation. And so what we wanted to do is we wanted to build like that

Starting point is 00:17:47 EA connected to the same data sources and see if we couldn't simulate that behavior so that you basically, your interface to Brex's SMS in the card. And when we started building that out, you know, the most naive like architecture for that would be to have an agent with a variety of tools and maybe maybe do some some rag to ensure that it has like appropriate context for the conversation. But what we were finding is that, um, the wide range of different product lines that exist on Brex made it difficult for one, uh, like agent to perform well, uh, being responsible from everything from like expense management to finding and booking travel to answering policy and procurement questions. And so that's when we started breaking down the problem, uh,

Starting point is 00:18:31 and into into a variety of subagents that sit behind and orchestrate. And obviously this is something that can be implemented using Langraph or Master even has the notion of these as like network switches and data. But what we found is that it was easier for us when it came to being able to build eVals for the system. We kind of just hit the eject button and built our own framework, which is one in which we have agents that are able to basically DM with other agents and have multi-turn conversations amongst themselves to coordinate to complete a test to or like to complete an objective. And what's, what's been nice about that is it means that, like, you can have your Brex assistant. There's, like, one single, one single, like, point of contact between you as an employee

Starting point is 00:19:18 and the Brex product. And then behind your assistant, if the company has, like, expense management turn on, you have that. If they have reimbursements, there's another agent for that. If they have travel attached to their own agent for that, they actually also then facilitates, like, our conception here is that, you know, it's like generally, like, software encapsulation. patterns, like sort of projected into the agent space, it also makes it easier for us to have,

Starting point is 00:19:39 like, the team that owns and understands travel, like, be the ones to go and iterate on that without needing to worry about, like, regressing the total system or needing, like, one team to own every single possible action you could take as an employee. And I'll say that, like, I'm still of the mindset that somebody will build a great framework, and we may have ultimately migrate to it, but, or it might be us that we ultimately able to source this, right? But, but, But for us, like, this is, this has worked out quite well in, like, lieu of, like, a couple other approaches that we tried along the way that just didn't perform well, which was to, you know, overload the agent with a variety of tools or contextual, like,

Starting point is 00:20:17 context switching where we try to say, oh, this conversation looks like it's more about reimbursement. So let's, like, update the prompt with more reimbursement context. Like, that was, that was another approach that we took that didn't perform as well as actually having a reimbursement agent that it would collaborate with. What about MCPs as, like, Salby? Oh, yeah, some other pattern. The key thing there is that we, there's actually a lot of value in having like multi-turn conversations from like the orchestrator or the assistant to like the sub-agent,

Starting point is 00:20:45 whereas like, you know, a tool call is basically just like one RPC. And so oftentimes what will happen is, you know, let's say, let's say the user reaches out to their REC assistant and says, hey, like, am I allowed, like, how much am I allowed to expense per person for dinner tonight? I'm taking my team out. and the, you know, your assistant's going to then reach out to the policy agent. Maybe the policy agent needs to know, in order to answer that question, maybe it needs to know whether this was like a customer event, a team event, or whether you're traveling.

Starting point is 00:21:17 And so it may actually send, instead of, and it can't just answer the question. So it's going to reply back to the assistant and say, hey, I need you to ask this clarifying question. And so then the assistant will return to the user as clarifying question. and they'll basically have this sort of multi-turn conversation across multiple agents versus it just being encapsulated in like a single call-in-response tool call. And so there are still like all the sub-agents have a ton of tools. But I think of like the MCP and tool usage as being like the interface to all of our conventional imperative system not at the AI space.

Starting point is 00:21:52 Yeah, that's the conversation we were having earlier, whether or not it should be an agent-to-agent to-agin call as well. Yeah. Or like, yeah, there should be like a chat back. Exactly, exactly. And that's the thing. It's like, okay, and one of the ways that we actually grafted this in a master before we built our own framework was to make every sub-agent a tool.

Starting point is 00:22:11 And then the input was just natural language. The output was natural language. And the, if you needed to have multi-turn, you would basically just put the full, like, our conversation. And as you kept calling the sub-agent as a tool. And it's just like at that point, you're like, okay, the ergonomics are kind of The framework is fighting me on this. It's actually helpful for us to basically conceive of it as an org chart.

Starting point is 00:22:34 And like it's the agent org chart with, you know, my EA is DMA and other specialists and having brief conversations to support me as their client. That was a really good deep dive. Thanks for indulging. I feel like you guys are not afraid to make your own tech, which I think is a competitive advantage. I really like that culture. Maybe we should go a bit breath first as well. well. Of course. I think we also deep dive a little bit too much in one area. There's,

Starting point is 00:23:02 and we'll put up the chart. But I'm also very interested in like the sort of internal agent stuff, the operational stuff, and just the general platform scope. So please feel free to just like go into your spiel on it. Yeah, of course. So one of the things that I was trying to do at the beginning of the year as CTO, you know, I think it really felt to me to articulate what our AI strategy was as a business. You know, every board of director was, you know, or every member of our board is, like, hey, what's your AI strategy? And while we were doing a lot of things, we literally go, he's got it. Well, yeah. And if I didn't, I'd be in trouble. I think he also was counting on me, given that I was doing the AI organization before CTO to have. But a big part of it was like, we were doing a lot with LLMs. It was more like these little one-off features and, you know, hey, like maybe mix in some suggestions here or maybe do a little bit of ops automation over here. But it wasn't, it wasn't, it wasn't.

Starting point is 00:23:57 wasn't easy to kind of create like a verbal framework of all of these investments. And without that framework, then we weren't able to like set a vision or a roadmap for for investments. So what we did at the beginning of the year is we took everything that was going on as well as all of our ambitions, all of the good ideas, as well as like the problems we were trying to tackle as a business this year, throw it all on the table and see if there were some ways to cluster it into a framework that made sense to the business, to our board, to ourselves. And we came up with, I think this is not particularly novel, but has helped us quite a bit. We have like three pillars for AI strategy. We have our corporate AI strategy, which is how are we going to adopt

Starting point is 00:24:38 and like buy AI tooling across the business and basically every single function to be able to 10x our workflows. And we have our operational AI strategy, which is how are we going to buy and build solutions that enable us to lower our cost of operations as a financial institution. It because I think it's fairly intuitive, like financial institutions like ours face a lot of regulatory expectations, and there's just like a high ops burden for running our business. And so it's sort of like a lot of kind of internal use cases, like being able to do like fraud detection, underwriting KYC, be able to handle dispute automation on car transactions, those types of operational investments are our ops AI pillar. And then the final pillar is the

Starting point is 00:25:22 product AI pillar, which is like, are we going to introduce new features that, enable Brex to be a part of the corporate AI pillar of our customers. It's like we want to build features and be a solution that somebody else is saying to their board, hey, we adopted Brex and this is part of our corporate AI strategy. And so it's kind of has this nice little feedback loop

Starting point is 00:25:43 and we basically within the company split, you know, did a little bit of divide and conquer where folks in IT and on our people team were more or less spending more of the effort driving on corporate AI. really like looking for making the procurement decisions, like creating a culture of experimentation, where we spotlight and incentivize people

Starting point is 00:26:04 for trying to sort of improve their personal workflows using AI. And then the pieces that I've been more involved in have been operational and product. And we were just talking about products here, which is like the agents on Brex and stuff. But I think that the operational AI investments have been some of the most sort of immediately impactful

Starting point is 00:26:21 to the business because we have hundreds of people who work in our operations organization. And it's actually something, that differentiates us because our CSAT and the quality of our support and service is very, very high. It's something we're very proud of. And so trying to figure out how can we automate a significant portion of this and use LLMs in a way that doesn't degrade the customer experience? And then also kind of addresses like, what is the future of the roles of the people who we already have working full time for us?

Starting point is 00:26:51 So this is where Camilla, our C-O-O, who kind of co-wrote the piece with first round with me, she's been lean in really aggressively to help every member of the operations organization start rethinking their role as being not people who kind of execute against an SOP, but are people who are going to, like, build prompts, build evals and, like, become more AI-native and, like, the way that they do work. And so a lot of the engineering we've done has been to enable folks, say, and, and fraud and risk to be able to refine prompts and add additional automation to their workflows. Yeah, and it's a secret fourth pillar, the platform.

Starting point is 00:27:32 Yeah, yeah, exactly. That is the thing that ties it all together, exactly, is the platform. And I think what's been really nice is that even though the platform is kind of a loose, loose term because it consists of a wide variety of technologies, as I said, like we haven't been too religious or dogmatic about everybody

Starting point is 00:27:50 needed to be on one particular thing. What we've seen is that by making a variety of sort of ergonomic options for building with LOMs available, it really has made it easier for us to make a quick leap forward on operational AI. As soon as we put our mind to it,

Starting point is 00:28:06 we said, look, no, we want to hit 80% automated acceptance rate for all, all startup and commercial businesses that apply for Brexit. Like we want a decision within 60 seconds it's fully touchless, no humans involved. We're able to break that down

Starting point is 00:28:20 and then actually build the agents, build the tools on top of that platform really quickly and a lot of those tools are the same tools that are product AI agents use as well. I was pretty sold on the conductor. I don't know if this is under exactly the bucket, the conductor one. Oh, yeah.

Starting point is 00:28:36 Provisioning command. I was like, yep, I want that. Yeah, that was actually, I'd love to talk about that. So that's actually on the corporate side. And I think that this goes back to maybe another intuitive, but I'd say like bold decision that we made, which is that we're not going to, we're not going to try to pick winners in the horse race between the foundational model providers or the agentic coding tools or like basically anywhere

Starting point is 00:28:58 where there's there's an active horse race. What we do instead of like trying to pick a single solution is we will procure like a small number of seats, like multiple solutions and then we'll give employees the ability to pick whatever one they want to use and so for instance like we allow employees to basically go to in slack and use conductor one to get a chat chpt a cloud or a jemini license and basically you can just like build your own stack where you pick your um you pick your like chat chat provider uh as a dev you can pick um you know between like cursor windsurf cloud code credits like and and you can basically craft your your stack to your preference and easily switch between them and what they

Starting point is 00:29:39 does for us too is when we're going to like obviously we have sort of enterprise agreements in place for all of them for the sake of like the you know the privacy and non-training guarantees but it's fun because when we go to renew these contracts um it it we can basically resist the need to like do a wall-to-wall deployment we can say hey look like usage trends they are our employees are voting with their feet they're voting with their dollars and you know maybe uh if your tool isn't is uh is how does it was a year ago does it give you a dashboard of what people are choosing yeah actually we'll look at that. We were looking at that as we're going into budgeting over next year. It's very interesting. I would love to see that those, what's, you know, anything that's like really up, anything that's really down. It's fascinating how, how different the landscape is every, every three, three months. And I think one of the, one of the interesting challenges we had early on was getting folks to just like try these tools, try to incorporate like a genetic coding. You know, like early on, I say like 12 to 18 months ago now, like get folks to just take the time to try a new workflow.

Starting point is 00:30:41 And now at this point, I think what we're seeing is like, even if, you know, a new model hits the same, like when Codex came out and everybody was like, oh, codex is better at CodeGem, it's a little bit slower. Like, I find fewer folks are like kicking the tires on new things because like they're just so comfortable with ergonomics of their current workflow that, you know, some folks are just like, I want to stick with Claude Code because I know it now. working with it for like nine months so I don't need to keep uh keep switching I don't need I don't feel the incessant need to keep trying new things because I've I've gotten I'm an iPhone

Starting point is 00:31:17 person and I'm just like going to stay with an iPhone even you know even though there's some really sexy Android hardware out there do you have one of the big numbers like 80% of all of our code is written by AI or but how do you measure it internally yeah no not really we we I mean I what we do is we'll measure like the attributions on the the number of commits that have the like co-authored with. And we pull some of those stats, but I don't index have, like, in fact,

Starting point is 00:31:43 I don't index on those at all. And honestly, like, I, I don't know how I, like, honestly, like, honestly calculate that number. Yeah, I agree. Yeah. And so, so,

Starting point is 00:31:53 so, the thing that, the thing that we're really just, you know, we're up the point now with the, like, our AI, agentic coding journey, where now we're trying to solve

Starting point is 00:32:03 the second order effects of like a little bit too much slop, maybe a little, not enough yeah exactly not enough like rigor and code reviews we're trying to

Starting point is 00:32:13 the adoption is there and now we have to figure out like how to mature in our usage of these tools so that we quality or like long-term maintainability doesn't suffer as well as like

Starting point is 00:32:25 maybe one of the other facets of being able to generate a lot more code more quickly is like the the drift between team members as far as like understanding of the code that's in their services

Starting point is 00:32:37 increases is like everybody's moving faster and more independently. That is another sort of risk that we're starting to see. Like, you know, an incident response where folks don't know, they don't know a service as well as they used to because it's changed so much in the past couple months because everybody's moving more quickly. Yeah, this has been a major topic for me this year on code-based understanding and slop because obviously it's so much easier to generate code, but then now we have to review it. And to some extent, you can't really fight AI with more AI.

Starting point is 00:33:06 you can't just be like, or just throw an AI reviewer on the AI code and you solved it. And so you do need to just scale human attention. And I think that's something I've been pushing a little bit in terms of like, well, you're just going to, like every engineer is just going to own more code. Yep. Period. And be parachuted in and be expected to ramp up and be productive and also fix bugs.

Starting point is 00:33:28 And if you're on, you know, page of duty or whatever to just because, I mean, everyone's going to try to be more efficient and you're supposed to see ROI productivity. Because if he don'ts, then what's the whole point of this? Exactly. Exactly. And I think it's funny, you're going back to the point of, you know, you could add AI on top to solve the problems that the AI introduces. And you just keep, that's like an endless chain. And so.

Starting point is 00:33:51 But I mean, the, the, the code rabbits of the world, the graphites of the world would say, yes, actually, you can. And so that's the little bit of the tension there. Yeah. You know, I've been thinking a lot about how the craft of engineering is evolving. and I will say that I feel further away from being able to predict what it looks like than I did this past summer when I spent a bunch of time. I actually basically went on leave for a month and joined the team that the AI team that we were building just to go and build alongside them.

Starting point is 00:34:23 I felt like it was really important for me to deeply understand the problems in the tech. And so that was me. I was writing, pushing code effectively 996. And I went through so many different moments of realization of like, oh, my God, this is going to change everything to, oh, my God, this is just amplifying all the good and the bad in the industry to, oh, my God, engineers are not going to have a job anymore. And so I don't have any, like, I felt like I had all the predictions back then. And at this point now, I'm just very interested to watch the phenomenon continue to unfold in front of us. And I will say, I was chatting with a bunch of really bright. know, college juniors and seniors at a dinner we hosted last night. And all these folks are about

Starting point is 00:35:09 to enter the industry, basically having kind of come up in the era of agentic development and LOMs and I asked them, like, what is your workflow when you're like building a project? How do you use agents versus like when you decide you're going to actually just write code by hand? And I was surprised to hear the consensus was that most people there were using agents to collaborate on like building a design document and like collaborating on the architecture of the solution that they want to build and then they'd be asking it to like emit

Starting point is 00:35:41 a doc or an implementation plan but then they'll go and write a lot of the code themselves still so it's a little bit more of the the rubber duck co-architect use case that was most prevalent in that group I was very surprised by that I'm impressed the kids are all right yeah I know they still want to they still want to actually

Starting point is 00:35:59 write the code themselves that's interesting Yeah, what we hear from like the Gen Zs that open the end, they just yolo everything into code as. Yeah, I would say most of the code I generate is like, yeah, but I spend a lot of time on the doc. It's curious, like, when you're like younger in your career, it's like you don't really have all the mental models of the different patterns to instruct. I feel like there's like overreliance, especially if you're doing the design doc, you know. I feel like most of the senior engineers will spend more time on that. It's like, even things like, you know, what columns should you index? depending on, you know, what queries we usually run on this table and things like that.

Starting point is 00:36:35 It's hard for any AI to know that, you know? And it's like, I feel like the role of like the more senior engineer should actually be more of this. It's like spending time teaching the AI and then the AI can teach the junior people in a way. Yeah, yeah. And it, everything, everything looks like mentorship and management at the end of the day, right? It's like you're breaking down tasks, you're supervising work, you're giving feedback. like it's basically management. Except that there's agents are really bad at memory still.

Starting point is 00:37:05 They basically have zero memory. And it's, it's, it's, it's, it's, it's, it's, it's, it's, it's 2021, 25. What's going on? Yeah. Yeah, what's your internal stack for like, uh, preferences? There's like, kind of like, you know, explicit preference you can use with, uh, you know, agents at MD and all that stuff. Uh, there's implicit preference with lentil rules and things like that in a way,

Starting point is 00:37:25 where it's like, it just happens. You don't have to tell it. How do you structure that? Oh, and are you talking about for agentic coding or memory or a thin or lit-ed platform? Yeah, yeah, for like the coding specifically. It's like, and then we can kind of talk about, you know, the whole Brex platform. Yeah, just, just, nothing, nothing special, just a lot of explicit rules. That MD files.

Starting point is 00:37:43 Yeah, and then we have, uh, and we, um, in Linting, we still have like traditional linters in place for the couple of different language full chains. And then we're, we're big fans of creptile and we use them for basically all of sort of the, um, smarter than linting, uh, like, Agented Code Review. That's been the one solution that we've aligned around that has served us extremely well. Yeah. Joe Gertile. Yeah, I know. We're huge fans. They've built something really

Starting point is 00:38:08 impressive. And I think the thing that constantly blows my mind about it is the way that they're able to just have a really impressive signal to noise ratio. Like the comments that it leaves are very very high signal. I never regret going through all like

Starting point is 00:38:26 65 comments that leaves on my on my diffs because it catches so many things. Yeah. I found the codex review to be really good. I don't use codex for code generation, but the review product is like very good for some reason. I used to have, when I was working in Rails, there was like this project called Danger Systems.

Starting point is 00:38:43 Oh, yeah. It was kind of like a semantic linter. Exactly. I feel like there should be more of that now. It's kind of like the rules are one thing, a generation, but I want something in my CI that is like, enforce these rules and call out where they're broken.

Starting point is 00:38:54 And then I can just copy paste that in an agent. but yeah when we when we started building this this new agent um code as like as we was saying like we were answering the question what would you do if you built uh you know a brex disruptor today and it's like it wouldn't be to pick cotlin and elixir as the back end and uh and so we actually went with the full like type script stack and we we were building on all like public interfaces and um really trying to make sure that this agent layer was uh like arm's length from from the the good and the bad of the core of our product. And one thing, I think what we did early on,

Starting point is 00:39:31 and I don't actually know if this is true because, again, the team keeps sort of iterating. But we, we're having good luck using cloud code, like in a GitHub action to basically go and do more of that dangerous style like code review. So have a prompt for it that went through all of the different facets that were more conceptual versus, like, rigidly enforceable by a linter

Starting point is 00:39:52 and have it leave a big comment at the end with, your conformance to the idiomatic coding patterns of the new repo. I wanted to spend some time. You said you wanted to devive on operational agents. Customers afford, onboarding, KOC, fraud, delinquent account disputes. This is, I imagine, the bulk of it. Yes. Of the work. Anywhere where there's a good story about maybe when you started out, it was going to be this way,

Starting point is 00:40:17 and then you discovered through building or through customer contacts that it had to go a different direction. And so that difference in beliefs is something that is something that people can learn from. The thing that immediately comes to mind is that we, uh, we believed at the beginning that using RL for credit decisions would actually be a like would be the way that we would end up, or like credit and underwriting, like how much of a, of a limit should be give to this business, um, that reinforcement learning would be the way that we would go about, um, building a model that effectively would decision in the way that, um, a human underwriter would And it turns out that it was, he made this big investment.

Starting point is 00:40:58 We were working with some outside like a company that specializes in this. And the performance we ended up getting was inferior to just building a like a web research agent. And so I think what we took away, what has been most evident in operational AI is that in operations, you need to be able to break down problems really granularly and be able to form SOPs. that humans can repeatedly follow and thus can be audited because so much of the responsibilities and operations is to have audible, repeatable processes that help to ensure that we're operating in a compliant manner.

Starting point is 00:41:39 And that actually translates just so cleanly to LLMs that we haven't needed to use too many sophisticated techniques in operational AI. It's been relatively simple, like, you tool, like agents, or maybe, even a lot of problems can be solved, which is like a single turn, chat completion. And so the fact that we didn't, well, we did one, one sort of attempt to over-engineer and use more sophisticated techniques.

Starting point is 00:42:07 And we discovered that, in fact, the solutions are a bit more, more plain and less technically sophisticated. The challenge is really articulating and refining prompts to reflect, reflect the execution, the SOP, and, like, reflect all the sort of institutional knowledge that isn't written down so that agents can properly replace, like, the humans or the contractors we would have making these decisions. How do you decide what is worth, like, spending a lot of time building versus what you think? Some of these models are just, because some of these tasks are so generic. They're not really about Brex.

Starting point is 00:42:39 Yep. Like, you can assume the models will be good at it versus some of them are, like, very specific to you. We kind of prioritize, like, the tasks that are most common for the broadest number of customers. and some of them are fairly, fairly intuitive, like being able to research a customer to look to assess legitimacy of the business and whether that business would fit our ideal customer profile for onboarding,

Starting point is 00:43:08 because there's certain types of businesses that we either legally cannot serve or we are not comfortable being able to serve. So that's the type of really kind of basic research and like a relatively straightforward problem that isn't hyper-brecht-specific. The things that are a little bit more specific to us or companies in our sector

Starting point is 00:43:29 would be preparing documentation for a network card dispute. Like if you go and dispute a transaction on your personal card, you will provide evidence to your card issuer, the card issuer then has to put together like a three or four-page word document that goes to the card network

Starting point is 00:43:47 and then eventually goes to the acquiring bank and all of that is like much more specific to our business. It's a huge operational overhead for us. And that's something that we decided to automate later because it's not as, it's not on the critical path of like serving the vast number of our customers. Like disputes are expensive, but not very common operational process.

Starting point is 00:44:09 And so they're lower on the stack. And I think we're getting there right now. But this year has basically been us just kind of like looking at every single process, just kind of stack ranking. And I will say like, The thing that got us started down this path was we wanted to expand our ideal customer profile to support more business, like a wider variety of commercial businesses, which tend to be businesses that aren't growing as quickly. So they're not like tech startups, which have a lot of growth and they're not usually like, they're not enterprises, which also tend to have a lot of growth. It's more like a lawyers, a law firm or a dentist office, these types of like solid businesses that we should be able to serve an underwrite.

Starting point is 00:44:48 but the cost to onboard them and the cost to serve if you have all all the humans in the loop make them ROI negative. And so that was the first sort of use case of AI within our ops organization that then led to us really understanding we could automate much more than that. Is this berks going back into SMBs? Ah, that's a good question. Yeah, yeah. So never let let that die. You know, we think the way we've thought about this is we want to, to always like offer our product to customers where we believe we have a like an offering that is

Starting point is 00:45:24 well suited to the needs of those businesses. And I would say that still for very small, uh, businesses are offering isn't, it's not built for that. It's built for, it's built for companies that have some degree of scale, typically have at least sort of one person, if not a couple people in the, their finance team. So we consider these to be more like the, the commercial segment. And so it, It rhymes with SB, but our approach back then was a little bit more naive. And I would say we also, we were just going for a volume, like a volume game there. Our internal controls were not as strong.

Starting point is 00:46:03 We didn't have as much experience like underwriting those businesses. And so it was really ended up being a huge burden for the business, almost existential, for us to have those tens of thousands of customers that all were, ROI negative. So we're trying to basically scale to serve more businesses outside of tech and outside of like the upmarket segment, but, but do it thoughtfully. So I think right now our minimum threshold is like a million dollars a year in annual revenue or like $10,000 or more per month in card transactions as kind of being like the low end of our ICP, which is obviously not what you would think when you think of a small business. Like small businesses tend to still be

Starting point is 00:46:46 smaller than that. Oh, wow, that's really small. Okay. Yeah. Yeah. Mid-market. Yeah, exactly. And it's funny, it's just like the names of these segments, you know, it's like what we consider. I don't know. Yeah, no, I think like that's, it's like, yeah, it's like lower mid-market. And it's funny, though, because when what we call enterprise may be another, you know, what sales, what we call enterprises business that salesperson might call a mid-market, right? Like, because it's just it depends on the scale of yourself as a business when you use these terms. And all of these things are built in the Braxage. platform, like all these automations that people build? Yes, exactly.

Starting point is 00:47:19 Yeah, and in fact, most of the operational AI is running on that original platform that we have. And we built it, one element of it that I didn't mention is that it also, most of the UI, UX for this platform is built in a retool. And so, like, you can basically go into retool, and there's like a prompt manager, a tool manager, an e-eval manager. And that's sort of where much of this was built. And the goal with that was, again, to make it more accessible. more ergonomic to get started, but what a secondary effect of having a more like visual

Starting point is 00:47:52 set of tools for this is it's enabled members of the apps organization to go and do prompt refinement themselves. So you don't need engineers to go and refine the prompts or, or even like tests new foundational models when they come out. I think that that's another fun thing when like a new, when a new model drops, folks will go into the platform and basically run the evils on the new, on the new model and kind of see, can we get better performance here? Does this have different latency or different cost characteristics?

Starting point is 00:48:22 Yeah, you want the domain experts or the people directly using the tool, not the engineers who are somewhat removed from the tool. Yeah, I do want to highlight to listeners that a lot of the BRIC agent platform are just things that every company should have, basically. Problem management system,

Starting point is 00:48:37 which we talked about where the domain experts are doing it, multi-model testing, evaluation and benchmarking frameworks, API integrations for automated workflows, NCP-based architecture straight with Brexit's external AI products. This one is obviously very Brexit-specific. One thing I did want to highlight that I was semi-impressed by it because nobody,

Starting point is 00:48:55 people very few rarely talk about this, is knowledge-based for understanding Brexit's business. Yeah. So do you want to expand on that? Yeah, and this is an area where we've only scratched the surface here, but a big challenge that we face is that the world knowledge or the knowledge that's built into the model about about, you know, what GPP5 thinks Brex does and how it thinks our business operates

Starting point is 00:49:20 is actually quite different from what our business offers today or how our product works. And so we've had to work on building a corpus of sort of product documentation, process documentation, and, like, curate this set of information to basically ground a variety of our LLM applications, including, like, that Brex's, which is like the assistant that employees will talk to is like we don't want it to

Starting point is 00:49:46 hallucinate features that we don't have or like give wrong information there and similarly like some of the operational agents need to be grounded on like what our ICP is because if you ask chat GPT5 right now like what types of businesses is brex on board or like what types of businesses does brex serve it might not give an accurate expectations. to that question. It might say, hey, we're a corporate car for startups, which is what we did, you know, seven years ago. They might say we're only, we only serve enterprises. And so that has been an interesting challenge. I think we're, what we've been trying to do there is I'm actually going to be spending time with folks talking about this next week internally about like, can we refresh our strategy and kind of unify it because we have a lot of product documentation that's internal for like our operations and go to market teams. We have a bunch of product documentation that's external for our customers. We have a lot of go-to-market sort of enablement material that's more sales pitchy. And we have documentation that is put into Sierra, which is the chat assistant that we use for frontline support.

Starting point is 00:50:57 Like all of this ideally could draw from the same source, but right now it's a little bit fragmented. It's just something that we're trying to invest in though because I think at the end of the day, the duplication of efforts is just like is wasteful. it's absolutely necessary to get this right. Just to de-duplicate Sierra, meaning the Brett Taylor startup. Yes, exactly. I would expect that. You built so many other agents.

Starting point is 00:51:21 That's one you can build yourself. That's like solving problems that are not differentiated enough for us. I think what's interesting about the Sierra that has been really helpful is that, again, it's really easy for, like the UI and UX of basically administering a Sierra agent is something that's really accessible for the ops and CX strategy team, which are like, it's much more low code and more sort of workflow and DAG-oriented. And we have engineers kind of going and giving it tools to take actions. But for the most part, like it's nice to not have to build the UX for somebody to manage something like that.

Starting point is 00:51:58 And I think the fact that Sierra speaks the language of customers. Yeah, exactly. Speaks the language of CX. They can do all the reporting and the telemetry and stuff that are, you know, VP of CX. would like to see. You know, it's just one fewer thing that we have to build.

Starting point is 00:52:12 What about e-vales? How do you build e-vals? Who manages them? Well, it depends on, it depends on the application. So on the, on the operational AI side, those evils are basically

Starting point is 00:52:24 baked into the, in the platform around every, every prompt or every agent. And for the most part, I think most of these use cases kind of come online, like the V1 of like our commercial underwriting agent,

Starting point is 00:52:37 or the V1 of, our startup KYC agent are co-developed between like a subject matter expert in ops and like an engineer and they're going to kind of co-develop an initial eval set. But then from there generally in ops you're always doing QA, be it like on humans or on on the LLM decisions. And so whenever like as part of our QA feedback loop, whenever there is a mistake that's usually almost always going to result in like another Eval being written as like a regression test. So all of that within ops AI is pretty straightforwardly managed. On the product AI side, that's where it starts getting a little bit more challenging because the multi-agent network is quite challenging to evaluate.

Starting point is 00:53:22 And so what we do there is we try to adopt some of the state of the art for multi-turn evils where we will basically have an agent embody the user and have basically the end user agent is given an objective. then we basically have it run a multi-turn conversation and then use a misjudge up the end to all of the different asset assessment. The one other thing that we do technique-wise that is interesting is sometimes you don't want to do, like, you know, I think these multi-turn e-vals are kind of like integration tests. They sometimes test more than what you want to to assess. And so sometimes what we'll do is we'll also pre-can like an initial preamble to a conversation or maybe a couple turns will be handwritten

Starting point is 00:54:09 and we'll basically set the evel to start and we'll see if we're able to isolate certain certain behaviors. So it's still like a work in progress. And I say like at the end of the day, a lot of the just periodic group of human review and like looking at cases where we've detected as we go to like summarize,

Starting point is 00:54:33 like what we'll do is we'll reflect on a conversation after a certain amount of time his past where we'll summarize it, like extract assets, like did it seem like the user accomplished their objective? And it will just manually when a lot of the cases

Starting point is 00:54:47 when that's failed and side write an aval for it. Are all the evels supposed to pass? Or do you have a set of evals that are like, someday the model will be good enough? And like, how does that change over time? Yeah, it's interesting.

Starting point is 00:54:59 I don't know if we have any that are like, oh, someday I hope it'll be good enough to do this, but it's like there are the evils that are blocking because they would indicate like a regression, an unacceptable regression. So these tend to be just accuracy-related evils, but then there are others that are more about like tone and coherency and these types of things where they're more subjective

Starting point is 00:55:22 and we were just looking at those over time as a metric. But the team is actually interesting. I think we're going to get a big update on like how the team is thinking about evils tomorrow and like our Friday. our Friday review. So this is an area where I'd say the largest challenge, like the largest change we needed to make and how we're executing sort of as like a lab or an incubator back earlier this year to like where we are now where we've shipped and like we're trying to to increase the rigor has been around like avoiding regressions and having more and more

Starting point is 00:55:58 increasingly robust evils. Yeah, I work with a company called BRCIIs that does user simulations. And I think that's what's been interesting. Some of these things they just don't expect, like the customer does not expect the model to do, but they want to track the saturation of the model in a way, if that makes sense. And I feel like most companies know what they don't want to happen. But it's almost like they cannot quite articulate,

Starting point is 00:56:22 oh, I want in the future the model to be able to do this. They can do it today, but I'll keep running this e-val. That's actually really, really interesting to me. And I'm going to take that away and start thinking about this because there are, there are going to be certain, I mean, we already seen this where users will ask the assistant for help with things that we don't support yet, or we haven't implemented yet. It's like, those are opportunities actually for us to build a, like, effectively write a test that's going to be failing for weeks or months and eventually will go green, but is a way for us to actually kind of show like the progression of sophistication of the assistant. I really, I really like that as an idea. Yeah, I wonder how you also catch hallucinations and things.

Starting point is 00:57:03 that it doesn't have that's usually the problem is it you know it'll it'll it'll pretend like it can assist with something and it'll uh like one thing that is really annoying that has been tough to um to prevent is that the assistant because it is used to speaking to other agents um that can support it and like accomplishing various tasks if you ask it to to help with a task that it thinks it probably should have an agent to uh to work with it'll just hallucinate that it always like oh yes, I will like, you know, I'll reach out to the finance team on your behalf to pass this question along, but it's not doing anything. There's like no finance team.

Starting point is 00:57:39 There's no way for it to do that. This is something that comes up a lot. It's like, would you like me to ask the finance team? And there's no, there's no actual tool for that. Do you put guard reels for that? Yeah, yeah. That was something that we had to. Like your rejects?

Starting point is 00:57:50 Oh, no, we don't. I think we've been able to just beat that out of its system with a system prompt. But the, we don't have as many guard rails in place right now, just around a couple of potential things that could get us into trouble. Yeah, really, it's reasonable. Yeah, it's surprising when I, I guess, two years ago

Starting point is 00:58:11 was first kicking around the idea of all these things. I would have said that probably guardrails would be more prevalent, especially in finance use cases. But surprisingly, they're not. Yeah, and that was actually part of what we, that was like a feature,

Starting point is 00:58:24 I believe we built in the LLM gateway early on is like the sort of last chance, like, like hard-goated. Yeah, exactly. Here's some red jacks and double this tilt. Yeah, exactly. Or just, you know, in the way that like if you go away a field on chat GPT,

Starting point is 00:58:38 you just get like the in-line 500 error. It doesn't even tell you that it can't tell, but just like craps out. Like we kind of built a couple of those circuit breakers or like the ability to put those circuit breakers in and I don't believe if we're using them for anything. One last thing I want to get your thoughts on was AI fluency levels, which you guys have a framework of user advocate builder native and everyone goes through it including Camilla. And I just think it's interesting.

Starting point is 00:59:03 I think it's a model that other people are thinking about adopting, but they're worried about rolling it out. That everybody's going to be bad. And also, like, how do you have, like, this in-house training course that you keep up to date? Just tell us more about it. Yeah. So in the Operationsorg,

Starting point is 00:59:22 they're actually more ahead of even engineering on this front as far as, like, trying to create, create like learning pathways for this. And I think that part of the reason why they're ahead of us is that in operations, they have to be a lot of training at scale. Like training is a very big part of how people build the aptitude around their job function within ops, whereas like in EPD, a lot of it is sort of getting hands on, building experience, like going a lot and getting mentor, getting code review.

Starting point is 00:59:54 But it's been really neat because I think we've really like, we created an environment we managed to by speaking openly about the transformation that we saw would happen in this industry towards AI sort of displacing a lot of a lot of the operations in CX roles and we were just honest about it and I think what in the same breath that we said hey a lot of these job responsibilities will go away we also said we don't anticipate that meaning that your job has to go away it's just that your job has to change. And so the fluency framework and then the training and support and like the positive sort of culture where we celebrate people making progress has been really helpful for like

Starting point is 01:00:40 avoiding a culture of fear or like, oh, you have to do this or you have, you're going to get, it's going to go in your performance evaluation. I think the, it does. Well, it's not like his rote is like, oh, like what is the, like how much are you using AI and is it enough? It's more, I think we've built a pretty like positively framed culture where we'll do like spot bonuses for for people who have like particularly novel uses of AI on in their day to day. In our company all hands every two weeks, we'll do an AI spotlight. And it's very rarely somebody in EPD for the most part is folks in ATMs, ops, finance, the people organization showing off like other building agents, you know, in chat TPT or on Glean or how they're, they like just found some new use case that they thought was helpful. So we're trying to create, I think at the end of the day, like, we've hired a bunch of really smart people who, like, I have full confidence that this type of work is within the reach of anybody who's motivated to, like, sort of challenge themselves.

Starting point is 01:01:38 And so we've done that. And in engineering, there's one other thing that I want to call out, because I think that this is kind of fun, is that we adapted our interview loop to be more AI, sort of agentic coding native. So instead of we had like a coding and a system design question that we basically have revamped into a project where we'll give you like a brief before you come on site and then like an additional sort of spec when you do when you start. You know, we expect you to use agented coding to complete the task. In fact, it's like kind of impossible to get all the way through it if you don't. And so we're evaluating, you know, your knowledge like we're kind of watching how you work. We're evaluating whether you understand the codes that's coming out. We, you know, we're kind of probing at you as you go.

Starting point is 01:02:20 But what we did in order to kind of, maybe bootstrap the process of all of our existing engineers, like getting familiar with the gentic coding is that we, as soon as we had the interview ready to ship, we started, we said everybody in engineering, including all the managers, are going to have to go through this interview. And so we re-interviewed everybody internally. And it's like, it's one of those things where it's like,

Starting point is 01:02:40 it's not a, we didn't like keep a score or like, or like, you know, I don't have any data on like who passed or failed or what they, what they scored. but what we found is like as people would take it it would actually cause them to have moments of realization where it was like oh I I can up level my skills or sort of like I have like I want to be better at this and so we're trying to find like a way like a variety of techniques that kind of push the culture along and I think as I reflect on like the year because this is the year where we really put all the effort into it I'm really satisfied to see a descent to which everybody's leaning in on a on a daily Guys is going back to like even I was shocked when we were looking at our cursor logs that like the number one user is is an engineering manager and for infra org. It's like that that is super cool to me. It means that like folks have have taken this to heart and found found ways of doing their job differently. I guess I had a closing question or I guess a parting question.

Starting point is 01:03:37 And this is broadening out from Brex. Yeah. And this is just you interface with other engineering leaders all the time. Did we not cover anything that other CTOs are. having as top of mind today, like, their number one problem is underscore. The thing I find myself discussing with folks that, and I don't want to shy away from, like, scary topics. In fact, we're just kind of on one that was adjacent, which is, like, how do you evaluate somebody's, like, progression towards being more AI-Native? The cousin to that question is it's like, will we need as many people to operate our businesses? Like, are their layoffs coming? How are we thinking about, like,

Starting point is 01:04:17 headcount growth. Junior versus senior. Junior versus senior, yes, exactly, like level mix. And I still have more questions than I have answers there. I think what has been really interesting is that I view agentic development as being something that amplifies all the good, just as much as it amplifies all the bad. And the amplifies sloppiness, poor architectural thinking, misunderstanding of the requirements. Like there are, for all of the acceleration of good outcomes, it also accelerates bad outcomes. And like what has been interesting is that there has been, when you sum that all together, there's less of a obvious, like capacity increase.

Starting point is 01:05:04 It's more nuanced than that. And so I'm not looking at headcount planning as we think about it next year as being something You know, like, oh, well, because AI is giving us so much more leverage, we don't need as many people. We've actually, the thing I'm really proud of in my tenure as CTO is that we haven't grown engineering at all. What we've done is we've grown the business significantly, but we've been able to build, like, greater efficiencies in how we execute, like how we think about building, how we roadmap, what we choose to do and what not to do, that we're able to serve significantly more customers with more lines of business. without needing to grow engineering. I think that's kind of the way that we're going to just continue on this road.

Starting point is 01:05:47 It's like, I like having 300 engineers. Like, I would love to just, you know, a year from now have 300 engineers, but we're still, you know, 30, 50, 100% more efficient. That is the thing that comes up with other engineering leaders and the other part of that conversation is like how much is AI getting blamed for this sort of ordinary performance-oriented rim?

Starting point is 01:06:08 You know, like if Microsoft is letting go of like four, thousand people as a business what they have 150,000 employees, I believe. Is that really like AI causing that or is it them just using it as a way to avoid some harder like perf management decisions? I'm not entirely sure, but I'm listening more than I'm speaking on this topic because every time I feel like I have a pretty from point of view, some new anecdote or experience comes in that kind of challenges or invalidates it. Yeah. Well, you know, I take these signals as It's my job to go find people who think they have answers and surface them. And you may or may not disagree, but at least you have something to use as a strongman in your work.

Starting point is 01:06:51 Exactly, exactly. And I think as an industry, it's just early innings on this transformation. So I'm looking forward to seeing, you know, listening to this podcast episode a year from now and seeing, you know, what we got right, what we got wrong and what's different because so much changes quarter over quarter. I do think AICOE is a very well-established pattern. I think internal platform is very well-established pattern. And this fluency thing is something that people are figuring out that I think you guys are a hit on.

Starting point is 01:07:22 I'm happy to hear it. I'll be my feedback. Yeah. Any final call to action for things that you want to buy? Like, what should people build for you? Like problems who are trying to solve that you would love people to reach out for it to help him. The call that I'd make is for folks who are interested in multi-agent networks, to get in touch with us because I do feel like this is something where we're innovating

Starting point is 01:07:44 in service of our customers and where I feel like the frameworks, the tooling, and the research is there. There's actually quite a lot of like interesting papers and things that we lean on, but I would love to, I would love to see more of that like encoded in the, in the, what's available at large in the industry, because I feel like my intuition has been that trying to craft LLMs into deterministic workflows and DAGs is kind of underselling like the power that they have to actually plan and execute in a more sophisticated like fluid way. And I, and I just want to see like the industry lean in more on, on these agent to agent interactions. Okay. So I'll dive in a little bit here. Yeah. I have a minor opinion. You keep using the word networks.

Starting point is 01:08:36 Yeah. Is that a reference to a specific paper or it's your term for it. It's just our term. And I think that that is, that's actually the term that master uses as well. Um, or it, it, we, um, yeah, initially we used to call them agent run times, uh, internally and that we just, yeah, switched to networks. Uh, and then I think the other thing I wanted to get a clarification on is, is it mostly a full agent talking with a full agent? Or is there kind of like an orchestrated boss agent talking to a sub agent? And I think that does matter for a subset of people who are building all these things. because when you say multi-agents,

Starting point is 01:09:10 sometimes people don't agree what that means. Yeah, so it's a tree more than it is a graph. So it is like, yeah. When you say network, it feels more of a graph. Yeah. But it seems more directional as a tree. Like there is a hierarchy. There's a hierarchy, yeah.

Starting point is 01:09:25 But there are some violations of that. Like one of the interesting use cases, and this is where like the power of having an assistant for every employee plus having agents that run. and embody members of the finance team is really powerful because there's this interesting use case that we brought to market, which is that one of the finance team agents that we launched is an audit agent where an audit agent kind of embodies the work that a lot of larger finance teams will do to look for patterns of waste, fraud, or abuse or like systematic avoidance

Starting point is 01:10:04 of policy that isn't as obvious with a single expense. You can evaluate a single expense in the metadata around it to see if it's within policy or not. But what if you start seeing an employee often make a large number of like $74 transactions when receipts are required a $75? Or what if you see certain things like, oh, okay, there's actually a fair number of like DoorDash expenses during business hours from this individual like on days that in office lunch is provided. Or maybe you see like ride share patterns that are where you have to look at a more. broader context. So we built this audit agent that can like ingest your SOP and look also ingest your. This is a box.

Starting point is 01:10:44 This is customers. Exactly. Yep. And what it does then is it's basically always looking for potential violations. And what it does is it is extremely zealous. Like it wants to have a minimum number of false negatives. So it will raise a large number of potential violations. And then a separate agent, a review agent will then apply wisdom, the wisdom of like,

Starting point is 01:11:07 Is this important enough to follow up on? Is the dollar amount in question high enough? Does this user seem to have like a high compliance behavior more generally? It makes a judgment call about whether it's worthy enough to take that violation and make it into a case. And once it's made into a case, generally what happens is that you need to get more information from the individual. So if humans were doing this, there'd be some outsourced team that's like looking for all the potential violations, then you have some full-time employee on the finance team who's, who's, looking at all the violations. Oh, these are the ones that are important.

Starting point is 01:11:39 We need to follow up on it. Now, what they do is they hand it off to somebody who will go and slack that employee and be like, hey, what's going on here? And so what we have is, like, the audit agent looks for violations. The review agent decides whether it's worthy enough to turn into a case. And then from there, when the case is filed, that will trigger an event to the Brexit assistant for that employee. And, like, any additional information about, like, the business justification can be

Starting point is 01:12:04 collected or maybe the assistant already knows because it in its conversation history of the employee knew something about why this this expense looked out of policy. And so you start having the network becomes interesting when you have the finance team agents communicating with the assistant for various employees. And then behind there, you have other other subagents. And so then you start seeing like more of a graph emerge. But when you look at just what serves the employee, it looks more like a tree. Amazing. Well, I didn't know you were going to go into that level of detail. Yeah, yeah, I'm very about that. No, no, no, no. I'm actually really glad I asked.

Starting point is 01:12:39 Like, that is very impressive and I hope you do more content about that. Yeah, absolutely. We're really excited about it. I think it's been, it's been good to finally figure out a use for, for agents and have the technology as, like, is the robust as it is to start realizing this vision, because it's something that we kind of dreamt of a couple years ago. on the tech, like to your earlier point, the tech just wasn't there when we were trying to make the similar concept to work with the GPT 3.5.

Starting point is 01:13:05 I was like, no, we were hallucinating tool calls back in that day. Awesome, man. Thanks so much for joining us. Yeah, I really enjoyed it. Happy holidays, guys. Thank you for having me. Thank you.

Latent Space: The AI Engineer Podcast - Brex’s AI Hail Mary — With CTO James Reggio

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.