Latent Space: The AI Engineer Podcast - Brex’s AI Hail Mary — With CTO James Reggio
Episode Date: January 17, 2026From building internal AI labs to becoming CTO of Brex, James Reggio has helped lead one of the most disciplined AI transformations inside a real financial institution where compliance, auditability, ...and customer trust actually matter.We sat down with Reggio to unpack Brex’s three-pillar AI strategy (corporate, operational, and product AI) [https://www.brex.com/journal/brex-ai-native-operations], how SOP-driven agents beat overengineered RL in ops, why Brex lets employees “build their own AI stack” instead of picking winners [https://www.conductorone.com/customers/brex/], and how a small, founder-heavy AI team is shipping production agents to 40,000+ companies. Reggio also goes deep on Brex’s multi-agent “network” architecture, evals for multi-turn systems, agentic coding’s second-order effects on codebase understanding, and why the future of finance software looks less like dashboards and more like executive assistants coordinating specialist agents behind the scenes.We discuss:* Brex’s three-pillar AI strategy: corporate AI for 10x employee workflows, operational AI for cost and compliance leverage, and product AI that lets customers justify Brex as part of their AI strategy to the board* Why SOP-driven agents beat overengineered RL in finance ops, and how breaking work into auditable, repeatable steps unlocked faster automation in KYC, underwriting, fraud, and disputes* Building an internal AI platform early: LLM gateways, prompt/version management, evals, cost observability, and why platform work quietly became the force multiplier behind everything else* Multi-agent “networks” vs single-agent tools: why Brex’s EA-style assistant coordinates specialist agents (policy, travel, reimbursements) through multi-turn conversations instead of one-shot tool calls* The audit agent pattern: separating detection, judgment, and follow-up into different agents to reduce false negatives without overwhelming finance teams* Centralized AI teams without resentment: how Brex avoided “AI envy” by tying work to business impact and letting anyone transfer in if they cared deeply enough* Letting employees build their own AI stack: ChatGPT vs Claude vs Gemini, Cursor vs Windsurf, and why Brex refuses to pick winners in fast-moving tool races* Measuring adoption without vanity metrics: why “% of code written by AI” is the wrong KPI and what second-order effects (slop, drift, code ownership) actually matter* Evals in the real world: regression tests from ops QA, LLM-as-judge for multi-turn agents, and why integration-style evals break faster than you expect* Teaching AI fluency at scale: the user → advocate → builder → native framework, ops-led training, spot bonuses, and avoiding fear-based adoption* Re-interviewing the entire engineering org: using agentic coding interviews internally to force hands-on skill upgrades without formal performance scoring* Headcount in the age of agents: why Brex grew the business without growing engineering, and why AI amplifies bad architecture as fast as good decisions* The future of finance software: why dashboards fade, assistants take over, and agent-to-agent collaboration becomes the real UI—James Reggio* X: https://x.com/jamesreggio* LinkedIn: https://www.linkedin.com/in/jamesreggio/Where to find Latent Space* X: https://x.com/latentspacepodFull Video EpisodeTimestamps00:00:00 Introduction00:01:24 From Mobile Engineer to CTO: The Founder's Path00:03:00 Quitters Welcome: Building a Founder-Friendly Culture00:05:13 The AI Team Structure: 10-Person Startup Within Brex00:11:55 Building the Brex Agent Platform: Multi-Agent Networks00:13:45 Tech Stack Decisions: TypeScript, Mastra, and MCP00:24:32 Operational AI: Automating Underwriting, KYC, and Fraud00:16:40 The Brex Assistant: Executive Assistant for Every Employee00:40:26 Evaluation Strategy: From Simple SOPs to Multi-Turn Evals00:37:11 Agentic Coding Adoption: Cursor, Windsurf, and the Engineering Interview00:58:51 AI Fluency Levels: From User to Native01:09:14 The Audit Agent Network: Finance Team Agents in Action01:03:33 The Future of Engineering Headcount and AI Leverage This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe
Transcript
Discussion (0)
We have like three pillars for AI strategy.
We have our corporate AI strategy, which is how are we going to adopt and like buy AI tooling across the business and basically every single function to be able to 10x our workflows.
And we have our operational AI strategy, which is how are we going to buy and build solutions that enable us to lower our cost of operations as a financial institution?
And then the final pillar is the product AI pillar, which is like, are we going to introduce new features,
that enable Brex to be a part of the corporate AI pillar of our customers.
It's like we want to build features and be a solution that somebody else is saying to their board,
hey, we adopted Brex, and this is part of our corporate AISR.
Hey, everyone, welcome to the Light and Space podcast.
This is Celessio, I'm joined by Swix, editor of Layden Space.
Hey, hey, hey, and we're here with Gives VergeoC2 at Brex.
Welcome.
Hey, thank you for having me.
Thanks for visiting from up in Seattle where I've been a little bit.
It's cold up there.
Yeah, and we have an atmospheric river hitting the city right now, so a lot of blowing.
Yeah, well, yeah, we're getting the full-on winter effect right now.
Well, you're here to, we talk about the sort of AI transformation within Braxton.
There's a lot of interesting tip-bits that we were going to draw from your article, but also your background.
You've got a wide array of experience from Stripe to Banter to Convoy.
And I think also mostly I'm interested in your journey as one of the rare people that have transitioned from like a mobile
engineering leader to a CTO, which I think is also a bit more rare. I used to have this comment
in a past where there's a career ceiling for people who work on client-only things, where usually
they don't hit CTO, whereas they typically promote the backend people, the backend clouding for people
to CTO. Yeah, you know, it's something that I hear fairly frequently because there aren't that many
folks with a front-end background who reach this level of leadership. And it's exciting for me to be able to
represent that group. But I'll say that even though my
resume kind of reflects that I've been more on the front end of things. It's probably more my experience
as a founder a couple times over that actually helped me get to this level of my career working for
somebody else, becoming the CTO is very much like a leadership and in like general business role as
much as it is a technical role. And so I think it was more the skills that I built from starting
companies and trying to build those up made me a decent fit and enabled me to get the nod from Pedro
to take this on as my predecessor left about two years ago. Yeah.
One thing, I'm curious to you guys' commentary, this is a little bit broad, unscheduled,
but a lot of startups are bragging about how many ex-founders they have.
And yes, to some extent, you want people with the founder mentality and agency,
which is what you did, to be your employees and to take initiative in the company.
But also, I wonder if it's becoming anti-Signo sometimes.
I don't know if you've thought about this.
I think it's more about the turn for me, especially when people are hiring ex-founders.
is like if you're truly of the founder gene,
it's kind of hard to just stay somewhere.
It's like an IC for too long.
And then it's like, all right,
I joined this thing and then in one year
I'm back to being a founder.
I'm curious for you.
I'm sure you thought about leaving
and like doing another company and say.
In fact, that was the alternative.
I was considering even at the time that I got the phone call
where they made me the offer to become CTO,
I was thinking about leaving to go start a company.
And, you know, I think what's interesting about it,
we actually launched sort of like a new recruit
an employee value proposition for Brex a couple months ago called Quidders Welcome,
where we actually intentionally are leaning into this idea that we have a disproportionate
number of folks who go on to become founders or heads of a department when they leave our company,
and we celebrate that. It's actually something that I'm very proud of.
And it means that we welcome in people who want to get a different experience.
I think that there's certainly a lot of founders who don't make it,
don't scale their own businesses to the scale that we've achieved at Brex,
so there's something to be learned when they come in.
And then we're very happy to, like, support people on their way out.
And so I actually really like hiring former founders or future founders.
The one value proposition I find that's most relevant,
because a lot of the folks we're hiring as AI engineers
are kind of folks that are either, like, winding down their companies
or considering maybe running AI startup,
the thing that resonates the most with them is that we oftentimes can give them
problems to solve that are interesting, problems that maybe they even want to want to
build their own startup around, but with instant distribution, right? Like that, that is the, that is
the allure. It's like, you can come into this business and build, like, financial AI applications
and instantly have deployed to roughly 40,000 customers across, you know, the Fortune 100,
down to, you know, tens of thousands of startups. So that, that's what is, I think, appealing.
The founders, but the challenge then is making sure that we set them up for success in an environment
that still feels a little bit like the startup
that they might build themselves
versus like something that's too corporate.
Yeah, instead of doing your own company
and then come into you and be like,
can I integrate into Brex?
Yeah, get all the data.
Yeah, exactly.
How's the engineering team structure?
Yeah, so we have about 300 people in engineering,
like 350 total across EPD.
And for the most part,
we structure around our product domains.
And so this means that Brex is a corporate card.
It's also a corporate bank account.
expense management, travel, and accounting.
And so we actually have sort of full stack product domains that are roughly like 30, 40 people
for each of those that have everything from like the low level infrastructure up to the
Web and Mobile experiences.
That's generally like the structure of our engineering organization.
And then we have naturally like a organization that focuses on infrastructure,
security, IT.
And then there are two additional centers.
of excellence that we've kind of built that kind of violate that org design where we've felt the need to
put more focus or like operate slightly differently. And AI is one of those areas where we have
another team of just roughly about 10 people who are focused primarily on LLM applications.
And we wanted to create a bit of a separation there because the way that we were thinking
about this and this is actually something we did this summer is we paused and asked ourselves
on our AI journey towards like infusing our product with AI and
generating customer value, we asked ourselves, like, what would a company that was founded today
to disrupt Brex look like? And then we tried to basically use the answer to that question to form
this team internally. So it's a little bit off to the side. Ideally, everybody kind of comes up to
speed and contributes, you know, LLM features, but we have this sort of off on the side right now in a
centralized manner. What's the difference in AI adoption for those teams? So like, are the people on the
LLM team, like much bigger cursor users, clock co-users, or like, do you see similar diffusion?
It's actually fairly, fairly uniform across the entire engineering department. It's actually
kind of funny, like, one of our largest cursor users is actually an engineering manager. So, like,
and I think that this also just, like, speaks to our core value of operated all levels, where we want
all of our EMs and everybody in leadership to still basically do the job that they're managing,
manage the work.
So it actually is,
I think the journey of getting everybody
into using agentic coding
was not sort of exclusive
to like the AI group.
Yeah.
In fact, I think this podcast was actually set up
because I called outreach to Pedro
because he tweeted this.
I assume this is the center of X.
He says,
I started a new company inside Brex to build the future
of agentic finance.
No BS just builders building 986
and pushing production grade agents
to 30,000 finance teams, now 40,000.
and then he actually has like a little job description which I think is really interesting
I'll skip that and go straight to Brex Accelerated Grow 5X and Cut Burn 99% in the past 18 months
I assume that's a mix of internal AI automation and other stuff but where basically I wanted to
put some headline numbers up front to impress people yeah before we dig into the details
yeah absolutely and you're correct that's the that's the team that we have this like AI team
you're actually what was that very young team yeah it's very young I mean it's and it's been
really interesting. The composition of the team is
very young, like AI native
20-year-olds who basically grew up with
the tech, kind of paired off with
more like staff level, software engineers
that have been up for a while who can kind of navigate
the existing code bases and
like understand the product and the customer
deeply. Like we've formed these really
a couple of tight tight knit pods
in the AI org where it's like three people.
Generally somebody who has like more of a product
customer focus background that like
staff engineer who knows where the skeletons
are and then like a much younger
or like AI native engineer who can just do things with agents that like the rest of us,
dinosaurs maybe don't, don't, can't either dream of or like, or where our, I think,
I think part of it is like sometimes the too much experience or too much knowledge of how to solve
a problem and actually be an impediment to thinking differently about it and thinking about it
from like an AI first lens. But yes, we've been slowly growing that team just in the same way
that like a pre-seed startup, you want to be very, very careful about talent density.
and like very deliberate, like only hire when you absolutely need it.
And so, yeah, at this point, it's just about 10 people.
And I think it was probably four or five people.
I think everybody was actually in the photo that was attached to that tweet
when Pedro put that out a couple months ago.
Yeah, we'll put it up.
It's a photo at 1.20 a.m. on a Friday.
Yes.
Oh, yeah, yeah.
Because we always do, we always do like Friday demos.
And like that's a time for everybody to get like kind of an exact review time.
And so...
Everyone in Seattle?
Those folks were all in Seattle.
But they're actually geographically distributed.
We have a couple folks here, a couple in Sao Paulo, a couple in Seattle.
At Decibo, we have this like A Center of Excellence, which are basically the people running these teams across companies.
Yep.
How do you make the other engineers not feel like you're not special?
I think that's something that I hear a lot is like, hey, you know, why aren't these people working all the cool LM things?
And like, I'm stuck working on, you know, the KYC integration with whatever.
You know what I mean?
It's like, how do you build that culture?
You know, it's interesting. I thought that that would be more of a problem, but the benefit of having really optimized our engineering culture around business impact actually causes it to cut in the other direction.
Some folks don't want to work on the AI products because it doesn't have as much clear direct like business impact right now.
Doesn't impact revenues directly. And so I think folks, for the most part, we've enabled folks who have as strong as I work on AI products to join that team.
like somebody transferred out of our expense management organization to come over there because they're
really passionate about taking like their knowledge of like policy evaluation and and bringing it into the
AI team. But the most part, I think everybody understands like how their work ladders up. And
there's some like friendly rivalry because like the folks who say we're a card product, they drive 60% of our direct revenue.
And so they're pretty happy with that. And they don't feel like they're being left out. And I will also say,
As you probably saw in this piece that we put out with first round,
there is a lot of smaller applications of LLMs
peppered throughout all of our product and operations teams.
It's just some of the more novel, like,
agentic layer that sits on top of Brex that has been put together,
like in this sort of isolated team.
So it's not like folks aren't getting to build with LLMs
or use LLM on a daily basis.
Yeah, maybe run people through the Brex agent platform.
We'll put the diagram in the video where you have
at the LLM Gateway, you have like the OmbCP layer,
we just at David, the creator of MCP right before you.
So this is very timely.
Yeah.
How did you start building that?
What's the architecture?
Yeah, the architecture, you know, I think simple is, is elegant.
And we've had basically an LLN gateway and a basic hand-rolled platform from the very early days.
In fact, right before being tapped to become CTO, I was leading like an AI labs team
internally in the wake of like the announcement of chat GPT, you know, everybody,
saw this through technology, is that, hey, what are we going to do with it?
And so one of the first things that we did, I think January 20, 23, that would have been,
was try to put together some internal infrastructure that made it possible for us to deploy,
deploy, manage version and evel prompts and then be able to manage, like, data egress and model routing
and have some very basic, like, observability and cost monitoring in an LOM gateway.
But that's infrastructure that we stood up, and it still continues to power a lot of those
smaller, more, let's say, like, precise applications of LLM. So, like, for instance, we've,
we set up a completely automated pipeline for evaluating customer applications to get them
onboarded instantly to Brex, which is something that used to require human intervention,
either for underwriting or KYC. But now we basically have a series of agents and, particularly
like research agents that will go and do the work that humans would normally do. And so that's
running on top of this, this hand-rolled framework. And then for the agents on Brex that we announced
in our fall release, which is like this agentic layer that we're building that sort of sits on top
of Brex and can embody workflows that a finance team would normally hire humans for, we've actually
started using Mastra for that as like the kind of primary framework for AcceleratingS.
We actually have built everything in TypeScript, which is another like technology choice that's
answers the question of like what will we do if we started Brex today but isn't the case for
all of our existing back-end code which is either Kotlin or Elixir. And then we have a mix of
PG-Vector pine cone and like I think what we've seen is we're always we're always re-evaluating
the tech and framework choices as we go because the half-life of code has declined so significantly
with agented coding. It's actually quite easy for us and for anyone else to kind of try on for
size, a variety of different pieces of tech to figure out what is going to be most
economic for solving the problem.
The Woclico Mastra, that's a new choice, an interesting one.
Yeah, I mean, I think that the main reason that we adopted Mastro is that it provided
the ergonomics that we were actually, that the ergonomics of Mastra are quite similar to
the internal LLM framework that we built two and a half years ago, whereas like Langecane was
available at the time, two and a half, three years ago.
It didn't quite feel right to us when we were trying to, it kind of addressed the things that weren't the pieces that we needed to address,
which was like being able to have really simple observability and logging, tracing.
Langchain didn't do it?
I mean, at that time, it didn't.
I think it was really, I think it was.
Or they fixed that.
Yeah, no, they certainly did.
But so we did, I'm trying to remember, because this is now ancient history, we evaluated link chain,
turned off of it, built our own thing.
And then as we were looking,
we kind of want to deprecate this internal framework that we built
because at the end of the day,
it's not leveraged for us to maintain that.
And Mastra ended up fitting the bill for the feature set that we were looking for.
And I think what's been interesting is about half of the applications
that we're building right now on the agent layer are running on Mastra.
And then the other half are actually still running on like yet another
or internally developed framework, which is a framework that's focused more on networks of agents.
So sort of multi-agent orchestration versus more like strict, like, you know, single turn or like workflows,
which are easier to use like either Landgraf or Mastra.
Tell us about your multi-agent framework.
I mean, what are the design considerations?
Why is this the first we're hearing about it?
Yeah, yeah.
So it's funny.
A big reason why we haven't written more about this is that it continues to evolve quite a bit.
I feel like we actually had a blog post that we were going to put out in conjunction with the fall release,
talking about how we built this.
And by the time that we finished, you know, the blog post and had all the package ready,
it was already like halfway outdated.
And so the way that this has started to emerge is this multi-agent network approach to implementation
was when we were trying to scale up our sort of consumer-grade Brex assistant.
So if you think about like Brex and our customers, there's really like two very broad personas that we serve.
serve members of a finance team who are generally like going to be doing like in roles like
accountant or controller or head of T&E.
For those folks, they are going to be interacting with agents that are much more specific
to their roles.
But then the other broad cohort of users we have are like employees of companies that
have deployed BREX.
So you go join a new company.
That company uses BREX.
You get your BREX card.
And our goal for employees is for BREX to completely disappear.
Like the best UI, UX for BREX is just the card.
like every single thing that you have to do in the software beyond just swiping the card is like an
opportunity for AI to to eliminate some work for you. And so what we thought was the right approach
to solving for that was to embody like an executive assistant for every employee. Because I as an
executive at Brex, I have an EA. And she knows enough about me. She has access to my calendar, my email,
has all the context on when I'm traveling and for what business purposes. And so she's basically
able to do everything that I would be obligated to do in Brex, be it like booking travel or
like doing expense documentation. And so what we wanted to do is we wanted to build like that
EA connected to the same data sources and see if we couldn't simulate that behavior so that
you basically, your interface to Brex's SMS in the card. And when we started building that out,
you know, the most naive like architecture for that would be to have an agent with a variety of
tools and maybe maybe do some some rag to ensure that it has like appropriate context for the
conversation. But what we were finding is that, um, the wide range of different product lines
that exist on Brex made it difficult for one, uh, like agent to perform well, uh, being responsible
from everything from like expense management to finding and booking travel to answering policy
and procurement questions. And so that's when we started breaking down the problem, uh,
and into into a variety of subagents that sit behind and orchestrate.
And obviously this is something that can be implemented using Langraph or Master even has the notion of these as like network switches and data.
But what we found is that it was easier for us when it came to being able to build eVals for the system.
We kind of just hit the eject button and built our own framework, which is one in which we have agents that are able to basically DM with other agents and have multi-turn conversations amongst themselves to coordinate to complete a test to
or like to complete an objective.
And what's, what's been nice about that is it means that, like,
you can have your Brex assistant.
There's, like, one single, one single, like, point of contact between you as an employee
and the Brex product.
And then behind your assistant, if the company has, like, expense management turn on,
you have that.
If they have reimbursements, there's another agent for that.
If they have travel attached to their own agent for that,
they actually also then facilitates, like, our conception here is that, you know,
it's like generally, like, software encapsulation.
patterns, like sort of projected into the agent space, it also makes it easier for us to have,
like, the team that owns and understands travel, like, be the ones to go and iterate on that
without needing to worry about, like, regressing the total system or needing, like, one team
to own every single possible action you could take as an employee. And I'll say that, like,
I'm still of the mindset that somebody will build a great framework, and we may have ultimately
migrate to it, but, or it might be us that we ultimately able to source this, right? But, but,
But for us, like, this is, this has worked out quite well in, like,
lieu of, like, a couple other approaches that we tried along the way that just didn't perform well,
which was to, you know, overload the agent with a variety of tools or contextual, like,
context switching where we try to say, oh, this conversation looks like it's more about reimbursement.
So let's, like, update the prompt with more reimbursement context.
Like, that was, that was another approach that we took that didn't perform as well as actually
having a reimbursement agent that it would collaborate with.
What about MCPs as, like, Salby?
Oh, yeah, some other pattern.
The key thing there is that we, there's actually a lot of value in having like multi-turn
conversations from like the orchestrator or the assistant to like the sub-agent,
whereas like, you know, a tool call is basically just like one RPC.
And so oftentimes what will happen is, you know, let's say, let's say the user reaches out
to their REC assistant and says, hey, like, am I allowed, like, how much am I allowed to expense
per person for dinner tonight?
I'm taking my team out.
and the, you know, your assistant's going to then reach out to the policy agent.
Maybe the policy agent needs to know, in order to answer that question, maybe it needs to know
whether this was like a customer event, a team event, or whether you're traveling.
And so it may actually send, instead of, and it can't just answer the question.
So it's going to reply back to the assistant and say, hey, I need you to ask this clarifying question.
And so then the assistant will return to the user as clarifying question.
and they'll basically have this sort of multi-turn conversation across multiple agents
versus it just being encapsulated in like a single call-in-response tool call.
And so there are still like all the sub-agents have a ton of tools.
But I think of like the MCP and tool usage as being like the interface to all of our conventional
imperative system not at the AI space.
Yeah, that's the conversation we were having earlier, whether or not it should be an agent-to-agent
to-agin call as well.
Yeah.
Or like, yeah, there should be like a chat back.
Exactly, exactly.
And that's the thing.
It's like, okay, and one of the ways that we actually grafted this in a master
before we built our own framework was to make every sub-agent a tool.
And then the input was just natural language.
The output was natural language.
And the, if you needed to have multi-turn, you would basically just put the full, like,
our conversation.
And as you kept calling the sub-agent as a tool.
And it's just like at that point, you're like, okay, the ergonomics are kind of
The framework is fighting me on this.
It's actually helpful for us to basically conceive of it as an org chart.
And like it's the agent org chart with, you know, my EA is DMA and other specialists
and having brief conversations to support me as their client.
That was a really good deep dive.
Thanks for indulging.
I feel like you guys are not afraid to make your own tech, which I think is a competitive advantage.
I really like that culture.
Maybe we should go a bit breath first as well.
well. Of course. I think we also deep dive a little bit too much in one area. There's,
and we'll put up the chart. But I'm also very interested in like the sort of internal agent stuff,
the operational stuff, and just the general platform scope. So please feel free to just like go into
your spiel on it. Yeah, of course. So one of the things that I was trying to do at the beginning of the
year as CTO, you know, I think it really felt to me to articulate what our AI strategy was as a business.
You know, every board of director was, you know, or every member of our board is,
like, hey, what's your AI strategy? And while we were doing a lot of things, we literally go, he's got it.
Well, yeah. And if I didn't, I'd be in trouble. I think he also was counting on me, given that I was doing the AI organization before CTO to have.
But a big part of it was like, we were doing a lot with LLMs. It was more like these little one-off features and, you know, hey, like maybe mix in some suggestions here or maybe do a little bit of ops automation over here. But it wasn't, it wasn't, it wasn't.
wasn't easy to kind of create like a verbal framework of all of these investments. And without
that framework, then we weren't able to like set a vision or a roadmap for for investments. So
what we did at the beginning of the year is we took everything that was going on as well as
all of our ambitions, all of the good ideas, as well as like the problems we were trying to
tackle as a business this year, throw it all on the table and see if there were some ways to
cluster it into a framework that made sense to the business, to our board, to ourselves. And we came
up with, I think this is not particularly novel, but has helped us quite a bit. We have like three
pillars for AI strategy. We have our corporate AI strategy, which is how are we going to adopt
and like buy AI tooling across the business and basically every single function to be able to
10x our workflows. And we have our operational AI strategy, which is how are we going to buy and
build solutions that enable us to lower our cost of operations as a financial institution. It
because I think it's fairly intuitive, like financial institutions like ours face a lot of
regulatory expectations, and there's just like a high ops burden for running our business.
And so it's sort of like a lot of kind of internal use cases, like being able to do like fraud
detection, underwriting KYC, be able to handle dispute automation on car transactions, those
types of operational investments are our ops AI pillar. And then the final pillar is the
product AI pillar, which is like, are we going to introduce new features that,
enable Brex to be a part of the corporate AI
pillar of our customers.
It's like we want to build features
and be a solution that somebody else is saying
to their board, hey, we adopted Brex
and this is part of our corporate AI strategy.
And so it's kind of has this nice little feedback loop
and we basically within the company
split, you know, did a little bit of divide and conquer
where folks in IT and on our people team
were more or less spending more of the effort
driving on corporate AI.
really like looking for making the procurement decisions,
like creating a culture of experimentation,
where we spotlight and incentivize people
for trying to sort of improve their personal workflows
using AI.
And then the pieces that I've been more involved in
have been operational and product.
And we were just talking about products here,
which is like the agents on Brex and stuff.
But I think that the operational AI investments
have been some of the most sort of immediately impactful
to the business because we have hundreds of people
who work in our operations organization.
And it's actually something,
that differentiates us because our CSAT and the quality of our support and service is very,
very high.
It's something we're very proud of.
And so trying to figure out how can we automate a significant portion of this and use LLMs in a way that doesn't degrade the customer experience?
And then also kind of addresses like, what is the future of the roles of the people who we already have working full time for us?
So this is where Camilla, our C-O-O, who kind of co-wrote the piece with first round with me,
she's been lean in really aggressively to help every member of the operations organization
start rethinking their role as being not people who kind of execute against an SOP,
but are people who are going to, like, build prompts, build evals and, like, become more AI-native
and, like, the way that they do work.
And so a lot of the engineering we've done has been to enable folks, say, and,
and fraud and risk to be able to refine prompts and add additional automation to their workflows.
Yeah, and it's a secret fourth pillar, the platform.
Yeah, yeah, exactly.
That is the thing that ties it all together, exactly, is the platform.
And I think what's been really nice is that even though the platform is kind of a loose,
loose
term because it consists of a wide
variety of technologies, as I said,
like we haven't been too religious
or dogmatic about everybody
needed to be on one particular thing.
What we've seen is that
by making a variety of sort of ergonomic
options for building with LOMs available,
it really has made it easier
for us to make a quick leap forward
on operational AI.
As soon as we put our mind to it,
we said, look, no, we want to hit
80% automated acceptance rate
for all,
all startup and commercial businesses
that apply for Brexit.
Like we want a decision within 60 seconds
it's fully touchless, no humans involved.
We're able to break that down
and then actually build the agents,
build the tools on top of that platform really quickly
and a lot of those tools are the same tools
that are product AI agents use as well.
I was pretty sold on the conductor.
I don't know if this is under exactly the bucket,
the conductor one.
Oh, yeah.
Provisioning command.
I was like, yep, I want that.
Yeah, that was actually,
I'd love to talk about that.
So that's actually on the corporate side.
And I think that this goes back to maybe another intuitive, but I'd say like bold decision that we made,
which is that we're not going to, we're not going to try to pick winners in the horse race
between the foundational model providers or the agentic coding tools or like basically anywhere
where there's there's an active horse race.
What we do instead of like trying to pick a single solution is we will procure like a small number of seats,
like multiple solutions and then we'll give employees the ability to pick whatever one they want to
use and so for instance like we allow employees to basically go to in slack and use conductor one to
get a chat chpt a cloud or a jemini license and basically you can just like build your own stack
where you pick your um you pick your like chat chat provider uh as a dev you can pick um you know
between like cursor windsurf cloud code credits like and and you can basically craft your your stack to
your preference and easily switch between them and what they
does for us too is when we're going to like obviously we have sort of enterprise agreements in place for all of them for the sake of like the you know the privacy and non-training guarantees but it's fun because when we go to renew these contracts um it it we can basically resist the need to like do a wall-to-wall deployment we can say hey look like usage trends they are our employees are voting with their feet they're voting with their dollars and you know maybe uh if your tool isn't is uh is how does it was a year ago does it give you a dashboard of what people are choosing yeah actually we'll
look at that. We were looking at that as we're going into budgeting over next year. It's very
interesting. I would love to see that those, what's, you know, anything that's like really up,
anything that's really down. It's fascinating how, how different the landscape is every, every three,
three months. And I think one of the, one of the interesting challenges we had early on was
getting folks to just like try these tools, try to incorporate like a genetic coding.
You know, like early on, I say like 12 to 18 months ago now, like get folks to just take
the time to try a new workflow.
And now at this point, I think what we're seeing is like, even if, you know, a new model
hits the same, like when Codex came out and everybody was like, oh, codex is better at CodeGem,
it's a little bit slower.
Like, I find fewer folks are like kicking the tires on new things because like they're just
so comfortable with ergonomics of their current workflow that, you know, some folks
are just like, I want to stick with Claude Code because I know it now.
working with it for like nine months so I don't need to keep uh keep switching I don't need
I don't feel the incessant need to keep trying new things because I've I've gotten I'm an iPhone
person and I'm just like going to stay with an iPhone even you know even though there's some
really sexy Android hardware out there do you have one of the big numbers like 80% of all of our
code is written by AI or but how do you measure it internally yeah no not really we we I mean I
what we do is we'll measure like the attributions on the the number of commits that have the like
co-authored with.
And we pull some of those stats,
but I don't index have,
like, in fact,
I don't index on those at all.
And honestly, like, I,
I don't know how I,
like, honestly,
like, honestly calculate that number.
Yeah, I agree.
Yeah.
And so, so,
so, the thing that,
the thing that we're really just,
you know,
we're up the point now
with the, like,
our AI,
agentic coding journey,
where now we're trying to solve
the second order effects
of like a little bit too much slop,
maybe a little,
not enough
yeah exactly
not enough like
rigor and code reviews
we're trying to
the adoption is there
and now we have to figure out
like how to mature
in our usage of these tools
so that we
quality or like long-term
maintainability doesn't suffer
as well as like
maybe one of the other facets
of being able to generate
a lot more code more quickly
is like the
the drift between
team members as far as like
understanding of the code
that's in their services
increases is like everybody's moving faster and more independently.
That is another sort of risk that we're starting to see.
Like, you know, an incident response where folks don't know,
they don't know a service as well as they used to because it's changed so much in the past
couple months because everybody's moving more quickly.
Yeah, this has been a major topic for me this year on code-based understanding and
slop because obviously it's so much easier to generate code, but then now we have to review it.
And to some extent, you can't really fight AI with more AI.
you can't just be like,
or just throw an AI reviewer on the AI code and you solved it.
And so you do need to just scale human attention.
And I think that's something I've been pushing a little bit in terms of like,
well, you're just going to, like every engineer is just going to own more code.
Yep.
Period.
And be parachuted in and be expected to ramp up and be productive and also fix bugs.
And if you're on, you know, page of duty or whatever to just because, I mean,
everyone's going to try to be more efficient and you're supposed to see ROI productivity.
Because if he don'ts, then what's the whole point of this?
Exactly.
Exactly.
And I think it's funny, you're going back to the point of, you know, you could add AI on top to solve the problems that the AI introduces.
And you just keep, that's like an endless chain.
And so.
But I mean, the, the, the code rabbits of the world, the graphites of the world would say, yes, actually, you can.
And so that's the little bit of the tension there.
Yeah.
You know, I've been thinking a lot about how the craft of engineering is evolving.
and I will say that I feel further away from being able to predict what it looks like
than I did this past summer when I spent a bunch of time.
I actually basically went on leave for a month and joined the team that the AI team
that we were building just to go and build alongside them.
I felt like it was really important for me to deeply understand the problems in the tech.
And so that was me.
I was writing, pushing code effectively 996.
And I went through so many different moments of realization of like, oh, my God, this is going to change everything to, oh, my God, this is just amplifying all the good and the bad in the industry to, oh, my God, engineers are not going to have a job anymore.
And so I don't have any, like, I felt like I had all the predictions back then.
And at this point now, I'm just very interested to watch the phenomenon continue to unfold in front of us.
And I will say, I was chatting with a bunch of really bright.
know, college juniors and seniors at a dinner we hosted last night. And all these folks are about
to enter the industry, basically having kind of come up in the era of agentic development and
LOMs and I asked them, like, what is your workflow when you're like building a project?
How do you use agents versus like when you decide you're going to actually just write code by
hand? And I was surprised to hear the consensus was that most people there were using agents
to collaborate on like building a design document
and like collaborating on the architecture
of the solution that they want to build
and then they'd be asking it to like emit
a doc or an implementation plan
but then they'll go and write a lot of the code themselves still
so it's a little bit more of the
the rubber duck co-architect
use case that was most prevalent in that group
I was very surprised by that
I'm impressed the kids are all right
yeah I know they still want to they still want to actually
write the code themselves that's interesting
Yeah, what we hear from like the Gen Zs that open the end, they just yolo everything into code as.
Yeah, I would say most of the code I generate is like, yeah, but I spend a lot of time on the doc.
It's curious, like, when you're like younger in your career, it's like you don't really have all the mental models of the different patterns to instruct.
I feel like there's like overreliance, especially if you're doing the design doc, you know.
I feel like most of the senior engineers will spend more time on that.
It's like, even things like, you know, what columns should you index?
depending on, you know, what queries we usually run on this table and things like that.
It's hard for any AI to know that, you know?
And it's like, I feel like the role of like the more senior engineer should actually be more of this.
It's like spending time teaching the AI and then the AI can teach the junior people in a way.
Yeah, yeah.
And it, everything, everything looks like mentorship and management at the end of the day, right?
It's like you're breaking down tasks, you're supervising work, you're giving feedback.
like it's basically management.
Except that there's agents are really bad at memory still.
They basically have zero memory.
And it's, it's, it's, it's, it's, it's, it's, it's, it's, it's 2021, 25.
What's going on?
Yeah.
Yeah, what's your internal stack for like, uh, preferences?
There's like, kind of like, you know, explicit preference you can use with, uh, you know,
agents at MD and all that stuff.
Uh, there's implicit preference with lentil rules and things like that in a way,
where it's like, it just happens.
You don't have to tell it.
How do you structure that?
Oh, and are you talking about for agentic coding or memory or a thin or lit-ed platform?
Yeah, yeah, for like the coding specifically.
It's like, and then we can kind of talk about, you know, the whole Brex platform.
Yeah, just, just, nothing, nothing special, just a lot of explicit rules.
That MD files.
Yeah, and then we have, uh, and we, um, in Linting, we still have like traditional linters in place for the couple of different language full chains.
And then we're, we're big fans of creptile and we use them for basically all of sort of the, um, smarter than linting, uh, like,
Agented Code Review. That's been the one
solution that we've aligned around that
has served us extremely well.
Yeah.
Joe Gertile. Yeah, I know. We're
huge fans. They've built something really
impressive. And I think the thing that constantly
blows my mind about it is
the way that they're able to
just have a really impressive signal
to noise ratio. Like the
comments that it leaves are very
very high signal.
I never regret going through all like
65 comments that leaves on my
on my diffs because it catches so many things.
Yeah.
I found the codex review to be really good.
I don't use codex for code generation,
but the review product is like very good for some reason.
I used to have, when I was working in Rails,
there was like this project called Danger Systems.
Oh, yeah.
It was kind of like a semantic linter.
Exactly.
I feel like there should be more of that now.
It's kind of like the rules are one thing,
a generation,
but I want something in my CI that is like,
enforce these rules and call out where they're broken.
And then I can just copy paste that in an agent.
but yeah when we when we started building this this new agent um code as like as we was saying like
we were answering the question what would you do if you built uh you know a brex disruptor today and it's
like it wouldn't be to pick cotlin and elixir as the back end and uh and so we actually went with the
full like type script stack and we we were building on all like public interfaces and um really
trying to make sure that this agent layer was uh like arm's length from from the the good and the bad
of the core of our product.
And one thing, I think what we did early on,
and I don't actually know if this is true
because, again, the team keeps sort of iterating.
But we, we're having good luck using cloud code,
like in a GitHub action to basically go and do more
of that dangerous style like code review.
So have a prompt for it that went through
all of the different facets that were more conceptual
versus, like, rigidly enforceable by a linter
and have it leave a big comment at the end with,
your conformance to the idiomatic coding patterns of the new repo.
I wanted to spend some time. You said you wanted to devive on operational agents.
Customers afford, onboarding, KOC, fraud, delinquent account disputes.
This is, I imagine, the bulk of it.
Yes.
Of the work.
Anywhere where there's a good story about maybe when you started out, it was going to be this way,
and then you discovered through building or through customer contacts that it had to go a different direction.
And so that difference in beliefs is something that is something that
people can learn from. The thing that immediately comes to mind is that we, uh, we believed at the
beginning that using RL for credit decisions would actually be a like would be the way that we
would end up, or like credit and underwriting, like how much of a, of a limit should be give to
this business, um, that reinforcement learning would be the way that we would go about, um,
building a model that effectively would decision in the way that, um, a human underwriter would
And it turns out that it was, he made this big investment.
We were working with some outside like a company that specializes in this.
And the performance we ended up getting was inferior to just building a like a web research agent.
And so I think what we took away, what has been most evident in operational AI is that in operations,
you need to be able to break down problems really granularly and be able to form SOPs.
that humans can repeatedly follow and thus can be audited
because so much of the responsibilities and operations
is to have audible, repeatable processes
that help to ensure that we're operating in a compliant manner.
And that actually translates just so cleanly to LLMs
that we haven't needed to use too many sophisticated techniques
in operational AI.
It's been relatively simple, like,
you tool, like agents, or maybe,
even a lot of problems can be solved, which is like a single turn, chat completion.
And so the fact that we didn't, well, we did one, one sort of attempt to over-engineer
and use more sophisticated techniques.
And we discovered that, in fact, the solutions are a bit more, more plain and less technically
sophisticated.
The challenge is really articulating and refining prompts to reflect, reflect the execution,
the SOP, and, like, reflect all the sort of institutional knowledge that isn't written down
so that agents can properly replace, like, the humans or the contractors we would have making these decisions.
How do you decide what is worth, like, spending a lot of time building versus what you think?
Some of these models are just, because some of these tasks are so generic.
They're not really about Brex.
Yep.
Like, you can assume the models will be good at it versus some of them are, like, very specific to you.
We kind of prioritize, like, the tasks that are most common for the broadest number of customers.
and some of them are fairly, fairly intuitive,
like being able to research a customer
to look to assess legitimacy of the business
and whether that business would fit our ideal customer profile
for onboarding,
because there's certain types of businesses
that we either legally cannot serve
or we are not comfortable being able to serve.
So that's the type of really kind of basic research
and like a relatively straightforward problem
that isn't hyper-brecht-specific.
The things that are a little bit more specific
to us or companies in our sector
would be preparing documentation
for a network card dispute.
Like if you go and dispute a transaction
on your personal card,
you will provide evidence to your card issuer,
the card issuer then has to put together
like a three or four-page word document
that goes to the card network
and then eventually goes to the acquiring bank
and all of that is like much more specific to our business.
It's a huge operational overhead for us.
And that's something that we decided to automate later
because it's not as,
it's not on the critical path of like serving the vast number of our customers.
Like disputes are expensive,
but not very common operational process.
And so they're lower on the stack.
And I think we're getting there right now.
But this year has basically been us just kind of like looking at every single process,
just kind of stack ranking.
And I will say like,
The thing that got us started down this path was we wanted to expand our ideal customer profile to support more business, like a wider variety of commercial businesses, which tend to be businesses that aren't growing as quickly.
So they're not like tech startups, which have a lot of growth and they're not usually like, they're not enterprises, which also tend to have a lot of growth.
It's more like a lawyers, a law firm or a dentist office, these types of like solid businesses that we should be able to serve an underwrite.
but the cost to onboard them and the cost to serve if you have all all the humans in the loop make them ROI negative.
And so that was the first sort of use case of AI within our ops organization that then led to us really understanding we could automate much more than that.
Is this berks going back into SMBs?
Ah, that's a good question.
Yeah, yeah.
So never let let that die.
You know, we think the way we've thought about this is we want to,
to always like offer our product to customers where we believe we have a like an offering that is
well suited to the needs of those businesses. And I would say that still for very small,
uh, businesses are offering isn't, it's not built for that. It's built for,
it's built for companies that have some degree of scale, typically have at least sort of one person,
if not a couple people in the, their finance team. So we consider these to be more like the,
the commercial segment. And so it,
It rhymes with SB, but our approach back then was a little bit more naive.
And I would say we also, we were just going for a volume, like a volume game there.
Our internal controls were not as strong.
We didn't have as much experience like underwriting those businesses.
And so it was really ended up being a huge burden for the business, almost existential,
for us to have those tens of thousands of customers that all were,
ROI negative. So we're trying to basically scale to serve more businesses outside of tech and
outside of like the upmarket segment, but, but do it thoughtfully. So I think right now our
minimum threshold is like a million dollars a year in annual revenue or like $10,000 or more
per month in card transactions as kind of being like the low end of our ICP, which is obviously
not what you would think when you think of a small business. Like small businesses tend to still be
smaller than that. Oh, wow, that's really small. Okay. Yeah. Yeah. Mid-market. Yeah, exactly. And it's
funny, it's just like the names of these segments, you know, it's like what we consider. I don't know.
Yeah, no, I think like that's, it's like, yeah, it's like lower mid-market. And it's funny, though,
because when what we call enterprise may be another, you know, what sales, what we call enterprises
business that salesperson might call a mid-market, right? Like, because it's just it depends on the scale of
yourself as a business when you use these terms. And all of these things are built in the Braxage.
platform, like all these automations that people build?
Yes, exactly.
Yeah, and in fact, most of the operational AI is running on that original platform that we have.
And we built it, one element of it that I didn't mention is that it also, most of the UI,
UX for this platform is built in a retool.
And so, like, you can basically go into retool, and there's like a prompt manager,
a tool manager, an e-eval manager.
And that's sort of where much of this was built.
And the goal with that was, again, to make it more accessible.
more ergonomic to get started, but what a secondary effect of having a more like visual
set of tools for this is it's enabled members of the apps organization to go and do prompt
refinement themselves. So you don't need engineers to go and refine the prompts or, or even like
tests new foundational models when they come out. I think that that's another fun thing when like a new,
when a new model drops, folks will go into the platform and basically run the evils on the new,
on the new model and kind of see,
can we get better performance here?
Does this have different latency
or different cost characteristics?
Yeah, you want the domain experts
or the people directly using the tool,
not the engineers who are somewhat removed
from the tool.
Yeah, I do want to highlight to listeners
that a lot of the BRIC agent platform
are just things that every company should have,
basically. Problem management system,
which we talked about where the domain experts are doing it,
multi-model testing, evaluation and benchmarking frameworks,
API integrations for automated workflows,
NCP-based architecture
straight with Brexit's external AI products.
This one is obviously very Brexit-specific.
One thing I did want to highlight that I was
semi-impressed by it because nobody,
people very few rarely talk about this,
is knowledge-based for understanding Brexit's business.
Yeah.
So do you want to expand on that?
Yeah, and this is an area where we've only scratched the surface here,
but a big challenge that we face is that the world knowledge
or the knowledge that's built into the model about about, you know,
what GPP5 thinks Brex does and how it thinks our business operates
is actually quite different from what our business offers today
or how our product works.
And so we've had to work on building a corpus
of sort of product documentation, process documentation,
and, like, curate this set of information
to basically ground a variety of our LLM applications,
including, like, that Brex's,
which is like the assistant that employees will talk to is like we don't want it to
hallucinate features that we don't have or like give wrong information there and similarly like
some of the operational agents need to be grounded on like what our ICP is because if you ask
chat GPT5 right now like what types of businesses is brex on board or like what types of
businesses does brex serve it might not give an accurate expectations.
to that question. It might say, hey, we're a corporate car for startups, which is what we did, you know, seven years ago.
They might say we're only, we only serve enterprises. And so that has been an interesting challenge. I think we're, what we've been trying to do there is I'm actually going to be spending time with folks talking about this next week internally about like, can we refresh our strategy and kind of unify it because we have a lot of product documentation that's internal for like our operations and go to market teams. We have a bunch of product documentation that's external for our customers.
We have a lot of go-to-market sort of enablement material that's more sales pitchy.
And we have documentation that is put into Sierra, which is the chat assistant that we use for frontline support.
Like all of this ideally could draw from the same source, but right now it's a little bit fragmented.
It's just something that we're trying to invest in though because I think at the end of the day,
the duplication of efforts is just like is wasteful.
it's absolutely necessary to get this right.
Just to de-duplicate Sierra, meaning the Brett Taylor startup.
Yes, exactly.
I would expect that.
You built so many other agents.
That's one you can build yourself.
That's like solving problems that are not differentiated enough for us.
I think what's interesting about the Sierra that has been really helpful is that, again,
it's really easy for, like the UI and UX of basically administering a Sierra agent
is something that's really accessible for the ops and CX strategy team,
which are like, it's much more low code and more sort of workflow and DAG-oriented.
And we have engineers kind of going and giving it tools to take actions.
But for the most part, like it's nice to not have to build the UX for somebody to manage something like that.
And I think the fact that Sierra speaks the language of customers.
Yeah, exactly.
Speaks the language of CX.
They can do all the reporting and the telemetry and stuff that are, you know, VP of CX.
would like to see.
You know,
it's just one fewer thing
that we have to build.
What about e-vales?
How do you build e-vals?
Who manages them?
Well, it depends on,
it depends on the application.
So on the,
on the operational AI side,
those evils are basically
baked into the,
in the platform around every,
every prompt or every agent.
And for the most part,
I think most of these
use cases kind of come online,
like the V1 of like our commercial
underwriting agent,
or the V1 of,
our startup KYC agent are co-developed between like a subject matter expert in
ops and like an engineer and they're going to kind of co-develop an initial eval set. But then from
there generally in ops you're always doing QA, be it like on humans or on on the LLM decisions.
And so whenever like as part of our QA feedback loop, whenever there is a mistake that's usually
almost always going to result in like another Eval being written as like a regression test.
So all of that within ops AI is pretty straightforwardly managed.
On the product AI side, that's where it starts getting a little bit more challenging because the multi-agent network is quite challenging to evaluate.
And so what we do there is we try to adopt some of the state of the art for multi-turn evils where we will basically have an agent embody the user and have basically the end user agent is given an objective.
then we basically have it run a multi-turn conversation and then use a misjudge up the end to
all of the different asset assessment.
The one other thing that we do technique-wise that is interesting is sometimes you don't want to do,
like, you know, I think these multi-turn e-vals are kind of like integration tests.
They sometimes test more than what you want to to assess.
And so sometimes what we'll do is we'll also pre-can like an initial preamble to a conversation
or maybe a couple turns will be handwritten
and we'll basically set the evel to start
and we'll see if we're able to isolate certain
certain behaviors.
So it's still like a work in progress.
And I say like at the end of the day,
a lot of the just periodic group of human review
and like looking at cases where we've detected
as we go to like summarize,
like what we'll do is we'll reflect on a conversation
after a certain amount of time
his past where we'll summarize it,
like extract assets,
like did it seem like the user
accomplished their objective?
And it will just manually
when a lot of the cases
when that's failed
and side write an aval for it.
Are all the evels supposed to pass?
Or do you have a set of evals
that are like,
someday the model will be good enough?
And like, how does that change over time?
Yeah, it's interesting.
I don't know if we have any that are like,
oh, someday I hope it'll be good enough to do this,
but it's like there are the evils that are blocking
because they would indicate like a regression,
an unacceptable regression.
So these tend to be just accuracy-related evils,
but then there are others that are more about like tone and coherency
and these types of things where they're more subjective
and we were just looking at those over time as a metric.
But the team is actually interesting.
I think we're going to get a big update on like how the team
is thinking about evils tomorrow and like our Friday.
our Friday review. So this is an area where I'd say the largest challenge, like the largest
change we needed to make and how we're executing sort of as like a lab or an incubator back
earlier this year to like where we are now where we've shipped and like we're trying to
to increase the rigor has been around like avoiding regressions and having more and more
increasingly robust evils. Yeah, I work with a company called BRCIIs that does user simulations.
And I think that's what's been interesting.
Some of these things they just don't expect,
like the customer does not expect the model to do,
but they want to track the saturation of the model in a way,
if that makes sense.
And I feel like most companies know what they don't want to happen.
But it's almost like they cannot quite articulate,
oh, I want in the future the model to be able to do this.
They can do it today, but I'll keep running this e-val.
That's actually really, really interesting to me.
And I'm going to take that away and start thinking about this
because there are, there are going to be certain, I mean, we already seen this where users will ask the assistant for help with things that we don't support yet, or we haven't implemented yet.
It's like, those are opportunities actually for us to build a, like, effectively write a test that's going to be failing for weeks or months and eventually will go green, but is a way for us to actually kind of show like the progression of sophistication of the assistant.
I really, I really like that as an idea.
Yeah, I wonder how you also catch hallucinations and things.
that it doesn't have that's usually the problem is it you know it'll it'll it'll pretend like it can
assist with something and it'll uh like one thing that is really annoying that has been tough to um
to prevent is that the assistant because it is used to speaking to other agents um that can
support it and like accomplishing various tasks if you ask it to to help with a task that it thinks
it probably should have an agent to uh to work with it'll just hallucinate that it always like
oh yes, I will like, you know, I'll reach out to the finance team on your behalf to
pass this question along, but it's not doing anything.
There's like no finance team.
There's no way for it to do that.
This is something that comes up a lot.
It's like, would you like me to ask the finance team?
And there's no, there's no actual tool for that.
Do you put guard reels for that?
Yeah, yeah.
That was something that we had to.
Like your rejects?
Oh, no, we don't.
I think we've been able to just beat that out of its system with a system prompt.
But the, we don't have as many guard rails in place right now,
just around a couple of potential
things that could get us into trouble.
Yeah, really, it's reasonable.
Yeah, it's surprising
when I, I guess, two years ago
was first kicking around the idea
of all these things.
I would have said that probably guardrails
would be more prevalent,
especially in finance use cases.
But surprisingly, they're not.
Yeah, and that was actually part of what we,
that was like a feature,
I believe we built in the LLM gateway
early on is like the sort of
last chance, like,
like hard-goated.
Yeah, exactly.
Here's some red jacks and double this tilt.
Yeah, exactly.
Or just, you know, in the way that like if you go away a field on chat GPT,
you just get like the in-line 500 error.
It doesn't even tell you that it can't tell, but just like craps out.
Like we kind of built a couple of those circuit breakers or like the ability to put
those circuit breakers in and I don't believe if we're using them for anything.
One last thing I want to get your thoughts on was AI fluency levels,
which you guys have a framework of user advocate builder native and everyone goes through
it including Camilla.
And I just think it's interesting.
I think it's a model that other people are thinking about adopting,
but they're worried about rolling it out.
That everybody's going to be bad.
And also, like, how do you have, like,
this in-house training course that you keep up to date?
Just tell us more about it.
Yeah.
So in the Operationsorg,
they're actually more ahead of even engineering on this front
as far as, like, trying to create,
create like learning pathways for this.
And I think that part of the reason why they're ahead of us is that in operations, they have
to be a lot of training at scale.
Like training is a very big part of how people build the aptitude around their job function
within ops, whereas like in EPD, a lot of it is sort of getting hands on, building
experience, like going a lot and getting mentor, getting code review.
But it's been really neat because I think we've really like, we created an environment
we managed to by speaking openly about the transformation that we saw would happen in this industry
towards AI sort of displacing a lot of a lot of the operations in CX roles and we were just honest
about it and I think what in the same breath that we said hey a lot of these job responsibilities
will go away we also said we don't anticipate that meaning that your job has to go away it's just
that your job has to change.
And so the fluency framework and then the training and support and like the positive sort
of culture where we celebrate people making progress has been really helpful for like
avoiding a culture of fear or like, oh, you have to do this or you have, you're going to get,
it's going to go in your performance evaluation.
I think the, it does.
Well, it's not like his rote is like, oh, like what is the, like how much are you using AI and is it enough?
It's more, I think we've built a pretty like positively framed culture where we'll do like spot bonuses for for people who have like particularly novel uses of AI on in their day to day.
In our company all hands every two weeks, we'll do an AI spotlight.
And it's very rarely somebody in EPD for the most part is folks in ATMs, ops, finance, the people organization showing off like other building agents, you know, in chat TPT or on Glean or how they're, they like just found some new use case that they thought was helpful.
So we're trying to create, I think at the end of the day, like, we've hired a bunch of really smart people who, like, I have full confidence that this type of work is within the reach of anybody who's motivated to, like, sort of challenge themselves.
And so we've done that.
And in engineering, there's one other thing that I want to call out, because I think that this is kind of fun, is that we adapted our interview loop to be more AI, sort of agentic coding native.
So instead of we had like a coding and a system design question that we basically have revamped into a project where we'll give you like a brief before you come on site and then like an additional sort of spec when you do when you start.
You know, we expect you to use agented coding to complete the task.
In fact, it's like kind of impossible to get all the way through it if you don't.
And so we're evaluating, you know, your knowledge like we're kind of watching how you work.
We're evaluating whether you understand the codes that's coming out.
We, you know, we're kind of probing at you as you go.
But what we did in order to kind of,
maybe bootstrap the process of all of our existing engineers,
like getting familiar with the gentic coding is that we,
as soon as we had the interview ready to ship,
we started, we said everybody in engineering,
including all the managers, are going to have to go through this interview.
And so we re-interviewed everybody internally.
And it's like, it's one of those things where it's like,
it's not a, we didn't like keep a score or like,
or like, you know, I don't have any data on like who passed or failed or what they,
what they scored.
but what we found is like as people would take it it would actually cause them to have moments of realization where it was like oh I I can up level my skills or sort of like I have like I want to be better at this and so we're trying to find like a way like a variety of techniques that kind of push the culture along and I think as I reflect on like the year because this is the year where we really put all the effort into it I'm really satisfied to see a descent to which everybody's leaning in on a on a daily
Guys is going back to like even I was shocked when we were looking at our cursor logs that like
the number one user is is an engineering manager and for infra org. It's like that that is super
cool to me. It means that like folks have have taken this to heart and found found ways of
doing their job differently. I guess I had a closing question or I guess a parting question.
And this is broadening out from Brex. Yeah. And this is just you interface with other engineering
leaders all the time. Did we not cover anything that other CTOs are.
having as top of mind today, like, their number one problem is underscore.
The thing I find myself discussing with folks that, and I don't want to shy away from,
like, scary topics. In fact, we're just kind of on one that was adjacent, which is, like,
how do you evaluate somebody's, like, progression towards being more AI-Native?
The cousin to that question is it's like, will we need as many people to operate our businesses?
Like, are their layoffs coming? How are we thinking about, like,
headcount growth. Junior versus senior. Junior versus senior, yes, exactly, like level mix. And
I still have more questions than I have answers there. I think what has been really interesting
is that I view agentic development as being something that amplifies all the good, just as much
as it amplifies all the bad. And the amplifies sloppiness, poor architectural thinking,
misunderstanding of the requirements.
Like there are, for all of the acceleration of good outcomes, it also accelerates bad outcomes.
And like what has been interesting is that there has been, when you sum that all together,
there's less of a obvious, like capacity increase.
It's more nuanced than that.
And so I'm not looking at headcount planning as we think about it next year as being something
You know, like, oh, well, because AI is giving us so much more leverage, we don't need as many people.
We've actually, the thing I'm really proud of in my tenure as CTO is that we haven't grown engineering at all.
What we've done is we've grown the business significantly, but we've been able to build, like, greater efficiencies in how we execute, like how we think about building, how we roadmap, what we choose to do and what not to do, that we're able to serve significantly more customers with more lines of business.
without needing to grow engineering.
I think that's kind of the way
that we're going to just continue on this road.
It's like, I like having 300 engineers.
Like, I would love to just, you know,
a year from now have 300 engineers,
but we're still, you know, 30, 50, 100% more efficient.
That is the thing that comes up with other engineering leaders
and the other part of that conversation
is like how much is AI getting blamed
for this sort of ordinary performance-oriented rim?
You know, like if Microsoft is letting go of like four,
thousand people as a business what they have 150,000 employees, I believe. Is that really like
AI causing that or is it them just using it as a way to avoid some harder like perf management
decisions? I'm not entirely sure, but I'm listening more than I'm speaking on this topic because
every time I feel like I have a pretty from point of view, some new anecdote or experience
comes in that kind of challenges or invalidates it. Yeah. Well, you know, I take these signals as
It's my job to go find people who think they have answers and surface them.
And you may or may not disagree, but at least you have something to use as a strongman in your work.
Exactly, exactly.
And I think as an industry, it's just early innings on this transformation.
So I'm looking forward to seeing, you know, listening to this podcast episode a year from now
and seeing, you know, what we got right, what we got wrong and what's different
because so much changes quarter over quarter.
I do think AICOE is a very well-established pattern.
I think internal platform is very well-established pattern.
And this fluency thing is something that people are figuring out that I think you guys are a hit on.
I'm happy to hear it.
I'll be my feedback.
Yeah.
Any final call to action for things that you want to buy?
Like, what should people build for you?
Like problems who are trying to solve that you would love people to reach out for it to help him.
The call that I'd make is for folks who are interested in multi-agent networks,
to get in touch with us because I do feel like this is something where we're innovating
in service of our customers and where I feel like the frameworks, the tooling, and the research
is there. There's actually quite a lot of like interesting papers and things that we lean on,
but I would love to, I would love to see more of that like encoded in the, in the, what's
available at large in the industry, because I feel like my intuition has been that trying to craft
LLMs into deterministic workflows and DAGs is kind of underselling like the power that they
have to actually plan and execute in a more sophisticated like fluid way. And I, and I just want to
see like the industry lean in more on, on these agent to agent interactions. Okay. So I'll
dive in a little bit here. Yeah. I have a minor opinion. You keep using the word networks.
Yeah. Is that a reference to a specific paper or it's your
term for it. It's just our term. And I think that that is, that's actually the term that master
uses as well. Um, or it, it, we, um, yeah, initially we used to call them agent run times, uh, internally
and that we just, yeah, switched to networks. Uh, and then I think the other thing I wanted to
get a clarification on is, is it mostly a full agent talking with a full agent? Or is there
kind of like an orchestrated boss agent talking to a sub agent? And I think that does matter for a
subset of people who are building all these things.
because when you say multi-agents,
sometimes people don't agree what that means.
Yeah, so it's a tree more than it is a graph.
So it is like, yeah.
When you say network, it feels more of a graph.
Yeah.
But it seems more directional as a tree.
Like there is a hierarchy.
There's a hierarchy, yeah.
But there are some violations of that.
Like one of the interesting use cases,
and this is where like the power of having an assistant for every employee
plus having agents that run.
and embody members of the finance team is really powerful because there's this interesting use case
that we brought to market, which is that one of the finance team agents that we launched
is an audit agent where an audit agent kind of embodies the work that a lot of larger finance
teams will do to look for patterns of waste, fraud, or abuse or like systematic avoidance
of policy that isn't as obvious with a single expense.
You can evaluate a single expense in the metadata around it to see if it's within policy or not.
But what if you start seeing an employee often make a large number of like $74 transactions when receipts are required a $75?
Or what if you see certain things like, oh, okay, there's actually a fair number of like DoorDash expenses during business hours from this individual like on days that in office lunch is provided.
Or maybe you see like ride share patterns that are where you have to look at a more.
broader context.
So we built this audit agent that can like ingest your SOP and look also ingest your.
This is a box.
This is customers.
Exactly.
Yep.
And what it does then is it's basically always looking for potential violations.
And what it does is it is extremely zealous.
Like it wants to have a minimum number of false negatives.
So it will raise a large number of potential violations.
And then a separate agent, a review agent will then apply wisdom, the wisdom of like,
Is this important enough to follow up on? Is the dollar amount in question high enough?
Does this user seem to have like a high compliance behavior more generally? It makes a judgment call
about whether it's worthy enough to take that violation and make it into a case. And once it's made into a case,
generally what happens is that you need to get more information from the individual. So if humans were doing this,
there'd be some outsourced team that's like looking for all the potential violations,
then you have some full-time employee on the finance team who's, who's,
looking at all the violations.
Oh, these are the ones that are important.
We need to follow up on it.
Now, what they do is they hand it off to somebody who will go and slack that employee and
be like, hey, what's going on here?
And so what we have is, like, the audit agent looks for violations.
The review agent decides whether it's worthy enough to turn into a case.
And then from there, when the case is filed, that will trigger an event to the Brexit
assistant for that employee.
And, like, any additional information about, like, the business justification can be
collected or maybe the assistant already knows because it in its conversation history of the
employee knew something about why this this expense looked out of policy. And so you start having
the network becomes interesting when you have the finance team agents communicating with
the assistant for various employees. And then behind there, you have other other subagents. And so
then you start seeing like more of a graph emerge. But when you look at just what serves the
employee, it looks more like a tree.
Amazing. Well, I didn't know you were going to go into that level of detail.
Yeah, yeah, I'm very about that. No, no, no, no. I'm actually really glad I asked.
Like, that is very impressive and I hope you do more content about that.
Yeah, absolutely. We're really excited about it. I think it's been, it's been good to finally figure out
a use for, for agents and have the technology as, like, is the robust as it is to start
realizing this vision, because it's something that we kind of dreamt of a couple years ago.
on the tech, like to your earlier point,
the tech just wasn't there when we were trying to
make the similar concept
to work with the GPT 3.5.
I was like, no, we were hallucinating tool calls
back in that day.
Awesome, man. Thanks so much for joining us.
Yeah, I really enjoyed it.
Happy holidays, guys. Thank you for having me.
Thank you.
