The Infra Pod - AI agents is on the rise with the crew! Chat with Joao (CEO of Crew AI)
Episode Date: March 10, 2025In this episode of the Infra Pod, hosts Tim Chen and Ian Livingston welcome Joao, CEO and founder of Crew AI. The discussion dives deep into the world of AI agents, exploring the journey of Crew AI, t...he intricacies of multi-agent systems, and the future possibilities of automation using AI. Joe shares his personal journey, founding stories, and hot takes on the evolving landscape of AI, including the rise of open-source solutions like DeepSeek.00:00 Introduction to the Infrapod00:28 Meet Joe, CEO of Crew AI00:31 The Journey of Crew AI's Early Days02:15 From Closed-Source to Open-Source05:07 Joe's Passion for AI Agents07:03 Defining Agents and Crew AI's Vision09:39 Successful Use Cases of AI Agents14:01 Cutting-Edge Agent Applications17:49 How Multi-Agent Systems Work22:23 Designing Effective Agent Flows24:33 Control and Flexibility in Agent Operations30:29 Future of Generalized Agent Platforms37:02 Quality Assurance in Agent Outputs40:46 Spicy Takes on AI's Future
Transcript
Discussion (0)
Welcome to the InfraPod.
This is Tim from Essence and Ian, let's go.
And this is Ian Livingston, lover of agents,
couldn't be more excited about the future of AI.
Today we have Joe, CEO and founder of Crew AI on the podcast.
Joe, tell us a little about yourself.
What got you in to thinking about agents and then what what was the trigger for you to start a company?
What brought Crew AI to reality?
Tell us your story.
All right.
All right.
Well, first of all, thank you so much for having me.
Very excited to be here.
And I got to say, Crew AI has been a wild journey.
I never expected to get as big as it is, but it's funny as it starts to take steps and kind of like the road that
it has, it kind of like reviews itself and things that doesn't make sense start to make sense and you see further and further.
So it's amazing. I think none of this is for granted and I really appreciate the support that we get.
CrayEye honestly, it was a very organic process. Everything started with my wife.
My wife came back to me one day and she was like,
hey, you're doing all this cool stuff,
like with AI in general, right?
You're doing these rag applications,
you're doing the super advanced kind of like embedding things.
And this was back like, I don't know,
two years ago, three years ago.
So it was early days, like people are not doing rag.
And my wife was like,
we should be more public about this.
Like there's people that can learn from this and
I thought maybe I should post from LinkedIn and I'm great at cheat posting on Twitter
Not as great at posting on LinkedIn
So I was like, alright
I'm gonna have to figure this one out and I start to do the agents to do it and guess what it worked
I love it. I started to see all the metrics
go up. I would post every kind of like two days or so and I could just put my random
ideas into the thing and it would speed it out like a nice concept. So that's kind of
like how the first version of Graeai came to be.
And is that that was still close source? Like at what point did you, because my understanding
is like first it was like an open source framework.
What's the journey from I'm now like growing my LinkedIn following, I'm mastering the art
of the LinkedIn shitpost, which is in fact a special category of shitposting.
Like what, what got you from, okay, I built this, this worked for me.
Now I'm going to go like open source sources so other people can do it as well
Yeah, so funny enough. I I do this one and it was working great
He's like three four agents is working great and it was like alright. I want to do I got a hook
I was like I want to do more agents
I went out to made my life away and I was trying to think like alright
I want to reuse a bunch of this stuff so I better I could use the framework
So I looked online and there was nothing
out there that really like check out the boxes for me. So I was like all right I'm going to
build something. I remember I was building something in my computer and it was my wife's
or anniversary so I would travel with my computer when she didn't log but I would wake up early and
kind of like do some of the coding and I got the first version of Cray-Eye out there and was open
source from the get-go. I put it out there and I started to use it to build my own agents and basically tell the story
about every automation that I was building. So I was building something to help me with stocks.
I was building something to help me with like other social media. I'm basically doing those things.
And that's how the framework really took off. And then like in December,
we start to see a bunch of people adopting and January thing kind of like skyrocketed. And that's kind of
like when I remember going to some meetups in the day area and I would have
like companies like Oracle reach out to me in the air and say like, hey I work at
Oracle we're actually using Kriai in production. Can you help us? And that was
the aha moment for me. I was like, wait, what? I'm using the thing to kind of like write LinkedIn posts
and you're using it for monitoring your CI system and automatically fix errors. So that's kind of
like where the lightbulb moment happened. I was like, all right, if I want to give that level of
support for this kind of company, this cannot be only open source. It needs to be a proper business.
So I have the resources to deploy on these problems.
And that's how the company came to be.
And so I think I'm very curious about the beginning of it even though we've been talking
about it.
But you know, Mr. Elliott was on our podcast last time, talked about meeting you the first
time.
And he said you are like, basically obsessed with the idea of agents.
There's so many ideas flying your head.
You know, maybe you're just that kind of guy that are just obsessed with anything you're
truly in love with in general.
But I'm curious, talk about like what got you into this idea of an agent and what you
thought was an agent.
And why do you have so many ideas in the first place around this thing?
Yeah, so ideas always have been easy and hard on me.
My wife can tell you all about it.
I get these crazy ideas and I just non-stop.
I dive into them and I can't stop.
And that has been the case for many projects.
And a bunch of IoT stuff, a bunch of side projects.
I have been this kind of engineer that I believe is more rare nowadays.
And maybe that's a hot take like a lot of
The new kind of like cutoff engineers are more worried about money and I consider myself more of like the OG
Like I I like engineering like I'm gonna be coding all the way until I die because I love it
So I was doing a lot of that early on and all these projects come in mind and I think in a way or another
I was always trying to build agents, but it was like never as we know them now. So I remember trying to build kind of like all sorts
of bots and automations and kind of like build things that I could chat in the terminal and
trying to use Twilos API so that I could message over with it. And I don't know how many bots I
named ever just because I went to have like the
initial one. I have been in this room before. So I think when I finally clicked for me with the LLM
and then battings and seeing some of this stuff, like, hey, now it went from like, oh, this will
not happen. It's just super hard to happen to like, I can do this. Like that was just like, I had to
do it. I like, all like, I had to do it.
I like, all right, I have no other option here.
I have to start creating these automations.
But I think I always have been an automation kind of guy.
Like I think it, and maybe this is another hot take.
The best engineers are the lazy ones.
The ones that are trying to kind of like basically automate everything on their way.
Right.
And so what is the idea of an agent? Cause I think even today, if you are still asking me to, what, is the idea of an agent?
Because I think even today, people are still asking me too, what the heck is an agent?
And I think everybody has a different idea what it is.
And there's like an academic term, there's a technical term, there's like a purist term,
whatever.
You're just like, I feel like the whole world has basically just copied and replaced LLM
into agent at this point, you know? So I personally can't even tell the difference sometimes.
So what is an agent? And what does CRUDE do?
I hear you. And I think honestly, there's a few things that doesn't help.
Like I was talking with someone from Gartner the other day,
and they used the term agent washing.
Like a bunch of people talking about, oh, this is agentic, this is agentic, but not really.
And that kind of like, I think it only contributes to the confusion.
But in the end of the day, the way that I look at the agents, I'm going to
keep it very simple is agents got to have agency.
So what do you want is the AI to dictate the flow of the program.
So if you think about like a traditional software, right, you, like you, you have
like that predictability, like you always have about like a traditional software, right, you like you have like that
predictability, like you always have strong typing in traditional software, you know what
is coming in and know what's going inside, you know what's going out. When you think
about agents, you don't know what's going in. You think that GPT can be a recipe or
a PhD thesis. You don't know what is happening in the model. It's mostly a black box and
you don't know what's going out. But that works and the majority of people use it because
there's a place in the world for applications like that.
So I think at the end of the day if you're an agent I would say ask yourself if what you're
building has agency if it can kind of like self-heal and self-coordinate if it kind of like finds
blocks along the way. If don't then it's probably not an agent. And so in your world like coming
from this place of I just want to automate everything in my life, LLMs come about you If don't, then it's probably not an agent. then I can turn into code or I can do something.
Like, what are the best use cases today for agents? Like, you know, crew is this broad platform, you have this,
all these integrations and the tools you support and you have this concept of multi-agent.
Like, help us understand, like, where are the places you see people having a lot of success,
like types of automations that they're building, where like, this stuff really works and it's phenomenal.
And, you know, like maybe over here it's still like maybe too early or missing some pieces
And it has an open source framework that we consider a product. We have a team dedicated 100% of the time building in.
I'm a strong believer in open source.
So there is that.
That is now used by close to half percent of Fortune 500.
Like there's a lot of companies using out there.
It's insane.
And we run around 20 million crews a month now.
Each crew has between two and I think the highest number that I have seen is 21 agents. So we're talking about tens of millions of agents a month now. Each crew has between two and I think the highest number that I have seen is 21 agents.
So we're talking about tens of millions of agents a month now. Now we also have an enterprise version
of that that we sell either as a self-hosted solution where people can run on their own cloud
or even on prem if they want to do that and have some partners. Or we have a cloud version that people can use more on a self-serve motion. So that's creating AI as a company and the
frameworks and the software. Now, I think that up to this point in the industry, what
we have been seeing in terms of use cases is a lot of people were people like experimenting
in 2024. I mean, I was expecting to see a lot of that and kind of like prototyping.
I see a lot of things going into production as well, and we have hundreds of customers now with
use cases deployed, but a lot of it was kind of like understanding and then all those questions
pop up, right? Oh, what about memory? What about rag? What about graphs versus flows? How do you
think about the embeddings
and like how this plays into together?
So I think there has been all these conversations
in the industry now.
And then again, maybe this is a hot take,
but I think all that is becoming very commoditized.
Yes, memory.
So what you need it.
Yes, you can do that in 20 different ways.
Craya, you can do it better.
But yes, that's it.
I think what is happening now is companies are
realizing, hey, we're going to have thousands of agents running these organizations three months
from now. Do we want them to be legacy applications on their GitHub? Or do we want to have a control
plane where we can manage the agents, the authentication, the scoping, the tools? And that's
what we're building. So that's kind of like the vision.
In terms of use cases, a lot of it's kind of like companies
start with early, kind of like what we call
low precision use cases.
So not user facing, kind of back office automations,
things that they can get around pretty quickly.
A lot of sales and marketing and back office
is usually by the first like box checkers.
And then as they get confident on that, they expand into high precision use cases.
So things for example, I have seen filling IRS form, we can have a whole conversation
about that.
It's a very complex problem.
Then two, basically handling pricing change approvals is a major use case with one of
our customers as well.
And what people are calling a GenTech OCR, where you're kind of like not only processing docs,
but you're classifying, inferencing, taking actions on them.
So there's a lot of more advanced use cases that we're seeing in that area now that are super interesting.
The world is on fire with agents, as you clearly are knowing.
And you know, your crew is on fire with the, as you clearly are knowing, and you know, your crew is on
fire with the gasoline, everything in the middle too.
I think it's a super interesting topic because I don't think we fundamentally even know the
limitations of agents as much yet, and the possibilities of agents as well.
And I'm very curious because you talked about some of the use cases, which I think are just
a little sampling points, right?
Can you maybe talk about what do you think is like pushing the boundary of what agents
can do now?
Because I think a lot of people are not talking about agents that are doing full automation,
right?
Yes.
Replacing humans or, you know,, just basically go apply and generate a resume
and just be truly like an employee or some sorts.
But I think that we're all in this idea
that agents can do that, but we actually are not so sure
the type of actual functionality it can actually bring.
So can you maybe talk about a few cases where like,
this is like the cutting edge of what agents are doing
maybe within the Careeri platform or so.
Sure, and by the way, can I share my screen
because I'd pick it up if I do
because I can show you some videos as well.
We do, yeah, yeah, for you to share.
We usually just take the audio for the main,
but we kind of cut some snippets, so.
Yeah, if you can have some things,
we can do it, yeah.
Let's do audio first, So, all right. So what I would say is it's we have crews in customers
that have code and code replace teams, but it's not that these people were a lot of go.
They're basically doing kind of like work that they could do way better. So for example, we have a very interesting pricing use case
where they had a data scientist that before approving
this price changes that I was talking about,
they would actually do a bunch of queries to compare,
compare their prices across different marketplaces,
across different regions, check integrations
with things like AlphaSense and a few other things
before they actually approved some of those changes.
And now they have agents doing the whole thing end to end.
So it was an entire team of people that are super capable.
They know how to write queries. They're more junior data scientists.
They're now can be deployed doing something else.
So we're definitely seeing some of that.
But most of the crews are not automating jobs just yet. They're automating processes, right? So what you have is maybe
you have like three or four agents that are automating pull request reviews. So we in
Crew AI, for example, in our company, we have a four to one agents to employee ratio. We
have a lot of agents running. A lot of our use cases is we have agents, for example,
doing custom marketing material per customer,
onboarding customers into the platform and researching,
reviewing out the pull request codes that we get,
being able to answer support tickets,
take phone calls to answer support things as well.
So there's a lot going on internally
and I can show you some of that.
I think the most advanced use case that I have seen to this day is for big GSI. So basically
one of those big like PWC, Capture Night, Deloitte and all that. They're working with a big media
company and what they're trying to do is while there is live footage of a
game going on TV, they have agents that use fine-tuned video models to cut this
video and then lay audio over it and then figure out legends and post that in social
media. That was one of the more complex use cases that I haven't seen.
I was like, well, you're basically doing editing, like live and kind of like directing, like how you want this to be.
So I think those are kind of like some of the more complex ones.
So let's turn to the technical. Typically, when you're building with Krueger, you're building a
single agent that you're kind of giving a prompt, some optionality, some things like you have,
you know, your tagline here is the universal multi-agent platform.
What is it about multi-agent that makes this stuff work and why is it better
than like a single agent and what's the situation where, for example, in your
LinkedIn example, when you first built, was that a single agent that just had,
was good at one thing, or did you actually have a mixture of expert style approach?
So what's the decision point on that?
And then how do you think with that conception?
And then from there, I'd love to dive into like,
okay, how does this all actually work?
And what makes it work?
Why?
I love that.
Yes, there's so many different pieces
that make this agent's work.
Because if you think about it from the get go,
it's simple, right?
I want to LLM in a loop.
All right, that's okay.
But then you start to thinking,
well, I needed to use tools.
I need to be able to kind of like tap into other stuff. So, all right, I'm going to add a tool layer. Now, if they have tools,
maybe we should have caching because I don't want them to use the same tools over and over again.
All right, so we have caching. Well, we also probably want them to remember things since they
have caching. So we need like a long-term memory. Then we might need a shorter memory. Then what is
a long-term memory? Well, maybe it's a vector database that we're performing reg
against with a combination of something else. Then that piece starts to getting a little
big and you're like, all right, like my agent's now done. But then you start to go into some
use cases like, well, I actually want to remove PII and personal information before this request
goes out. So I'm going to need a sanitization layer. All right, let me do that. So things can start getting very complex as you can like start to get into this
production and like agents. And I have been kind of like beating the drum on multi agents versus
single agents for a while just because multi agents offer you so many more advantages. And
that's inherently because of how these models work.
So if you think about these models,
and I know that we can get more technical here.
So, a Nellolab is an AI model, right?
If you think about more traditional models
where you have classification models in the past,
or even kind of like prediction models,
you basically have a set of features,
so the data that you know, and you have a new data point
that you're trying to learn.
So you give enough examples and figure out
it's like the mathematical formula that kind of predicts
that model with a certain degree of kind of like a precision.
Now, with LLMs, they are not that different
in a theoretical sense, because what you still have,
you still have features, but the features are all the tokens all the words that have been
typed so far and you're trying to use that in order to predict what would be
the most appropriate next token. What that means is that and people I don't
think a lot of people that use LLMs understand this a lot of the quality of
your output depends on how you write what comes before.
It's going to use what you typed so far to define what's going to write next. So if you go into
chat GPT and you ask like, hey, give me a stock analysis on Tesla, it's going to give you an
answer. But if you say you're a Finha-approved investor, give me analysis of stock on Tesla,
they're going to have a way better one. Because
again, it's using other features, right? So when you're working with multi-agents, you got to use
that on steroids, where you can have agents that are specialized on one thing versus specialized on
other. And that goes beyond the prompting because on crew, you can actually have agents running off
different LLMs entirely. So you can have a coding agent use Sonin,
while a reviewer agent using GPT-4, and maybe another agent for PII information using a local
module like DeepSeq R1, for example, and there you go. So these are kind of like some of the
benefits that you get. I don't want to talk about multi-agents. And I remember some of the
skepticism that I got early on was like, well, if one agent hallucinates
like 0.5% of time,
doesn't five agents hallucinate like 95% of the time?
But what do you get to do is if you do this right,
is that they actually fact check each other.
So you can have like an agent kind of like hallucinate
something and the other is like,
I don't think that's right.
And that gets you the back for the first one
to kind of fix it so there's some
interesting dynamics in there as well.
Sounds such an amazing group of agents that can live harmonically together.
I'm very curious because from a programmer point of view we used to be able to
understand the whole control flow and data flow of your program right because
they would actually guarantee functionality almost, right?
Like, I want maybe a library or a function call or microservice
to do certain things.
It should orchestrate and do certain other things, right?
It's like all those are pretty much all hand coded, you know,
and then it feels like the Stone Age now, right?
Now you have this beautiful thing called agents, like you mentioned, has agency.
And then there's multiple agents with their own agency
trying to work together.
And this becomes like a jazz band.
I don't even know what's going to happen here.
I'm very curious because at one level,
I think if you like try to actually achieve a functionality,
like you said, like video editing, you know,
we used to have to be able to understand each step
how it's going to be able to be outputted.
And what each agent, I guess, like maybe you think of it as a microservice,
what microservice is trying to accomplish.
And you kind of piece it together.
And at this point, I didn't even know what each agent will actually output.
Even though you're giving it a prompt.
I wonder how do you help folks to understand
where is the level of design of a human input should be in?
Are we just going to design some high level agent, let it flow so widely and just kind of guide us very high level?
Or there's actually very specific outputs outputs inputs and very specific logic you're
trying to instruct each agent to do? Is there any sort of trade-offs here because now at this point
it's such a black box now right? I don't even know exactly what's going to happen. What are some of
the things you learned working with so many agents at this point? Like the granularity of
humans, us trying to design the process of agents working together.
Is there such thing as being too high level, too abstract and just kind of go anywhere?
Or too specific where like it doesn't only do that much?
Yeah, I think our own point.
It's funny because you have two axes, right?
In one axis is you and this agent still be flexible. So being able to handle kind of like different
use cases or get like whatever gets thrown their way. But you also want to have consistent
quality. So whatever like they do, you want to make sure they're getting good quality
at the end. So that's where things become a little like, all right, how do you do that
when we're doing is kind of like this super fuzzy application. So for what we call high precision use cases,
usually there's a lot of code involved still in the form of either functions that you don't need
agents for. For example, if we like just pull data from somewhere every time, you don't need
an agent to do that. You can write the code, the code pull the data, and then it pass the data into
an agent to do something. But then also for a lot of guardrails and validations, right? So we have this idea
at a crew where at a task level whenever agents finishing a task, you can implement guardrails
that you can programmatically write that will check that data and then kind of like send it
back to the agent in case it doesn't pass us. But the way that we see is there's cases or there are use cases where you're
going to have more autonomy and then you're good with crews, with agents,
kind of like doing their own thing and that's okay.
There are use cases where you're going to want to add more restrictions on that.
So you're going to have like guardrails,
we're going to have like before hooks or after hooks and that's one thing.
But then on all the other side,
if you want to have a lot more control,
there's Crew AI Flows.
And Flows is basically a way for you to use
events-based actions with agents if you want to.
So on Flows, it's more the traditional
if this then that that you would get in programming,
but it's all events-based, so consumers and listeners.
And if at any point in time you want to throw an agent at something during that execution,
you can do that natively. So what we see is for more low precision use cases that can actually
use their agency, things like, oh, I want to just kind of like write an email, help me for press
release, research someone, create a report, whatever that might be, you might go straight with agents and crews. But if you want to like, for example, fill up IRS forms, you
probably want to use flows for that. And one example for that specific use case is, it's basically a
huge financial institution, and they have to fill out those forms like every show often. And funny enough
the forms have like 70 pages long of just like content they need to fill out.
But fear not, it comes with instruction manual and that manual alone has 620
pages. What do you want to do in there? Like if you just throw agents at that,
right, you can screw it up financial information and can like how you fill
that out. So for
those use cases, we're using a mix of agents and flows where we use flows to extract each
page individually, extract all the fields that you have on that page. And then we pass
to an agent to kind of like perform a rag queries against the instructions manual to
understand how they should feel each feud.
And then these agents perform rag on the database
to extract that information from there,
and then they kind of fill it out.
And the other thing is that in use cases like that,
there's nowhere around it.
We still need humans in the loop, right?
That's the biggest thing.
A lot of these high precision use cases,
you need humans to validate things.
Speaking of the control flow of the user in the loop, on the podcast and in talk car
conversation, I've often used the continuum between driving to cruise control to adaptive
cruise and lane keeping, which is basically very close to self-driving, but it's not exactly.
And then self-driving and how cars move basically back and forth along that.
It seems like that analogy also applies very much to like all these automations like at the broad layer. So what do you think the decision points are for when you actually bring
these things back to a human? What point do I bring a human back into the loop? How far do I let some
agents go down a road or let an agent generate a plan? Like we were saying a broad base and I
there's so many questions I have here but let's answer that question. What do you think the
decision points are that says,
okay, agent go bring the human back in.
And is that something a human decides
or is that something the agent decides?
So what we're seeing is most people are enforcing,
like for this specific task,
I want to make sure that a human gets involved
before things move along.
So that's the most common.
And at the end of the day,
I think the companies that have been most successful are the ones that are kind of like promoting their employees
into now manager of these agents. So the employees themselves, they're still responsible for
accuracy, presentation, quality, put a nice pretty bow on it and everything. But they
now have agents to help them do their work. It's just like another tool on their tool set.
But what we're seeing most of the times
is during the implementation of the automation,
they say like, hey, I want specific human approval
in these three different spots.
And then at the end, you always have something as well.
Well, how do you think this changes over time?
Do you believe, okay, so the final question I have actually
is about generalization of agent flows, right?
Like today with crew, how general are the automations?
Like I'm saying building a very specific flow that like, you know, for example, you have
this one flow around pricing that you talked about and it's like obviously it is one little
section of their day-to-day job that they wanted to automate and they did it using Crew
and that was very successful and that makes a lot of sense.
But what's the, like how generalized do you think agents become? Do you think we have many agents that have finally
scoped or do you think we end up with like broad agents, broad flows that can handle many, many,
many different types of tasks? And what's your mental model for how to think about sort of the
scoping of task support, if you will, and these sort of delegation points.
Yeah, so the vision that we have at Crew
is more of the former,
where you have agents with smaller scope.
Now, the way that users interact with that,
the way that it feels to them, is the latter.
And what I mean by that is,
you have behind the scenes these agents
with smaller scopes and smaller access to a set of
tools, but when you're actually tasking the system to do something, you're asking about anything.
It's behind the scenes where these agents are going to be picked up apart. It's like, all right,
you're here to perform this. We need these kind of agents, these kind of tools. Let's put them
together and let's get them to do something. So right now, most of the use cases are more
around specific and like, hey, there's one specific process went out to me.
But where I think this will go into the future is that we now expand to, well, I have a pool of agents, a pool of tools, a pool of authentication resources, like all those resources that I manage.
Theoretically, I can show anything at it and they should be able to self-organize and get it done.
And that's kind of like part of what we're doing as well.
So your broad view long-term though,
is that based from the user perspective,
it feels generalized from the agent builder's perspective,
they're task specific,
and then there's some intermediary,
which is Crew AI's platform that helps, you know,
the user figure out what agents to task with work.
And that's the future of these platforms.
And so in many ways,
I'm assuming that your potential vision is basically,
you become the Google of work
for a company or a person, right?
It's like, oh, I need to do this.
Crew AI is going to go and figure it out
with the available T like agents that you have,
deep integrate knowledge in different places.
That makes a lot of sense.
The initial company use case that we had
was something like that.
Because there was a lot of like interest from early days,
like a bunch of people went to chat,
we would be in the situations where like,
I got to jump into a call
and then I have another call right after that
and another one,
and I know nothing about those companies, right?
How do I prep for that?
So the first screw that we start to use internally
is a prep for meeting crew,
where you pick it off
and we kick it off straight from Slack.
You have a query integration Slack
and you can do like, hey, this is the meeting.
This is the person.
This is the context.
And then this agent just go online
and research everything, right?
So it would be funny because I will jump
into these meetings like nowhere.
And I would know a lot about that.
I was like, oh, that's amazing.
The new factory that you folks just opened in Australia.
I'm very happy about that.
It feels like you're being bullish on APAC.
And they're like, yes, for sure.
And they're like, yes, there you go.
That's amazing.
You've got crew to run your glasses or something.
Now suddenly you're like the superhuman.
You know, I actually want to ask this.
When I think about the future of agents, I think we're talking about like a lot of like
the specific how to get more functionality, more evaluation, more judging.
But a huge aspect of right now, I feel like a lot of people are trying to explore is like this web agent,
this sort of ability to browse the web, this ability to call APIs, this ability to even like start paying each other through agents.
Like I see there's a lot more like action taking things or like actual browsing the web stuff.
I'm curious how you think about this world, given that the agent world is so broad.
You can do pure research, you can do scraping data, a lot of content stuff.
But then there's a lot of also action taking.
Do you find that all action taking through agents has been pretty useful and easy to
really get started?
Is there any limitations you see that a lot of people, we can only really support these
kind of functionality callings at this point, and we still need more, I don't know, research
or boundary pushing to get, because I feel like we're so early, we don't really know
where the limitations of this is.
Yeah, I think one of our most successful use cases is actually an action-taking one.
I think action-taking is hard to do, but if you
do it, it locks a lot of value, right? Especially if you can measure the
accuracy of that. So for that use case specifically, we actually track the
results with the customer for like a month, and then at the end we compare.
The humans were doing in parallel, agents were doing in parallel for a month,
what this looks like like in that case?
We got 100% accuracy.
I believe that we got it.
We got a little lucky in there 100% kind of like it's not
something that he got to do every time.
I would expect like 99 98 but but that was pretty good
and that became a major use case for them.
But I think action taking is something that if you can do with agents for your use case,
and you can get it to do it right using the tools that are available, then it unlocks
major value.
Because that's where like things really kind of like what clogs the machine a lot of times
is having to have those people to say like, let's do A, let's do B. So I do have been
seeing some use cases around that. But they're more rare. There are people that are more advanced,
people that are more comfortable, people that have been building agents for a while so they
understand how this is going to perform. And they have a lot of evaluation behind the scenes
to make sure that if something goes south, like it kind of like alerts you or everything.
And that's one of the big features that people like in the platform is the ability for you to kind of like set this alert triggers in case things go way higher. I'm curious how you
think about sort of the quality assurance aspect of the agent. Like is it today such that the
automations you are making basically is a human approving or disapproving the outcome and so this
isn't really an issue. What's your description or pattern for thinking about how do I get,
you know, I have a series of agents, they're producing some quality of
answer X with some error rate.
How do we move that up the bar?
Like, do you think that's actually, is that a problem that agent builders encounter today
with crew or is it not so much a problem because hey, actually the way in which the types of
use cases we're building for don't necessarily always have a user at the end of the flow
anyways?
I'm kind of curious how you think about sort of the QA aspect.
You know, there's tons of talk last year, in 2024, about eval frameworks.
Like it was like every company was building an eval framework.
Everyone was talking about evals, eval, eval, eval, endless evals.
I'm curious what your thought process is, because you're actually like a layers
and layers above what we traditionally thought eval frameworks were supposed to be doing.
Yes. And I think you got that on point, right? There's different kinds of evals. There's evals
on the prompt level, that's okay, but if you're doing agents, you got to check that box, but you
got to step up a few layers as well. So if you see this, this is one of the crews that we run that
helps onboarding users. And you can see that we have an actual quality score and a hallucination score that we keep track.
And by the way, this is kind of like,
there's an opinionated approach,
but you can also overwrite this
with kind of like your own metrics
or even add new ones if you want to keep track of them.
So a lot of this is making sure
that we're keeping a certain quality threshold
and basically doing some sampling around that
to make sure that things are okay
and hallucinations as well.
Now as I said, you can implement custom ones and that usually helps a lot when engineers
are building very specific use cases where they're like, oh, my use case, I can't have
any link that is hallucinating.
So I want to make sure that I double check all this.
You can add custom trackers for that as well.
But what we're seeing is this ability for you to
track not only on the prompt level what you can do, but then on the task level and then on the
crew level, it's very important, especially for deviations, right? Because if you have a
Gentic automation that gives you kind of like a constant quality that is acceptable for you without
human intervention, then
your first thoughts like, all right, I don't want that to deviate. And that's
kind of like how we set up a lot of these thresholds and alerts because if
things kind of like change or you try different models or anything like that,
you can kind of like, you can make sure that you act on it. But I think
that right now, again, there's use cases where people are involved in and there are
most of use cases where people are not. And the ones that people are not are usually the ones
that drive a lot of value. That is really helpful. So I think a fun one for me was we
implemented agents for ourselves, a crew of agents to do automatically PR reviews. We
have an amazing open source community. A lot of people open up PRs. It's very hard to review them all.
So we built agents to do it.
And for me, what was amazing is, so we shipped it,
then I went to a series of meetings,
and then two hours later I got back,
and it already had review a bunch of PRs.
And then you go into those PRs,
and you see people taking the actions
that the agents told them to.
Like, hey, you should change this,
like this is not cool, and all that.
And people were reacting to that.
So that alone has been a no brainer.
And then like there's no human in the loop whatsoever.
Or the agents are reviewing all of our repost 24 seven now.
That's amazing.
All right, sir.
Here's our favorite section of our podcast.
We call this spicy future.
Tell us, sir, what is your spicy hot take of the world? I assume it's Asian-related, but it doesn't really have to.
We'll let you pick what you want.
Damn, all right.
That's a hard one.
I think we had so many hot takes throughout the whole thing.
I would say, one, I think open source is going to win long term.
Maybe that's a hot date.
I think open source is going to win.
And I think we have seen this before.
And I think it's going to be one of those cases where open source will
beat private source.
Two, it's going to take a while for people,
like I think for a lot of people that are fearful of being replaced
at a workplace, I think that it will take a while still given what we're seeing out there.
And I also think that people don't understand that they don't have a lot of control over
that. So they should focus on the things that they have control like learning more about
agents. And I would say those probably would be like the most like the highest hot takes
right now. I don't know how people would feel about those.
I'm curious, just going to ask you a question which some would consider spicy.
I'm curious, now we have agents that can use browsers as a tool.
What do you think the impact to the way that people think about building products is in
the future?
Do you think people continue to build websites or is there something completely different?
Like if we get to a world where browsers are using the website and the browser is designed for humans, would I still fund like a massive front end team
for browsers to do it when I could just give them an API? I'm kind of curious what your
vision of the future of tool calling with browsers is and where that takes us.
Such a great question. What is happening with browser right now? It's funny, right? Because
what do you have is one, AI is moving so fast
that it's not willing to wait for its own protocols, right? Like it's not going to wait
for that. So it's using whatever is in front of it, like browsers, keyboards and Nowsers
people are working around those things. Even though that like that sucks for true put,
right, the ability for you to get information in the now now, like agents could be way more efficient than that.
Now there's a question on whether you want them
to be more efficient than that
at the cost of not being able to observe them.
And another question that if you can get them
to be very good at that,
then you don't need to change anything out there.
That's what you're asking.
Because if you got to reboot every single surface out there
To comply with an agentic protocol that would be super expensive and hard to do
But if you get agents to be able to navigate that in a very smart way, then you now get access to
Everything that we have ever be for humans now. I do think that in the medium term
Yes, we're gonna still funding a front end team for sure.
On the long term, I think there's gonna be versions
of like a kind of like RSS, like what we had back in the day,
but like a version of that for agents.
We already have something, for example, for our own docs.
If you go in our docs and you append kind of like,
I think lm.txt, you get an lm-friendly version
of our documentation that you can copy and paste into kind of like a model that it might be chatting with.
So I think there's going to be more and more of that.
But I think what actually will happen is agents will get very good at navigating the common
interfaces that we use.
So people don't need to re-build them and they get to be more useful faster.
And I think we're, well, you know, we would talk about so many spicy hotcakes in the world.
Actually we do want to talk maybe a bit more open source because that's one of my favorite
topics as well.
You know, DeepSeek has created all the hype and craziness right now.
Like every single hour is DeepSeek, DeepSeek, DeepSeek has created all the hype and craziness right now.
Every single hour is DeepSeek, DeepSeek, DeepSeek.
I wonder, do you see that open source has already been taking over
a lot of people's existing use cases?
Because I think before DeepSeek really happened, We've been probably seeing open source are being tried by a lot of enterprises, but still a lot of companies
still use just Open AI and Photopic, you know,
from my own sampling.
And I wonder, do you see DeepSeq
with truly a transformational moment,
where like, okay, the umpire quality is finally here
with such a small model?
Or like, I'm already seeing this everywhere, right?
This is just a, you know, just a This is just a public announcement pretty much.
What have you been seeing?
Has open source already taken over?
Or we're just going to see it faster now?
So I think it hasn't taken over yet, but it is stepping up.
I think honestly I was not expecting to see something as close to all1 coming out of the open source this fast.
So I was very very impressed with this.
I think that if anything if you keep at this pace, even if it doesn't completely take over
it's going to be almost one to one ratio, right?
Like you can get a O1, you can get DeepSeek.
O3 will come out at some point, we're going to have a new model that's going to come out
at some point that's going to be better at par with O3.
So I think open source is definitely out there. And the reason why I like this is one, it
forces everybody's hands to kind of like lower prices, what I think drive a lot of innovation
because people can build more user cases with that. I think it also force people to be more
creative because everyone's going to try to get an edge and people are going to try to
think about how they can do different things. So if anything, I think that will push innovation even further. And I do like that.
Now, I don't think it has taken over just yet. The main companies that we see they're leveraging
open source models, usually they're doing because of data constraints, right? They want to self-host
the model so data is not getting out of their premises during their agent executions. So they
want their agents to be using a low-code
kind of like version of like not DeepSeek necessarily because DeepSeek is brand new,
but like other models. Now I got a lot of mixed messages from the market in general.
I had customers telling me like we are not going to get anywhere close this because it's coming
from China and we don't want to get associated with this because of all the political reasons and all that.
And we had other companies, even financial institutions, saying like,
yes, we're actually thinking about them, but I'm super strong on this
because it is amazing and yada yada yada.
So I think judges stew out on how adoption will play out on the Western world.
But what I'm seeing is a government stepping away from it.
So I think like there's a few branches of governments that
already put statements out that they're not going to use it.
Companies seeing this as an opportunity for them
to kind of like cut down costs and get better models.
So I think it's just to out on kind of like what's going to happen with
the six specifically.
But I think it's a major win for open source and the community in general.
Awesome.
Well, we could have just going and going and going, you know, we have so many agents in
our brains and want to ask more questions, but we got to stop at some point.
Where do people find you and crew?
How does the whole world that already probably already know crew I've been trying, but just
in case somebody hasn't really tried it,
where we can find you.
I would say that I, one, I like to build in public a lot.
So probably the best way for you to tag along my journey
is Axe or LinkedIn, where you can basically find me
at joaoMDmora, so J-O-A-O-M-D-M-O-U-R-A.
And it's the same thing for both LinkedIn and X.
And that's kind of like where we post a lot of building public.
And if you want to know more about the project,
both from the enterprise and the open source,
you can go at create.com, and you're
going to learn all about it in there.
And yes, thank you so much if you are a user,
and thank you so much if you are considering to become a user.
Awesome.
This is so fun.
Thanks so much, Joe.
Thank you so much for having me, everyone.