The Infra Pod - AI agents is on the rise with the crew! Chat with Joao (CEO of Crew AI)

Starting point is 00:00:00 Welcome to the InfraPod. This is Tim from Essence and Ian, let's go. And this is Ian Livingston, lover of agents, couldn't be more excited about the future of AI. Today we have Joe, CEO and founder of Crew AI on the podcast. Joe, tell us a little about yourself. What got you in to thinking about agents and then what what was the trigger for you to start a company? What brought Crew AI to reality?

Starting point is 00:00:30 Tell us your story. All right. All right. Well, first of all, thank you so much for having me. Very excited to be here. And I got to say, Crew AI has been a wild journey. I never expected to get as big as it is, but it's funny as it starts to take steps and kind of like the road that it has, it kind of like reviews itself and things that doesn't make sense start to make sense and you see further and further.

Starting point is 00:00:52 So it's amazing. I think none of this is for granted and I really appreciate the support that we get. CrayEye honestly, it was a very organic process. Everything started with my wife. My wife came back to me one day and she was like, hey, you're doing all this cool stuff, like with AI in general, right? You're doing these rag applications, you're doing the super advanced kind of like embedding things. And this was back like, I don't know,

Starting point is 00:01:17 two years ago, three years ago. So it was early days, like people are not doing rag. And my wife was like, we should be more public about this. Like there's people that can learn from this and I thought maybe I should post from LinkedIn and I'm great at cheat posting on Twitter Not as great at posting on LinkedIn So I was like, alright

Starting point is 00:01:36 I'm gonna have to figure this one out and I start to do the agents to do it and guess what it worked I love it. I started to see all the metrics go up. I would post every kind of like two days or so and I could just put my random ideas into the thing and it would speed it out like a nice concept. So that's kind of like how the first version of Graeai came to be. And is that that was still close source? Like at what point did you, because my understanding is like first it was like an open source framework. What's the journey from I'm now like growing my LinkedIn following, I'm mastering the art

Starting point is 00:02:11 of the LinkedIn shitpost, which is in fact a special category of shitposting. Like what, what got you from, okay, I built this, this worked for me. Now I'm going to go like open source sources so other people can do it as well Yeah, so funny enough. I I do this one and it was working great He's like three four agents is working great and it was like alright. I want to do I got a hook I was like I want to do more agents I went out to made my life away and I was trying to think like alright I want to reuse a bunch of this stuff so I better I could use the framework

Starting point is 00:02:43 So I looked online and there was nothing out there that really like check out the boxes for me. So I was like all right I'm going to build something. I remember I was building something in my computer and it was my wife's or anniversary so I would travel with my computer when she didn't log but I would wake up early and kind of like do some of the coding and I got the first version of Cray-Eye out there and was open source from the get-go. I put it out there and I started to use it to build my own agents and basically tell the story about every automation that I was building. So I was building something to help me with stocks. I was building something to help me with like other social media. I'm basically doing those things.

Starting point is 00:03:18 And that's how the framework really took off. And then like in December, we start to see a bunch of people adopting and January thing kind of like skyrocketed. And that's kind of like when I remember going to some meetups in the day area and I would have like companies like Oracle reach out to me in the air and say like, hey I work at Oracle we're actually using Kriai in production. Can you help us? And that was the aha moment for me. I was like, wait, what? I'm using the thing to kind of like write LinkedIn posts and you're using it for monitoring your CI system and automatically fix errors. So that's kind of like where the lightbulb moment happened. I was like, all right, if I want to give that level of

Starting point is 00:03:58 support for this kind of company, this cannot be only open source. It needs to be a proper business. So I have the resources to deploy on these problems. And that's how the company came to be. And so I think I'm very curious about the beginning of it even though we've been talking about it. But you know, Mr. Elliott was on our podcast last time, talked about meeting you the first time. And he said you are like, basically obsessed with the idea of agents.

Starting point is 00:04:23 There's so many ideas flying your head. You know, maybe you're just that kind of guy that are just obsessed with anything you're truly in love with in general. But I'm curious, talk about like what got you into this idea of an agent and what you thought was an agent. And why do you have so many ideas in the first place around this thing? Yeah, so ideas always have been easy and hard on me. My wife can tell you all about it.

Starting point is 00:04:50 I get these crazy ideas and I just non-stop. I dive into them and I can't stop. And that has been the case for many projects. And a bunch of IoT stuff, a bunch of side projects. I have been this kind of engineer that I believe is more rare nowadays. And maybe that's a hot take like a lot of The new kind of like cutoff engineers are more worried about money and I consider myself more of like the OG Like I I like engineering like I'm gonna be coding all the way until I die because I love it

Starting point is 00:05:17 So I was doing a lot of that early on and all these projects come in mind and I think in a way or another I was always trying to build agents, but it was like never as we know them now. So I remember trying to build kind of like all sorts of bots and automations and kind of like build things that I could chat in the terminal and trying to use Twilos API so that I could message over with it. And I don't know how many bots I named ever just because I went to have like the initial one. I have been in this room before. So I think when I finally clicked for me with the LLM and then battings and seeing some of this stuff, like, hey, now it went from like, oh, this will not happen. It's just super hard to happen to like, I can do this. Like that was just like, I had to

Starting point is 00:06:04 do it. I like, all like, I had to do it. I like, all right, I have no other option here. I have to start creating these automations. But I think I always have been an automation kind of guy. Like I think it, and maybe this is another hot take. The best engineers are the lazy ones. The ones that are trying to kind of like basically automate everything on their way. Right.

Starting point is 00:06:21 And so what is the idea of an agent? Cause I think even today, if you are still asking me to, what, is the idea of an agent? Because I think even today, people are still asking me too, what the heck is an agent? And I think everybody has a different idea what it is. And there's like an academic term, there's a technical term, there's like a purist term, whatever. You're just like, I feel like the whole world has basically just copied and replaced LLM into agent at this point, you know? So I personally can't even tell the difference sometimes. So what is an agent? And what does CRUDE do?

Starting point is 00:06:50 I hear you. And I think honestly, there's a few things that doesn't help. Like I was talking with someone from Gartner the other day, and they used the term agent washing. Like a bunch of people talking about, oh, this is agentic, this is agentic, but not really. And that kind of like, I think it only contributes to the confusion. But in the end of the day, the way that I look at the agents, I'm going to keep it very simple is agents got to have agency. So what do you want is the AI to dictate the flow of the program.

Starting point is 00:07:20 So if you think about like a traditional software, right, you, like you, you have like that predictability, like you always have about like a traditional software, right, you like you have like that predictability, like you always have strong typing in traditional software, you know what is coming in and know what's going inside, you know what's going out. When you think about agents, you don't know what's going in. You think that GPT can be a recipe or a PhD thesis. You don't know what is happening in the model. It's mostly a black box and you don't know what's going out. But that works and the majority of people use it because there's a place in the world for applications like that.

Starting point is 00:07:47 So I think at the end of the day if you're an agent I would say ask yourself if what you're building has agency if it can kind of like self-heal and self-coordinate if it kind of like finds blocks along the way. If don't then it's probably not an agent. And so in your world like coming from this place of I just want to automate everything in my life, LLMs come about you If don't, then it's probably not an agent. then I can turn into code or I can do something. Like, what are the best use cases today for agents? Like, you know, crew is this broad platform, you have this, all these integrations and the tools you support and you have this concept of multi-agent. Like, help us understand, like, where are the places you see people having a lot of success, like types of automations that they're building, where like, this stuff really works and it's phenomenal.

Starting point is 00:09:03 And, you know, like maybe over here it's still like maybe too early or missing some pieces And it has an open source framework that we consider a product. We have a team dedicated 100% of the time building in. I'm a strong believer in open source. So there is that. That is now used by close to half percent of Fortune 500. Like there's a lot of companies using out there. It's insane. And we run around 20 million crews a month now.

Starting point is 00:09:41 Each crew has between two and I think the highest number that I have seen is 21 agents. So we're talking about tens of millions of agents a month now. Each crew has between two and I think the highest number that I have seen is 21 agents. So we're talking about tens of millions of agents a month now. Now we also have an enterprise version of that that we sell either as a self-hosted solution where people can run on their own cloud or even on prem if they want to do that and have some partners. Or we have a cloud version that people can use more on a self-serve motion. So that's creating AI as a company and the frameworks and the software. Now, I think that up to this point in the industry, what we have been seeing in terms of use cases is a lot of people were people like experimenting in 2024. I mean, I was expecting to see a lot of that and kind of like prototyping. I see a lot of things going into production as well, and we have hundreds of customers now with

Starting point is 00:10:31 use cases deployed, but a lot of it was kind of like understanding and then all those questions pop up, right? Oh, what about memory? What about rag? What about graphs versus flows? How do you think about the embeddings and like how this plays into together? So I think there has been all these conversations in the industry now. And then again, maybe this is a hot take, but I think all that is becoming very commoditized.

Starting point is 00:10:56 Yes, memory. So what you need it. Yes, you can do that in 20 different ways. Craya, you can do it better. But yes, that's it. I think what is happening now is companies are realizing, hey, we're going to have thousands of agents running these organizations three months from now. Do we want them to be legacy applications on their GitHub? Or do we want to have a control

Starting point is 00:11:18 plane where we can manage the agents, the authentication, the scoping, the tools? And that's what we're building. So that's kind of like the vision. In terms of use cases, a lot of it's kind of like companies start with early, kind of like what we call low precision use cases. So not user facing, kind of back office automations, things that they can get around pretty quickly. A lot of sales and marketing and back office

Starting point is 00:11:41 is usually by the first like box checkers. And then as they get confident on that, they expand into high precision use cases. So things for example, I have seen filling IRS form, we can have a whole conversation about that. It's a very complex problem. Then two, basically handling pricing change approvals is a major use case with one of our customers as well. And what people are calling a GenTech OCR, where you're kind of like not only processing docs,

Starting point is 00:12:10 but you're classifying, inferencing, taking actions on them. So there's a lot of more advanced use cases that we're seeing in that area now that are super interesting. The world is on fire with agents, as you clearly are knowing. And you know, your crew is on fire with the, as you clearly are knowing, and you know, your crew is on fire with the gasoline, everything in the middle too. I think it's a super interesting topic because I don't think we fundamentally even know the limitations of agents as much yet, and the possibilities of agents as well. And I'm very curious because you talked about some of the use cases, which I think are just

Starting point is 00:12:48 a little sampling points, right? Can you maybe talk about what do you think is like pushing the boundary of what agents can do now? Because I think a lot of people are not talking about agents that are doing full automation, right? Yes. Replacing humans or, you know,, just basically go apply and generate a resume and just be truly like an employee or some sorts.

Starting point is 00:13:09 But I think that we're all in this idea that agents can do that, but we actually are not so sure the type of actual functionality it can actually bring. So can you maybe talk about a few cases where like, this is like the cutting edge of what agents are doing maybe within the Careeri platform or so. Sure, and by the way, can I share my screen because I'd pick it up if I do

Starting point is 00:13:32 because I can show you some videos as well. We do, yeah, yeah, for you to share. We usually just take the audio for the main, but we kind of cut some snippets, so. Yeah, if you can have some things, we can do it, yeah. Let's do audio first, So, all right. So what I would say is it's we have crews in customers that have code and code replace teams, but it's not that these people were a lot of go.

Starting point is 00:13:58 They're basically doing kind of like work that they could do way better. So for example, we have a very interesting pricing use case where they had a data scientist that before approving this price changes that I was talking about, they would actually do a bunch of queries to compare, compare their prices across different marketplaces, across different regions, check integrations with things like AlphaSense and a few other things before they actually approved some of those changes.

Starting point is 00:14:27 And now they have agents doing the whole thing end to end. So it was an entire team of people that are super capable. They know how to write queries. They're more junior data scientists. They're now can be deployed doing something else. So we're definitely seeing some of that. But most of the crews are not automating jobs just yet. They're automating processes, right? So what you have is maybe you have like three or four agents that are automating pull request reviews. So we in Crew AI, for example, in our company, we have a four to one agents to employee ratio. We

Starting point is 00:15:02 have a lot of agents running. A lot of our use cases is we have agents, for example, doing custom marketing material per customer, onboarding customers into the platform and researching, reviewing out the pull request codes that we get, being able to answer support tickets, take phone calls to answer support things as well. So there's a lot going on internally and I can show you some of that.

Starting point is 00:15:25 I think the most advanced use case that I have seen to this day is for big GSI. So basically one of those big like PWC, Capture Night, Deloitte and all that. They're working with a big media company and what they're trying to do is while there is live footage of a game going on TV, they have agents that use fine-tuned video models to cut this video and then lay audio over it and then figure out legends and post that in social media. That was one of the more complex use cases that I haven't seen. I was like, well, you're basically doing editing, like live and kind of like directing, like how you want this to be. So I think those are kind of like some of the more complex ones.

Starting point is 00:16:14 So let's turn to the technical. Typically, when you're building with Krueger, you're building a single agent that you're kind of giving a prompt, some optionality, some things like you have, you know, your tagline here is the universal multi-agent platform. What is it about multi-agent that makes this stuff work and why is it better than like a single agent and what's the situation where, for example, in your LinkedIn example, when you first built, was that a single agent that just had, was good at one thing, or did you actually have a mixture of expert style approach? So what's the decision point on that?

Starting point is 00:16:43 And then how do you think with that conception? And then from there, I'd love to dive into like, okay, how does this all actually work? And what makes it work? Why? I love that. Yes, there's so many different pieces that make this agent's work.

Starting point is 00:16:55 Because if you think about it from the get go, it's simple, right? I want to LLM in a loop. All right, that's okay. But then you start to thinking, well, I needed to use tools. I need to be able to kind of like tap into other stuff. So, all right, I'm going to add a tool layer. Now, if they have tools, maybe we should have caching because I don't want them to use the same tools over and over again.

Starting point is 00:17:14 All right, so we have caching. Well, we also probably want them to remember things since they have caching. So we need like a long-term memory. Then we might need a shorter memory. Then what is a long-term memory? Well, maybe it's a vector database that we're performing reg against with a combination of something else. Then that piece starts to getting a little big and you're like, all right, like my agent's now done. But then you start to go into some use cases like, well, I actually want to remove PII and personal information before this request goes out. So I'm going to need a sanitization layer. All right, let me do that. So things can start getting very complex as you can like start to get into this production and like agents. And I have been kind of like beating the drum on multi agents versus

Starting point is 00:17:56 single agents for a while just because multi agents offer you so many more advantages. And that's inherently because of how these models work. So if you think about these models, and I know that we can get more technical here. So, a Nellolab is an AI model, right? If you think about more traditional models where you have classification models in the past, or even kind of like prediction models,

Starting point is 00:18:22 you basically have a set of features, so the data that you know, and you have a new data point that you're trying to learn. So you give enough examples and figure out it's like the mathematical formula that kind of predicts that model with a certain degree of kind of like a precision. Now, with LLMs, they are not that different in a theoretical sense, because what you still have,

Starting point is 00:18:43 you still have features, but the features are all the tokens all the words that have been typed so far and you're trying to use that in order to predict what would be the most appropriate next token. What that means is that and people I don't think a lot of people that use LLMs understand this a lot of the quality of your output depends on how you write what comes before. It's going to use what you typed so far to define what's going to write next. So if you go into chat GPT and you ask like, hey, give me a stock analysis on Tesla, it's going to give you an answer. But if you say you're a Finha-approved investor, give me analysis of stock on Tesla,

Starting point is 00:19:22 they're going to have a way better one. Because again, it's using other features, right? So when you're working with multi-agents, you got to use that on steroids, where you can have agents that are specialized on one thing versus specialized on other. And that goes beyond the prompting because on crew, you can actually have agents running off different LLMs entirely. So you can have a coding agent use Sonin, while a reviewer agent using GPT-4, and maybe another agent for PII information using a local module like DeepSeq R1, for example, and there you go. So these are kind of like some of the benefits that you get. I don't want to talk about multi-agents. And I remember some of the

Starting point is 00:20:02 skepticism that I got early on was like, well, if one agent hallucinates like 0.5% of time, doesn't five agents hallucinate like 95% of the time? But what do you get to do is if you do this right, is that they actually fact check each other. So you can have like an agent kind of like hallucinate something and the other is like, I don't think that's right.

Starting point is 00:20:23 And that gets you the back for the first one to kind of fix it so there's some interesting dynamics in there as well. Sounds such an amazing group of agents that can live harmonically together. I'm very curious because from a programmer point of view we used to be able to understand the whole control flow and data flow of your program right because they would actually guarantee functionality almost, right? Like, I want maybe a library or a function call or microservice

Starting point is 00:20:51 to do certain things. It should orchestrate and do certain other things, right? It's like all those are pretty much all hand coded, you know, and then it feels like the Stone Age now, right? Now you have this beautiful thing called agents, like you mentioned, has agency. And then there's multiple agents with their own agency trying to work together. And this becomes like a jazz band.

Starting point is 00:21:13 I don't even know what's going to happen here. I'm very curious because at one level, I think if you like try to actually achieve a functionality, like you said, like video editing, you know, we used to have to be able to understand each step how it's going to be able to be outputted. And what each agent, I guess, like maybe you think of it as a microservice, what microservice is trying to accomplish.

Starting point is 00:21:34 And you kind of piece it together. And at this point, I didn't even know what each agent will actually output. Even though you're giving it a prompt. I wonder how do you help folks to understand where is the level of design of a human input should be in? Are we just going to design some high level agent, let it flow so widely and just kind of guide us very high level? Or there's actually very specific outputs outputs inputs and very specific logic you're trying to instruct each agent to do? Is there any sort of trade-offs here because now at this point

Starting point is 00:22:12 it's such a black box now right? I don't even know exactly what's going to happen. What are some of the things you learned working with so many agents at this point? Like the granularity of humans, us trying to design the process of agents working together. Is there such thing as being too high level, too abstract and just kind of go anywhere? Or too specific where like it doesn't only do that much? Yeah, I think our own point. It's funny because you have two axes, right? In one axis is you and this agent still be flexible. So being able to handle kind of like different

Starting point is 00:22:46 use cases or get like whatever gets thrown their way. But you also want to have consistent quality. So whatever like they do, you want to make sure they're getting good quality at the end. So that's where things become a little like, all right, how do you do that when we're doing is kind of like this super fuzzy application. So for what we call high precision use cases, usually there's a lot of code involved still in the form of either functions that you don't need agents for. For example, if we like just pull data from somewhere every time, you don't need an agent to do that. You can write the code, the code pull the data, and then it pass the data into an agent to do something. But then also for a lot of guardrails and validations, right? So we have this idea

Starting point is 00:23:29 at a crew where at a task level whenever agents finishing a task, you can implement guardrails that you can programmatically write that will check that data and then kind of like send it back to the agent in case it doesn't pass us. But the way that we see is there's cases or there are use cases where you're going to have more autonomy and then you're good with crews, with agents, kind of like doing their own thing and that's okay. There are use cases where you're going to want to add more restrictions on that. So you're going to have like guardrails, we're going to have like before hooks or after hooks and that's one thing.

Starting point is 00:24:03 But then on all the other side, if you want to have a lot more control, there's Crew AI Flows. And Flows is basically a way for you to use events-based actions with agents if you want to. So on Flows, it's more the traditional if this then that that you would get in programming, but it's all events-based, so consumers and listeners.

Starting point is 00:24:24 And if at any point in time you want to throw an agent at something during that execution, you can do that natively. So what we see is for more low precision use cases that can actually use their agency, things like, oh, I want to just kind of like write an email, help me for press release, research someone, create a report, whatever that might be, you might go straight with agents and crews. But if you want to like, for example, fill up IRS forms, you probably want to use flows for that. And one example for that specific use case is, it's basically a huge financial institution, and they have to fill out those forms like every show often. And funny enough the forms have like 70 pages long of just like content they need to fill out. But fear not, it comes with instruction manual and that manual alone has 620

Starting point is 00:25:16 pages. What do you want to do in there? Like if you just throw agents at that, right, you can screw it up financial information and can like how you fill that out. So for those use cases, we're using a mix of agents and flows where we use flows to extract each page individually, extract all the fields that you have on that page. And then we pass to an agent to kind of like perform a rag queries against the instructions manual to understand how they should feel each feud. And then these agents perform rag on the database

Starting point is 00:25:50 to extract that information from there, and then they kind of fill it out. And the other thing is that in use cases like that, there's nowhere around it. We still need humans in the loop, right? That's the biggest thing. A lot of these high precision use cases, you need humans to validate things.

Starting point is 00:26:04 Speaking of the control flow of the user in the loop, on the podcast and in talk car conversation, I've often used the continuum between driving to cruise control to adaptive cruise and lane keeping, which is basically very close to self-driving, but it's not exactly. And then self-driving and how cars move basically back and forth along that. It seems like that analogy also applies very much to like all these automations like at the broad layer. So what do you think the decision points are for when you actually bring these things back to a human? What point do I bring a human back into the loop? How far do I let some agents go down a road or let an agent generate a plan? Like we were saying a broad base and I there's so many questions I have here but let's answer that question. What do you think the

Starting point is 00:26:44 decision points are that says, okay, agent go bring the human back in. And is that something a human decides or is that something the agent decides? So what we're seeing is most people are enforcing, like for this specific task, I want to make sure that a human gets involved before things move along.

Starting point is 00:27:01 So that's the most common. And at the end of the day, I think the companies that have been most successful are the ones that are kind of like promoting their employees into now manager of these agents. So the employees themselves, they're still responsible for accuracy, presentation, quality, put a nice pretty bow on it and everything. But they now have agents to help them do their work. It's just like another tool on their tool set. But what we're seeing most of the times is during the implementation of the automation,

Starting point is 00:27:29 they say like, hey, I want specific human approval in these three different spots. And then at the end, you always have something as well. Well, how do you think this changes over time? Do you believe, okay, so the final question I have actually is about generalization of agent flows, right? Like today with crew, how general are the automations? Like I'm saying building a very specific flow that like, you know, for example, you have

Starting point is 00:27:52 this one flow around pricing that you talked about and it's like obviously it is one little section of their day-to-day job that they wanted to automate and they did it using Crew and that was very successful and that makes a lot of sense. But what's the, like how generalized do you think agents become? Do you think we have many agents that have finally scoped or do you think we end up with like broad agents, broad flows that can handle many, many, many different types of tasks? And what's your mental model for how to think about sort of the scoping of task support, if you will, and these sort of delegation points. Yeah, so the vision that we have at Crew

Starting point is 00:28:28 is more of the former, where you have agents with smaller scope. Now, the way that users interact with that, the way that it feels to them, is the latter. And what I mean by that is, you have behind the scenes these agents with smaller scopes and smaller access to a set of tools, but when you're actually tasking the system to do something, you're asking about anything.

Starting point is 00:28:51 It's behind the scenes where these agents are going to be picked up apart. It's like, all right, you're here to perform this. We need these kind of agents, these kind of tools. Let's put them together and let's get them to do something. So right now, most of the use cases are more around specific and like, hey, there's one specific process went out to me. But where I think this will go into the future is that we now expand to, well, I have a pool of agents, a pool of tools, a pool of authentication resources, like all those resources that I manage. Theoretically, I can show anything at it and they should be able to self-organize and get it done. And that's kind of like part of what we're doing as well. So your broad view long-term though,

Starting point is 00:29:26 is that based from the user perspective, it feels generalized from the agent builder's perspective, they're task specific, and then there's some intermediary, which is Crew AI's platform that helps, you know, the user figure out what agents to task with work. And that's the future of these platforms. And so in many ways,

Starting point is 00:29:41 I'm assuming that your potential vision is basically, you become the Google of work for a company or a person, right? It's like, oh, I need to do this. Crew AI is going to go and figure it out with the available T like agents that you have, deep integrate knowledge in different places. That makes a lot of sense.

Starting point is 00:29:56 The initial company use case that we had was something like that. Because there was a lot of like interest from early days, like a bunch of people went to chat, we would be in the situations where like, I got to jump into a call and then I have another call right after that and another one,

Starting point is 00:30:10 and I know nothing about those companies, right? How do I prep for that? So the first screw that we start to use internally is a prep for meeting crew, where you pick it off and we kick it off straight from Slack. You have a query integration Slack and you can do like, hey, this is the meeting.

Starting point is 00:30:26 This is the person. This is the context. And then this agent just go online and research everything, right? So it would be funny because I will jump into these meetings like nowhere. And I would know a lot about that. I was like, oh, that's amazing.

Starting point is 00:30:37 The new factory that you folks just opened in Australia. I'm very happy about that. It feels like you're being bullish on APAC. And they're like, yes, for sure. And they're like, yes, there you go. That's amazing. You've got crew to run your glasses or something. Now suddenly you're like the superhuman.

Starting point is 00:30:51 You know, I actually want to ask this. When I think about the future of agents, I think we're talking about like a lot of like the specific how to get more functionality, more evaluation, more judging. But a huge aspect of right now, I feel like a lot of people are trying to explore is like this web agent, this sort of ability to browse the web, this ability to call APIs, this ability to even like start paying each other through agents. Like I see there's a lot more like action taking things or like actual browsing the web stuff. I'm curious how you think about this world, given that the agent world is so broad. You can do pure research, you can do scraping data, a lot of content stuff.

Starting point is 00:31:30 But then there's a lot of also action taking. Do you find that all action taking through agents has been pretty useful and easy to really get started? Is there any limitations you see that a lot of people, we can only really support these kind of functionality callings at this point, and we still need more, I don't know, research or boundary pushing to get, because I feel like we're so early, we don't really know where the limitations of this is. Yeah, I think one of our most successful use cases is actually an action-taking one.

Starting point is 00:32:02 I think action-taking is hard to do, but if you do it, it locks a lot of value, right? Especially if you can measure the accuracy of that. So for that use case specifically, we actually track the results with the customer for like a month, and then at the end we compare. The humans were doing in parallel, agents were doing in parallel for a month, what this looks like like in that case? We got 100% accuracy. I believe that we got it.

Starting point is 00:32:29 We got a little lucky in there 100% kind of like it's not something that he got to do every time. I would expect like 99 98 but but that was pretty good and that became a major use case for them. But I think action taking is something that if you can do with agents for your use case, and you can get it to do it right using the tools that are available, then it unlocks major value. Because that's where like things really kind of like what clogs the machine a lot of times

Starting point is 00:32:59 is having to have those people to say like, let's do A, let's do B. So I do have been seeing some use cases around that. But they're more rare. There are people that are more advanced, people that are more comfortable, people that have been building agents for a while so they understand how this is going to perform. And they have a lot of evaluation behind the scenes to make sure that if something goes south, like it kind of like alerts you or everything. And that's one of the big features that people like in the platform is the ability for you to kind of like set this alert triggers in case things go way higher. I'm curious how you think about sort of the quality assurance aspect of the agent. Like is it today such that the automations you are making basically is a human approving or disapproving the outcome and so this

Starting point is 00:33:39 isn't really an issue. What's your description or pattern for thinking about how do I get, you know, I have a series of agents, they're producing some quality of answer X with some error rate. How do we move that up the bar? Like, do you think that's actually, is that a problem that agent builders encounter today with crew or is it not so much a problem because hey, actually the way in which the types of use cases we're building for don't necessarily always have a user at the end of the flow anyways?

Starting point is 00:34:03 I'm kind of curious how you think about sort of the QA aspect. You know, there's tons of talk last year, in 2024, about eval frameworks. Like it was like every company was building an eval framework. Everyone was talking about evals, eval, eval, eval, endless evals. I'm curious what your thought process is, because you're actually like a layers and layers above what we traditionally thought eval frameworks were supposed to be doing. Yes. And I think you got that on point, right? There's different kinds of evals. There's evals on the prompt level, that's okay, but if you're doing agents, you got to check that box, but you

Starting point is 00:34:32 got to step up a few layers as well. So if you see this, this is one of the crews that we run that helps onboarding users. And you can see that we have an actual quality score and a hallucination score that we keep track. And by the way, this is kind of like, there's an opinionated approach, but you can also overwrite this with kind of like your own metrics or even add new ones if you want to keep track of them. So a lot of this is making sure

Starting point is 00:34:58 that we're keeping a certain quality threshold and basically doing some sampling around that to make sure that things are okay and hallucinations as well. Now as I said, you can implement custom ones and that usually helps a lot when engineers are building very specific use cases where they're like, oh, my use case, I can't have any link that is hallucinating. So I want to make sure that I double check all this.

Starting point is 00:35:20 You can add custom trackers for that as well. But what we're seeing is this ability for you to track not only on the prompt level what you can do, but then on the task level and then on the crew level, it's very important, especially for deviations, right? Because if you have a Gentic automation that gives you kind of like a constant quality that is acceptable for you without human intervention, then your first thoughts like, all right, I don't want that to deviate. And that's kind of like how we set up a lot of these thresholds and alerts because if

Starting point is 00:35:52 things kind of like change or you try different models or anything like that, you can kind of like, you can make sure that you act on it. But I think that right now, again, there's use cases where people are involved in and there are most of use cases where people are not. And the ones that people are not are usually the ones that drive a lot of value. That is really helpful. So I think a fun one for me was we implemented agents for ourselves, a crew of agents to do automatically PR reviews. We have an amazing open source community. A lot of people open up PRs. It's very hard to review them all. So we built agents to do it.

Starting point is 00:36:27 And for me, what was amazing is, so we shipped it, then I went to a series of meetings, and then two hours later I got back, and it already had review a bunch of PRs. And then you go into those PRs, and you see people taking the actions that the agents told them to. Like, hey, you should change this,

Starting point is 00:36:43 like this is not cool, and all that. And people were reacting to that. So that alone has been a no brainer. And then like there's no human in the loop whatsoever. Or the agents are reviewing all of our repost 24 seven now. That's amazing. All right, sir. Here's our favorite section of our podcast.

Starting point is 00:36:59 We call this spicy future. Tell us, sir, what is your spicy hot take of the world? I assume it's Asian-related, but it doesn't really have to. We'll let you pick what you want. Damn, all right. That's a hard one. I think we had so many hot takes throughout the whole thing. I would say, one, I think open source is going to win long term. Maybe that's a hot date.

Starting point is 00:37:29 I think open source is going to win. And I think we have seen this before. And I think it's going to be one of those cases where open source will beat private source. Two, it's going to take a while for people, like I think for a lot of people that are fearful of being replaced at a workplace, I think that it will take a while still given what we're seeing out there. And I also think that people don't understand that they don't have a lot of control over

Starting point is 00:37:54 that. So they should focus on the things that they have control like learning more about agents. And I would say those probably would be like the most like the highest hot takes right now. I don't know how people would feel about those. I'm curious, just going to ask you a question which some would consider spicy. I'm curious, now we have agents that can use browsers as a tool. What do you think the impact to the way that people think about building products is in the future? Do you think people continue to build websites or is there something completely different?

Starting point is 00:38:20 Like if we get to a world where browsers are using the website and the browser is designed for humans, would I still fund like a massive front end team for browsers to do it when I could just give them an API? I'm kind of curious what your vision of the future of tool calling with browsers is and where that takes us. Such a great question. What is happening with browser right now? It's funny, right? Because what do you have is one, AI is moving so fast that it's not willing to wait for its own protocols, right? Like it's not going to wait for that. So it's using whatever is in front of it, like browsers, keyboards and Nowsers people are working around those things. Even though that like that sucks for true put,

Starting point is 00:39:01 right, the ability for you to get information in the now now, like agents could be way more efficient than that. Now there's a question on whether you want them to be more efficient than that at the cost of not being able to observe them. And another question that if you can get them to be very good at that, then you don't need to change anything out there. That's what you're asking.

Starting point is 00:39:21 Because if you got to reboot every single surface out there To comply with an agentic protocol that would be super expensive and hard to do But if you get agents to be able to navigate that in a very smart way, then you now get access to Everything that we have ever be for humans now. I do think that in the medium term Yes, we're gonna still funding a front end team for sure. On the long term, I think there's gonna be versions of like a kind of like RSS, like what we had back in the day, but like a version of that for agents.

Starting point is 00:39:54 We already have something, for example, for our own docs. If you go in our docs and you append kind of like, I think lm.txt, you get an lm-friendly version of our documentation that you can copy and paste into kind of like a model that it might be chatting with. So I think there's going to be more and more of that. But I think what actually will happen is agents will get very good at navigating the common interfaces that we use. So people don't need to re-build them and they get to be more useful faster.

Starting point is 00:40:27 And I think we're, well, you know, we would talk about so many spicy hotcakes in the world. Actually we do want to talk maybe a bit more open source because that's one of my favorite topics as well. You know, DeepSeek has created all the hype and craziness right now. Like every single hour is DeepSeek, DeepSeek, DeepSeek has created all the hype and craziness right now. Every single hour is DeepSeek, DeepSeek, DeepSeek. I wonder, do you see that open source has already been taking over a lot of people's existing use cases?

Starting point is 00:41:00 Because I think before DeepSeek really happened, We've been probably seeing open source are being tried by a lot of enterprises, but still a lot of companies still use just Open AI and Photopic, you know, from my own sampling. And I wonder, do you see DeepSeq with truly a transformational moment, where like, okay, the umpire quality is finally here with such a small model? Or like, I'm already seeing this everywhere, right?

Starting point is 00:41:23 This is just a, you know, just a This is just a public announcement pretty much. What have you been seeing? Has open source already taken over? Or we're just going to see it faster now? So I think it hasn't taken over yet, but it is stepping up. I think honestly I was not expecting to see something as close to all1 coming out of the open source this fast. So I was very very impressed with this. I think that if anything if you keep at this pace, even if it doesn't completely take over

Starting point is 00:41:53 it's going to be almost one to one ratio, right? Like you can get a O1, you can get DeepSeek. O3 will come out at some point, we're going to have a new model that's going to come out at some point that's going to be better at par with O3. So I think open source is definitely out there. And the reason why I like this is one, it forces everybody's hands to kind of like lower prices, what I think drive a lot of innovation because people can build more user cases with that. I think it also force people to be more creative because everyone's going to try to get an edge and people are going to try to

Starting point is 00:42:20 think about how they can do different things. So if anything, I think that will push innovation even further. And I do like that. Now, I don't think it has taken over just yet. The main companies that we see they're leveraging open source models, usually they're doing because of data constraints, right? They want to self-host the model so data is not getting out of their premises during their agent executions. So they want their agents to be using a low-code kind of like version of like not DeepSeek necessarily because DeepSeek is brand new, but like other models. Now I got a lot of mixed messages from the market in general. I had customers telling me like we are not going to get anywhere close this because it's coming

Starting point is 00:43:01 from China and we don't want to get associated with this because of all the political reasons and all that. And we had other companies, even financial institutions, saying like, yes, we're actually thinking about them, but I'm super strong on this because it is amazing and yada yada yada. So I think judges stew out on how adoption will play out on the Western world. But what I'm seeing is a government stepping away from it. So I think like there's a few branches of governments that already put statements out that they're not going to use it.

Starting point is 00:43:34 Companies seeing this as an opportunity for them to kind of like cut down costs and get better models. So I think it's just to out on kind of like what's going to happen with the six specifically. But I think it's a major win for open source and the community in general. Awesome. Well, we could have just going and going and going, you know, we have so many agents in our brains and want to ask more questions, but we got to stop at some point.

Starting point is 00:43:56 Where do people find you and crew? How does the whole world that already probably already know crew I've been trying, but just in case somebody hasn't really tried it, where we can find you. I would say that I, one, I like to build in public a lot. So probably the best way for you to tag along my journey is Axe or LinkedIn, where you can basically find me at joaoMDmora, so J-O-A-O-M-D-M-O-U-R-A.

Starting point is 00:44:23 And it's the same thing for both LinkedIn and X. And that's kind of like where we post a lot of building public. And if you want to know more about the project, both from the enterprise and the open source, you can go at create.com, and you're going to learn all about it in there. And yes, thank you so much if you are a user, and thank you so much if you are considering to become a user.

Starting point is 00:44:41 Awesome. This is so fun. Thanks so much, Joe. Thank you so much for having me, everyone.

Your Ad Here

The Infra Pod - AI agents is on the rise with the crew! Chat with Joao (CEO of Crew AI)

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.