This Week in Startups - Multi-Agent AI, Open Protocols & Startup Acceleration with Saurabh Tiwary | AI Basics

Starting point is 00:00:00 All right, everybody, welcome back to another episode of AI Basics. Doing this in partnership with our friends over at Google Cloud. Go check out their report. The Future of AI Perspective for Startups. It features insights from 23 leading voices in AI, and it's available to you for free at g-o-o.geli slash future of AI. And today we're excited to welcome Sorob Torwari. He is the VP and general manager of cloud AI at Google.

Starting point is 00:00:35 Sir Rob leads cloud AI products like Vertex, Kaggle, and the agent development stack. And his team works closely with DeepMind to bring AI research into production. All right, Sir Rob, it is a crazy time for AI. Thanks for taking a break. No, thanks for having me. And really nice to be here on the podcast. My lord, have you ever seen a pace like this in our industry? I mean, it is crazy.

Starting point is 00:00:59 has been crazy for quite some time now. I guess you probably feel it as well that the pace of things happening is accelerating, if you can imagine that. Which is really the crazy part of it. We see all this amazing innovation over the last two or three years as AI becomes available to a broader group of individuals,

Starting point is 00:01:20 whether it's consumers, business executives, consultants, and of course developers and people building products. And I think today, really good idea to talk about agents. Yep.

Starting point is 00:01:31 Everybody's been obsessed for a couple of years now with chatting and prompting, but agents promise a much different future. Maybe you could explain to the audience what the paradigm shift is between a chat room

Starting point is 00:01:44 and talking to an LLM, doing deep research, all these great features versus having an agent and setting up an agent. Yeah, sure. So I think even the definition of agents has been shifting

Starting point is 00:01:56 as the technology has been improving. It's kind of like the shifting goal. post kind of a thing. People earlier were like, I will wrap an LLM with a prompt and that becomes my agent. Then there were things like workflows where you have predictable outcomes and you were kind of like codifying it through LLMs. That became agents. And now I think we are at a point where we are basically asking these agents to make indeterminate tasks, right? Meaning take a set up actions, look at the output, based on that, define what the next set of actions should be.

Starting point is 00:02:32 And so that leads to very interesting possibilities, if you can imagine in this abstract kind of way, like, for example, or look at my email, see if there is something important. The definition of important is a little bit vague and broad. Pick those important emails, see what tasks are assigned, try to do those tasks, etc., just as an example over there. When does that future arrive? When do we go into, you know, open our email box and see, hey, by the way, overnight, you had seven emails.

Starting point is 00:03:03 Two of them were critical. And here's some research. We prepared to reply for you. And here are the tasks that we've already delegated. One of the tasks went to the sales team. Another went to customer support. And another one went to operations and finance. When will we start to see that future emerge in your mind?

Starting point is 00:03:21 So I feel some of the early foundations are already being laid right away. Things are happening. What is happening is, and particularly for these ambiguous tasks, as models keep on improving, what needs to happen is that these tasks need to be done at high enough level of quality, because otherwise you or I am not going to delegate these really, really important tasks to those agents, right? Yeah. And so as the quality of the foundation models is improving, we are now seeing these bits and pieces getting connected, and that's why all this excitement about agents, that we are going from one step task to two steps and three steps and four steps and so on. And so those things are actually happening. There are products,

Starting point is 00:04:04 for example, we have something called Agent Space, which is our enterprise offering, where we are actually building these type of automations. And so what are the precursors that you need to make this work and to trust it? So is there a step that you see organizations doing in order to build trust that the agent is going to do the right thing. So I think at the end of the day, it boils down to quality, meaning if you are executing or calling that agent to do certain tasks, how often does it finish, right? And let's say the accuracy is like 50% or so it's like,

Starting point is 00:04:41 it's correct one out of two times, not a great thing. Once you go into a territory of like 80, 90% accuracy, that's when things start getting interesting. Another thing that we are observing from our customers is that the ambition of our customers is also increasing. So let's say you have a two-step task and you can get to 80 or 90 percent. Then people want to add a bit more complexity and a bit more complexity and so on and so forth. So I think agents will continue to evolve in terms of what they are doing and their capabilities and to this full-on ambition when we talk about in social media and other places about what should agents do of the future do.

Starting point is 00:05:20 but I think some of those basic steps are already being taken. A lot of companies are already deploying agents into production environments today. We're starting to do it ourselves internally when a founder sends us an update and they say, hey, here's what's going on in my startup. As investors, we have to read that update. Sometimes they organize it really concisely. Hey, here's our growth. Here's our runway.

Starting point is 00:05:42 Here's our spend. Here's the number of employees. Here's the open positions. Other times, it's just rambling, right? And we don't have the authority or ability to say, oh, fill out of form when they send an update. So now we have AI look at it and then normalize and say, hey, here are the data points in there. Here are the ones that aren't in there. Now, we don't have it reply and say, hey, you're missing your cash on hand. You're missing February's revenue.

Starting point is 00:06:05 But it will be able, it can now tell us what's missing so that we can save that step. And I would say, arguably that saves us 15 to 30 minutes and increases our accuracy, but we're still not ready to have it go back and talk to the founder. Yeah, yeah, yeah. As the quality of these agents to improve, right, we would start having a little bit more confidence as we go along, and then we start delegating more and more tasks to the agents. All right.

Starting point is 00:06:30 Now, to do this, there's going to need to be infrastructure. I know you guys are working on this ADK. Maybe explain ADK and then agent-to-agent protocols, because in this situation that I described of reading a founder's update and then saying, hey, we're missing a couple of pieces of data here that we could use. on the other side, if that was an email exchange or a Slack exchange, you know, I message, it could come

Starting point is 00:06:54 any number of ways. The founder could have an agent who says, hey, yeah, we got that information for you. It's right here. So maybe we could talk about those two things. ADK and then A2A, the agent-to-agent protocol. Yeah, so ATK is our open source framework for building agents. What we believe is and actually open source in the sense that you can go download it, do whatever you want with it in terms of building multi-agent systems using ADK.

Starting point is 00:07:22 It has a lot of nice properties like debugability. As I was mentioning, quality is really important. And so when founders or developers are building agents, they need to ensure they have the right tools for tracing off like how the conversation goes, debugability, latency, stuff like that. So it has a lot of tools within the agent development kit itself, or ADK stands for agent development kit. So within itself to help build agents, it is not tied to Google per se. So you can take an agent built on ADK, use any LLM of your choice, any large language model of your choice, and you can run it in any cloud or within your on-prem environment as well. So it's a fully open agent development kit.

Starting point is 00:08:04 Similarly, we have this agent-to-agent protocol, and this is where our thought process was that, yes, we have this agent development kit through which development kit. or founders can go build agents. But at the end of the day, there will be many, many different platforms, frameworks on which people will be building agents. And we can't just hope that everyone converges to this ADK format, for example,

Starting point is 00:08:28 even though we prefer that and we would like people to, but I don't think that's practical to assume everyone will be. And so here what we did was we built this A2A or agent-to-agent protocol so that if you have fully functioning agents that you might have pre-built, Whether using LLMs or whether using traditional rule-based systems, the agent-to-j agent protocol can help you talk across agents. So that now if you think about the multi-agent kind of ambition that we have,

Starting point is 00:08:56 where you have many task-specific agents and you can call them on demand to do certain things, you can do it in a very interoperable way, independent of where those agents have been built, how they have been built, and on what platform are they running. Yeah, and this protocol, it's very new. Yeah, we're in year one of this, I think. I mean, we announced it at Cloud Next. The growth has been stupendous.

Starting point is 00:09:20 Cloud Next was in March of this year. Oh, so we're in three months into this. Yeah, actually April, April of this year. So it's like month two, when we announced, we had about 50 to 60 partners, different companies who are supporting this protocol. Now that number has grown to about 120. Just yesterday, we donated the A2A protocol to the Linux Foundation. so that it has a completely open governance body

Starting point is 00:09:46 in terms of building and adding new features into the protocol. And almost, I think all the hyper-scalers, all the major companies, ServiceNow, Salesforce, Microsoft, Amazon, they are all part of supporting and they have put in their weight behind the agent-to-agent protocol. And this is critical. If we can get the entire ecosystem to support a standard, then if in this hypothetical example,

Starting point is 00:10:10 I don't know, we have a billing, agent that goes and checks invoices and which ones have been paid, which one haven't been paid, it could ping sales force and say, well, who's the account executive here? Let's let them know that this hasn't been paid. And then who's the customer and who's the support person at that customer? Let's ask their agent to check. Hey, has invoice number, one, two, three, four, five, six, seven been paid or not? Do you have that? It could go back and say, yes, it's been sent. Okay, then we could escalate that call ticket. And these are things that would be done by humans, right? These are chores that humans really don't like to do. Nobody likes to look up stuff. I mean,

Starting point is 00:10:45 in the old days, you'd have to send somebody down to, I don't know, the files in the basement to go find the invoice, pull it, bring it upstairs. And there was a runner who did that. The equivalent here is, I think, pretty clear. Clerical work, chores, they're going to go away. And that frees up humans to do whatever the actual business of that business is, the more important high order bits. Yeah? Yeah, yeah. Completely makes it. And actually, we are already, I mean, to your, it's very interesting. Like, we are actually working with SAP for some of these procurement-related agents and working with Microsoft so that we can have an orchestration of agents across these different

Starting point is 00:11:23 very highly tactical or precise agents, right, which understand that particular domain really, really well and can execute really well at high enough quality and do these higher-level tasks on top of them. And there'll have to be some permissions here. So there's a whole. layer of permissions that will need to be granted. You know, just like if you invited somebody to collaborate in a workspace or a document, if you're using a Google sheet, you might invite somebody from another organization to it and you approve it. This will have to happen between the

Starting point is 00:11:54 procurement departments and the billing departments between companies, but they're not going to be sending money. That's in the future. And if they do, a human would be in the loop. So maybe you could talk a little bit about the concept of human in the loop here. And is that part of the standard where it says, hey, you know what? This seems important. Human in the loop time. Yeah, we actually both within the ADK as well as in the, so the A2A protocol just talks about the protocol of like two agents talking to each other and how do you register. The agent development kit actually has capabilities for human in the loop, right, so that you can intervene, you can have function calls or an interrupt where a human can jump in, give feedback that it's okay to proceed.

Starting point is 00:12:35 But the point that you are raising is really important, right? Like security, like if we envision this view of where agents are going, right? We are providing them a lot of capabilities to go and do and execute tasks on our behalf. Now, almost the way I think about it is almost all the challenges that were there in the app ecosystem on Play Store or App Store. You have similar kind of issues, right, meaning you submit code, you check and you make sure that the apps are doing the right things or not. And also when you install it on the phone, you have permissions and so on, right? Kind of similar kind of issues will pop up over here with respect to all the different agents which are there, whom should you trust versus not, while also giving them what kind of controls or ability to make changes.

Starting point is 00:13:23 Do you want to provide them? Because even if it is a trustable one, an agent, you don't want it to, I don't know, change the pay slip of an employee, for example, and stuff like that. Yeah, if we don't want a changing passwords. You don't need to change the passwords. You don't need to change people's social security. security numbers. In fact, you don't even need to look at that. We'll keep it a high level. When we're looking at this system a year or two from now, what will success look like for you and for the standard? What will success look like? What will common tasks that will be have been

Starting point is 00:13:53 accomplished? And then what do you think is going to take more time? I mean, as I said, like, this will be, will be a journey. I think in terms of the complexity of the task, Definitely this example that we said about like an SAP agent or a service NIO agent orchestrating across multiple different things and doing complex tasks, I think those are going to happen. Multimodality will become very, very powerful. Explain what that means for folks, yeah. Yeah, so multimodality meaning right now a lot of like when we talk to these LLMs, generally people think about text-based operation, right? You type something, you get something back, right? Multimodality is where you have speech, image, video, text, all coming in into the model and the model is able to reason on top of it.

Starting point is 00:14:40 So, for example, when a receipt is submitted for expense approval, does it look like a receipt? Or is it actually just a set of numbers which are submitted as an example, right? That you are just processing text as an example. So multimodality, I feel, will be a very, very powerful construct. And the models are already capable, and we are already seeing some progress along that. that domain. So the effectiveness of the agents will become much more stronger as we leverage multimodality. Yeah, we started a couple of years ago where different sales training software started to incorporate AI, started listening to, you know, video conferences, and then

Starting point is 00:15:20 coaching people, hey, you know, this other agent or this other account executive was doing a great job asking these questions. You might want to incorporate those. And to have an agent then go correlate, hey, here were five calls with potential partners, customers. These three actually followed through and bought the product. These two didn't. We're going to look back on the phone calls and the emails and we ping Gmail. We ping Google meets or Zoom.

Starting point is 00:15:49 And we put all that together. Here's why we think we lost that customer. Or here's the customer support calls. Here's the tone of their voice and the despair they had and the frustration they had with, you know, your solution. back to multimodal here. It could be fascinating to be putting these data sets together and then just getting coaching.

Starting point is 00:16:08 The coaching piece of it to me is so interesting. Yeah, actually, I will share an example. Like one of our customers, Shopify, they actually are leveraging multimodality for their customers who are setting up storefronts, and they have their dashboard, right? The admin dashboard that they have for setting up the storefront.

Starting point is 00:16:28 Now it has lots of buttons and so on, and if someone who is new to it may be stuck into, how do I change the price or do something else, add a new product or something, what they are doing is they are sharing that dashboard as part of the agentic workflow with the LLM. It is able to look at the screen of the user and say, well, click that button, not that one, the one next to it

Starting point is 00:16:50 and stuff like that, right? So that becomes really, really powerful. It reminds me of my early days in tech support where I had somebody on the phone when I was in PC support in the late 80s, early 90s. And I said, okay, you know, we're installing this on Windows. Just click any key. And the woman said to me, I see a space bar, I see a control key, I see an enter key,

Starting point is 00:17:10 I don't see an any key. And I said, oh, that means you can pick the key of your choice. Like, so let's hit the space bar. And she said, that's incredibly frustrating. If you want me to click the any key, why don't you have a key marked any? And I said, you're 100%. Right. But just coaching on those things, watching the path in which a

Starting point is 00:17:29 customer navigates your store and then finding out where they're frustrated or on the back end where they're trying to figure out how do I find taxes and file my taxes. It's going to be incredible for engagement on these products and then even getting to the product development team. Hey, by the way, these are the three times and here are the paths where customers are most frustrated with your product. Yeah. And here's how to fix them. And then eventually, hey, let's talk to the UX agent. Let's talk to the developer agent, have them suggest a fix, and then human in a loop it, somebody puts the bug fix in. I mean, can you imagine the world we're going to be living in in 18 months? Yeah, yeah, yeah. No, this is, I mean, and we are moving along that particular direction right away.

Starting point is 00:18:13 I just love this idea. So I saw somebody had connected, I'm not sure which programs it was, but, you know, a co-pilot for a developer with either Figma or Canva or something. And the two of them started talking to each and that had to build a lot of glue to make this happen. And this is pre-standards. But they got the two of them talking to each where Figma started or Canvas started designing a website, the coder started building it. And they were going back and forth, essentially talking to each other. Yeah. And you could just watch them work. Pretty interesting. Yeah. If you think about it, like the progress that we make is because of the iterative design or iterative work that we do, right? And if we can provide that capability to agents and provide that

Starting point is 00:18:57 agency to them. It opens up just new opportunities. All right, listen, this is incredible. Google's an amazing partner. Thank you so much for coming on. For more insights, check out Google Clouds, The Future of AI, Perspectives for Startups, tons of predictions, real world, examples and startup advice from 23 AI leaders. Get some great ideas in there, some great techniques, some tactics, some strategy, go-o.glo.g.g. slash future of AI. To read the full report, thanks again for listening. Go to this week in startups.com slash basics. To see all our basics. We got legal basics up there. We've got accounting basics. And now we've added AI basics because this is the fundamental feature that's

Starting point is 00:19:38 going to drive startups and commerce and our lives over the next decade. Thanks so much for joining us and we'll see you next time. Yeah. Thank you.

This Week in Startups - Multi-Agent AI, Open Protocols & Startup Acceleration with Saurabh Tiwary | AI Basics

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.