Screaming in the Cloud - Creating GenAI Teammates with Amit Eyal Govrin

Starting point is 00:00:00 So essentially, go and assign that entire end-to-end role that can also perform in high velocity, high accuracy, and high predictability. And at the end of the day, be fully audited for compliance reasons. And of course, that frees up the humans and that frees up your team's time to go and to innovate. Welcome to Screaming in the Cloud. I'm Corey Quinn. Unless you've been hiding under a rock somewhere, you've probably heard a fair bit about Gen AI lately and how it is the savior slash doom slash hype cycle to beat them all.

Starting point is 00:00:37 My guest today is Amit Iyalgobrin, who is the CEO at Kubia. First, thank you for joining me. I appreciate your taking the time. Thank you for having me here, Corey. Complicated environments lead to limited insight. This means many businesses are flying blind instead of using their observability data to make decisions. And engineering teams struggle to remediate quickly as they sift through piles of unnecessary data. That's why Chronosphere is on a mission to help you take back control with end-to-end visibility and centralized governance to choose and harness

Starting point is 00:01:10 the most useful data. See why Chronosphere was named a leader in the 2024 Gartner Magic Quadrant for observability platforms at chronosphere.io. So you have been obviously a little bit on one side of the Gen AI. Is it great? Is it terrible? Divide. Given that you have a company that is selling something directly in the space, let's take that head on.

Starting point is 00:01:34 What are you building? What I'm not building is an AI solution. I'm building an outcome. And the outcome is actually maybe if we take one step back, we can talk about what I'm trying to solve for and kind of the paradox I'm trying to shatter. And then we can talk about where Gen AI can enable that. That's a good way of approaching it. So often it feels like people are raising giant rounds just because, all right, I basically wrote a Python script that step one, import OpenAI, and step two is, we'll figure it out. Then they're

Starting point is 00:02:05 shocked, simply shocked, when a feature enhancement OpenAI puts out destroys their company. Who would have predicted it could speak PDF one day? And yet, here we are. So what is the problem you're aiming at? What outcome are you going for? That's a fair statement, by the way. I'll just acknowledge that. Let's be honest. Gen AI, most people discovered it under a rock about two years ago when OpenAI officially announced themselves to the world as ChatGPT. Clearly, we've been doing this a little bit longer. We're working with all the up-to-date models, training our own models, doing all the things that you would expect an AI company to do. But that's not the topic of today's discussion.

Starting point is 00:02:48 Clearly, that's for people to geek out with afterwards if they want to look at our docs. What we're actually looking to solve, Corey, because that's really why people are listening. There's a concept called the time-to-automation paradox. Are you familiar with it, Corey? I think it's better if you explain it to everyone, because even if I am, I guarantee you someone is not. Actually, I want you to do the selling for me. Okay. On some level, if you wind up with a question of is the juice worth the squeeze longer term, the idea of how long do you spend automating a thing versus how many times do you do the thing? If you're spending three days to automate something you do once a quarter,

Starting point is 00:03:26 that takes five minutes to do. Is that worth it? Well, the answer is, of course, it depends. That's a loose definition of it. You probably have a better one. Actually, that's a perfect layperson definition of it. But it's really the outcome and the effort that, you know, is the effort and time to automation,

Starting point is 00:03:44 the amount of time it takes to write the script, the terraform landing zone, the configuration file, and obviously maintaining that golden path use case versus the number of times it gets used or essentially the output you receive, is it congruent to one another? And oftentimes you'll find that it's not. Oftentimes a level of effort and determination and obviously the ongoing maintenance it takes to automate an end-to-end process and the output that you receive isn't necessarily going to go and relate to business outcomes. And that's typically where you see a lot of organizations with a very clearly defined automation strategy coming in with we're going to set up an internal developer

Starting point is 00:04:25 platform name it backstage or any other deviation of that we're going to go and set up some kind of self-service platform they're going to go spend all this effort oftentimes even the headcount associated with it just to find out that after a year worth of a lot of toil they managed to set up seven golden path use cases, out of which the first time a developer tries to go on to self-serve themselves into one of these automations, encounter some kind of configuration or access or permission issue, goes right back into the GRTK queue. Back to the on-call engineer and says, guess what, buddy? Enough is enough. Enough of this nonsense.

Starting point is 00:05:06 I need a human to help. And that's typically, and it repeats itself in various different formats and different flavors and organizations. But that's the sum of it. The paradox, just like Jeevan's paradox in many respects, if it takes longer to automate something than it is the output, it's not going to get done. And what you find is many organizations start with the strategy, end up it is the output, it's not going to get done. And what you find is many organizations start with the strategy, end up going down the line just to find out everyone's

Starting point is 00:05:31 doing ad hoc scripting until what happens. You do tend to cut yourself to ribbons on the edge cases though of, okay, this only takes five minutes every morning to do. Why spend a week automating it? Well, it gets done every morning, sure, but look at the defect rate when humans are doing this. Sometimes people are hungover. Sometimes someone is out sick. If you can get a better outcome via automation, that does tend to put a thumb on the scale from time to time. But yeah, directionally, you're spot on.

Starting point is 00:05:59 No argument there. And I think where kind of the output and kind of the outcome base we're referring to, it all comes down to the amount of effort versus the output. If it was as easy to automate an end-to-end process as it is to have a conversation with, say, Bob, and Bob's your on-call engineer. And every time you need something from a platform, you just go to Bob and say, hey, Bob, can you go ahead and configure this resource for me? Can you go ahead and grant me access or permission to this IAM policy or creative policy, elevated permission, or get approval for this resource that requires that approval for this? If you could do this as easy as having a conversation with Bob, guess what,

Starting point is 00:06:43 Corey? It's going to get done every single time. So what we've created at Kubia, and this is why you mentioned Gen AI at first, and you kind of set this up for this answer, is if instead of Bob, you have an AI full-on discussion, conversation, bi-directional conversation, and it is access-aware, permission-aware, and able to really meet the users in the exact channels where they communicate and collaborate already. Slack, Teams, Jira, Kanban boards. personalized experience and release Bob to actually do the real work, which is setting up the infrastructure to tee up the company to have the next gen AI company product that they're going to go and roll out themselves. Really, it's all about outcome and rewards. If you're going to want to go and put all this effort, it better be worth the reward unless it's as easy as having a conversation. Then you break away from that. The counterpoint, of course, is this has been teased at and people have done a number of experiments, myself included, of, all right, I want a Python script to do X.

Starting point is 00:07:52 Go ahead and build that out for me. And chat chippity gets some parts right, some parts wrong. But it is very far away from being something that I could accept out the gate as meeting the acceptance criteria that I have for it, it feels like on some level, to dive right back into your paradox, that I would spend more time supervising the thing than just writing the quick script myself

Starting point is 00:08:15 in some of those cases. How do you get around that? So you hit the most important point. It's not just about doing the automation because that's half the battle. It's about doing it in a way that's expected, controllable, and fully auditable. And we're actually

Starting point is 00:08:29 allowing that. We're actually using Terraform as our backend in many respects to give the user the ability to control every aspect of this interaction with essentially the teammate that's configured with Terraform. So you get to control the environment variables,

Starting point is 00:08:46 you get to put the output, you get to control the permissions, and you get to control every single aspect of it. So as an operator, you're in full control. As an end user, you're interacting and having that LLM type of experience that people are accustomed to with ChatGPT and other type of chatbots that they're comfortable with. So you're getting the best of both worlds. I want to dig in a bit on the idea of talking about this as a virtual teammate,

Starting point is 00:09:14 specifically from the perspective of, I don't know about you, but I have something of a potty mouth when I'm berating Siri when he gets something wrong, or Alexa when basically I ask for anything and then I'm followed up with a, by the way, buy some more pants or whatever it is they're trying to sell this week. If I talk to an actual colleague like that, HR is inviting me to a meeting in which I'm not offered coffee. And very shortly afterwards, I'm not allowed back in the office ever again. So there's a question of how much is this an accelerational tool for folks that are getting value from it versus how much is this actually intended to be a full-on member, the fourth person on a three-person dev team? Is this a, I guess, employee replacer? Is it an augment?

Starting point is 00:10:03 Where on that spectrum do you see it landing? So just like you would say across all sorts of revolutions, industrial revolutions, where the humans weren't replaced by the smart assembly line, they became supervisors of the smart assembly line and managed to go and to reinvent their position. And you could go down the list from the cloud and how you did things on prem to the cloud and how you went through all the different resolutions. At the end of the day, AI is a tool. It's an enabler. It's a megaphone. If your entire role in an organization is to move a pencil from right to left, then likely AI will replace you. I'm sorry to say that. But if you actually have the capability of supervising and becoming an enabler of essentially a supervisor of agents. So think about this as up until now, you've been an

Starting point is 00:10:51 individual contributor. Now you actually get to supervise your teammates. So you're a DevOps manager all of a sudden. That becomes a completely different job title. And of course, you get to see everything through and have the highest and best use of your time freed up for the things that AI aren't prepared to do. And some of you are talking about moving commoditization a bit further up the stack. Similarly to, it used to be you had to run a bunch of compiler commands to get a web server to run your application. Then it was just Yammer app install. Then in time it became, oh, now just Docker run and you get the whole thing prepackaged, ready to go.

Starting point is 00:11:28 And you're spending more of your time trying to get the application to do what you want it to do and not get the application set up in the first place. That's exactly what we're saying. You don't want to have to do the repetitive work that otherwise would have been better suited for AI. Free up your plate. You have plenty of work to do.

Starting point is 00:11:44 You probably have a big backlog that you haven't even gone around to because you're behind the eight ball every single day. When you start, there's a hair and fire drills. Do more with less is the persistent rallying cry of our current industry and honestly, our entire system. There's case studies that are being studied in Harvard and every single business school. It's all about Blockbuster and Netflix, right? Don't be Blockbuster. Don't be left behind. If you know how to reinvent yourself and adjust yourself,

Starting point is 00:12:11 AI is going to be the biggest enabler, biggest career boost you could ever have in your career. Otherwise, if you feel that you're perfectly fine with stacking DVDs and dropping them in the inbox every single Sunday when people have to return them, you're going to be left behind and the streaming kind of movement will take you by storm.

Starting point is 00:12:33 That's kind of what we're saying. Don't be left behind by AI. Have AI be the enabler for you in your careers. I don't necessarily disagree with the premise. I think that it is fairly clear at this point to most folk that there is value that can be derived from Gen AI. Whether it is this wild transformation of society to a perfect utopia, I'm a little bit of a skeptic. But it's similar to, oh, I insist on doing long division the old way because I'm not a fan of these newfangled things called

Starting point is 00:13:01 calculators. Yeah, it acts as a tool that accelerates. But understanding when to apply it, how to validate the output that comes out of it and to ensure that it's not insane is going to be something I think that we're stumbling through as a society. And in many cases, the hallucination problems aren't making a strong case for, let's turn the air traffic control system

Starting point is 00:13:23 over to the Gen AI and hope for the best. There's a, I think that it's a matter of nuance, similar to before this, developers would wind up using Stack Overflow, the world's premier copy and paste website, and use that to solve problems on an iterative basis. You can amalgamate that into various coding assistants and chatbots.

Starting point is 00:13:43 Having them actually go ahead and do the implementation seems like the next logical step. But as always, there's going to be some question around the margins and how this, is this going to be something that we can actually trust? And if so, how far? So the beauty of what we're trying to accomplish here, and it's not to let AI take over the entire Antoin workflow and orchestrate the entire process.

Starting point is 00:14:04 You can't avoid hallucinations. By the way, that's a feature within large language models, right? It's a statistical-based approach. Every single answer will deviate from the other answer every single time. What we're actually advocating for is to make it very controllable, very predictable, and that's where the Terraform code comes into play and the ai enablement is essentially the free the natural language interaction that you have where it can go and abstract away the business logic of your intent and then work into that pre-defined pre-gated

Starting point is 00:14:39 workflow so it's essentially combining the best of both worlds, both the known and expected structure, along with all the things we know and love about large language models. I think that that's a fascinating approach. I mean, something that I've always done when I've been asking large language models is I won't ask for the answer because, okay, you're going to give me an answer. Maybe it's right, maybe it's wrong. But regardless, you certainly sound very confident in what it is that you're saying. What I'll ask instead is for a script to go ahead and do the thing to get the answer out of it.

Starting point is 00:15:09 Because from my perspective, that gives me two great paths. One, I can see how it's doing that and potentially catch weird issues it's making along the way. And two, okay, that was great. I want to iterate on that now. I'm not going back to square one or trying to find the chat that generated that and then have it go ahead and pick up where I left off. I find that the show your work stage and breaking it down into stages means that when it starts to go off the rails around step 17, you can go back to 16 and try again to get things moving along again. I think that that aligns with the approach that you're taking. It's a very modular approach and the ability to go into insert your own tools, your own scripts,

Starting point is 00:15:50 your own code as part of this to inject that into the process. Make sure that you control every aspect of your workflow. It's essentially your own words orchestrating essentially into a complex process that otherwise would have taken disparate tools and processes within organization to accomplish in the same way. It's just done in a highly,

Starting point is 00:16:10 highly condensed time to automation, which is the beautiful part about it. We can go into use cases if it helps. By all means, give me an example use case. Let's talk about something real rather than the ephemeral vision of the developer of tomorrow. Let's talk about something that someone might actually do. One of our favorite golden use cases, if you may, is one of our customers came to us to enable a self-service infrastructure or resource provisioning platform, all within Slack, which is obviously where they meet their users. So the concept is a user comes in, asks for, I don't know, a new SQS queue, for example. And this is as part of an application that they want to copy over from one of their other resources.

Starting point is 00:17:07 So the ability for the teammate to first verify the identity of the user, verify that they have permission, maybe even create a just-in-time policy in order to enable that user to do so, but then also backtrace what the cost of this resource would be because there is budget enforcement that has to come into play. So if it costs more, say, than $100 a day, that requires some additional layers of approval, which, again, the teammate could also go and get the right approvals for that. So the ability to go and then to both enforce budget, enforce policy, and create least privilege automation without needing to assign a role, that's already a big win for this organization. At the end of the day, they also care about cost. So not only are you enforcing the budget, they also have a cleanup process where after 30 days, you imagine TTL, you could actually configure that three hours, three days, 30 days,

Starting point is 00:17:45 it would go and automatically destroy that resource and bring it back to where it was before. So you're never over provisioning or over resourcing. And it's all done as a simple conversation. This same process, if you would have copied that over to the way they currently do it, would take a matter of three to five days and five different people involved in the process to provision and deprovision that resource. With us, it's less than a minute. Complicated environments lead to limited insight. This means many businesses are flying blind instead of using their observability data to make decisions. And engineering teams struggle to remediate quickly as they sift through piles of unnecessary data. That's why Chronosphere is Thank you. at chronosphere.io. I think that there's a, there's also significant value in being able to spit these things out

Starting point is 00:18:49 that then go through a somewhat normal production process where, okay, great, this works in a test account. It can go ahead and spin things up, whatever, ideally there are guardrails somewhere around it to prevent it from doing the psychotic things that make the headlines.

Starting point is 00:19:01 But okay, once that's done, great. Then having it be vetted as it gets promoted to higher environments and have humans weighing in on that does seem like a reasonable control. Because objectively, there's not that big of a deal when you have a Gen AI system hallucinating or being wildly inappropriate unless you're deploying that directly to customers without any form of human review along the way. I think if you put a chatbot on your website and make it authorized to cut deals on your behalf and that it does horrific things, I think you're unhinged. I think if there's a human review that

Starting point is 00:19:36 goes through it to validate it's on brand, that it is doing what you want it to do, well, that seems like a much more reasoned, rational approach. Maybe I'm just old and I have perspectives on these things that don't necessarily align with the rest of the industry, but here we are. And I fully agree, Corey. So from my perspective, and this is why I want to make sure we're on the same page, it's not by accident that we called it a teammate.

Starting point is 00:20:00 It's not a co-pilot. A co-pilot just watches over your shoulder, does code completion. By design, it's only limited to the human and loop interaction you're involved in. Here, we're talking about the concept of delegation. We're saying delegation is new automation. If you could go ahead and just instruct one of your teammates to go ahead and to solve the entire Jira ticket queue and to come back to you and report back with the medium time to resolve every single ticket and trust that no longer do you need a human to do it, but you only need a human to supervise the outcome and to make sure that it's

Starting point is 00:20:35 fully audited and compliant, then you just saved potentially dozens of hours from the human's work week. And at the same time, being able to free that human up to do quite a few more important things that they have on their plate. So essentially, go and assign that entire end-to-end role that can also perform in high velocity, high accuracy, and high predictability, and at the end of the day, be fully audited for compliance reasons. And of course, that frees up the humans, and that frees up your team's time to go and to innovate. I think that that's probably a very fair way of splitting the difference.

Starting point is 00:21:14 Now, the obvious question I have that I did allude to at the beginning that so many companies have seen is, is this effectively a three-line Python script that starts with import OpenAI and you're going to be shocked, simply shocked when that company doesn't hold still and releases something new? What is the moat, for lack of a better term?

Starting point is 00:21:40 Well, the complexity of the infrastructure goes beyond this discussion or my acumen as a CEO, to be fair. We're using over a dozen different language models. Some of them we're fine-tuning models. Some of them were fine tuning ourselves. Some of them were training ourselves and some of them are GPT-4-0, anthropic and so forth. But at the end of the day, everything is broken down to multi-agent systems. So every single operation or task may be invoking a different language model. Just to give you an idea, just to understand if you're going to go and encounter an operation that requires interrogating a resource and doing a Q&A, that would be invoking a different language model and probably a different agent than if you're going and asking it a question or if you're going and asking it to provision some.

Starting point is 00:22:22 So you would potentially have three different paths you could go by, depending on the context of what you give it in the question. And that, for example, you're encountering with a classifier agent that knows how to classify the right agent that you would be routed towards so it can go down that same path. So as an example, that's just one element of this. We can go into different workflows and go into how you go and seek approval. We have multiple agents. So an approver agent isn't

Starting point is 00:22:52 necessarily the policy agent and isn't necessarily the TTL agent. So each way you have kind of a Chinese firewall between these agents. So you can't brute force your instructions and try to go and to get information you want out of it that otherwise would have been under some form of access control. There's a lot of excitement around Gen AI, and I get it. The first time I saw ChatGipity do something, it was magic. It was, oh, wow, I'm watching the future unfold. And it's rare you get those moments where you get to see it. Like, it reminds me of the first time I walked to an Apple store and played with an iPhone.

Starting point is 00:23:28 And it was, oh my God, this is so much better than the crappy BlackBerry I was using. There are those transformative moments in time. Now, whether it's worth the massive uprooting of everything and hurling down the well after January, I don't know. But from what you're doing and from what I've seen of it, I think that you're definitely building something interesting. What that turns into and how that winds up manifesting, I think definitely will remain to be seen, but that's the nature of

Starting point is 00:23:54 anything. So I'm sided with everybody who hates hearing about the next Gen AI company raising their $200 million round based on a piped ring. Then there's a huge difference, and there's levels to this, Corey. So there's a huge difference between people putting together a demo, people putting together a POC from the demo, and then people going into production in an enterprise-grade environment. And this is effectively where we've already arrived. We have, and we can talk about that, but I'm not sure if this is before or after our embargo,

Starting point is 00:24:28 but we have enterprises that are effectively in production working with us and enjoying the fruits of their teammates. Yeah, I think that there are, the proof is always going to be in the pudding for great. You can tell beautiful stories. I mean, I love the sound of my own voice. That's why I have two podcasts, but you can tell beautiful stories. I mean, I love the sound of my own voice. That's why I have two podcasts.

Starting point is 00:24:46 But you can only go so far before having actual customers pony up and saying, yes, this is valuable. This is something that we are investing in. And whether you think it's hokey or not, we're going to be spending a boatload of money on it. I mean, Kubernetes is a great example of this. I thought that was significantly overhyped in some circles,

Starting point is 00:25:04 but everyone's using it at this point. I was clearly wrong. I'm wrong a lot. That's the best part about being me. I know Amazon likes to say leaders are right a lot, but no, no, no. I like being aggressively wrong, but then adjusting my opinion in light of new information.

Starting point is 00:25:19 What do you think right now folks are being, I guess, misunderstanding the most about Gen AI's opportunity? What are they sleeping on that they perhaps shouldn't be without descending into full on boosterism? Go big or go home, right, Corey? That's a model.

Starting point is 00:25:35 Gen AI, it's a very powerful technology, but it's not an end all be all. It's not the pincia, right? You need to have a clearly defined pain that you're going to solve you need to have a clearly defined path and clearly defined architecture to get to that path until that all aligns and all the stars aligned everything else is very aware it really is and this is where you know we kind of talk about separate the men from the boys. There's very few companies in production working with enterprises in in gen AI applications. We're one of those. We're not the only ones. I assure you, there's others are coming up. But at the end of the day, the proof is in the pudding. We give a

Starting point is 00:26:18 guarantee on our product. We even let them opt out after three months if they don't necessarily enjoy that experience because we are outcome based. If you're going to go and you're going to enjoy the fruits of our labor, you're going to pay. If not, take your money and leave part ways. Come back to us in a year when you think you're ready, when you think you have a better way of doing it. Yeah, I have little interest personally in taking money from people that aren't seeing value in return for that. I'd rather lead to a good outcome because then it turns out it's a hell of a lot easier to sell to an existing customer than it is a new one. But if you wind up basically leaving them feeling fleeced, you don't really have much of an option to sell a part two. Word to mouth

Starting point is 00:26:57 doesn't travel very well when that happens, right? No. And what is it like bad news travels 10 times faster than good news? Yeah, I've seen that all the time. Whenever I'm cynical on Twitter, I wind up getting an awful lot of traction out of it. But if I say this is surprisingly great, no one cares. No one wants to hear positivity. They want to hear the overwhelming negativity aspect. And we're now doing our best as a society to algorithmically boost it.

Starting point is 00:27:18 But here we are. That's why we've been very cautious not to overhype what we've been doing until we have proof points and social proof for this. I'm not going to pick on Devin. They did an excellent job trying to be pioneers in this space. But let's face it, they probably should have been a little bit more cautious before they release their videos and kind of the boosting about what they're doing, because at the end of the day, they fell where a lot of companies are falling. Being having a controllable software,

Starting point is 00:27:45 autonomous software engineer requires you to have actual controllable measures in place. I don't think they've done that. Maybe they will with the $200 million they just raised. You know, best of luck to them. We don't have the privilege

Starting point is 00:27:58 of raising $200 million, but we have the privilege of knowing exactly what we're doing and how to go into, to tame the large language models and to behave the way we want. I really want to thank you for taking the time to speak with me about this. If people want to learn more, where should they go? Well, I guess you can look at my shirt, but I don't know if I'm high up here.

Starting point is 00:28:17 Kubiaya, K-U-B-I-Y-A dot A-I. Happy to have anybody ask questions. We have a chat bot on our website, but you could also sign up. Because of course you do. The wait list, and we're happy to answer your questions. We have a support channel as well. So very happy to take questions, and I appreciate your time. Of course, and we will put links to that all in the show notes. Amit Eyal Govran, CEO at Kubia. I'm cloud economist Corey Quinn, and this is Screaming in the Cloud. If you enjoyed this podcast, please leave a five-star review on

Starting point is 00:28:51 your podcast platform of choice. Whereas if you hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry, insulting comment that I will just assume was written by a malfunctioning chatbot.

Screaming in the Cloud - Creating GenAI Teammates with Amit Eyal Govrin

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.