Screaming in the Cloud - Creating GenAI Teammates with Amit Eyal Govrin
Episode Date: September 3, 2024Much of the discourse surrounding GenAI has centered on replacement, but what if tools focused on harmony instead? In this episode of Screaming in the Cloud, Kubiya CEO Amit Eyal Govrin expla...ins why his company is flipping the script on AI. Amit and Corey discuss the perks and shortcomings of today’s automation, how Kubiya functions as a teammate alongside its human counterparts, and the GenAI trends that aren’t getting the attention they deserve. If you’re worrying about your job security in the current AI climate, this discussion may help put your fears at ease.Show Highlights:(0:00) Intro(0:47) Chronosphere sponsor read(1:21) What Amit and Kubiya are building(5:34) Pros and cons of automation(9:10) Building a virtual teammate(12:39) Implementing AI with nuance(16:16) Real world applications of the tech(18:09) Firefly ad read(18:43) The value of human review in the world of AI(21:10) Complexities (or lack thereof) of GenAI(24:36) What people are sleeping on when it comes to GenAI(28:08) Where you can learn more about KubiyaAbout Amit Eyal Govrin:Amit is the CEO of Kubiya, helping the industry Break through the Time-To-Automation Paradox. As an early pioneer in the FinOps domain - executive position at Cloudyn (currently Azure Cost Manager), Zesty (advisor, early investor) and leading DevOps partnerships at AWS.Links Referenced:Kubiya: kubiya.aiSponsorChronosphere: https://chronosphere.io/?utm_source=duckbill-group&utm_medium=podcast
Transcript
Discussion (0)
So essentially, go and assign that entire end-to-end role that can also perform in high velocity, high accuracy, and high predictability.
And at the end of the day, be fully audited for compliance reasons.
And of course, that frees up the humans and that frees up your team's time to go and to innovate.
Welcome to Screaming in the Cloud.
I'm Corey Quinn.
Unless you've been hiding under a rock somewhere,
you've probably heard a fair bit about Gen AI lately
and how it is the savior slash doom slash hype cycle to beat them all.
My guest today is Amit Iyalgobrin, who is the CEO at Kubia.
First, thank you for joining me.
I appreciate your taking the
time. Thank you for having me here, Corey. Complicated environments lead to limited
insight. This means many businesses are flying blind instead of using their observability data
to make decisions. And engineering teams struggle to remediate quickly as they sift through piles
of unnecessary data. That's why Chronosphere is on a mission to help you take
back control with end-to-end visibility and centralized governance to choose and harness
the most useful data. See why Chronosphere was named a leader in the 2024 Gartner Magic
Quadrant for observability platforms at chronosphere.io. So you have been obviously
a little bit on one side of the Gen AI.
Is it great?
Is it terrible?
Divide.
Given that you have a company that is selling something directly in the space, let's take
that head on.
What are you building?
What I'm not building is an AI solution.
I'm building an outcome.
And the outcome is actually maybe if we take one step back, we can talk about
what I'm trying to solve for and kind of the paradox I'm trying to shatter. And then we can
talk about where Gen AI can enable that. That's a good way of approaching it. So often it feels
like people are raising giant rounds just because, all right, I basically wrote a Python script that
step one, import OpenAI, and step two is, we'll figure it out. Then they're
shocked, simply shocked, when a feature enhancement OpenAI puts out destroys their company. Who would
have predicted it could speak PDF one day? And yet, here we are. So what is the problem you're
aiming at? What outcome are you going for? That's a fair statement, by the way. I'll just
acknowledge that. Let's be honest. Gen AI, most people discovered it under a rock about two years
ago when OpenAI officially announced themselves to the world as ChatGPT. Clearly, we've been doing
this a little bit longer. We're working with all the up-to-date models, training our own models,
doing all the things that you would expect an AI company to do.
But that's not the topic of today's discussion.
Clearly, that's for people to geek out with afterwards if they want to look at our docs.
What we're actually looking to solve, Corey, because that's really why people are listening.
There's a concept called the time-to-automation paradox.
Are you familiar with it, Corey?
I think it's better if you explain it to everyone, because even if I am, I guarantee you someone is
not. Actually, I want you to do the selling for me. Okay. On some level, if you wind up with a
question of is the juice worth the squeeze longer term, the idea of how long do you spend automating
a thing versus how many times do you do the thing? If you're spending three days to automate something you do once a quarter,
that takes five minutes to do.
Is that worth it?
Well, the answer is, of course, it depends.
That's a loose definition of it.
You probably have a better one.
Actually, that's a perfect layperson definition of it.
But it's really the outcome and the effort that, you know,
is the effort and time to automation,
the amount of time it takes to write the script, the terraform landing zone,
the configuration file, and obviously maintaining that golden path use case
versus the number of times it gets used or essentially the output you receive,
is it congruent to one another?
And oftentimes you'll find that it's not.
Oftentimes a level of effort and determination and obviously the ongoing maintenance it takes to automate an end-to-end process
and the output that you receive isn't necessarily going to go and relate to business outcomes.
And that's typically where you see a lot of organizations with a very clearly defined automation strategy coming in with we're going to set up an internal developer
platform name it backstage or any other deviation of that we're going to go and set up some kind of
self-service platform they're going to go spend all this effort oftentimes even the headcount
associated with it just to find out that after a year worth of a lot of toil they managed to set
up seven golden path use cases,
out of which the first time a developer tries to go on to self-serve themselves into one of these automations,
encounter some kind of configuration or access or permission issue, goes right back into the GRTK queue.
Back to the on-call engineer and says, guess what, buddy? Enough is enough.
Enough of this nonsense.
I need a human to help.
And that's typically, and it repeats itself in various different formats and different
flavors and organizations.
But that's the sum of it.
The paradox, just like Jeevan's paradox in many respects, if it takes longer to automate
something than it is the output, it's not going to get done.
And what you find is many organizations start with the strategy, end up it is the output, it's not going to get done. And what you find is
many organizations start with the strategy, end up going down the line just to find out everyone's
doing ad hoc scripting until what happens. You do tend to cut yourself to ribbons on the edge
cases though of, okay, this only takes five minutes every morning to do. Why spend a week automating
it? Well, it gets done every morning, sure, but look at the defect rate when humans are doing this.
Sometimes people are hungover.
Sometimes someone is out sick.
If you can get a better outcome via automation,
that does tend to put a thumb on the scale from time to time.
But yeah, directionally, you're spot on.
No argument there.
And I think where kind of the output
and kind of the outcome base we're referring to, it all comes down to the amount of effort versus the output.
If it was as easy to automate an end-to-end process as it is to have a conversation with, say, Bob, and Bob's your on-call engineer.
And every time you need something from a platform, you just go to Bob and say, hey, Bob, can you go ahead and configure this
resource for me? Can you go ahead and grant me access or permission to this IAM policy or
creative policy, elevated permission, or get approval for this resource that requires that
approval for this? If you could do this as easy as having a conversation with Bob, guess what,
Corey? It's going to get done every single time.
So what we've created at Kubia, and this is why you mentioned Gen AI at first, and you kind of set this up for this answer, is if instead of Bob, you have an AI full-on discussion, conversation, bi-directional conversation, and it is access-aware, permission-aware, and able to really meet the users in the exact channels where they communicate and collaborate already.
Slack, Teams, Jira, Kanban boards. personalized experience and release Bob to actually do the real work, which is setting up
the infrastructure to tee up the company to have the next gen AI company product that they're going
to go and roll out themselves. Really, it's all about outcome and rewards. If you're going to
want to go and put all this effort, it better be worth the reward unless it's as easy as having a
conversation. Then you break away from that. The counterpoint, of course, is this has been teased at and people have done a number of
experiments, myself included, of, all right, I want a Python script to do X.
Go ahead and build that out for me.
And chat chippity gets some parts right, some parts wrong.
But it is very far away from being something that I could accept out the gate as meeting
the acceptance criteria that I have for it,
it feels like on some level,
to dive right back into your paradox,
that I would spend more time supervising the thing
than just writing the quick script myself
in some of those cases.
How do you get around that?
So you hit the most important point.
It's not just about doing the automation
because that's half the battle.
It's about doing it in a way that's expected,
controllable, and fully auditable.
And we're actually
allowing that. We're actually using
Terraform as our backend in many
respects to give the user the ability
to control every aspect of
this interaction with
essentially
the teammate that's configured
with Terraform. So you get to control the environment variables,
you get to put the output, you get to control the permissions,
and you get to control every single aspect of it.
So as an operator, you're in full control.
As an end user, you're interacting and having that LLM type of experience
that people are accustomed to with ChatGPT
and other type of
chatbots that they're comfortable with. So you're getting the best of both worlds.
I want to dig in a bit on the idea of talking about this as a virtual teammate,
specifically from the perspective of, I don't know about you, but I have something of a potty
mouth when I'm berating Siri when he gets something wrong, or Alexa when basically I ask for anything
and then I'm followed up with a, by the way, buy some more pants or whatever it is they're
trying to sell this week. If I talk to an actual colleague like that, HR is inviting me to a
meeting in which I'm not offered coffee. And very shortly afterwards, I'm not allowed back in the
office ever again. So there's a question of how much is this an accelerational tool for folks that are getting value from it
versus how much is this actually intended to be a full-on member, the fourth person on a three-person dev team?
Is this a, I guess, employee replacer? Is it an augment?
Where on that spectrum do you see it landing?
So just like you would say across all sorts of revolutions, industrial revolutions, where the humans weren't replaced by the smart assembly line, they became supervisors of the smart assembly line and managed to go and to reinvent their position.
And you could go down the list from the cloud and how you did things on
prem to the cloud and how you went through all the different resolutions. At the end of the day,
AI is a tool. It's an enabler. It's a megaphone. If your entire role in an organization is to move
a pencil from right to left, then likely AI will replace you. I'm sorry to say that. But if you
actually have the capability of supervising and becoming an enabler
of essentially a supervisor of agents. So think about this as up until now, you've been an
individual contributor. Now you actually get to supervise your teammates. So you're a DevOps
manager all of a sudden. That becomes a completely different job title. And of course, you get to see
everything through and have the highest and best use of your time freed up for the things that AI aren't prepared to do.
And some of you are talking about moving commoditization a bit further up the stack.
Similarly to, it used to be you had to run a bunch of compiler commands to get a web server
to run your application. Then it was just Yammer app install. Then in time it became,
oh, now just Docker run
and you get the whole thing prepackaged, ready to go.
And you're spending more of your time
trying to get the application to do what you want it to do
and not get the application set up in the first place.
That's exactly what we're saying.
You don't want to have to do the repetitive work
that otherwise would have been better suited for AI.
Free up your plate.
You have plenty of work to do.
You probably have a
big backlog that you haven't even gone around to because you're behind the eight ball every single
day. When you start, there's a hair and fire drills. Do more with less is the persistent
rallying cry of our current industry and honestly, our entire system. There's case studies that are
being studied in Harvard and every single business school. It's all about Blockbuster and Netflix, right?
Don't be Blockbuster.
Don't be left behind.
If you know how to reinvent yourself and adjust yourself,
AI is going to be the biggest enabler,
biggest career boost you could ever have in your career.
Otherwise, if you feel that you're perfectly fine
with stacking DVDs and dropping them in the inbox
every single Sunday when people have to return them,
you're going to be left behind
and the streaming kind of movement
will take you by storm.
That's kind of what we're saying.
Don't be left behind by AI.
Have AI be the enabler for you in your careers.
I don't necessarily disagree with the premise.
I think that it is fairly clear at this point to most
folk that there is value that can be derived from Gen AI. Whether it is this wild transformation
of society to a perfect utopia, I'm a little bit of a skeptic. But it's similar to, oh, I insist
on doing long division the old way because I'm not a fan of these newfangled things called
calculators. Yeah, it acts as a tool that accelerates.
But understanding when to apply it,
how to validate the output that comes out of it and to ensure that it's not insane
is going to be something I think
that we're stumbling through as a society.
And in many cases, the hallucination problems
aren't making a strong case for,
let's turn the air traffic control system
over to the Gen AI and hope for the best.
There's a, I think that it's a matter of nuance,
similar to before this,
developers would wind up using Stack Overflow,
the world's premier copy and paste website,
and use that to solve problems on an iterative basis.
You can amalgamate that
into various coding assistants and chatbots.
Having them actually go ahead and do the implementation
seems like the next logical step.
But as always, there's going to be some question around the margins
and how this, is this going to be something that we can actually trust?
And if so, how far?
So the beauty of what we're trying to accomplish here,
and it's not to let AI take over the entire Antoin workflow
and orchestrate the entire process.
You can't avoid hallucinations.
By the way, that's a feature within large language models, right?
It's a statistical-based approach.
Every single answer will deviate from the other answer every single time.
What we're actually advocating for is to make it very controllable, very predictable,
and that's where the Terraform code comes into play and the ai enablement is
essentially the free the natural language interaction that you have where it can go and
abstract away the business logic of your intent and then work into that pre-defined pre-gated
workflow so it's essentially combining the best of both worlds, both the known and expected structure,
along with all the things we know and love about large language models.
I think that that's a fascinating approach.
I mean, something that I've always done when I've been asking large language models is
I won't ask for the answer because, okay, you're going to give me an answer.
Maybe it's right, maybe it's wrong.
But regardless, you certainly sound very confident in what it is that you're saying.
What I'll ask instead is for a script to go ahead and do the thing to get the answer out of it.
Because from my perspective, that gives me two great paths. One, I can see how it's doing that
and potentially catch weird issues it's making along the way. And two, okay, that was great. I
want to iterate on that now. I'm not going back to square one or trying to find the chat that generated that and then have it go ahead and pick up where I left off.
I find that the show your work stage and breaking it down into stages means that when it starts to
go off the rails around step 17, you can go back to 16 and try again to get things moving along
again. I think that that aligns with the approach that you're taking. It's a very modular
approach and the ability to go
into insert your own tools, your own scripts,
your own code as part of
this to inject that into the process.
Make sure that you control every aspect
of your workflow. It's essentially your own
words orchestrating
essentially into a complex
process that otherwise would have taken
disparate tools and processes within organization to accomplish in the same way. It's just done in a highly,
highly condensed time to automation, which is the beautiful part about it. We can go into use cases
if it helps. By all means, give me an example use case. Let's talk about something real rather
than the ephemeral vision of the developer of tomorrow. Let's talk about something that
someone might actually do. One of our favorite golden use cases, if you may, is one of our customers came to us to
enable a self-service infrastructure or resource provisioning platform, all within Slack, which
is obviously where they meet their users.
So the concept is a user comes in, asks for, I don't know, a new SQS queue, for example.
And this is as part of an application that they want to copy over from one of their other resources.
So the ability for the teammate to first verify the identity of the user, verify that they have permission, maybe even create a just-in-time policy in order to enable that user to do so, but then also backtrace what the cost of this resource would be because there is budget enforcement that has to come into play.
So if it costs more, say, than $100 a day, that requires some additional layers of approval,
which, again, the teammate could also go and get the right approvals for that.
So the ability to go and then to both enforce budget, enforce policy, and create least
privilege automation without needing to assign a role, that's already a big win for this organization.
At the end of the day, they also care about cost. So not only are you enforcing the budget,
they also have a cleanup process where after 30 days, you imagine TTL, you could actually configure
that three hours, three days, 30 days,
it would go and automatically destroy that resource and bring it back to where it was before.
So you're never over provisioning or over resourcing. And it's all done as a simple
conversation. This same process, if you would have copied that over to the way they currently do it,
would take a matter of three to five days and five different people involved in the process to provision and deprovision that resource. With us, it's less than a minute.
Complicated environments lead to limited insight. This means many businesses are flying blind
instead of using their observability data to make decisions. And engineering teams struggle
to remediate quickly as they sift through piles of unnecessary data. That's why Chronosphere is Thank you. at chronosphere.io. I think that there's a, there's also significant value
in being able to spit these things out
that then go through a somewhat normal
production process where,
okay, great, this works in a test account.
It can go ahead and spin things up,
whatever, ideally there are guardrails
somewhere around it to prevent it
from doing the psychotic things
that make the headlines.
But okay, once that's done, great.
Then having it be vetted as
it gets promoted to higher environments and have humans weighing in on that does seem like a
reasonable control. Because objectively, there's not that big of a deal when you have a Gen AI
system hallucinating or being wildly inappropriate unless you're deploying that directly to customers
without any form of human review along the way.
I think if you put a chatbot on your website and make it authorized to cut deals on your behalf
and that it does horrific things, I think you're unhinged. I think if there's a human review that
goes through it to validate it's on brand, that it is doing what you want it to do, well, that
seems like a much more reasoned, rational approach. Maybe I'm just old and I have perspectives on these things
that don't necessarily align with the rest of the industry,
but here we are.
And I fully agree, Corey.
So from my perspective,
and this is why I want to make sure we're on the same page,
it's not by accident that we called it a teammate.
It's not a co-pilot.
A co-pilot just watches over your shoulder,
does code completion.
By design, it's only limited to the human and loop interaction you're involved in.
Here, we're talking about the concept of delegation. We're saying delegation is new
automation. If you could go ahead and just instruct one of your teammates to go ahead and
to solve the entire Jira ticket queue and to come back to you and report back with the medium time to resolve every single ticket and trust that no longer do you need a
human to do it, but you only need a human to supervise the outcome and to make sure that it's
fully audited and compliant, then you just saved potentially dozens of hours from the human's
work week. And at the same time, being able to free that human
up to do quite a few more important things that they have on their plate. So essentially, go and
assign that entire end-to-end role that can also perform in high velocity, high accuracy, and high
predictability, and at the end of the day, be fully audited for compliance reasons. And of course,
that frees up the humans, and that frees up your team's time to go and to innovate.
I think that that's probably a very fair way
of splitting the difference.
Now, the obvious question I have
that I did allude to at the beginning
that so many companies have seen is,
is this effectively a three-line Python script
that starts with import OpenAI
and you're going to be shocked,
simply shocked when that company doesn't hold still and releases something new?
What is the moat, for lack of a better term?
Well, the complexity of the infrastructure goes beyond this discussion or my acumen as a CEO, to be fair.
We're using over a dozen different language models.
Some of them we're fine-tuning models. Some of them were fine tuning ourselves.
Some of them were training ourselves and some of them are GPT-4-0, anthropic and so forth.
But at the end of the day, everything is broken down to multi-agent systems. So every single operation or task may be invoking a different language model. Just to give you an idea, just to understand if you're going to go and encounter an operation
that requires interrogating a resource and doing a Q&A, that would be invoking a different
language model and probably a different agent than if you're going and asking it a question
or if you're going and asking it to provision some.
So you would potentially have three different paths you could go by,
depending on the context of what you give it in the question.
And that, for example, you're encountering with a classifier agent
that knows how to classify the right agent that you would be routed towards
so it can go down that same path.
So as an example, that's just one element of this.
We can go into different workflows and
go into how you go and seek approval. We have multiple agents. So an approver agent isn't
necessarily the policy agent and isn't necessarily the TTL agent. So each way you have kind of a
Chinese firewall between these agents. So you can't brute force your instructions and try to
go and to get information you want out of it that otherwise
would have been under some form of access control. There's a lot of excitement around Gen AI,
and I get it. The first time I saw ChatGipity do something, it was magic. It was, oh, wow,
I'm watching the future unfold. And it's rare you get those moments where you get to see it.
Like, it reminds me of the first time
I walked to an Apple store and played with an iPhone.
And it was, oh my God, this is so much better
than the crappy BlackBerry I was using.
There are those transformative moments in time.
Now, whether it's worth the massive uprooting of everything
and hurling down the well after January, I don't know.
But from what you're doing and from what I've seen of it,
I think that you're definitely building something interesting. What that turns into and how that
winds up manifesting, I think definitely will remain to be seen, but that's the nature of
anything. So I'm sided with everybody who hates hearing about the next Gen AI company
raising their $200 million round based on a piped ring.
Then there's a huge difference, and there's levels to this, Corey.
So there's a huge difference between people putting together a demo, people putting together
a POC from the demo, and then people going into production in an enterprise-grade environment.
And this is effectively where we've already arrived.
We have, and we can talk about that,
but I'm not sure if this is before or after our embargo,
but we have enterprises that are effectively
in production working with us
and enjoying the fruits of their teammates.
Yeah, I think that there are,
the proof is always going to be in the pudding for great.
You can tell beautiful stories.
I mean, I love the sound of my own voice.
That's why I have two podcasts, but you can tell beautiful stories. I mean, I love the sound of my own voice. That's why I have two podcasts.
But you can only go so far
before having actual customers pony up
and saying, yes, this is valuable.
This is something that we are investing in.
And whether you think it's hokey or not,
we're going to be spending a boatload of money on it.
I mean, Kubernetes is a great example of this.
I thought that was significantly overhyped in some circles,
but everyone's using it at this point.
I was clearly wrong.
I'm wrong a lot.
That's the best part about being me.
I know Amazon likes to say leaders are right a lot,
but no, no, no.
I like being aggressively wrong,
but then adjusting my opinion in light of new information.
What do you think right now folks are being,
I guess, misunderstanding the most
about Gen
AI's opportunity?
What are they sleeping on that they perhaps shouldn't be without descending into full
on boosterism?
Go big or go home, right, Corey?
That's a model.
Gen AI, it's a very powerful technology, but it's not an end all be all.
It's not the pincia, right?
You need to have a clearly defined pain that you're going to solve
you need to have a clearly defined path and clearly defined architecture to get to that path
until that all aligns and all the stars aligned everything else is very aware it really is and
this is where you know we kind of talk about separate the men from the boys. There's very few companies in production working with
enterprises in in gen AI applications. We're one of those. We're not the only ones. I assure you,
there's others are coming up. But at the end of the day, the proof is in the pudding. We give a
guarantee on our product. We even let them opt out after three months if they don't necessarily
enjoy that experience because we are outcome based. If you're going to go and you're going to enjoy the fruits of our labor,
you're going to pay. If not, take your money and leave part ways. Come back to us in a year when
you think you're ready, when you think you have a better way of doing it.
Yeah, I have little interest personally in taking money from people that aren't seeing
value in return for that. I'd rather lead to a good outcome because then it turns out it's a hell of a lot
easier to sell to an existing customer than it is a new one. But if you wind up basically leaving
them feeling fleeced, you don't really have much of an option to sell a part two. Word to mouth
doesn't travel very well when that happens, right? No. And what is it like bad news travels 10 times
faster than good news? Yeah, I've seen that all the time. Whenever I'm cynical on Twitter,
I wind up getting an awful lot of traction out of it.
But if I say this is surprisingly great, no one cares.
No one wants to hear positivity.
They want to hear the overwhelming negativity aspect.
And we're now doing our best as a society
to algorithmically boost it.
But here we are.
That's why we've been very cautious not to overhype
what we've been doing until we have proof points
and social proof for this.
I'm not going to pick on Devin.
They did an excellent job trying to be pioneers in this space.
But let's face it, they probably should have been a little bit more cautious before they release their videos and kind of the boosting about what they're doing, because at the end of the day, they fell where a lot of companies are falling.
Being having a controllable software,
autonomous software engineer
requires you to have
actual controllable measures in place.
I don't think they've done that.
Maybe they will with the $200 million
they just raised.
You know, best of luck to them.
We don't have the privilege
of raising $200 million,
but we have the privilege
of knowing exactly what we're doing
and how to go into,
to tame the large language models and to behave the way we want.
I really want to thank you for taking the time to speak with me about this.
If people want to learn more, where should they go?
Well, I guess you can look at my shirt, but I don't know if I'm high up here.
Kubiaya, K-U-B-I-Y-A dot A-I.
Happy to have anybody ask questions.
We have a chat bot on our website, but you could
also sign up. Because of course you do. The wait list, and we're happy to answer your questions.
We have a support channel as well. So very happy to take questions, and I appreciate your time.
Of course, and we will put links to that all in the show notes. Amit Eyal Govran, CEO at Kubia.
I'm cloud economist Corey Quinn, and this is
Screaming in the Cloud. If you enjoyed this podcast, please leave a five-star review on
your podcast platform of choice. Whereas if you hated this podcast, please leave a five-star
review on your podcast platform of choice, along with an angry, insulting comment that I will just
assume was written by a malfunctioning chatbot.