Screaming in the Cloud - The Evolution of OpenTelemetry with Austin Parker
Episode Date: September 5, 2023Austin Parker, Community Maintainer at OpenTelemetry, joins Corey on Screaming in the Cloud to discuss OpenTelemetry’s mission in the world of observability. Austin explains how the OpenTel...emetry community was able to scale the OpenTelemetry project to a commercial offering, and the way Open Telemetry is driving innovation in the data space. Corey and Austin also discuss why Austin decided to write a book on OpenTelemetry, and the book’s focus on the evergreen applications of the tool. About AustinAustin Parker is the OpenTelemetry Community Maintainer, as well as an event organizer, public speaker, author, and general bon vivant. They've been a part of OpenTelemetry since its inception in 2019.Links Referenced:OpenTelemetry: https://opentelemetry.io/Learning OpenTelemetry early release: https://www.oreilly.com/library/view/learning-opentelemetry/9781098147174/Page with Austin’s social links: https://social.ap2.io
Transcript
Discussion (0)
Hello, and welcome to Screaming in the Cloud, with your host, Chief Cloud Economist at the
Duckbill Group, Corey Quinn.
This weekly show features conversations with people doing interesting work in the world
of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles
for which Corey refuses to apologize.
This is Screaming in the Cloud.
Look, I get it.
Folks are being asked to do more and more.
Most companies don't have a dedicated DBA because that person now has a full-time job
figuring out which one of AWS's multiple managed database offerings is right for every workload.
Instead, developers and
engineers are being asked to support and, heck, if time allows, optimize their databases. That's
where OtterTune comes in. Their AI is your database co-pilot for MySQL and PostgreSQL on Amazon RDS
or Aurora. It helps improve performance by up to 4x or reduce cost by 50%.
Both of those are decent options.
Go to OtterTune.com to learn more and start a free trial.
That's O-T-T-E-R-T-U-N-E dot com.
Welcome to Screaming in the Cloud.
I'm Corey Quinn.
It's been a few hundred episodes since I had Austin Parker on to talk about the things that Austin cares about.
But it's time to rectify that.
Austin is the community maintainer for Open Telemetry, which is a CNCF project, if you're unfamiliar with.
We're probably going to fix that in short order.
Austin, welcome back. It's been a month of Sundays.
It has been a month and a half of Sundays.
A whole pandemic and a half.
So much has happened since then.
I've tried to instrument something with OpenTelemetry about a year and a half ago.
And in defense to the project, my use case is always very strange.
But it felt like a lot of things have sharp edges,
but it felt like this had so many sharp edges
that you just pivot to being a chainsaw.
And I would have been at least a little bit more understanding of why it hurts so very
much.
But I have heard from people that I trust that the experience has gotten significantly
better.
Before we get into the nitty gritty of me lobbing passive aggressive bug reports at
you for you to fix in a scenario in which you can't possibly refuse me.
Let's start with the beginning. What is OpenTelemetry?
That's a great question. Thank you for asking it.
So OpenTelemetry is an observability framework.
It is run by the CNCF, home of such wonderful, award-winning technologies as Kubernetes. And, you know, the second biggest source of YAML in the known universe.
On some level, it feels like that is right there with hydrogen
as far as unlimited resources in our universe.
It really is.
And, you know, as we all know,
there are two things that make sort of the DevOps and cloud world go around.
One of them being, as you would probably know, AWS bills, and the second being YAML.
But OpenTelemetry tries to kind of carve a path through this, right?
Because we're interested in observability.
And observability, for those that don't know or have been living
under a rock or not reading blogs, it's a lot of things. But we can generally sort of describe it
as like, this is how you understand what your system is doing. I like to describe it as it's
a way that we can model systems, especially complex distributed or decentralized software
systems that are pretty commonly found in large organizations of every shape and size,
quite often running on Kubernetes, quite often running in public or private clouds.
And the goal of observability is to help you model this system and understand what it's doing,
which is something that I think we can all agree is a pretty important part of our job as software engineers.
Where OpenTelemetry fits into this is as the framework that helps you get the telemetry
data you need from those systems, put it into a universal format, and then ship it off to
some observability backend, you know, a Prometheus or a Datadog or whatever,
in order to analyze that data and get answers to your questions you have.
From where I sit, the value of OTEL or open telemetry, people in software engineering
love abbreviations that are impenetrable from the outside. So of course, we're going to lean
into that. But what I've found for my own use cases is the shining value prop was that I could instrument an application with OTEL, in theory, and then send whatever I wanted that was emitted in terms of telemetry, be it events, be it logs, be it metrics, etc. all of a creation of vendors on a case-by-case basis, which meant that suddenly it was the first step
in, I guess, an observability pipeline,
which increasingly is starting to feel
like an industrial observability complex
where there's so many different companies out there.
It seems like a good approach to use to start,
I guess, racing vendors in different areas
to see which performs better.
One of the challenges I had with that
when I started down that path is it felt like every vendor who was embracing OTEL did it from a perspective of
their implementation. Here's how to instrument it to send it to us because we're the best,
obviously. And you're a community maintainer. Despite working at observability vendors yourself,
you have always been one of those community-first types where you care more about the user experience
than you do this quarter
for any particular employer that you have,
which, to be very clear, is intended as a compliment,
not a terrifying warning.
It's why you have this authentic air to you
and why you are one of those very few voices
that I trust in a space where normally
I need to approach it with significant skepticism.
How do you see the relationship
between vendors and Open open telemetry?
I think the hard thing is that I know who signs my paychecks at the end of the day,
right?
And you always have, you know, some level of, you know, let's say bias, right?
Because it is a bias to look after, you look after them who brought you to the dance.
But I think you can be
responsible with balancing the needs of your
employer and the needs of the community. The way I've always
described this is that if you think about observability as a
market,
what's the total addressable market there.
It's literally everyone that uses software.
It's literally every software company,
which means there's a plenty of room for people to make their numbers and to
buy and sell and trade and do all this sort of stuff.
And by taking that approach,
by taking sort of the big picture approach and saying,
well,
look,
you know, there's going to be, of all these people, there are going to be some of them that are going to use our stuff. And there are some of them that are going to use our competitor's stuff.
And that's fine.
Let's figure out where we can invest in an open telemetry in a way that makes sense for everyone and not just our people.
So let's build things like documentation.
One of the things I'm most impressed with with OpenTelemetry over the past two years
is we went from being, as a project, if you search for OpenTelemetry,
you would go and you would get five or six or ten different vendor pages coming up
trying to tell you, this is how you use it, this is how you use it.
And what we've done as a community is we've said, you know, if you go looking for documentation, you should find our website, you should find our resources.
And we've managed to get the OpenTelemetry website to basically rank above almost everything else when people are searching for help with OpenTelemetry. And that's been really good because, one, it means that now,
rather than vendors or whoever coming in saying,
well, we can do this better than you, we can be like, well, look,
just put your effort here. It's already the top result.
It's already where people are coming, and we can prove that.
And two, it means that as people come in,
they're going to be put into this process of community feedback where they can go in, they can look at the docs, and they can say, oh, well, I had a bad experience here, or how do I do this?
And we get that feedback, and then we can improve the docs for everyone else by acting on that feedback.
And the net result of this is that more people are using OpenTelemetry, which means there are more people kind of going into the tippy-tippy top of the funnel, right, that are able to become a customer of one of these myriad observability backends.
You touched on something very important here.
When I first was exploring this, you may have been looking over my shoulder as I went through this process. oh, this is a CNCF project, in quotes, where this is not true universally, of course, but there are
cases where it clearly is, where this is an effectively vendor-captured project,
not necessarily by one vendor, but by an almost consortium of them. And that was my takeaway from
OpenTelemetry. It was conversations with you, among others, that led me to believe, no, no,
this is not in that vein. This is clearly something
that is a win. There are just a whole bunch of vendors more or less falling all over themselves,
trying to stake out thought leadership and imply ownership on some level of where these things go.
But I definitely left with a sense that this is bigger than any one vendor.
I would agree. I think to even step back further,
right, there's almost two different ways
that I think vendors or anyone
can approach open telemetry,
you know, from a market perspective.
And one is to say like,
oh, this is socializing
kind of the maintenance burden
of instrumentation,
which is a huge cost
for commercial players, right?
Like if you're Datadog or Splunk or whoever, you have these agents that you go in and they
rip telemetry out of your web servers, out of your gRPC libraries, whatever.
And it costs a lot of money to pay engineers to maintain those instrumentation agents right and the cynical take
is oh look at all these big companies that are kind of like pushing all that labor onto the
open source community and you know i'm not casting any aspersions here like i do think that there's
an element of truth to it though because yeah that, yeah, that is a huge fixed cost.
And if you look at the actual lived reality of people and you look at back when SignalFX was still a going concern, right, and they had their APM agents open sourced, you could go into the SignalFX repo and diff their Node Express instrumentation against the Datadog Node Express instrumentation.
And it's almost 100% the same, right?
Because it's truly a commodity.
There's nothing interesting about how you get that telemetry out.
The interesting stuff all happens after you have the telemetry
and you've sent it to some backend,
and then you can analyze it and find interesting things. So yeah, it doesn't make
sense for there to be five or six or eight different companies all competing to rebuild the
same wheels over and over and over and over when they don't have to. I think the second thing that
some people are starting to understand is that it's like, okay, let's take this a step beyond
instrumentation, right? Because the goal of OpenTelemetry really is to make sure that this
instrumentation is native so that you don't need a third-party agent. You don't need some other
process or jar or whatever that you drop in and it instruments stuff for you. The JVM should provide
this. Your web framework should provide this. Your RPC library
should provide this, right? This data should come from the code itself and be in a normalized
fashion that can then be sent to any number of vendors or backends or whatever. And that changes
how sort of the competitive landscape a lot, I think, for observability vendors. Because rather than kind of what you have now,
which is people competing on like,
well, how quickly can I throw this agent in
and get set up and get a dashboard going?
It really becomes more about like,
okay, how are you differentiating yourself
against every other person
that has access to the same data, right?
And you get more interesting use cases
and how much more interesting analysis features.
And that results in more innovation
in sort of this industry
than we've seen in a very long time.
For me, just coming from the customer side of the world,
one of the biggest problems I had with observability
in my career as an SRE type for years
was you would wind up building your observability in my career as an SRE type for years was you would wind up building your
observability pipeline around whatever vendor you had selected. And that meant emphasizing
things they were good at and de-emphasizing things that they weren't. And sometimes it's
worked to your benefit, usually not. But then you always had this question when it got to things
that touched on APM or whatnot, or application performance monitoring, where, oh, just embed
our library into this.
Okay, great. But a year and a half ago, my exposure to this was on an application that I was running in a distributed fashion on top of AWS Lambda. So great. You can either use an extension
for this, or you can build in the library yourself. But then there's always a question of
precedence, where when you have multiple things that are looking at this from different points
of view, which one gets done first? Which one is going to see the others? Which one is going to
enmesh the others and enclose the others in its own perspective of the world?
And it just got incredibly frustrating. One of the, at least for me, bright lights of OTA was
that it got away from that, where all of the vendors receiving telemetry got the same view. Yeah, they all get the same view.
They all get the same data.
And there's a pretty rich collection of tools
that we're starting to develop
to help you build those pipelines yourselves
and really own everything from the point of generation
to intermediate collection
to actually outputting it to wherever you want to go.
For example, a lot of really interesting work has come out of the OpenTelopathy Collector recently.
One of them is this feature called Connectors.
And Connectors let you take the output of certain pipelines and route them as inputs to another pipeline.
And as part of that connection, you can transform stuff.
So for example, let's say you have a bunch of spans or traces coming from your API endpoints.
And you don't necessarily want to keep all those traces in their raw form
because maybe they aren't interesting or maybe they're just too high of a volume.
So with connectors,
you can go and you can actually convert
all of those spans into metrics
and export them to a metrics database.
You could continue to save that span data if you want,
but you have options now, right?
Like you can take that span data
and put it into cold storage
or put it into like, you know,
some sort of slow blob storage thing
where it's not actively indexed,
it's slow lookups,
and then keep a metric representation of it
in your alerting pipeline,
use metadata exemplars or whatever
to kind of connect those things back.
And so when you do suddenly see,
it's like, oh, well,
there's some interesting P99 behavior
or we're hitting an alert or violating an SLO or whatever, then do suddenly see, it's like, oh, well, there's some interesting P99 behavior, or we're hitting an alert, or we're violating an SLO, or whatever. Then you can go
back and say, okay, well, let's go dig through the slow data. Let's look at the cold data to figure
out what actually happened. And those are features that historically you wouldn't have needed to
go to a big, important vendor and say, hey, here's a bunch of money.
Do this for me.
Now you have the option to do all that more interesting
pipeline stuff yourself and then make choices about vendors
based on who's making a tool that can help me
with the problem that I have.
Because most of the time, I feel like we tend to treat
observability tools as it depends a lot on where you sit in the org.
But you've certainly seen this movement towards like, well, we don't want a tool.
We want a platform.
We want to go to Lowe's and we want to get the 48-in-1 kit that has a bunch of things in it.
And we're going to pay for the 48-in-1 kit even if we only need like two things or three of things in it. And we're going to pay for the $40,000 kit
even if we only need like two things
or three things out of it.
OpenTelemetry lets you kind of step back and say like,
well, what if we just got like really high quality tools
for the two or three things we need?
And then for the rest of this stuff,
we can use other cheaper options,
which is I think really attractive,
especially in today's macroeconomic conditions, let's say. One thing I'm trying to wrap my head around,
because we all find when it comes to observability in my experience, it's the parable of three blind
people trying to describe an elephant by touch. Depending on where you are in the elephant,
you have a very different perspective. What I'm trying to wrap my head around is what is the vision for open
telemetry? Is it specifically envisioned to be the agent that runs wherever the workload is,
whether it's an agent on a host or a layer in a Lambda function or a sidecar or whatnot in a
Kubernetes cluster that winds up gathering and sending data out? Or is the vision something
different? Because part of what you're saying
aligns with my perspective on it,
but other parts of it seem
that there's a misunderstanding somewhere,
and it's almost certainly on my part.
I think the long-term vision is that
you as a developer, you as an SRE,
don't even have to think about open telemetry.
That when you are using your container orchestrator
or you are using your API framework
or you're using your managed API gateway
or any kind of software that you're building something with,
that the telemetry data from that software
is emitted in open telemetry format, right?
And when you are writing your code,
you know, and you're using gRPC, let's say,
you could just natively expect that OpenTelemetry
is kind of there in the background
and it's integrated into the actual libraries themselves.
And so you can just call the OpenTelemetry API
and it's part of the standard library almost, right? You add some
additional metadata to a span and say like,
oh, this is the customer ID or this is some interesting
attribute that I want to track for later on or I'm going to create
a histogram here or a counter, whatever it is. And then all
that data is just kind of there, right?
Invisible to you unless you need it.
And then when you need it, it's there for you to kind of pick up and send off somewhere
to any number of backends or databases or whatnot that you could then use to discover problems
or better model your system.
That's the long-term vision, right?
That it's just there, everyone uses it,
it is a de facto and de jure standard.
I think in the medium term,
it does look a little bit more like OpenTelemetry
is kind of this Swiss army knife agent that's running on inside cars and Kubernetes, or it's running on your EC2 instance.
Until we get to the point of everyone just agrees that we're going to use OpenTelemetry protocol for the data, and we're going to use all your stuff, and we just natively emit it, then that's going to be how long we're in that midpoint.
But that's sort of the medium and long-term vision, I think. Does that track? It does. I'm trying to equate this
to the evolution back in the Stone Age was back when I was first getting started, Nagios was the
gold standard. It was kind of the original call of duty. And it was awful. There were a bunch of problems with it, but it also worked. I'm not trying to dunk on the people who built that. We all stand on the shoulders of giants. It was an open source project that was awesome doing exactly what it did, but it was a product built for a very different time. It completely had the wheels fall off as soon as you got to things that were even slightly
ephemeral because it required this idea of the server needed to know where all of the
things it was monitoring lived as an individual host basis.
So there was this constant joy of, oh, we're going to add things to a cluster.
Its perspective was, what's a cluster?
Or you'd have these problems with a core switch going down and suddenly everything else would
explode as well.
And even setting up an on-call rotation
for who got paged when was nightmarish.
And a bunch of things have evolved since then,
which is putting it mildly.
You'd say that about fire, the invention of the wheel.
Yeah, a lot of things have evolved
since the invention of the wheel.
And here we are, freaking sand into thinking.
But we find ourselves just,
now it seems that the outcome of all of this has been, instead of one option that's the de facto
standard that's kind of terrible in its own ways, now we have an entire universe of different
products, many of which are best of breed at one very specific thing, but nothing's great at
everything. It's the multifunction printer
conundrum, where you find things that are great at one or two things at most, and then mediocre
at best at the rest. I'm excited about the possibility for OpenTelemetry to really get
to a point of best of breed for everything. But it also feels like the money folks are pushing
for consolidation. If you believe a lot of the analyst reports around this of, well, we already pay for seven different
observability vendors. How about we knock it down
to just one that does all of these things?
Because that would be terrible.
Where do you land on that?
Well, as I
alluded to this earlier,
I think
the consolidation
in the observability space in general
is very much driven by that force you just pointed out, right?
The buyers want to consolidate more and more things into single pools.
And I think there are good reasons for that.
But I also feel like a lot of those reasons are driven by fundamentally telemetry side concerns, right?
So one example of this is if you were large business X and you see you are an engineering director and you get a report that's like, we have eight different metrics products.
And you're like, that seems like a lot.
Let's just use brand X and brand X will tell you very, very happily tell you like, oh, you just install our thing everywhere and you can get rid of all these other, all these other tools. right? One reason is that they are forced to, and then they are forced to do a bunch of integration work to get whatever
the old stuff was working in the new way.
But the other reason is because they tried a bunch of different things and
they found the one tool that actually worked for them.
And what happens invariably in these sort of consolidation stories is,
you know, the new vendor comes in
on a shining horse to consolidate and you wind up instead of eight distinct metrics tools,
now you have nine distinct metrics tools because there's never any bandwidth for people to go back
and, you know, your Nagios example, right? People still use Nagios every day.
What's the economic justification
to take all those Nagios installs if they're working
and put them into something else, right?
What's the economic justification to go
and take a bunch of old software that hasn't
been touched for 10 years that still runs
and still does what it needs to do
like where's the incentive to go and re-instrument that with open telemetry or anything else
it doesn't necessarily exist right and that's a pretty i think fundamental
decision point in everyone's observability journey which is what do you do about all the old stuff
because most of the stuff is the old stuff.
And the worst part is most of the stuff
that you make money off of is the old stuff as well.
So you can't ignore it.
And if you're spending millions and millions of dollars
on the new stuff,
like there was a story that went around a while ago.
I think Coinbase spent something like, what,
$60 million on Datadog. I hope they asked for it in real money and not bitcoin but yeah something i've noticed
about all the vendors and even coinbase themselves very few of them actually transact in cryptocurrency
it's always cash on the barrel head so to speak yeah smart but still like that's a absurd amount
of money for any product or service i would argue argue, right? But that's just my perspective.
I do think, though, it goes to show you that it's very easy to get into these sort of things where
you're just spending over the barrel to the newest vendor that's going to come in and solve all your
problems for you. And it often doesn't work that way because most places aren't, especially large
organizations, just aren't built in
this sort of like, oh, we can go through and we can just redo stuff, right?
We can just roll out a new agent through whatever.
We have mainframes to think about.
In many cases, you have an awful lot of business systems that most kind of cloud people don't
like to think about right like sap
or salesforce or service now or whatever and those sort of business process systems are actually
responsible for quite a few things that are interesting from a observability point of view
but you don't see i mean hell you don't even OpenTelemetry going out and saying, oh, well, here's a thing to let you observe
Apex applications on Salesforce.
It's kind of an undiscovered country in a lot of ways, and it's something that
I think we will have to grapple with as we go forward.
In the shorter term, there's a reason that OpenTelemetry mostly focuses on cloud-native
applications, because that's a reason that OpenTelemetry mostly focuses on cloud-native applications because that's
a little bit easier to actually do what we're trying
to do on them, and
that's where the heat and light is.
But once we get done with
that, then the sky's the limit.
It still feels like OpenTelemetry
is evolving rapidly.
It's certainly not, I don't want to say
it's not feature-complete, which, again,
software's never done, but it does seem like even quarter to quarter or month to month,
its capabilities expand massively because you apparently enjoy pain. You're in the process
of writing a book that I think is an early release, early access that comes out in next year,
2024. Why would you do such a thing? That's a great question. And if I ever figure out the
answer, I will tell you. Remember, no one wants to write a book.
They want to have written the book.
And the worst part is I have written a book, and for some reason I went back for another round.
It's like childbirth.
No one remembers exactly how horrible it was.
Yeah, my partner could probably attest to that.
Although I was in the room, and I don't think I'd want to do it either.
So I think the real reason that I decided to go
and kind of write this book,
and it's Learning Open Telemetry.
It's in early release right now
on the O'Reilly Learning Platform.
And it'll be out in print
and digital next year,
I believe we're targeting right now,
early next year.
But the goal is,
as you pointed out so
eloquently, OpenTelemetry changes a lot. And it changes month to month sometimes. So why would
someone decide, say, hey, I'm going to write the book about learning this? Well, there's a very
good reason for that. And it is that I've looked at a lot of the other books out there on OpenTelemetry,
on observability in general general and they talk a lot
about like here's how you use the api here's how you use the sdk here's how you make a trace or a
span or a log statement or whatever and it's it's very technical it's very kind of in the weeds
what i was interested in is saying like okay let's put all that stuff aside because you don't necessarily,
I'm not saying any of that stuff is going to change. I'm not saying that how to make a span
is going to change tomorrow. It's not. But learning how to actually use something like
OpenTelemetry isn't just knowing how to create a measurement or how to create a trace. It's how do I actually use this in a production system?
To my point earlier, how do I use this to get data about these quote-unquote legacy
systems?
How do I use this to monitor a Kubernetes cluster?
What's the important parts of building these observability pipelines?
If I'm maintaining a library, how should I integrate OpenTelemetry into that library
for my users?
And so on and so on and so forth.
And the answers to those questions actually probably aren't going to change a ton over
the next four or five years, which is good because that makes it the perfect thing to
write a book about.
So the goal of learning OpenTelemetry is to help you learn not just how to use OpenTelemetry at an API or SDK level, but it's how to build an observability pipeline with OpenTelemetry. It's
how to roll it out to an organization. It's how to convince your boss that this is what you should use both for new and maybe picking up some legacy development.
It's really meant to give you that sort of 10,000-foot view of what are the benefits of this, how does it bring value, and how can you use it to build value for an observability practice in an organization.
I think that's fair. Looking at the more, quote-unquote,
evergreen style of content as opposed to...
That's the reason, for example,
I never wind up doing tutorials
on how to use an AWS service
because one console change away
and suddenly I have to redo the entire thing.
That's a treadmill I never had much interest in getting on.
One last topic I want to get into
before we wind up wrapping the episode
because I almost feel
obligated to sprinkle this all over everything because the analysts tell me i have to what's
your take on generative ai specifically with an eye toward observability oh gosh i've been thinking
a lot about this and hot take alert as a skeptic of many technological bubbles over the past five or so years, ten years, I'm actually pretty hot on AI, generative AI, large language models, things like that. all make are perfect, funny, deep dream meme characters or whatever through stable effusion
or whatever chat GPT spits out at us when we ask for a joke. I think the real win here is that this
to me is like the biggest advance in human computer interaction since resistive touchscreens.
Actually, probably since the mouse. I would agree with that.
And I don't know if anyone has tried to get someone that is over the age of 70 to use a computer
at any time in their life,
but mapping human language
to trying to do something on an operating system,
trying to do something on a computer or on the web,
is honestly one of the most challenging things
that faces interface design,
faces OS designers, faces anyone.
And I think this also applies for dev tools in general, right?
Like, if you think about observability, you think about, like,
well, what are the actual tasks involved in observability?
It's like, well, you're making, you're asking questions.
You're saying, like, hey, for this metric named HTTP requests by code,
there's four or five dimensions,
and you say, like, okay, well, break this down for me.
You know, you have to kind of know the magic words, right?
You have to know the magic promQL sequence
or whatever else to plug in
and to get it to graph that for you.
And you, as an operator, have to be,
have this very, very well-developed, like,
depth of knowledge and math and statistics
to really kind of get a lot of...
You must be at least this smart to ride on this ride.
Yeah. And I think that, like, that to me is the real, the short-term win for certainly
generative AI around using, like, large language models is the ability to create
human language interfaces to observability tools that...
As opposed to learning your own custom SQL dialect,
which I see a fair number of times.
Right.
And it's actually very funny because there was a while for the...
One of my side projects for the past little bit
has been this idea of,
can we make a universal query language or a universal query layer
that you could ship your dashboards or ship your alerts or
whatever and then it's like generative ai kind of just you know completely leapfrogs that right it
just says like well why would you need a query language if we can just if you can just ask the
computer and it works right the most common programming language is about to become english
which i mean there's an awful lot of externalities there which is great i want to be clear i'm not
here to gatekeep yeah i mean i think there's a lot of externalities there. Which is great. I want to be clear. I'm not here to gatekeep.
Yeah.
I mean, I think there's a lot of externalities there,
and there's a lot,
and the kind of hype to provable benefit ratio is very skewed right now towards hype.
That said, one of the things that is concerning to me
as sort of an observability practitioner
is the amount of people that are just like whole hog
throwing themselves into like oh we
need to we need to integrate generative ai right like we need to put ai chatbots and we need to
have chat gpt built into our products and da da da da da and now you kind of have this perfect storm
of people that really don't have because they're just using these APIs to integrate Gen AI stuff with. They really don't understand what it's doing
because it is very complex,
and I'll be the first to admit that I really don't understand
what a lot of it is doing on the deep foundational math side.
But if we're going to have trust in any kind of system,
we have to understand what it's doing, right?
And so the only way that we can understand what it's doing
is through observability,
which means it's incredibly important for organizations
and companies that are building products on generative AI
to don't walk, run towards something that is going to give you observability into these language models.
Yeah, the computer said so is strangely dissatisfying.
Yeah, you need to have a base, you know, sort of performance golden signals, obviously,
but you also need to really understand what are the questions being asked.
As an example, let's say you have something that is tokenizing questions.
You really probably do want to have some sort of observability on the hotpad there
that lets you kind of break down common tokens,
especially if you were using custom dialects or vectors or whatever
to modify the neural network model.
You really want to see what's the frequency of the certain tokens
that I'm getting that are hitting the vectors versus not.
Where can I improve these sort of things?
Where am I getting
unexpected results?
And maybe
even have some sort of
continuous feedback mechanism
that it could be either
analyzing the tone and tenor
of end-user responses, or it could have the little fr tone and tenor of end user responses
or it could have the little frowny and happy face, whatever it is.
Something that is giving that kind of constant feedback about,
hey, this is how people are actually interacting with it.
Because I think there's way too many stories right now of people just kind of saying,
oh, okay, here's some AI-powered search, and people just, like, hating it.
Because people are already very primed to distrust AI, I think.
And I can't blame anyone.
Well, we had an entire lifetime of movies telling us that it's going to kill us all.
Yeah.
And now you have a bunch of also billionaire tech owners
who are basically intent on making that reality.
But that's neither here nor there.
It isn't, but like I said, it's difficult.
It's actually one of the first times
that I've found myself very conflicted.
Yeah, I'm a booster of this stuff.
I love it, but at the same time,
you have some of the ridiculous hype around it
and the complete lack of attention
to safety and humanity aspects of it.
I like the technology,
and I think it has a lot of promise,
but I don't want to get lumped in with that set.
Exactly.
The technology is great. The fan base is maybe something a little different but i do think that
for lack of a better to not not to be an inevitabilist or whatever but i do think that
there is a significant amount of like like this is a genie you can't put back in the bottle and
it is going to have like wide-r ranging transformative effects on the discipline of like software development,
software engineering and white collar work in general.
Right?
Like there's a lot of,
if your job involves like putting numbers into Excel and making pretty
spreadsheets,
then Ooh,
that doesn't seem like something that's going to do too hot when I can just
have Excel do that for me.
And I think we do need to be aware of that, right?
We do need to have that sort of conversation about what are we actually comfortable doing here in terms of displacing human labor?
When we do displace human labor, are we doing it so that we can actually give people leisure time or so that we can just cram even more work down the throats of the humans that are left?
And unfortunately, I think we might know where that answer is, at least on our current path.
That's true. But you know, I'm an optimist.
I don't do well with disappointment, which this show has certainly not been. I really want to
thank you for taking the time to speak with me today. If people want to learn more,
where's the best place
for them to find you?
Well, you can find me
on most social media,
many, many social medias.
I used to be on Twitter a lot,
and then we all would happen to there.
The best place to figure out
what's going on
is check out my bio,
social.ap2.io.
We'll give you all the links to where I am. And yeah, it's been great talking with you. Likewise. Thank you so much for taking
the time out of your day. Austin Parker, Community Maintainer for OpenTelemetry. I'm
Cloud Economist Corey Quinn, and this is Screaming in
the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform
of choice. Whereas if you've hated this podcast, please leave a five-star review on your podcast
platform of choice, along with an angry comment pointing out that actually physicists say the
vast majority of the universe is empty space, so that we can later correct you by saying,
ah, but it's empty white space. That's right. YAML wins again.
If your AWS bill keeps rising and your blood pressure is doing the same,
then you need the Duck Bill Group. We help companies fix their AWS bill by making it smaller and less horrifying.
The Duckbill Group works for you, not AWS.
We tailor recommendations to your business, and we get to the point.
Visit duckbillgroup.com to get started.