Screaming in the Cloud - The Evolution of OpenTelemetry with Austin Parker

Episode Date: September 5, 2023

Austin Parker, Community Maintainer at OpenTelemetry, joins Corey on Screaming in the Cloud to discuss OpenTelemetry’s mission in the world of observability. Austin explains how the OpenTel...emetry community was able to scale the OpenTelemetry project to a commercial offering, and the way Open Telemetry is driving innovation in the data space. Corey and Austin also discuss why Austin decided to write a book on OpenTelemetry, and the book’s focus on the evergreen applications of the tool. About AustinAustin Parker is the OpenTelemetry Community Maintainer, as well as an event organizer, public speaker, author, and general bon vivant. They've been a part of OpenTelemetry since its inception in 2019.Links Referenced:OpenTelemetry: https://opentelemetry.io/Learning OpenTelemetry early release: https://www.oreilly.com/library/view/learning-opentelemetry/9781098147174/Page with Austin’s social links: https://social.ap2.io

Transcript
Discussion (0)
Starting point is 00:00:00 Hello, and welcome to Screaming in the Cloud, with your host, Chief Cloud Economist at the Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud. Look, I get it. Folks are being asked to do more and more.
Starting point is 00:00:34 Most companies don't have a dedicated DBA because that person now has a full-time job figuring out which one of AWS's multiple managed database offerings is right for every workload. Instead, developers and engineers are being asked to support and, heck, if time allows, optimize their databases. That's where OtterTune comes in. Their AI is your database co-pilot for MySQL and PostgreSQL on Amazon RDS or Aurora. It helps improve performance by up to 4x or reduce cost by 50%. Both of those are decent options. Go to OtterTune.com to learn more and start a free trial.
Starting point is 00:01:10 That's O-T-T-E-R-T-U-N-E dot com. Welcome to Screaming in the Cloud. I'm Corey Quinn. It's been a few hundred episodes since I had Austin Parker on to talk about the things that Austin cares about. But it's time to rectify that. Austin is the community maintainer for Open Telemetry, which is a CNCF project, if you're unfamiliar with. We're probably going to fix that in short order. Austin, welcome back. It's been a month of Sundays.
Starting point is 00:01:42 It has been a month and a half of Sundays. A whole pandemic and a half. So much has happened since then. I've tried to instrument something with OpenTelemetry about a year and a half ago. And in defense to the project, my use case is always very strange. But it felt like a lot of things have sharp edges, but it felt like this had so many sharp edges that you just pivot to being a chainsaw.
Starting point is 00:02:06 And I would have been at least a little bit more understanding of why it hurts so very much. But I have heard from people that I trust that the experience has gotten significantly better. Before we get into the nitty gritty of me lobbing passive aggressive bug reports at you for you to fix in a scenario in which you can't possibly refuse me. Let's start with the beginning. What is OpenTelemetry? That's a great question. Thank you for asking it.
Starting point is 00:02:36 So OpenTelemetry is an observability framework. It is run by the CNCF, home of such wonderful, award-winning technologies as Kubernetes. And, you know, the second biggest source of YAML in the known universe. On some level, it feels like that is right there with hydrogen as far as unlimited resources in our universe. It really is. And, you know, as we all know, there are two things that make sort of the DevOps and cloud world go around. One of them being, as you would probably know, AWS bills, and the second being YAML.
Starting point is 00:03:14 But OpenTelemetry tries to kind of carve a path through this, right? Because we're interested in observability. And observability, for those that don't know or have been living under a rock or not reading blogs, it's a lot of things. But we can generally sort of describe it as like, this is how you understand what your system is doing. I like to describe it as it's a way that we can model systems, especially complex distributed or decentralized software systems that are pretty commonly found in large organizations of every shape and size, quite often running on Kubernetes, quite often running in public or private clouds.
Starting point is 00:03:54 And the goal of observability is to help you model this system and understand what it's doing, which is something that I think we can all agree is a pretty important part of our job as software engineers. Where OpenTelemetry fits into this is as the framework that helps you get the telemetry data you need from those systems, put it into a universal format, and then ship it off to some observability backend, you know, a Prometheus or a Datadog or whatever, in order to analyze that data and get answers to your questions you have. From where I sit, the value of OTEL or open telemetry, people in software engineering love abbreviations that are impenetrable from the outside. So of course, we're going to lean
Starting point is 00:04:41 into that. But what I've found for my own use cases is the shining value prop was that I could instrument an application with OTEL, in theory, and then send whatever I wanted that was emitted in terms of telemetry, be it events, be it logs, be it metrics, etc. all of a creation of vendors on a case-by-case basis, which meant that suddenly it was the first step in, I guess, an observability pipeline, which increasingly is starting to feel like an industrial observability complex where there's so many different companies out there. It seems like a good approach to use to start, I guess, racing vendors in different areas to see which performs better.
Starting point is 00:05:22 One of the challenges I had with that when I started down that path is it felt like every vendor who was embracing OTEL did it from a perspective of their implementation. Here's how to instrument it to send it to us because we're the best, obviously. And you're a community maintainer. Despite working at observability vendors yourself, you have always been one of those community-first types where you care more about the user experience than you do this quarter for any particular employer that you have, which, to be very clear, is intended as a compliment,
Starting point is 00:05:53 not a terrifying warning. It's why you have this authentic air to you and why you are one of those very few voices that I trust in a space where normally I need to approach it with significant skepticism. How do you see the relationship between vendors and Open open telemetry? I think the hard thing is that I know who signs my paychecks at the end of the day,
Starting point is 00:06:14 right? And you always have, you know, some level of, you know, let's say bias, right? Because it is a bias to look after, you look after them who brought you to the dance. But I think you can be responsible with balancing the needs of your employer and the needs of the community. The way I've always described this is that if you think about observability as a market,
Starting point is 00:06:45 what's the total addressable market there. It's literally everyone that uses software. It's literally every software company, which means there's a plenty of room for people to make their numbers and to buy and sell and trade and do all this sort of stuff. And by taking that approach, by taking sort of the big picture approach and saying, well,
Starting point is 00:07:04 look, you know, there's going to be, of all these people, there are going to be some of them that are going to use our stuff. And there are some of them that are going to use our competitor's stuff. And that's fine. Let's figure out where we can invest in an open telemetry in a way that makes sense for everyone and not just our people. So let's build things like documentation. One of the things I'm most impressed with with OpenTelemetry over the past two years is we went from being, as a project, if you search for OpenTelemetry, you would go and you would get five or six or ten different vendor pages coming up
Starting point is 00:07:44 trying to tell you, this is how you use it, this is how you use it. And what we've done as a community is we've said, you know, if you go looking for documentation, you should find our website, you should find our resources. And we've managed to get the OpenTelemetry website to basically rank above almost everything else when people are searching for help with OpenTelemetry. And that's been really good because, one, it means that now, rather than vendors or whoever coming in saying, well, we can do this better than you, we can be like, well, look, just put your effort here. It's already the top result. It's already where people are coming, and we can prove that. And two, it means that as people come in,
Starting point is 00:08:22 they're going to be put into this process of community feedback where they can go in, they can look at the docs, and they can say, oh, well, I had a bad experience here, or how do I do this? And we get that feedback, and then we can improve the docs for everyone else by acting on that feedback. And the net result of this is that more people are using OpenTelemetry, which means there are more people kind of going into the tippy-tippy top of the funnel, right, that are able to become a customer of one of these myriad observability backends. You touched on something very important here. When I first was exploring this, you may have been looking over my shoulder as I went through this process. oh, this is a CNCF project, in quotes, where this is not true universally, of course, but there are cases where it clearly is, where this is an effectively vendor-captured project, not necessarily by one vendor, but by an almost consortium of them. And that was my takeaway from OpenTelemetry. It was conversations with you, among others, that led me to believe, no, no,
Starting point is 00:09:22 this is not in that vein. This is clearly something that is a win. There are just a whole bunch of vendors more or less falling all over themselves, trying to stake out thought leadership and imply ownership on some level of where these things go. But I definitely left with a sense that this is bigger than any one vendor. I would agree. I think to even step back further, right, there's almost two different ways that I think vendors or anyone can approach open telemetry,
Starting point is 00:09:51 you know, from a market perspective. And one is to say like, oh, this is socializing kind of the maintenance burden of instrumentation, which is a huge cost for commercial players, right? Like if you're Datadog or Splunk or whoever, you have these agents that you go in and they
Starting point is 00:10:13 rip telemetry out of your web servers, out of your gRPC libraries, whatever. And it costs a lot of money to pay engineers to maintain those instrumentation agents right and the cynical take is oh look at all these big companies that are kind of like pushing all that labor onto the open source community and you know i'm not casting any aspersions here like i do think that there's an element of truth to it though because yeah that, yeah, that is a huge fixed cost. And if you look at the actual lived reality of people and you look at back when SignalFX was still a going concern, right, and they had their APM agents open sourced, you could go into the SignalFX repo and diff their Node Express instrumentation against the Datadog Node Express instrumentation. And it's almost 100% the same, right? Because it's truly a commodity.
Starting point is 00:11:10 There's nothing interesting about how you get that telemetry out. The interesting stuff all happens after you have the telemetry and you've sent it to some backend, and then you can analyze it and find interesting things. So yeah, it doesn't make sense for there to be five or six or eight different companies all competing to rebuild the same wheels over and over and over and over when they don't have to. I think the second thing that some people are starting to understand is that it's like, okay, let's take this a step beyond instrumentation, right? Because the goal of OpenTelemetry really is to make sure that this
Starting point is 00:11:48 instrumentation is native so that you don't need a third-party agent. You don't need some other process or jar or whatever that you drop in and it instruments stuff for you. The JVM should provide this. Your web framework should provide this. Your RPC library should provide this, right? This data should come from the code itself and be in a normalized fashion that can then be sent to any number of vendors or backends or whatever. And that changes how sort of the competitive landscape a lot, I think, for observability vendors. Because rather than kind of what you have now, which is people competing on like, well, how quickly can I throw this agent in
Starting point is 00:12:32 and get set up and get a dashboard going? It really becomes more about like, okay, how are you differentiating yourself against every other person that has access to the same data, right? And you get more interesting use cases and how much more interesting analysis features. And that results in more innovation
Starting point is 00:12:50 in sort of this industry than we've seen in a very long time. For me, just coming from the customer side of the world, one of the biggest problems I had with observability in my career as an SRE type for years was you would wind up building your observability in my career as an SRE type for years was you would wind up building your observability pipeline around whatever vendor you had selected. And that meant emphasizing things they were good at and de-emphasizing things that they weren't. And sometimes it's
Starting point is 00:13:14 worked to your benefit, usually not. But then you always had this question when it got to things that touched on APM or whatnot, or application performance monitoring, where, oh, just embed our library into this. Okay, great. But a year and a half ago, my exposure to this was on an application that I was running in a distributed fashion on top of AWS Lambda. So great. You can either use an extension for this, or you can build in the library yourself. But then there's always a question of precedence, where when you have multiple things that are looking at this from different points of view, which one gets done first? Which one is going to see the others? Which one is going to enmesh the others and enclose the others in its own perspective of the world?
Starting point is 00:13:53 And it just got incredibly frustrating. One of the, at least for me, bright lights of OTA was that it got away from that, where all of the vendors receiving telemetry got the same view. Yeah, they all get the same view. They all get the same data. And there's a pretty rich collection of tools that we're starting to develop to help you build those pipelines yourselves and really own everything from the point of generation to intermediate collection
Starting point is 00:14:22 to actually outputting it to wherever you want to go. For example, a lot of really interesting work has come out of the OpenTelopathy Collector recently. One of them is this feature called Connectors. And Connectors let you take the output of certain pipelines and route them as inputs to another pipeline. And as part of that connection, you can transform stuff. So for example, let's say you have a bunch of spans or traces coming from your API endpoints. And you don't necessarily want to keep all those traces in their raw form because maybe they aren't interesting or maybe they're just too high of a volume.
Starting point is 00:15:06 So with connectors, you can go and you can actually convert all of those spans into metrics and export them to a metrics database. You could continue to save that span data if you want, but you have options now, right? Like you can take that span data and put it into cold storage
Starting point is 00:15:23 or put it into like, you know, some sort of slow blob storage thing where it's not actively indexed, it's slow lookups, and then keep a metric representation of it in your alerting pipeline, use metadata exemplars or whatever to kind of connect those things back.
Starting point is 00:15:40 And so when you do suddenly see, it's like, oh, well, there's some interesting P99 behavior or we're hitting an alert or violating an SLO or whatever, then do suddenly see, it's like, oh, well, there's some interesting P99 behavior, or we're hitting an alert, or we're violating an SLO, or whatever. Then you can go back and say, okay, well, let's go dig through the slow data. Let's look at the cold data to figure out what actually happened. And those are features that historically you wouldn't have needed to go to a big, important vendor and say, hey, here's a bunch of money. Do this for me.
Starting point is 00:16:08 Now you have the option to do all that more interesting pipeline stuff yourself and then make choices about vendors based on who's making a tool that can help me with the problem that I have. Because most of the time, I feel like we tend to treat observability tools as it depends a lot on where you sit in the org. But you've certainly seen this movement towards like, well, we don't want a tool. We want a platform.
Starting point is 00:16:34 We want to go to Lowe's and we want to get the 48-in-1 kit that has a bunch of things in it. And we're going to pay for the 48-in-1 kit even if we only need like two things or three of things in it. And we're going to pay for the $40,000 kit even if we only need like two things or three things out of it. OpenTelemetry lets you kind of step back and say like, well, what if we just got like really high quality tools for the two or three things we need? And then for the rest of this stuff,
Starting point is 00:16:59 we can use other cheaper options, which is I think really attractive, especially in today's macroeconomic conditions, let's say. One thing I'm trying to wrap my head around, because we all find when it comes to observability in my experience, it's the parable of three blind people trying to describe an elephant by touch. Depending on where you are in the elephant, you have a very different perspective. What I'm trying to wrap my head around is what is the vision for open telemetry? Is it specifically envisioned to be the agent that runs wherever the workload is, whether it's an agent on a host or a layer in a Lambda function or a sidecar or whatnot in a
Starting point is 00:17:39 Kubernetes cluster that winds up gathering and sending data out? Or is the vision something different? Because part of what you're saying aligns with my perspective on it, but other parts of it seem that there's a misunderstanding somewhere, and it's almost certainly on my part. I think the long-term vision is that you as a developer, you as an SRE,
Starting point is 00:17:59 don't even have to think about open telemetry. That when you are using your container orchestrator or you are using your API framework or you're using your managed API gateway or any kind of software that you're building something with, that the telemetry data from that software is emitted in open telemetry format, right? And when you are writing your code,
Starting point is 00:18:30 you know, and you're using gRPC, let's say, you could just natively expect that OpenTelemetry is kind of there in the background and it's integrated into the actual libraries themselves. And so you can just call the OpenTelemetry API and it's part of the standard library almost, right? You add some additional metadata to a span and say like, oh, this is the customer ID or this is some interesting
Starting point is 00:18:55 attribute that I want to track for later on or I'm going to create a histogram here or a counter, whatever it is. And then all that data is just kind of there, right? Invisible to you unless you need it. And then when you need it, it's there for you to kind of pick up and send off somewhere to any number of backends or databases or whatnot that you could then use to discover problems or better model your system. That's the long-term vision, right?
Starting point is 00:19:32 That it's just there, everyone uses it, it is a de facto and de jure standard. I think in the medium term, it does look a little bit more like OpenTelemetry is kind of this Swiss army knife agent that's running on inside cars and Kubernetes, or it's running on your EC2 instance. Until we get to the point of everyone just agrees that we're going to use OpenTelemetry protocol for the data, and we're going to use all your stuff, and we just natively emit it, then that's going to be how long we're in that midpoint. But that's sort of the medium and long-term vision, I think. Does that track? It does. I'm trying to equate this to the evolution back in the Stone Age was back when I was first getting started, Nagios was the
Starting point is 00:20:21 gold standard. It was kind of the original call of duty. And it was awful. There were a bunch of problems with it, but it also worked. I'm not trying to dunk on the people who built that. We all stand on the shoulders of giants. It was an open source project that was awesome doing exactly what it did, but it was a product built for a very different time. It completely had the wheels fall off as soon as you got to things that were even slightly ephemeral because it required this idea of the server needed to know where all of the things it was monitoring lived as an individual host basis. So there was this constant joy of, oh, we're going to add things to a cluster. Its perspective was, what's a cluster? Or you'd have these problems with a core switch going down and suddenly everything else would explode as well. And even setting up an on-call rotation
Starting point is 00:21:07 for who got paged when was nightmarish. And a bunch of things have evolved since then, which is putting it mildly. You'd say that about fire, the invention of the wheel. Yeah, a lot of things have evolved since the invention of the wheel. And here we are, freaking sand into thinking. But we find ourselves just,
Starting point is 00:21:26 now it seems that the outcome of all of this has been, instead of one option that's the de facto standard that's kind of terrible in its own ways, now we have an entire universe of different products, many of which are best of breed at one very specific thing, but nothing's great at everything. It's the multifunction printer conundrum, where you find things that are great at one or two things at most, and then mediocre at best at the rest. I'm excited about the possibility for OpenTelemetry to really get to a point of best of breed for everything. But it also feels like the money folks are pushing for consolidation. If you believe a lot of the analyst reports around this of, well, we already pay for seven different
Starting point is 00:22:06 observability vendors. How about we knock it down to just one that does all of these things? Because that would be terrible. Where do you land on that? Well, as I alluded to this earlier, I think the consolidation
Starting point is 00:22:21 in the observability space in general is very much driven by that force you just pointed out, right? The buyers want to consolidate more and more things into single pools. And I think there are good reasons for that. But I also feel like a lot of those reasons are driven by fundamentally telemetry side concerns, right? So one example of this is if you were large business X and you see you are an engineering director and you get a report that's like, we have eight different metrics products. And you're like, that seems like a lot. Let's just use brand X and brand X will tell you very, very happily tell you like, oh, you just install our thing everywhere and you can get rid of all these other, all these other tools. right? One reason is that they are forced to, and then they are forced to do a bunch of integration work to get whatever
Starting point is 00:23:28 the old stuff was working in the new way. But the other reason is because they tried a bunch of different things and they found the one tool that actually worked for them. And what happens invariably in these sort of consolidation stories is, you know, the new vendor comes in on a shining horse to consolidate and you wind up instead of eight distinct metrics tools, now you have nine distinct metrics tools because there's never any bandwidth for people to go back and, you know, your Nagios example, right? People still use Nagios every day.
Starting point is 00:24:07 What's the economic justification to take all those Nagios installs if they're working and put them into something else, right? What's the economic justification to go and take a bunch of old software that hasn't been touched for 10 years that still runs and still does what it needs to do like where's the incentive to go and re-instrument that with open telemetry or anything else
Starting point is 00:24:31 it doesn't necessarily exist right and that's a pretty i think fundamental decision point in everyone's observability journey which is what do you do about all the old stuff because most of the stuff is the old stuff. And the worst part is most of the stuff that you make money off of is the old stuff as well. So you can't ignore it. And if you're spending millions and millions of dollars on the new stuff,
Starting point is 00:24:57 like there was a story that went around a while ago. I think Coinbase spent something like, what, $60 million on Datadog. I hope they asked for it in real money and not bitcoin but yeah something i've noticed about all the vendors and even coinbase themselves very few of them actually transact in cryptocurrency it's always cash on the barrel head so to speak yeah smart but still like that's a absurd amount of money for any product or service i would argue argue, right? But that's just my perspective. I do think, though, it goes to show you that it's very easy to get into these sort of things where you're just spending over the barrel to the newest vendor that's going to come in and solve all your
Starting point is 00:25:37 problems for you. And it often doesn't work that way because most places aren't, especially large organizations, just aren't built in this sort of like, oh, we can go through and we can just redo stuff, right? We can just roll out a new agent through whatever. We have mainframes to think about. In many cases, you have an awful lot of business systems that most kind of cloud people don't like to think about right like sap or salesforce or service now or whatever and those sort of business process systems are actually
Starting point is 00:26:16 responsible for quite a few things that are interesting from a observability point of view but you don't see i mean hell you don't even OpenTelemetry going out and saying, oh, well, here's a thing to let you observe Apex applications on Salesforce. It's kind of an undiscovered country in a lot of ways, and it's something that I think we will have to grapple with as we go forward. In the shorter term, there's a reason that OpenTelemetry mostly focuses on cloud-native applications, because that's a reason that OpenTelemetry mostly focuses on cloud-native applications because that's a little bit easier to actually do what we're trying
Starting point is 00:26:47 to do on them, and that's where the heat and light is. But once we get done with that, then the sky's the limit. It still feels like OpenTelemetry is evolving rapidly. It's certainly not, I don't want to say it's not feature-complete, which, again,
Starting point is 00:27:03 software's never done, but it does seem like even quarter to quarter or month to month, its capabilities expand massively because you apparently enjoy pain. You're in the process of writing a book that I think is an early release, early access that comes out in next year, 2024. Why would you do such a thing? That's a great question. And if I ever figure out the answer, I will tell you. Remember, no one wants to write a book. They want to have written the book. And the worst part is I have written a book, and for some reason I went back for another round. It's like childbirth.
Starting point is 00:27:35 No one remembers exactly how horrible it was. Yeah, my partner could probably attest to that. Although I was in the room, and I don't think I'd want to do it either. So I think the real reason that I decided to go and kind of write this book, and it's Learning Open Telemetry. It's in early release right now on the O'Reilly Learning Platform.
Starting point is 00:27:56 And it'll be out in print and digital next year, I believe we're targeting right now, early next year. But the goal is, as you pointed out so eloquently, OpenTelemetry changes a lot. And it changes month to month sometimes. So why would someone decide, say, hey, I'm going to write the book about learning this? Well, there's a very
Starting point is 00:28:17 good reason for that. And it is that I've looked at a lot of the other books out there on OpenTelemetry, on observability in general general and they talk a lot about like here's how you use the api here's how you use the sdk here's how you make a trace or a span or a log statement or whatever and it's it's very technical it's very kind of in the weeds what i was interested in is saying like okay let's put all that stuff aside because you don't necessarily, I'm not saying any of that stuff is going to change. I'm not saying that how to make a span is going to change tomorrow. It's not. But learning how to actually use something like OpenTelemetry isn't just knowing how to create a measurement or how to create a trace. It's how do I actually use this in a production system?
Starting point is 00:29:08 To my point earlier, how do I use this to get data about these quote-unquote legacy systems? How do I use this to monitor a Kubernetes cluster? What's the important parts of building these observability pipelines? If I'm maintaining a library, how should I integrate OpenTelemetry into that library for my users? And so on and so on and so forth. And the answers to those questions actually probably aren't going to change a ton over
Starting point is 00:29:39 the next four or five years, which is good because that makes it the perfect thing to write a book about. So the goal of learning OpenTelemetry is to help you learn not just how to use OpenTelemetry at an API or SDK level, but it's how to build an observability pipeline with OpenTelemetry. It's how to roll it out to an organization. It's how to convince your boss that this is what you should use both for new and maybe picking up some legacy development. It's really meant to give you that sort of 10,000-foot view of what are the benefits of this, how does it bring value, and how can you use it to build value for an observability practice in an organization. I think that's fair. Looking at the more, quote-unquote, evergreen style of content as opposed to... That's the reason, for example,
Starting point is 00:30:29 I never wind up doing tutorials on how to use an AWS service because one console change away and suddenly I have to redo the entire thing. That's a treadmill I never had much interest in getting on. One last topic I want to get into before we wind up wrapping the episode because I almost feel
Starting point is 00:30:46 obligated to sprinkle this all over everything because the analysts tell me i have to what's your take on generative ai specifically with an eye toward observability oh gosh i've been thinking a lot about this and hot take alert as a skeptic of many technological bubbles over the past five or so years, ten years, I'm actually pretty hot on AI, generative AI, large language models, things like that. all make are perfect, funny, deep dream meme characters or whatever through stable effusion or whatever chat GPT spits out at us when we ask for a joke. I think the real win here is that this to me is like the biggest advance in human computer interaction since resistive touchscreens. Actually, probably since the mouse. I would agree with that. And I don't know if anyone has tried to get someone that is over the age of 70 to use a computer at any time in their life,
Starting point is 00:31:53 but mapping human language to trying to do something on an operating system, trying to do something on a computer or on the web, is honestly one of the most challenging things that faces interface design, faces OS designers, faces anyone. And I think this also applies for dev tools in general, right? Like, if you think about observability, you think about, like,
Starting point is 00:32:14 well, what are the actual tasks involved in observability? It's like, well, you're making, you're asking questions. You're saying, like, hey, for this metric named HTTP requests by code, there's four or five dimensions, and you say, like, okay, well, break this down for me. You know, you have to kind of know the magic words, right? You have to know the magic promQL sequence or whatever else to plug in
Starting point is 00:32:33 and to get it to graph that for you. And you, as an operator, have to be, have this very, very well-developed, like, depth of knowledge and math and statistics to really kind of get a lot of... You must be at least this smart to ride on this ride. Yeah. And I think that, like, that to me is the real, the short-term win for certainly generative AI around using, like, large language models is the ability to create
Starting point is 00:32:59 human language interfaces to observability tools that... As opposed to learning your own custom SQL dialect, which I see a fair number of times. Right. And it's actually very funny because there was a while for the... One of my side projects for the past little bit has been this idea of, can we make a universal query language or a universal query layer
Starting point is 00:33:22 that you could ship your dashboards or ship your alerts or whatever and then it's like generative ai kind of just you know completely leapfrogs that right it just says like well why would you need a query language if we can just if you can just ask the computer and it works right the most common programming language is about to become english which i mean there's an awful lot of externalities there which is great i want to be clear i'm not here to gatekeep yeah i mean i think there's a lot of externalities there. Which is great. I want to be clear. I'm not here to gatekeep. Yeah. I mean, I think there's a lot of externalities there,
Starting point is 00:33:48 and there's a lot, and the kind of hype to provable benefit ratio is very skewed right now towards hype. That said, one of the things that is concerning to me as sort of an observability practitioner is the amount of people that are just like whole hog throwing themselves into like oh we need to we need to integrate generative ai right like we need to put ai chatbots and we need to have chat gpt built into our products and da da da da da and now you kind of have this perfect storm
Starting point is 00:34:17 of people that really don't have because they're just using these APIs to integrate Gen AI stuff with. They really don't understand what it's doing because it is very complex, and I'll be the first to admit that I really don't understand what a lot of it is doing on the deep foundational math side. But if we're going to have trust in any kind of system, we have to understand what it's doing, right? And so the only way that we can understand what it's doing is through observability,
Starting point is 00:34:51 which means it's incredibly important for organizations and companies that are building products on generative AI to don't walk, run towards something that is going to give you observability into these language models. Yeah, the computer said so is strangely dissatisfying. Yeah, you need to have a base, you know, sort of performance golden signals, obviously, but you also need to really understand what are the questions being asked. As an example, let's say you have something that is tokenizing questions. You really probably do want to have some sort of observability on the hotpad there
Starting point is 00:35:29 that lets you kind of break down common tokens, especially if you were using custom dialects or vectors or whatever to modify the neural network model. You really want to see what's the frequency of the certain tokens that I'm getting that are hitting the vectors versus not. Where can I improve these sort of things? Where am I getting unexpected results?
Starting point is 00:35:54 And maybe even have some sort of continuous feedback mechanism that it could be either analyzing the tone and tenor of end-user responses, or it could have the little fr tone and tenor of end user responses or it could have the little frowny and happy face, whatever it is. Something that is giving that kind of constant feedback about,
Starting point is 00:36:13 hey, this is how people are actually interacting with it. Because I think there's way too many stories right now of people just kind of saying, oh, okay, here's some AI-powered search, and people just, like, hating it. Because people are already very primed to distrust AI, I think. And I can't blame anyone. Well, we had an entire lifetime of movies telling us that it's going to kill us all. Yeah. And now you have a bunch of also billionaire tech owners
Starting point is 00:36:38 who are basically intent on making that reality. But that's neither here nor there. It isn't, but like I said, it's difficult. It's actually one of the first times that I've found myself very conflicted. Yeah, I'm a booster of this stuff. I love it, but at the same time, you have some of the ridiculous hype around it
Starting point is 00:36:55 and the complete lack of attention to safety and humanity aspects of it. I like the technology, and I think it has a lot of promise, but I don't want to get lumped in with that set. Exactly. The technology is great. The fan base is maybe something a little different but i do think that for lack of a better to not not to be an inevitabilist or whatever but i do think that
Starting point is 00:37:15 there is a significant amount of like like this is a genie you can't put back in the bottle and it is going to have like wide-r ranging transformative effects on the discipline of like software development, software engineering and white collar work in general. Right? Like there's a lot of, if your job involves like putting numbers into Excel and making pretty spreadsheets, then Ooh,
Starting point is 00:37:38 that doesn't seem like something that's going to do too hot when I can just have Excel do that for me. And I think we do need to be aware of that, right? We do need to have that sort of conversation about what are we actually comfortable doing here in terms of displacing human labor? When we do displace human labor, are we doing it so that we can actually give people leisure time or so that we can just cram even more work down the throats of the humans that are left? And unfortunately, I think we might know where that answer is, at least on our current path. That's true. But you know, I'm an optimist. I don't do well with disappointment, which this show has certainly not been. I really want to
Starting point is 00:38:22 thank you for taking the time to speak with me today. If people want to learn more, where's the best place for them to find you? Well, you can find me on most social media, many, many social medias. I used to be on Twitter a lot, and then we all would happen to there.
Starting point is 00:38:39 The best place to figure out what's going on is check out my bio, social.ap2.io. We'll give you all the links to where I am. And yeah, it's been great talking with you. Likewise. Thank you so much for taking the time out of your day. Austin Parker, Community Maintainer for OpenTelemetry. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform
Starting point is 00:39:10 of choice. Whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry comment pointing out that actually physicists say the vast majority of the universe is empty space, so that we can later correct you by saying, ah, but it's empty white space. That's right. YAML wins again. If your AWS bill keeps rising and your blood pressure is doing the same, then you need the Duck Bill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business, and we get to the point.
Starting point is 00:39:56 Visit duckbillgroup.com to get started.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.