PurePerformance - Semiotics - A Future of Observability we are yet to see with William Louth

Starting point is 00:00:00 It's time for Pure Performance. Get your stopwatch is ready. It's time for Pure Performance with Andy Grabner and Brian Wilson. Hello everybody and welcome to another episode of Pure Performance. My name is Brian Wilson. And as always, I have with me my wonderful and lovely, beautiful and talented and smiling co-host, Andy Grabner today. How are you doing, Andy? It's amazing.

Starting point is 00:00:38 I'm good, but all of these attributes, they keep getting longer and longer. And I think I'm not sure if you are just doing this because you want to play nice because Christmas is coming and you're giving me an early Christmas present. I don't know. Well, yes, that's exactly what it is. But it could be the fact that I just had a double espresso before we recorded. So one of the two, not quite sure. Yeah, yeah. Or just to make everyone think this is a new video, so it's not recording.

Starting point is 00:01:02 It's not pre-recorded parts. And see, and now we actually show that we didn't brief our guest well enough because he started talking, even though we haven't yet given him a chance to actually welcome him on the stage. But William, this is perfect. He's very eager to get started because he's got a lot of great stuff to talk about. Exactly. And that's also how I remember him. And I want to just quickly say one or two.

Starting point is 00:01:28 words. I remember years ago when I started in the observability space and for me, tracing was everything that existed in my life, right? Because at Dinah Trace 20 years ago, this is what we did. And then I met William at a conference, really nice conversations, but at some point he said, you know, these traces, you know, you're doing this all wrong with observability. I've done this before, but you know, you're, you need to look at signals and you need to look at other things. And I was like, wow, he's like challenging me. and he's like kind of like destroying my world and now years later

Starting point is 00:02:04 we keep meeting each other off and on sometimes years pass and then I see another posting from him and William I'm not sure if you remember but sometimes when I see a posting out of the blue and you're commenting on one of my posts then I sometimes say it's good to know that you're still around

Starting point is 00:02:22 which is really good but now without further ado William thank you so much for being on the show Thank you actually. Thank you for responding to some of those posts that we do, challenging us. And the last one that you posted, I think it was about a month or so ago, I then said, now it's really time to get you on the podcast. So now I want to give it over to you. Can you quickly explain to the audience who is William Love? Who are you? What drives you? What's your background? And then I want to dive into the whole reason why you have a different opinion. on the state of observability. Yeah. Okay, so William Loud, I'm, first of all, I'm Irish. I live in Holland, so my hand, works a lot for American companies in the early stage.

Starting point is 00:03:10 But I think looking at my 30 years of engineering, which is really, I've done every job there is, like from always architectural, probably site reliability, architect, observability, profiling tools, I think up and down the stack but probably the best way to look at my career is like those three phases through it. The first phase was helping in telecommunications. So I came from a telecommunications background, building distributed systems for telecommunications,

Starting point is 00:03:42 a lot of corporate technology. I own it, worked for Borland, did a lot of observer technology. While working at Borland in the OR&D, kind of where they would send me out because I was probably the most, well, they just felt William was the most representative, person they can send from warranty for the app servers.

Starting point is 00:03:57 I would do a lot of due diligence, architecture reviews, performance, on-site, investigations. And there I had to use tooling. And that's actually got me into it. So when I joined Borden, there was two parts of the company. There's a tooling company like J-Builder and the other tools there. And then there was the runtime systems, the application servers, COBRA, technology, middleware.

Starting point is 00:04:20 And I was in that group. And I was always consulting there. So I then decided, well, after a while of pitching various products to Borland, and the last one I pitched, well, actually, it was two. There was App Simulator, which was probably a forerunner for what eventually became like Kubernetes and the cloud and AWS. It was a very service warranted because during my time at AppsAW, I felt our scaling issues was we couldn't change the environment quick enough.

Starting point is 00:04:53 And I was involved in the EC-Perf, which is one of these benchmarks that we have to prove which was the fastest application server, which one had the greatest throughput. And I had problems trying to find out what was the ideal setting and how to scale out the machines to different configurations. And that was when I came about an App Simulator. But during that time, I also came up with a product. I'd been reading a book on complex event processing called Power of Events. and I decided, hey, I want to build something else in the middleware space. I worked on our other protocol of Visi Cash, which was a distributed cache before coherence came along.

Starting point is 00:05:32 And I said, okay, let's call it Jeep Java Event Enterprise Event Processing Platform. I pitched it to Borland. They said, no, it's another one of your 10-year projects. Well, another one of, I wouldn't say it would take 10 years, but it was going to be 10 years before someone would actually want it. And that was probably the highlight. This is probably a recurring theme about my career. So I left Portland and I started a company.

Starting point is 00:05:57 And that was my second phase. And that was where I'm really an engineer who fell into being an entrepreneur, you know, just trying to build tools. First product was JDB Insight. That won awards for being the first kind of database transaction analysis tool. I went down to build JX Insight, which is a distributed tracing tool that came out in 2003. And then by 2008, I started to worry about whether it was all working with tracing, and that's when I started moving forward into different products.

Starting point is 00:06:30 I won't get into that, but I'll come back to that in a minute about tracing. And then the last 10 years of my career is kind of like tried to step away, but Andy keeps bringing me in because every time he posts something, it's like the observability in me keeps coming up. I need to talk to Andy again and tell him, because it didn't work the last time. time. So those 10 years, I've really kind of tried to take my experience in observability and build other systems. Because I think at the heart of it, I'm a designer, but a cybernetics kind of person. So I look for other solutions, well, other problems, but are similar to observability.

Starting point is 00:07:07 So I built the control tower. Well, I designed the control terror for post-en-L, which is the biggest logistic, you know, it's a national logistic company, personal logistic company. in Holland, and that was during COVID time, which was actually a very challenging time because they went from one million parcels to two million. So I kind of, my career in the last 10 years, has been jumping into other types of areas, but taking that technology or taking those concepts over, like largely in digital twins, which I think is an extension onto observability. I mean, that's the ultimate of what we're always trying to do. We probably don't do always the simulation part because we see that as very industrial.

Starting point is 00:07:47 But I think at the end, observability, controllability, and operability are kind of the packaging that goes into what a digital twin is. So that's where I have been. That's the kind of those three periods of my career. I think there's some interesting parallels now because I started my career in load testing with Segway software, where we built Sil Performer.

Starting point is 00:08:14 we got acquired by Boerland so I was part of Boiland for a little while and that's why it's all funny because today when I did my research I looked at your LinkedIn profile and then LinkedIn reminded me hey you both worked at Boland but you I think William

Starting point is 00:08:28 you were a little bit before me because when did you leave? I left I think in 2001 I think it was 2001 so yeah maybe it was 2003 or did I join board in 2001 I don't look back that far on my LinkedIn

Starting point is 00:08:43 and my rest of my rest of I had to cut it because it's too long because the older you get in the industry. If you send them a 30-page resume, no one wants to read it. And nobody wants to know what you did when you were 16. Yeah. Yeah. And then even ask for your degree. And you're like, yeah, I have a degree, but it's 30 years old.

Starting point is 00:09:01 Will that make it, is that relevant? Yeah, yeah, yeah. The interesting parallels are because I remember when we did load testing and then Boland came in and then burned, right, our founder, he also pitched the idea to Poland. about, you know, we are breaking systems, but we need something to get insights into the system. And then Boland also said, no, we don't need this. And then he found it down a trace.

Starting point is 00:09:24 So it seems you and Burnt have kind of parallel tracks because you said you've always been an entrepreneur and you wanted to build new tools to help people. It's really interesting. So if it wasn't for Borland, if it wasn't for Borland, both of yours and Burns' path wouldn't have taken where they did. Yeah, yeah. Well, probably that's why it's very common.

Starting point is 00:09:44 I mean, because we started both with the path thing. I mean, we had a little tug-of-war at the beginning because I'd actually brought path technology into JDB Insight. And when dietary came out were path, I kept saying, hey, I had paths before and we would have a lot of fights about just that. I mean, of course, there were different ways of building it. And nowadays, path, what does that mean? It's just a word.

Starting point is 00:10:06 But, yeah, I think there was, I mean, we're all, engineers, at least product engineers like myself, were trying to solve a problem. And a lot of the problems, and I think my problem for my own career was I was trying to solve the problem for William. And I would probably say I'm still trying to solve the problem for William because it's always the William that lives in the future

Starting point is 00:10:28 and not so much the William that most people are like, just engineers today. There are kind of like, I just deal with what, I need to get stuff done with what I have in front of me. And I'm trying to, like, I'm trying to design a future because I don't want to do this any longer. You know, I don't, because I want to get to the next stage. So I think that's why I created the tools when I worked for Borland, that work that work that I had to do at each of the customer sites, I just didn't want to keep repeating it. I wanted to automate it.

Starting point is 00:10:59 And the goal is always to automate yourself, to embed yourself into the product. So that was my vision is I'm, because Broaden would always send me out to customers. And I was like, if I put myself in a product, then they can just sell the product. But, you know, that's actually, it sounds easy in paper, like I'll embed all my intelligence of what I do into a product. But that's not simply a rule. You can't just do rule based in the system. You realize later that your humanity is far bigger than just the intelligence of a few rule system or decision tree. you think he can boil down a product that will be quite effective for everyone.

Starting point is 00:11:40 And it's really, the context is always the issue. And today, if you think about it, context is everything. But that was the part I kind of realized early in the product is that while it was good to do tracing, and by 2008, I think I got a bit worried about it because the issue was, it was just that we were, and this was before everything went microservice. But the systems were quite complex, distributed tracing had turned into tracing as well. So dynamic trace was also not a distributed tracing. You were also profiling, hopping from one node to another.

Starting point is 00:12:16 And the traces were detailed. There was a cold trace, but some of those called traces would be spans or, you know, hopping over. And every time I looked at those cold trees and expanded them, I just felt like this had to change. I really wanted the answers to come out. Because I, you know, when people would ask me to look at the system, I always had this kind of sense of how to figure out what it was wrong. And I was quite quick at doing that, which was why it was very effective and valued. And then when I would, you know, when you see the tools and how people try to interact with them,

Starting point is 00:12:54 I realized that that's not the way I work. So I don't click around like this, you know, drilling down a cold tree. and I'm scanning the large problem space and sometimes I drill down a little bit but I know when not to drill down but what I've seen even when I worked for an instant and we would give them oh and I didn't mention that I worked for Instan it too

Starting point is 00:13:17 but which was sole type yeah but I would also see salespeople demoing and I would always tell them don't start clicking further deeper into the product because you'll get lost even in the demo, which is very important for fluid flow. And that was the problem I had with tracing, is that it wasn't bringing the answer up to the surface.

Starting point is 00:13:43 It had the data, but I didn't want data. I wanted insight. I wanted. And even then, that was the question I kept saying. So every time I would get to insight or to a rule, I would embed in a product, I realized that was not where I was trying to. It was a proxy for something else that I want to. And eventually, that's when I came about is that I wanted a signal.

Starting point is 00:14:05 You know, of course, we always talk about signs and symptoms and all, but I kept being fascinated. What am I trying to do with profiling? Well, I should say profiling is very different than observability, or at least to me, observability is always about inference of status, not inference of how good something is performing in terms of performance profiling, but inference in terms of, is this system stable, predictable, is it going in, is its trajectory the way I wanted to be? And can what I see give me that answer? And that was where I came about this. What is that question did? What is the thing I'm trying to ask from the tool? And I think my career has been always about that, getting up, trying to

Starting point is 00:14:51 get up that hierarchy or triangle or steps to a higher level. And eventually I realize It was kind of I'd been reading about signals for animal signaling and then social intelligence or social cognitive models. And then I realized everyone talks about situation, status, subjects in a very small manner. And in fact, that's the most effective way of what we're trying to do. So observability is really that. And that's where I changed my tune. Of course, before when I met Andy, when we were talking about distributed tracing, I was just getting worried about how complicated the traces were coming

Starting point is 00:15:35 and how complicated the environments were. And I felt that wasn't going to work with the cognitive load. And I think I have a kind of low pain threshold for data or for too much data. And that's also what drives you. There's always to be an inventor, you have to feel a pain and a passion. There has to be a pain and you want a remedy and that turns into a passion looking for a solution. And that's what came about. Yeah.

Starting point is 00:16:06 So I got, I wanted to ask you about, you know, the whole situational intelligence and semiotics. But before that, there's one thing you said earlier, and I just want to have a quick clarification. Because you talked about the digital twin. So that observability is kind of like a digital twin. So what's your thought that the, looking at logs, metrics and tracing? and maybe some events that we are basically building a digital twin of the actual system

Starting point is 00:16:33 and this is what you refer to as the digital twin? So that observability is, okay. Yeah, yeah. So we always have to have some observability. I think the contention I have in the open telemetry is that we've not kind of changed what we mean by observability. I mean, we know the process of observability, which is collect information of some sort,

Starting point is 00:16:57 and then use that to create a model and from that model to project, you know, to make an assessment. So become aware of something, make assessments of it, and then anticipation. So I think it's always these three A's. And that awareness has been the open telemetry world. And awareness is a very, it's important. You first have to have perception. That's humans, perception, cognition, and so that's, you know, comprehension and projection. and this is actually foundation for situational awareness.

Starting point is 00:17:29 But the problem I found with Open telemetry was, is that we really stuck to the yesterday year technologies of the current way we data collected, and our systems had been evolving, but yet we're still talking about logs. We distributed tracing moved a bit with the microservices, but we have different types of interactions now, especially with AI agents,

Starting point is 00:17:54 and I don't think traceability is the right word for that. And in fact, one of the early workshops, we had a distributed tracing, and I think it was invited to it, before everything became open telemetry, I said to everyone that we had to take the tracing API and turn it into a workflow API. And now, when I look at everything, it should have been the workflow API, because when agents came along, workflows is very important API for systems now. So metrics, I mean, metrics will always be good. I mean, there's always some countings.

Starting point is 00:18:25 But again, they're always a proxy or there are always something that you have to interpret. And that was the issue I had is that, first of all, open telemetry had picked on three, yeah, kind of data technologies in terms of how to do something, had less context around it. And then we tried to hobble them all together into something. And it felt like open telemetry was focused more on ingestion rather than intelligence gathering. It had no filter, maybe sampling. but that was really to address data problems, not say, tell me what's important in your system, tell me what the science are.

Starting point is 00:19:03 And so it's easy to instrument something and add a counter, but it's very hard to think about how will that be interpreted and what's actually useful. Like, say, let's take for a Q size. There's a Q in every system. And you have a Q and you say your Q size is 90. Someone has an interpret 90. What does that mean?

Starting point is 00:19:22 What is the Q limit? and that's the problem I always had and I think I have an a version to numbers in that way is it's not qualitative. It's a quantity. And the qualitative, it has to be an interpretation. And we think as APM products

Starting point is 00:19:40 or observability products or data analytic products that were going to fix this later. But we don't fix it later. If it's not fixed at the source, it becomes data, and it stays data, and it stays dormant in stores.

Starting point is 00:19:52 And the engineers who are tasked with trying to understand that later really don't have that means to turn those numbers into qualitative science. So my view was that that will never change. It was observability or open telemetry had made it easy to instrument, but it actually narrowed down what the type of instruments were. It didn't allow for new experimentation like bringing in science, bringing in a semiotic way approaching it and making that flow through the pipeline, as opposed to what I would say is tracing, which is very profile-oriented,

Starting point is 00:20:32 even though we have a profile-oriented, but, you know, it's very profile-oriented in terms of execution and timing, and not telling you status of something, or at least the quality of a service. You have to interpret it with a number. So that was the thing I had a lot. We opened to language we didn't do. And then, of course, it really agreed. it grew itself from one silo tracing and then it became logging and then it became this and there was no unification underneath except when we got to a collector it kind of looked like everything was coming something into one runtime and where you could probably process

Starting point is 00:21:08 a little bit but I felt that we should have put in a proper century runtime an infobus or some sort where we could do a lot of pre-processing in there before it shipped anywhere and allow intelligence to be in the applications in the run times and then move out the signals for you know capture information where it makes sense reason about it at that what I call near time as opposed to real time which is never really real time and near time is this I'm near to the time when it happened not not just near in terms of temporal but spatial so that was where I that was a challenge I had with the open telemetry area any kind of new instrument or new silo,

Starting point is 00:21:55 because it's very silo-oriented in the way it's approaches things, is a new specification, it's a new set of APIs, it's a new, nearly a complete implementation, and then it's a new backend for everyone. But I want to, yeah, let me also give you, first of all, thank you so much for the way you see it.

Starting point is 00:22:14 And I think Open Telemetry, they really just wanted to solve the data capturing, right? They never said we are going to reinvent observability or whatever. Decer said there's many vendors out there and why do we have five different vendors that have an agent that is proprietary, why can't we just standardize on how we are capturing the known signals of observability? And I think the open telemetry collector was actually, not sure when he came in, but I think it came in rather early for some of these things, right, extracting insights from traces

Starting point is 00:22:46 because in the open telemetry collector you have a processor and you can extract information. you can also dare enforce some of your rules, right? So because I, you can say, you know, I don't need to capture every trace because this is too much information. I want to get, I want to convert spans into some metrics and I may want to even convert some of them to signals to an alert in case I see that there's too many messages sent into a queue. And against, you know, open telemetry, and I agree this is a misconception by many. Open telemetry is only focusing on how we are capturing the signals and how we're transporting them to the back end, but not what is done with the data. So what I would like just to validate from you, you said this was a missed opportunity instead of just standardizing on something we've been doing for the last 10, 20 years, meaning capturing logs, metrics and traces, why not think ahead and rethink on what is it really? that we need to do in the app to not just get a log and metric and a span or a trace,

Starting point is 00:23:54 but to understand better. And this is now the conversation I want to have around situational intelligence. But my question is, does a developer that develops an application? As a developer, no, do they have to know what is the right queue length? What is the right CPU time that a method executes? What are some of these examples? What do you envision? What does modern future observability situationally awareness look like?

Starting point is 00:24:25 Yeah. So the way I go about solving this is to try to say, okay, the default is not to take quantity numbers and transmit them, is send basically a token because we use language. So we have languages and we have words that refer to things. They don't refer to a quantity. So when we say someone's got a temperature or high temperature, it's high. We don't know what that is or what we have, but we can communicate high temperature to someone.

Starting point is 00:24:58 So when we say something has got an overflow, did we need to know that length, that overflow, that array or that queue or some kind of resource? So my goal was to kind of eliminate as much as possible the numbers, or at least make them. secondary and to bring in a qualitative answer to that signal. So we don't tell people, we don't send numbers to each other. We send signals, gestures. And this is where my analysis was. I was trying to imagine an ecosystem of all these microservices, but I always imagined them like agents.

Starting point is 00:25:35 And I was like, each of them is judging each other. This was also one of the things I had a problem in observability, is that the rules were very at the actual service. self, never judging from which service was talking to which. You know, your sensitivity, one service might have a higher sensitivity to latency or to failure than another service talking to. So if A and B and C are talking, it's a relational, you know, and this might have been my time when I went in the whole Buddhism and everything's connected and all.

Starting point is 00:26:07 But I realize the reality is it's, it's a social, it's like a social network. A signal from someone like a friend is going to be in terms. but it different from a signal like an enemy. And of course, we don't have like enemies and friends in software, but we do have promises and we have expectations. And those expectations are slightly different from each of the angles. And we have tolerances. We have services that can do retry or clients,

Starting point is 00:26:37 which are services generally themselves. They might have tolerance for errors. Well, other systems might not have that. They might not have put in some kind of retry mechanism or, you know, the hysterics or other kind of technologies that they use for that or adaptive control. So the thing I was trying to do is to get more to this language of communicating as opposed to the numbers, which is later no one's going to do something. And what's interesting is on the open telemetry, and you said it, that the collector was there quite early, and there was a capability to do something. But you're doing it with a trace and a metric. and the person that's in the collector writing something,

Starting point is 00:27:16 it's not the person that wrote the metric, the counter. It doesn't even know to understand it, and it might require a few other numbers at that time to turn it into something meaningful and send it along. So the belief that people will later apply intelligence or analytics to it and get it right is not valid. It doesn't work in practice. Can I ask you a question on this?

Starting point is 00:27:41 Isn't the whole idea on why we have it like this? the separation of concerns that somebody knows how to capture information. Like I can take to bring your body example, like body temperature, right? Or let's take a car example. I can have a speedometer that tells me am I going 80, 100 or 200 kilometers. Depending on in which country I'm driving, 200 might be too fast. In Germany, on some auto bonds, 200 is okay. But if I'm the engineer that creates that engine, why do I need to know that 200

Starting point is 00:28:13 in Germany is okay, but in Austria or not. Isn't that a separation of concern where I say, here's what I've built, here are the signals, but depending on where and how you operate it, you can then define your rules that alert you in case we're going too fast. Yeah, okay, so what there the problem is comes down

Starting point is 00:28:31 to context, and my view is that we're not very good at rebuilding the context. So that works in a remote system. Why you would do that, I'm not sure, but if you were to remotely send a signal system, it needs not only to know your speed, it needs to know your geolocation, and then later on apply some kind of rules and rebuild the context. That's not always how our systems are working, unless we send a lot of payload on it, a lot of context in the payload. But that is

Starting point is 00:29:02 valid there. But you see what you're doing is you're trying to recreate the context. So to interpret that number, you have to create the context. And I always found that wasn't like that's what a typical digital twin would be. We're not very good at creating that digital, at that context. And no matter how much we do, we tend to do it by structural, but structural is not the same as context there. There's many aspects of context. And so I just feel we can never get that enough context.

Starting point is 00:29:30 And it's far better if I just say, like, so what I'm driving a car now, it actually gives me a beep. It gives me a signal there and then on the road that I'm driving. So I don't have to think about what the speed is. Of course, I kind of got a gas and sometimes the lights or there's a new sign, a new runway, and they change it. But I get a beep. I get an immediate signal.

Starting point is 00:29:50 And that's because it snows where I am at that moment. And it says, okay, you're, you know, of course, it's downloaded some kind of information to tell it, but it's giving me a signal. And that feedback is straight away there. So it's already translating into that signal. And I was just trying to say, that's what we're always looking for,

Starting point is 00:30:08 is that signal that says you've gone over. I don't care about how far it is that the, number it is, I care about I've actually broke something, or I broke a limit, I've exceeded some threshold, and I want to react there and then. So part of the signals was to short circuit that, get it happening where the context was, the near time, and get the engineer who knows that, who's built that system to be the one that puts that in. So if you're writing a Q component and you've already got a capacity limit, and you know that the capacity is a, exceeded or and the algorithm itself, you don't want to say the number, you want to say,

Starting point is 00:30:48 I've got a capacity challenge. And that saves them, you put, that's them using their intelligence in their code. Now, the person that's using that library might understand queuing, but he might not understand how quickly you can refill or, you know, how quickly you can expand. He's not, and he's never going to be able to keep up to date where your implementation. So, the The thing there is to signal that I'm doing this. You know, I'm doing something that's not normal. You know, I'm deviating an important signal. You can indicate signals of outcome and operations.

Starting point is 00:31:24 We already do that even with code. And I'm just saying there's different signals that are very important to leading up to something. Symptus of something is about to happen. And they are never going to be regurgitated or reconstructed far away, far away both data and also developer and context and time. And that was just being practical. In fact, while people might think I'm being far out by going semiotics,

Starting point is 00:31:50 I'm being his most practical way of solving that problem. And by solving it there, we now turn observability into kind of like a large language model because it doesn't deal with numbers. You know, we already know the large language model is not really good with numbers. But if you turn everything into a token, it's a sign system. It's basically tokenized patterns of words. You might not know what they mean, but they have meaning from the person that generated it. It was a sign that had.

Starting point is 00:32:23 And over time we learned signage, even though where we might, the first time we see a sign, we don't understand it. But someone tells it later, or we use that sign to interpret when that person does this, that indicates that the problem. Because you later on see that when he did. that sign later on you see an action that follow that sign and your mind immediately associates that if you have a dog you know how quickly they get pick up on signs and and what follows that so like you get them you know even though they don't understand words you say it sound like food

Starting point is 00:32:55 or do you want something or you know like and you go to the kitchen they're already looking at where to go and they make that association quite quickly and that's the kind of what i was trying to do with the signage system is to capture that in just as small as a word that means so much more than that. And it's a bit, if you think about emojis, they're like signage in themselves, and we can literally read them quite compressed, you know, there's a compressed way, but there's a motion to it. So that's another thing.

Starting point is 00:33:27 If you come back down to emotion when someone says angry, it's very hard to know what angry is in every person. But that's a universal concept, even though what's happening inside the body is a quite complex process. It's described just with a simple token. That was my view is, can we get this streaming of tokens? And maybe this is where I was coming back to complex, you know, complex event processing. If we can turn it into that, I can start looking for patterns and then in that of cyclic patterns. because that's what we're always looking for,

Starting point is 00:34:05 is looking for these cyclic patterns, and then later on saying that this one was a bad one. Sometimes you don't know it at the time when you see it, but later when you do a diagnostic, you're not looking for numbers. You're not saying when it goes over this number, it goes up here and then it goes down there. You're just saying, if I see the sequence of these words together, I know what comes next,

Starting point is 00:34:25 which is like reading a book, which is like looking at a movie. And that was where I was trying to go with the symbiotics. And okay, so then we get the situation. awareness before we run our time. So, yeah. So let me just say one more quick clarifying question. If I understand this correctly in the world that you envision for software engineers, instead of thinking about which metric, which trace, which log to emit, emit a token that

Starting point is 00:34:57 expresses like how I am, it could say, thumbs up, I'm good, send me more. It could say, I don't know right now. Now I feel like something is wrong and then don't send me anything else. Now, here's the trivial, here's the million dollar question. How would I, as an engineer, come up with a thumbs up, a so la la, or a bottom? Because I need to still make this decision in my own code based on am I crossing a certain threshold? I guess yes or no, right? I see too many messages coming in.

Starting point is 00:35:25 That means I still use numbers and then compare it to a threshold and then just convert all of that into a single sign language. Is this? Yeah, well, okay, in some of the thresholds, yes. But if you, we actually are dealing, even though sometimes we think we're dealing with numbers, we're dealing with codes. Take HTTP.

Starting point is 00:35:46 There are codes. Then numbers, though we know they are mean, good, bad, ugly. So we, and we actually got behaviors that tells you when it says, I'm dropping, I'm shedding load. Or here's an okay response. So there's codes everywhere. And what I was trying to do is create a universal code set

Starting point is 00:36:03 that didn't depend on the technology it said here is now HDP gives you vast set it's a bigger sound set than good bad and you know you either you either you executed or you failed that's the basic level of things work things didn't work but HHDP does more than that

Starting point is 00:36:20 it tells you hey I'm redirecting you it tells you I'm dropping your load I'm dropping load because I'm dropping a packet or you know a request so it is actually trying to communicate with you so the way I was kind of, I was imagining at that time. It was all of these services trying to communicate to each other, not just binary, yes, no, but more context. And that was the science there. And so let's come back to then how. And I just want to say, I think this is a really good

Starting point is 00:36:51 explanation, right? With the HDB status codes, I think because I know when I call a service and a 200 comes back, I know everything is good. If it's a 404, I know, I'm a 4. I know. I'm not authenticated. If it's a 3x, I know I'm getting redirected. There's other things as well. So I think that's, that's an, yeah. But then we are already in some respects, obviously, right, we're already doing this. But I think what you're suggesting is using this universally so that every service, every app can just express their status, not with a number or numbers that I need to then centrally compare, but that everybody has enough awareness to say, this is how I.

Starting point is 00:37:31 I feel. This is what I can do right now. Yes, yeah. Because we do have, if you look at computation, there are universal primitives. I mean, I found 16 in service-to-service interactions. So you make a call, you know, you respond. There's a failure or success. There's, you know, you schedule something. So you accept it, but you don't do it. You accept the work. And you suspend the work. You resume the work. So I came up with like 16 primitives for any agent-to-agent to communicate. communicate each other to do work. And people have actually done this with language.

Starting point is 00:38:05 They've actually looked at this a thing called speech act theory, which is about understanding language in terms of a number of primitives. And Mark Purges also did this with Promise Theory, where he talks about promises and all. So I was looking for that. And the reason to do that is they translate into this kind of language of service or language of actors or agents depending. You can have multiple mini languages. but to hide the underlying infrastructure

Starting point is 00:38:33 of whether it's a JDBC error or whether it's a HDP, because they all have their different error codes or the different ways of expressing something. But also, you could just write your own component and you could say, yeah, good, bad, ugly, if that was your universal, if you want it just to be as primitive and good, bad.

Starting point is 00:38:51 So this comes down to semiotics or to situational awareness. Because think about what we're doing there. We're just saying there's a subject in the world. There's a thing we know, And it's always, subjects are generally nested inside each other. So like an organizational system. And like microservices are in containers, subjects are in there,

Starting point is 00:39:09 services even can be decomposed into entry pines or whatever method they're calling. And I call them subjects. There's something that's identifiable normally because it's not transient, even though it's transient in terms of the process is created, but it's not transient in terms of my reference. I know there's a microservice call work orders or a schedule or something. I don't know where it exists, but I know it.

Starting point is 00:39:33 And that's a subject. So to me, there's a concept of there are things out there that I'm aware of. And while they might have reincarnations, I see them that continuity exists. And those subjects are trying to tell me something. They're doing something. Or there's change happening to them.

Starting point is 00:39:55 And that change, I need to understand. And to me, that's then they have to talk in terms of a sign language. Now, they can use a universal primitive sign language of yes, no, or they can use more expressive sign language or even invent their own sign language, but a minimal set of signs. And they express that sign.

Starting point is 00:40:15 And how they do that is they admit, basically, I'm a subject, here's who I am, here's my sign, this word. You, of course, you have to understand what that sign is, but there is a word and whatever that behavior on the need, That's the contract that will stay always there. They're telling you something, even though how they calculate it, how it happens or how it's derived is just that word.

Starting point is 00:40:41 And that's the contract. So we have contracts for APIs. I'm saying we have contracts for observability. And as I said, we already have that with HTTP codes. So everyone's invented some kind of code to indicate something. So we already have that. So that's the first stage. Think of the world that it has.

Starting point is 00:40:59 subjects, they have a sign and they emit it as a signal. And the signal is basically, hey, it's me, I've got a sign to tell you. And here. Now, someone can subscribe to that. So now we're coming down to there are things in the world that are seeing other things doing things, which is like social intelligence. If I'm an ape and there's another ape and if I see a predator and I start screaming or then they all get the signs. So I'm broadcasting signs. My body and my actions emit something, which is a great way of communicating something that I'm seeing. So now we get to that sometimes a sign is not just about me, that I am also a source of a sign, and sometimes a source is myself, and sometimes that source is someone else.

Starting point is 00:41:47 So if we think about social networks, in social networks, we are judging each other. If someone comes to you, trust is a very important thing in social networks, But trust comes about by, first of all, vigilance, surveillance of the person that you've delegated to, but also people telling you about that person, whether they're trustworthy. So we are picking up signals from everywhere. And when I looked at microservices systems, I seen this collection of agents communicating with each other, whatever which way they do it, messaging, event base or RPC or file base or anything. They're all trying to coordinate work together, but they're also judging each other.

Starting point is 00:42:25 And they have the ability to judge. They have ability to tell themselves what they're doing and also what they see. So sometimes you have a signal that says, I'm a source, but the subject of the signal is someone else. And I want to tell you what I think about it, which is a sign. So what you get then is, at one level you get signals that are very factual, very kind of close to what we have with metrics. They're just telling you about the source itself. like I'm having a good day or bad day. But then they come to another level of judgment assessments.

Starting point is 00:43:01 And this is where I realize that we take signals from various people and we make assessments and we judge them and we emit them. So we do this on LinkedIn. We might say that person is not nice. That person is nice. Elon Musk is down or up to day, whatever they're doing. But we're judging their action. So someone does something immediate, we judge them, we emit a sign. about that person.

Starting point is 00:43:26 So what you see then is you have the low-level stimulus gets interpreted by subjects, their own behavior. They tell people what they've done, but they tell it in the language that's understandable, universal. That goes to other people

Starting point is 00:43:40 that are other services or judge subjects, and they're judging other subjects. And then they make a judgment, they might judge even themselves, but they're also judging others and they admit it. But what they emit is also a signal.

Starting point is 00:43:54 And what you get then is you get this higher level of crunching numbers down or crunching low-level stimulus into very informative information because what you get, while this might be an explosion because everyone's talking about each other, we're still, we pull that collective intelligence of what I'm getting from my environment, what that person, what that subject told me about that subject, and what I'm seeing when I interact with that subject, I turn that into a kind of like a scorecard. It's just a quick way. It's a simple way of doing it.

Starting point is 00:44:26 Of course, you would do various different models. And you turn that in, and then I make a judgment. And that also changes my behavior if I'm a service, but I can also emit that information and that it goes up to something else, which could be looking at the whole collective. It's everyone satisfied with each other. So I'm trying to build up this kind of layers of collective intelligence of very high levels where we're thinking close to society

Starting point is 00:44:52 is everyone happy? How are people voting? And I'm trying to do that at a distributed environment and building this kind of language. So I see basic signs, I see status. It's always awareness, awareness signs. Tell me something. Tell me about operation outcome or some kind of threshold breach. Second level, make me assessment.

Starting point is 00:45:14 You can assess yourself, which is I'm a cue and I'm not really good because I'm always queued up. you know and that's where you're consuming your own signs your own library is emitting signs you're consuming it and you see yourself five minutes queued up with no you know at the threshold then you say i'm overloaded yeah i'm not the i'm not and you might even see your divergent so you're able to judge yourself but others have the same sign set and they can make a judgment because they might say well i would judge you differently. And the reason I came about that is because I realize it's very hard to say definitively one way or another something is good or bad. It depends on your

Starting point is 00:45:57 perspective. So when I was trying to think about what I was doing with humanitarian and substrates is I'm going to say allow multiple observers observing other subjects. So an observer is a subject too. But they can also observe others and everyone can have different opinions and then we get this kind of voting system overall schemes there. And then we've gone from assessment, awareness to assessment, and then we come to anticipation. But before anticipation, we have to think is, what are we trying to do? We're trying to think of projection.

Starting point is 00:46:30 Something is going to happen next, or something is happening, but I don't really see it, or I'm thinking, yeah, it's coming. And what we do there is situation, because a situation is, There are a number of subjects who are talking about the status, and this has been going on for a bit longer. And I'm looking at this pattern at a status level, at a subject level, at a collective level, and I'm seeing something unfolds. And that situation awareness, where we're now saying it's unfolding and it's probably going to get worse or it's going to better. And this is where you get into diverging, converging, and also judging the situation whether it's critical. it's unusual. It's something new that came along. I'm investigating. You know, I'm watching it.

Starting point is 00:47:18 It needs a bit more attention. Or two, I need to escalate. And that's situation. So situation is status over a period of time. Status is signals or science over a period of time. And that's the three levels of the concepts that I've been trying to build is look at that. And I think then, and I know this is probably crazy, we could just have eventually one dashboard. which I know everyone doesn't want to hear like the one dashboard to rule everything. But it really only needs a few concepts in it, not traces, not metrics, not different types of things. It just needs to say there are subjects in the world. They're hierarchical because subjects are always embedded each other.

Starting point is 00:47:59 Subjects, I have signs, but I'm not interested. I'm interested in the status. The status is judged by different subjects. Then there's a situation evolving. And that's the primitive language we would only need. we all observability engineers would only need to understand five concepts i think five subject sign signal status and situation and that would be the end of dashboards well william i think this was an episode where we had a guest that had the most percentage of speaking time because obviously you have

Starting point is 00:48:39 a lot of stuff and I gotta say I need to process all of this I look at your Humaneery.io website folks. We definitely make sure if you're interested in this to check out the websites. The paper there's a lot of great stuff there. You also have

Starting point is 00:48:55 your substrates on GitHub for everybody to try out. I'm at source. Yeah. I still think you know it's you said in the beginning sometimes you are a little too far ahead into the future and maybe this is also what this is now.

Starting point is 00:49:10 I think the reason why open telemetry is so popular is because developers over 20, 30, 40 years know what the log is. They know what the metric is and many start to understand what the traces. So that's why the hurdle is very low to get this in. But I

Starting point is 00:49:27 really think, I mean, me, it gave this, at least me gave it this a lot of food for thought. But I need to process this. Yeah, yeah. Okay, but before you go, Andreas, when the next time you're talking to humans, think about am I tracing him or am I looking at the signs am I judging him? Am I seeing a situation?

Starting point is 00:49:46 You will find, though, it's actually far more universal. Every person in the world would understand situational awareness but they wouldn't understand distributed tracing or open telemetry. Yeah, but as an SRE, a side reliability engineer, I also, when I'm responsive before a service, I probably ask and look at response time, Q, I ask for what are your critical signals and then I learn what the normal behavior is and I see if there's abnormal behavior.

Starting point is 00:50:11 And I basically charge based on what I learned from the past. I obviously factor in the connections that one service has to another. And that's, I think, also what, including us, but other observability vendors have been trying to do based on the signals we have available, which is logs metrics races.

Starting point is 00:50:27 William, I'm sorry that I have to cut you off, but as I mentioned earlier, I need to rush and help somebody out here. Brian, mind-blowing. Brian is just being sitting there, like, what the... I think you're mute, Brian, because I can't hear you. Somehow, you're mute. It even blew away his microphone here.

Starting point is 00:50:50 Here you go. Connected. I was going to say there were so many times that I had stuff. I wanted to hop in with, but even Andy would hop in before me. So I was like, you know what? Trying to process all the information. Yeah, no, I think there's definitely a lot to process there, William. I do second Andy with the idea that I think.

Starting point is 00:51:06 this is what the observability tools have been working towards or something like this. I do think, though, that there is something beyond maybe what we're all focusing on that might be that step beyond that, which is what I want to try to process and understand. We're like, if we're trying to get here with this, which sounds very in line with where you're going, obviously you're seeing some gap from the goal to the next phase after that. And I think that's where I need to process to look into like understand. what that part is, right? Because again, real quick with the idea of Elon Musk, right?

Starting point is 00:51:43 Because you brought him up, you know, is Elon Musk acting normal today? That all comes from the context of, you know, are you an Elon Musk fan or an Elon Musk detractor? Maybe it's normal crazy, right? But it's still me interpreting that. So that's still that higher level platform. And to me, I think that I kind of got stuck on the idea of making those decisions. when you were talking about hotel, making the decision at the gathering level, as opposed to ship it off, take everything in and make that.

Starting point is 00:52:15 So yeah, a lot of process, definitely, definitely interesting thoughts. And Andy, I know you got to run, so I will not babble anymore. Tons of interesting thoughts. I think... Thanks for having me on. I'm glad I'm glad we did this. I don't even think it's an us or you. I don't even think it's an us or you.

Starting point is 00:52:31 I think this is more of a collaboration, like, let's see where this goes. But I think no matter what, it's definitely open. a whole new set of doors of like okay what are what are we missing and what might be alternatives for the future on that so yeah great stuff and william all the all the best keep us posted i'm sure i'll see you anywhere on social media i hope i also get to meet you in real life again at some point keep us posted and you know we're happy to have you back maybe in a couple of months from now and see how far you've moved into the future yeah enjoy your meetup thank you okay cheers thank you bye

Starting point is 00:53:06 Thank you.

PurePerformance - Semiotics - A Future of Observability we are yet to see with William Louth

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.