PurePerformance - PuerThe Research Behind the AI and Observability Innovation with Otmar Ertl and Martin FlechlPerformance_EP235_DTResearchLab

Starting point is 00:00:00 It's time for Pure Performance! Get your stopwatches ready! It's time for Pure Performance with Andy Grabner and Brian Wilson. Hello everybody and welcome to another episode of Pure Performance. My name is Brian Wilson and as always my co-host Andy Gravner is making fun of me during this intro. He's got a lot of lanyards hanging behind him which reminds me of when I was young all the quote unquote cool kids would keep on their wristbands from when they went to different clubs or went to see different bands to try to prove how cool they are.

Starting point is 00:00:49 So now everybody, if you're seeing that picture and they're in there, you can see how cool Andy thinks he is. That was my attempt at recovering from my malaise this morning, Andy, to review. Yeah, I'm not sure if conference pitches are something to be, well, they're kind of cool. Some of them are kind of, you know, but to be proud of especially people and all of our guests show, right?

Starting point is 00:01:11 That's true. Yeah. But you don't only meet great people at conferences or at running events. You also know you don't. Where else do you meet them? Oh my gosh. Where since at the university? Oh, like a research lab.

Starting point is 00:01:23 Yeah. at a university. I don't know how we can make that happen though. Ertl, I have your LinkedIn profile open. It says Senior Principal Software, Mathematician and Head of Real-Time Analytics Research at Dynatrace. That's a mouthful, hard for me to pronounce. Can you quickly introduce yourself, what you have done in the past and what you do right now in the research lab? Yeah, actually, it's very interesting because I have a master in physics and not in mathematics. This title was given to me at some point by Dino Trace, but it's a long story. Anyway,

Starting point is 00:02:34 so as said, I have a background, a master in physics, did a PhD in technical science at the University of Technology of Vienna and was working long in the field of computational physics so there's already the intersection between mathematics, physics and computer science and after that I had two years in automotive industry developing simulation code and then I switched to computer wear which was then Dynatrace again. And I'm with Dynatrace since 2012, so it's already a while. And worked for more than eight years in development, building root cause analysis and anomaly detection from more or less scratch for the second generation.

Starting point is 00:03:26 And in 2020, yeah, I switched to the research lab. So this is where I'm now. Yeah. Cool. That's a great story in history. And yeah, maybe update your LinkedIn profile and, you know, Master in Physics sounds pretty good and impressive. Martin, to you, if I look at your profile, Jenny Eilid and tech evangelist, Dynatrace, head of generative AI research, ex-Microsoft, ex-CERN,

Starting point is 00:03:56 machine learning researcher, data scientist, and then a brief DOS PhD. Yeah, I think I'll let it to you to explain what this all means and where you come from. Yeah, I think I pretty much used up all the character limit on this LinkedIn description. Sorry for that. Like Otmar, actually, my background is physics. I have a PhD in particle physics, actually. Unlike Otmar, I'm still relatively new at the company at Dynatrace. I've only been around for one and a half years or so. I spent quite a lot of my time actually in academic research in particle physics together

Starting point is 00:04:37 with CERN, as you've already mentioned. What I've been doing there is analyzing data with machine learning. So pretty much what I'm also doing now just very different kind of data and also very different kind of interpretations of outcomes perhaps. But yeah, a lot of the day-to-day work and the tools and so on actually pretty similar. Yeah, after a few of these typical academic positions, stations. I switched to industry a couple of years ago in 2019. I was at Microsoft for a couple of years,

Starting point is 00:05:10 working on speech recognition and language modeling. And now since one and a half years, I joined Dynatrace and started building up essentially this department in the research lab on genoid AI research. I was going to say, I think it's safe to say that we're smarter than Andy. I don't even think if we add our two IQs together in them individually. Martin, I'm not sure if you knew, but two months ago,

Starting point is 00:05:50 we had Ricardo Rocha from CERN on our podcast, because I saw him present at KubeCon a while ago, and then I asked him, do you want to jump on the podcast the large Hedron colliders actually collecting and how they're processing the data. And that was really interesting and what they're doing with Kubernetes and how they're using Kubernetes to really distribute the load. So there was a really interesting and engaging discussion with Ricardo. I'm not sure if you are familiar with him. I don't know him, but I know of course what's going on at CERN and with data processing and I mean a lot of stuff.

Starting point is 00:06:22 We did something like cloud computing long before it was cool actually. So yeah, it's always at the forefront. It was very, very exciting times in every respect, not just the physics aspect, but also socially, technologically and so on. Yeah. Hey, so the idea of the podcast and the topic came actually up when Otmar and I, we had a chat because I think Otmar, I saw some of your presentation, one of the presentations that you did on a different topic. But then we started talking, how can we make some of the stuff that you are doing and working on more public?

Starting point is 00:06:58 And then you said, you know what, we're working at the research lab and for a company of the size and the geographical region where we're in in Europe, it's actually very unusual that there is a research lab at a university. And then I said, you know what, maybe that should be the topic that we talk about, the research lab and the work that has been doing there, kind of the intersection between research and then taking what's been researched there and bringing it to the industry. I'm not sure if you want to kick it off or Martin, but I would like to know a little bit about the history of the research lab. I think you mentioned you started in 2020, but can you give us a little bit of a background, the history and where, how it grew

Starting point is 00:07:38 and where it's going? Yeah, I can say some words. Actually, I was not involved when the original idea was born. This was maybe in 2018 or 2019, but I was asked in 2019 if I'm interested to join the lab from the very beginning. So this was very exciting. So the basic idea was to have a reasonable size and so we could also look more into the future of what is relevant for the product, maybe in the next generation, after the next generation. So, this is what we called the M plus 2 approach at the very beginning. So if N is the current product generation, then m plus one is what development is working on and m plus two would be then the topics what we are looking into and which might hopefully

Starting point is 00:08:35 are relevant in the future product generations. So that means really that it's a nice way of putting it. And you said a reasonable size. We talk about our employer, we talk about Dynatrace. Dynatrace is a reasonable enough size to say we're dedicating a certain amount of people or money resources, whatever you want to call it, to research on what's next after next. Exactly. Yeah. Exactly. Cool. Um, I think also when we were sitting down together yesterday at the, at the lunch table and we also had Andreas Hametner, uh, sitting there, can one of

Starting point is 00:09:18 you just quickly say what his role is? Because I just saw him in a video that is promoting the research lab. Can you, I know he's not here today, but if any one of you would just quickly highlight what he's doing. Yeah, he is the lead of the research labs, so the organizational head, both of the physical entity. I mean, we have this co-working space or a part of the co-working space at the JKU, right? But also of the unit of

Starting point is 00:09:46 the department within Dynatrace. And yeah, he's been involved since the beginning. So essentially he created the whole of, at least played a major part in creating the whole mission statement and deciding what topics to work on and initial recruiting and yeah, is still of course heavily involved in all this. And so you mentioned 2020, maybe the initial idea came up 2018. Can you give us some idea? Are there five people, ten people? What are we talking about? How big is this research lab right now?

Starting point is 00:10:33 Currently I think we're around 20 people distributed mainly in Linz and Vienna but we also have some full remote employees. Yeah so it's about that size so we also have some students is the question how you count them, but yeah. So it's roughly 20. I think it's fair to fully count them, they're doing great work. It's part of our mission also, collaborating with academic research, right, and in that respect then we are supervising thesis, mostly bachelor and master thesis, also some PhD thesis. That's also part of the game at the research lab.

Starting point is 00:11:05 You know, it's crazy. I've been here since 2011, obviously since the time I was not even aware we had a research lab. Of course, I'm over here in the States and in the sales side of things, so we don't get word of that. But it really amazes me. Are you too familiar with Bell Labs? That is the place that used to be in the United States

Starting point is 00:11:29 We've seen head shake somewhat. Yeah, so for any listeners who aren't you know, I mean it's it's I'm sure it's nothing like Bell Labs and all but in a way it is right Bell Labs for people who are familiar Bell Telephone had laboratory In New Jersey, I think that's actually the building where they shoot severance is for the former Bell Labs. But they would just bring in researchers and scientists and let them loose with their imagination. Now, obviously it sounds like you guys have a mission. It's a little bit more focused.

Starting point is 00:12:02 But to me, the idea of having a research lab is just mind-blowing, which I don't know, it's just really, really cool. And as Andy said, or one of you said maybe, the idea that a company of our size having a research lab is also just astounding too. So now I'm just excited about today's conversation, but I just wanted to get that in there because it sounds very reminiscent of When a company would take time to actually do a lot of research I mean it even goes back to you know our approach with grail right? Let's let's let's build it ourselves We need something that needs the capacity

Starting point is 00:12:39 We have smart people. Let's build it. So really really cool And that actually brings brings me to the next question If we have smart people, let's build it. So really, really cool. I know that both of you are working on certain topics, but if you could quickly give a glance on what are the key topics that are researched in that lab? Just a quick overview, because I'm sure with 20 people and some additional students, there's a couple of topics. Yes, there are four topics. So we have four research teams at the moment, and they cover four different topics, of course.

Starting point is 00:13:14 They are overlapping partially. So one is real-time analytics. This is the team which I'm leading. Then we have distributed data systems. Then we have generative analysis, that is what Martin is leading, and the fourth team is cloud native security. All extremely relevant topics, obviously. Let's jump into some of the details because it was really great that we had some time

Starting point is 00:13:46 to sit together yesterday over lunch. Maybe let's start with Martin, on your end, can you fill us in a little bit on what is the N plus two thing that you're currently working on? Even obviously, you know, when you're working in academia, so I guess a lot of the stuff that you're working on will somehow also be publicized. But obviously, you know, tell us what you can tell or what you want us to tell, but it would really be interesting just to know

Starting point is 00:14:12 what you're researching on, your team is researching on, and what might be something that we will be able to see in the future. Okay. Yeah, I mean, first, I mean, my team is still relatively young. When I joined one and a half years ago, it was me plus one additional person, Selena. And over time, the team filled and we are now seven people in the team plus almost the same number of students.

Starting point is 00:14:35 So it has grown quite a lot. But yeah, we are still kind of trying to find exactly our path within the greater context of things. What we are currently mostly involved in, so you all of course aware here of Davis Cobra a lot, which is the Dynatrace generative AI offering. So in a nutshell what it is, it's a natural language interface for customers to produce stuff and that stuff can be a simple answer to a question. That's quite trivial.

Starting point is 00:15:07 We know this. Chatbots, they are popping up everywhere. More interesting use case in our case is query generation. So generating DQL, Dynatrace Query Language queries, using natural language input. This is already in the product and generally available to customers. In the future, more and more things will be added. This is all really future. It's no promised product roadmap-wise, but just some ideas. In the end, you could generate almost anything starting from a natural language prompt, dashboards,

Starting point is 00:15:45 workflows, whatever you would like. Complex actions, deep research-like things like Google Anthropic and others offer to solve really complex tasks over a longer period of time. All that is in our minds already, but not much more than that perhaps at this point. What we focus on at the moment are mostly the topic of query generation. So when we started out, it was essentially around the time when I also joined, the decision was to go to market as quickly as possible.

Starting point is 00:16:18 And so what we are leveraging here still now in the product is models from OpenAI, so GPT-4 models, together with some heavy engineering around it to generate these queries. And what we are working on is training our own model, our own Dynatrace model for query generation. That's already at a very advanced stage and delivering great results also in comparison to our baseline. And at the moment the work is on integrating that into the product, also in comparison to our baseline. At the moment, the work is on integrating that into the product, but still at the same time, of course, continuously improving it. This is one of the topics.

Starting point is 00:16:56 Then another big topic for the future is agent AI. That covers potentially, of course, a lot of use cases, but the general idea is that you have more complex questions that require several steps of reasoning plus some kind of tool usage where tool can be anything. That can be just generating a query, executing it, and consuming the output. It can be one of our Davis analyzers. It can also be something external. It can be Google search.

Starting point is 00:17:23 It can be a calculator. It can be a code interpreter, code execut extreme. It can be Google search, it can be a calculator, it can be a code interpreter, code executer, it can be anything. And to be able to solve complex tasks that require several of these tools potentially, several reasoning steps to answer more complex user questions in general for more specific use cases. This is what our main focus is at the moment. A third smaller topic is in log analytics, how to group and then also explain, analyze, interpret logs or clusters of log messages. I got a question now and hopefully

Starting point is 00:18:00 I'm not opening up a can of worms here, but it seems that obviously with LLMs and everything you just talked about, we can solve a very interesting problem, at least in our industry and I'm sure in other industries as well, which is not being bound to a vendor, to a specific query language, to a specific way of creating dashboards, if you can just use natural language to get the answers that you need. So, a classical example is always why I use an open standard because I don't want to be locked into a vendor in their language, into their dashboards. And it seems for me that as these models evolve, we achieve this and the natural standard,

Starting point is 00:18:45 the universal standard is the human language. At least that's if I hear this correctly. Yeah, I think that's definitely a possibility. Of course, I mean, not all languages are equivalent. You might be able to do something with one language that you cannot do with another. And then there are still, even if it's transparent to the user, there are differences of what you can achieve. But in the end, yeah, for all of them, the input would be

Starting point is 00:19:14 natural language, English, or any other language actually. I mean, all our tools work equally well with all of the major languages. It can even serve to translate. I mean, we've, for example, seen if you translate, I don't know, from DQL to SQL or to some other query language that may work in some cases, in some cases not so well, depending on how well the LLM knows this language already from their training. I mean, SQL, of course, it knows a lot, DQL less so, and same for other more proprietary languages.

Starting point is 00:19:46 But what may work quite well is first translating a query into a natural language and then producing from that a query in another language, so to have this step in between, for example. So absolutely, that's a possibility. The question is what will be behind what the user sees? Will there be these languages or will at some point maybe LLMs just create their own language The question is what will be behind what the user sees? Will there be these languages or will it sample and maybe LLMs just create their own language to irritate? So, yeah, far in the future, certainly, but lots of possibilities. Yeah, I was talking with some colleagues at our sales kickoff the other week.

Starting point is 00:20:20 And I just recently learned about this idea of the AI agents versus these other week. And I just recently learned about this idea of, you know, the AI agents, you know, versus these other things. And it's still all a mystery to me. So I'm not going to talk like they know anything, but it really became apparent that the future state is, you wouldn't even necessarily know how, learn how to use your tool, right? You wouldn't have to learn the ins and outs of anything. You would just give it a prompt, say, I want to see this, do this for me.

Starting point is 00:20:47 And at some point in the future, all the AI, the LLMs, everything working behind it's just going to be able to produce the output for you, which I think is just absolutely fantastic. There's been a lot of hype around AI doing this and that, and we get pictures missing pinkies and all this. But when you have these really deep tools, even if you're thinking on a cloud platform, I want to deploy an app on this and I want it to look like this, and it can do it all behind the scenes for you. Now, yeah, there's a lot of time and space between today and that. But the exciting thing that I think is that we also see

Starting point is 00:21:26 today and that, but the exciting thing that I think is that we also see a lot of these AI capabilities and features just exponentially changing even without warning. So, I guess where I would turn this into a question is in more of the research mind of it, not obviously product mind of it, in the research mind of it. Do you see a time soon when you'll be using other components of AI to help create what you're looking to create on your side, right? The behind the scenes stuff, right? Training models, right? Can you use AI to train an LLM, right? Can you use other things to do all these pieces? And how soon is it going to snowball into it can just start almost self-generating?

Starting point is 00:22:07 Yeah, that's almost a philosophical question. Honestly, I mean, two things. A, I'm pretty sure we will get there at some point. B, I believe not in the foreseeable future. It will take quite some time to get there, especially for more generic. I mean, the whole idea is also what's behind superintelligence and AGI, ASI, right, in a more generic sense.

Starting point is 00:22:35 AI just making itself smarter and smarter and overtaking us quickly and then enslaving us and I don't know what else. I don't know if it's going to happen. It's certainly a possibility, but I don't think it will happen any time soon. Well, I don't even mean the enslaving, but the idea of, when I was thinking about this the other day, allowing AI to start writing code, right? But then you would still think, okay, well, a human has to still verify the code, right? But then at one point, can you have another AI verifying the code? Right. And then if the code

Starting point is 00:23:08 is wrong, having yet another AI, like write a fix for it. And then going through iterations of testing, collecting the telemetry, say from a dinosaur or something, and looking at everything and being like, yes, this is working. It's going to make a judgment to say whether or not it's working as the input said it was going to work. You know, so I'm not talking about, you know, Terminator, which I know is from from Austria anyway, right? So, but, you know, using those cycles of how much do you do you replace or when and how can you replace the human factor in it to accomplish the things that right now we think of? Well, you still need the human there. And of, well, you still need the human there. And in most cases, you still need the human there.

Starting point is 00:23:47 But at what place, what time can that be replaced? And it sounds like it's still a ways off, right? Yeah, different people have different bets. If you believe some of the people in the Silicon Valley, it's more like a couple of months. I don't think so. I think it's more, I mean, definitely years, but more likely decades, tens of years, we'll see. I mean, at the moment, yes, you need humans for verification, but also, I think it's still the case, even if you

Starting point is 00:24:19 don't need to know all the details on how to use a tool, it still makes you a better programmer, architect, whatever, if you know how the details on how the user do it. It still makes you a better programmer, architect, whatever, if you know how the underlying tools work, because only then can you really also prompt and even formulate a problem in an appropriate way to get the optimal solution. Quite definitely, at some point, we will get to the situation that you have described, that AI will be doing all these steps itself. But I'm not so 100% sure about will it really be able to create something genuinely new. What it will be very good is recreating what already exists, creating similar things, interrelating

Starting point is 00:25:00 between things that already exist that can be code but could also be art, right? I mean, you see now nice images and music pieces and so on and they look like Creativity they look good. Definitely they produce nice music pieces. They also have some New elements in them. It's not like they're just copying things But then the question is, okay, they can perhaps produce a new nice pop song that is in fact different to what we already have. But I don't know, if you feed them all their human music until the 19th century, some classic music, some, I don't know, native black American music and so on,

Starting point is 00:25:44 would they be able to invent jazz, for example? Would they be able to invent blues and all this rock music? Definitely not today. We will be able to do that in the future. So really this not just combining existing things, but creating something genuinely new. I think the jury is still out on that. We don't know that yet. And frankly,

Starting point is 00:26:07 I think this kind of creating something genuinely new, this is relatively rare even among humans, right? But still the big progress always comes from this kind of steps. And I, my guess would be, we will still be needing humans for that for quite a long time actually. And what's great to see is obviously all the work that you're doing in the research lab. We will hopefully then eventually benefit all from this, what you then bring into the products. Martin, I would have another question on the Gentii, but I want to park it for now because I also want to now switch over to Otmar and give him a little

Starting point is 00:26:45 bit of a chance to also talk about the stuff Otmar that you are doing in your team. So can you, similar to Martin, just quickly fill us in on the primary focus topic of your group? Yeah. So, yeah, our team name is Real-Time Analytics, which means that we're focusing on processing huge amounts of data in real time. We're really talking about data volume, which is really big.

Starting point is 00:27:12 We're talking about better bytes per day, which gives us a lot of challenges. We really have to be creative still. Let's wait for the AI maybe, but still we have to think ourselves how to solve these problems. And yeah, we are focusing on data structures, algorithms to handle these data volumes, but also we're looking into new technologies, stream processing frameworks that can handle that and everything what is also associated with that fault tolerance, scaling and so on. So this is our main focus and it's very exciting. And I think you've also done contributions back to the scientific world, to the industry.

Starting point is 00:28:06 Any examples that you can give here? Yes. Actually, we said we're also looking for data structures to summarize data. One example is if you want to count distinct things, distinct users on your website, for example. If you want to know the exact answer, you have to remember all the unique IDs of those users. But there are structures like HyperLogLog that can do that only with a fraction of the

Starting point is 00:28:36 memory. So let's say a few kilobytes is efficient to get an approximate answer of a few percent. And these algorithms are very exciting and we were using them and we saw that they can be further improved and that's why we developed even better data structures, UltralogLog, ExcelLogLog, which were finally presented at scientific conferences. This is one example, but we also have some publications or testing different stream processing frameworks, defining a benchmark for stream processing and things like that. And so these then your contributions, they make it to scientific presentations, papers,

Starting point is 00:29:20 and they also then find it into software libraries, into open source libraries, or into how does this work? What's the process then to make so that this makes it into a final product or library? Actually, we have also an open source library. It's around hashing algorithms in general for Java, because we've seen that in Java there are only a few hashing libraries and some are not really up to date and do not have implemented the latest hashing algorithms, the fastest ones. And so we decided to come up with a hashing library ourselves. It's hash4j.

Starting point is 00:30:00 It's on GitHub. It's in the meantime widely used in our product, but also in IntelliJ and also Apache Pinoy is using it. These are open source projects we are aware of. And we also have some smaller projects as well. This is one thing how we distribute or maybe make our algorithms more popular. Yeah, really exciting. And Brian, if you think about it, we often talk about performance in a completely different sense on a different scale like

Starting point is 00:30:42 optimizing page load time from two seconds to one second. This is in a completely different sense, millions and trillions of time. And it's a completely different level now of performance engineering that we talk about. Yeah, we're working at the physical level and, Atmar, you're working at, you know, the, I was going to say nano, but I didn't want to say nano. What's the branch of study with wormholes and all that? The quantum, quantum level of performance, right? Obviously it's not quantum mechanics, but it's like that level because going back, I

Starting point is 00:31:32 remember even working with someone on a high volume trading app where like one millisecond of overhead was too much, right? You're probably looking at that and even smaller, right? I imagine. No, we are not. I mean, we do not have to care about latencies like, you know, these trading companies. I think they really have to build their servers right close to the trade exchange, right? A stock exchange, sorry, and in order to have the delay as small as possible. But this is, I think, a different level. So we are not at that level. We can afford latencies that are much higher. But of course, our main driver are the costs. And of course, the less CPU you use, the less memory you use, the cheaper

Starting point is 00:32:25 you can provide your service. Yeah, there's a lot of that going. In fact, I was watching a video the other day where it ties that concept into AI. Well, obviously, cost is a big thing in LLMs and all that kind of stuff. But this was more about how to thwart learning from other people's music, how to thwart AI at learning from other people's music. Because a lot of times they're just going out and grabbing people's stuff without permission, so they're building in this inaudible noise into the WAV files, which the AI chokes on and comes out with all these crazy things.

Starting point is 00:33:02 But it takes about 45 minutes of heavy GPU processing to process about 15 seconds of music. So that's when you're talking about the cost, the time. So yeah, it's very different than milliseconds, but it's about efficiency that really comes down to. And again, if I go back to the Grail just for a moment, because that's something I can more grasp my head on. And for anybody not familiar, Grail is our for a moment because that's something I can more grasp my head on and for anybody not familiar Grail is our really crazy fantastic backend. I'll leave it at that because I'll do worse if I try to explain it. That was all about scale efficiency and all that. So this stuff obviously takes a lot, a lot of research to do.

Starting point is 00:33:39 You can't just go from we're going to use the typical backend and suddenly add 20,000 data sources to it because that'll collapse. And I'm sure you can build something that'll handle it, but at what cost, right? So it's that fine balance, which is again, I am in such awe that we have this research lab because then you can do all these experiments and figure out what's going to go on. It's really awesome. Otmar, I know you also had another topic. I want to also park it for now, but I think it's around sampling.

Starting point is 00:34:13 I want to use this for later. But the segue now is perfect because, Brian, you talked about costs coming back to agentic AI. And one of the questions, Martin, that I had for you, and it also came up in the conversation we had yesterday over lunch. If you're building these agentic systems and really you are just prompting and then the system goes off and does 50 things with different tools, how do we know that what's been done is A, really efficient?

Starting point is 00:34:44 How much does it cost? Who is involved? Is there any, like, how can we control this or how can we understand what's actually really happening? And especially around the cost factor again, is there any way to make sure this doesn't explode in our face at the end of the month or whenever the billing cycle comes with all these tools that are connected. Yeah, I mean, usually the best practices around building agentic systems, they recommend to

Starting point is 00:35:13 start with something relatively constrained, not giving the LLM too much autonomy, because that, I mean, it might be fun to do it, you know, privately to try it, see what happens autonomy. perhaps even more important than like the last per mill or percentage of quality of the result, right? So the idea is typically to start with some very constrained, almost workflow-like things where there's very little freedom, which covers a certain number of use cases and also requires a maximum of a certain number of calls than to add an M or other tools. And only once you gain confidence in the whole system, you understand better how it works.

Starting point is 00:36:12 You can slowly perhaps loosen it, give it more freedom. Yeah, in a controlled way, step by step, so that as soon as you can see things going off you, there's still time to react to it, right? At the same time, there's of course kind of a bad going on in general, I think in the gen AI and LLM industry, the idea that things will keep getting cheaper and cheaper anyway.

Starting point is 00:36:37 So whatever is too expensive today, there is some point in the future where it will not be too expensive anymore. If that's true or not, we will see definitely in the future where it will not be too expensive anymore. If that's true or not, we will see. Definitely in the last couple of years, costs have come down with respect to LLMs at a tremendous rate. Moore's Law is nothing compared to that. It's much faster than that. I don't remember now the numbers from the top of my head,

Starting point is 00:37:00 but if you compare from essentially the first real usable LLM for industrial commercial applications with JetGBT and GPD3, GPD3.5, it's only been around for about two and a half years, right? But if we see how much costs have come down in these two and a half years, it's already been orders of magnitude. If the trend continues like that, I don't know how long it can continue like that, but if it continues like that for a while, things will get so drastically cheaper that a lot of the things that we can imagine today will be doable in the near future and almost anything will be possible in some given time in the future. So many people have this philosophy not caring too much about this cost aspect at the moment. Just build it, maybe wait with releasing it or

Starting point is 00:37:51 constraining it a bit. But at some point of time it's more important, it's powerful enough, the cost issue that will solve itself sooner or later anyways. It reminds me a little bit, so I learned software engineering in the 90s. I went to Lohattel in Leontien, so that's for the non-Austrian folks. It's the higher technical high school, I guess, where I learned programming. And in those five years I was there, we started with Assembler. It's kind of very low level. started with Assembler. But I remember in the Matura, that's the graduation in the last class. We had the choice between either building apps in MFC or the second one was Oracle Forms,

Starting point is 00:38:37 which was almost like a 3-4 GL language where everything was abstracted. You could just point and click and move things around and then you got a very nice. language where everything was abstracted. And I guess what you were just explaining is kind of similar. We're betting that the underlying technology will get cheaper and cheaper, and therefore we're building abstraction layers so we're making it easier for the end user to do things that they would not be able to do in the amount of time that now is possible with AI and the GenTik approaches. Exactly. And at the same time of getting cheaper, also getting better, of course, right? So not only a lot cheaper, but also a lot better and less likely to just derail into something that you don't want.

Starting point is 00:39:35 Yeah. Otmar, I want to go back to you for the topic I teased earlier. Open telemetry is a big topic. We've been discussing this, Brian, in different episodes. The new standard for the topic I teased earlier. We've been discussing this, Brian, in different episodes. The new standard that has evolved over the last couple of years in order to capture logs, metrics, traces. Now we also get profiling information in there. They're working on reuse and monitoring in other areas. You have told me that one of the topics that you have been, I mean you've been part of the SIG for quite a while, correct? Yes, since I think, or at least for two years I guess, we're actually looking into how we

Starting point is 00:40:23 can define the standards such that we collect the sampling information such that we can make use of the data or make use of the sample data. Because it's not so easy if you just do the NAEV and your sample and just collect the spans and then you do not know at which probability the span was selected and without this probability you cannot extrapolate. That's the problem and currently if you consume open telemetry spans they do not have this information on it. This is one of the main reasons why we have this SIG. How to define that? It's one piece of information, but you can define it in various ways, more efficient

Starting point is 00:41:18 ways and less efficient ways. It's basically the probability. But you can also, for example, store it as a multiplication factor. This is the multiplication factor which you use for extrapolation and things like that. And some are more efficient than others, some representations. And for me, I'd just like to recap what I learned, what kind of was an eye opener for me a little bit some representations. If I say how many requests did we have, I still get a very accurate number, even though we may only capture 10, 20, 50% of the traffic because we made certain sampling decisions. If I'm doing open telemetry, you're actually working with the SIG,

Starting point is 00:42:17 the Special Interest Group, to also get this level of information captured on the trace. So if somebody says, we're getting 100 transactions in per minute, but we're sampling on the rate of 10%. So that means we only get 10 in and the backend system only sees 10, but they don't. The backend system has right now no idea, does 10 mean we get 100%? Do we get 50%?

Starting point is 00:42:40 Do we only get 10%? And how can I extrapolate this, right? And I think this is, it's really cool that you in this particular instance are contributing to the open standard to also solve this problem because right now this problem is not solved. Yes, I mean, we're already quite far and hope that we get it into the spec finally. But of course, it's a lot of discussions, interesting discussions and a lot of pros and cons for every design decision. And so it's a very lengthy process. But for me, it was because in our industry, and I'm sure this is true for other industries as well, all the vendors, we were very proud of what we've built. There's a lot of money that went into building our IP, and so why share and why contribute

Starting point is 00:43:33 and why collaborate now with other people in the industry? But this is a great sign where we say this is a problem that needs to be solved because otherwise we cannot give accurate, or at least with a high probability accurate data if we make our dashboards and analytics in the backend based on sample data and if we don't know the sampling rate. Exactly. Yeah. Yeah.

Starting point is 00:43:56 I also think, Andy, that ties into the whole community of IT, right? Where everyone shares a bunch of stuff, right? We're all proud of the types of agents and everything we've made because they didn't exist and they had to be created. And of course, for those of us who are passionate about performance and all this kind of stuff, our wish and goal is that everybody's taking performance seriously. Everyone's looking at how to run these systems. So, if it gets to the point where collecting the data is easy for everybody, that gives all of us, vendors and everything else, the opportunity to really focus on with our customers

Starting point is 00:44:37 how to consume that data, how to provide answers, how to lead it to better business and all that, which is what we ultimately do, right? Because collecting data is just data, even if you use our agent, it's how do you present it? How do you solve problems with it? How do you achieve business goals with it? And I know I'm sounding a little salesy here, but that's the main driver. And the more people that we get who can easily deploy an instrument without that cost barrier of the agent at least, right?

Starting point is 00:45:07 The more, like, okay, not to sound stupid, but the better the world would be because everything will be performing better and you'll have less of those headache problems of, oh my gosh, I went to this site and it was so slow today and it was just awful and I couldn't click it and it wasn't working, right? It gives everybody a better chance to do it. So I think it's very much in line with the way at least the people in the IT community work, whether or not the corporations, that's a different side, right? But obviously we're seeing that because there's a lot of big companies that are sharing to these hotel things. Anyway, I'm doing another rant. So I know we have

Starting point is 00:45:40 only a few minutes left. Let me let you get back. Yeah. And contributing to the data quality, right? In this case, it's critical, critical data that we all capture, but Otma and team, you're contributing to making sure that the data we're capturing based on these standards have the right data elements so that we have better quality in the end and can make better analytics. Brian, as you said, we're getting towards the end of the episode. I know there were some other topics that you mentioned earlier. You mentioned security is a

Starting point is 00:46:13 big topic and just a shout out also to Stefan Achleitner. We had a podcast with him already, I think it was last year on cloud native security. So folks, if you're listening to this and you want to also learn a little bit more, what is the research team year on cloud native security. So folks, if you're listening to this and you want to also learn a little bit more, what is the research team doing around cloud native security? There's a podcast where you can listen to one of our other colleagues. And what did I miss? I think there was like, which other topic did I miss? The data systems.

Starting point is 00:46:41 Yeah, this is exactly the data systems. Yeah. Distributed data systems. Oh yeah, this, exactly. Distributed data systems, yeah. So definitely something for another episode to dive deeper into this final round, starting with Martin. Martin, anything we have missed to mention that you want to tell us in the last minute? Yeah, so if anyone is interested in working with us from academic research students, they're very welcome. Of course, we are always doing a lot of interesting stuff, I think.

Starting point is 00:47:12 If students, I mean, most of what we've discussed today is very close actually to our product or Davis Cobolot. If students, we always have the freedom to explore a bit more, I don't know, risky research, let's put it that way. And yeah, we are open for almost any topic that we can somehow meaningfully connect at least to what we are doing. Yeah, we like to work with other people. And the students will be counted as people, right? Absolutely. And we will work twice as hard as we do actually.

Starting point is 00:47:45 They should count as two people. And we will make sure to link all the resources so that people know how to get a hold of you on the podcast description. Otmar, anything from you? Any final thoughts? Yes. I mean, we also welcome all the people who are interested in processing huge amounts of data and maybe also something interesting because Martin said we can build something

Starting point is 00:48:11 maybe and wait until it's fast and cheap enough to run. But this actually does not work for us because in future also we have bigger data volumes, so the problem scales actually with maybe the improvements of resources or CPU power and so on. So it will be always exciting, I guess, of course, only until maybe the AI takes over. But we have a few years, I guess. So it's kind of like the Windows operating system. The bigger it gets, the more memory it gets. And then the cheaper the memory gets, you need more memory to run Windows. Somebody knows it's the money, right?

Starting point is 00:48:57 Brian, what do you think? Well, I was thinking this is awesome. And I'm glad we didn't get to all the topics today because I definitely want to have you guys back. And I also want to see, you know, between Andy and, you know, Utmar and Martin, if and when any new developments come up that you are allowed to share, any new exciting research stuff that you can publicly talk about or anything. I think this stuff is fascinating.

Starting point is 00:49:21 I'd love to have you guys on whenever you have something you want to share because this is the really exciting stuff, right? In computing, it's always been what's on the horizon that gets people excited and want to continue. Obviously, you're working on your now stuff, but to know that these really cool developments are coming up is just fantastic. And just really to all the students that are helping out there, if any of them are listening, thank you so much, because this is how we, you know, keep going so that we can finally get the, you know, Skynet to come alive.

Starting point is 00:49:55 I'm kidding. But yeah, that's it really, really, really. I think it's awesome that you guys came on today. This is a big revelation to me. Andy, what have you got to say? Last thing, thanks Martin for everything. I think it's awesome that you guys came on today. for doing a video on some of the things you mentioned earlier, the distinct value count once this becomes available in DQL. So we'll definitely see you and yeah that's it. I know it's the end of the day here in Austria and I don't want to keep you from your evening beer or whatever you're going to do now on a

Starting point is 00:50:39 Thursday evening but thanks and see you soon. Thank you all. It was a lot of fun. Thank you all. Thank you guys. It was a lot of fun. Thank you. Bye.

PurePerformance - PuerThe Research Behind the AI and Observability Innovation with Otmar Ertl and Martin FlechlPerformance_EP235_DTResearchLab

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.