PurePerformance - State of AI Observability with OpenLLMetry: The Best is Yet to Come with Nir Gazit

Starting point is 00:00:00 It's time for Pure Performance. Get your stopwatch is ready. It's time for Pure Performance with Andy Grabner and Brian Wilson. Hello, everybody, and welcome to another episode of Pure Performance. My name is Brian Wilson. And with me, as always, I have my co-host, Andy Gravner. Andy, it's good to be back. I almost didn't make today's recording.

Starting point is 00:00:36 Yeah, thank you so much. Yeah, I know. I heard. Why do you go back to COVID? I thought we were past that stage of our lives. It's the hip thing to do now. Now it's like ironic COVID, which is probably very upsetting to joke about for anybody, you lost anybody.

Starting point is 00:00:51 So don't mean to make light of it. Sorry, everybody. But, yeah, no, it's just been a whirlwind of, events, you know, my daughter's home because we can't have any caretaker so my wife and I are juggling work and taking care of her and yeah, it's been a lot of fun. But what's more

Starting point is 00:01:09 fun, Andy, is that we had our guest on previously and you know, in the nerd realm I think we all live in, there's a special number, number 42 and the last time this guest was on was 42 episodes ago. So it's

Starting point is 00:01:27 awesome to have a repeat at that level. So I'm very excited for the episode. But I'll let you take over because you can probably figure out a transition from this to our topic if you're on today. Well, if I remember, it had something to do from a science fiction novel.

Starting point is 00:01:47 And sometimes AI and artificial intelligence for many sounds and is like science fiction, right? It's like something that is awesome that you want, but we cannot explain it. And so without further ado, we are talking about AI, the topic that is on everybody's mind. And just by putting AI on the title of our podcast, I assume this is going to have a lot of hits, a lot of people that are listening in, but they are probably wondering who is our guest. And therefore, I want to pass it over near.

Starting point is 00:02:19 Thank you so much for being here. Maybe you want to introduce yourself for those that didn't listen to the 42nd episode before this one. so they can go back to what was it 299 right 199 so they can go to the 1099 episode

Starting point is 00:02:34 listen to it again yeah so I'm by the way 42 is like it's from the hitchhiker hitchhiker hitchhiker's guide

Starting point is 00:02:42 to the universe it's like the answer for everything so yeah and it was created by it wasn't deep mind no no no was deep mind deep mind was the computer

Starting point is 00:02:49 on the ship but the earth was a computer super computer to figure out the meaning of the universe so it ties into the AI very well Andy So you are here

Starting point is 00:02:57 So who are you? So I'm near I'm the CEO of Trace Loop and we're building an LLM Gen.I Engineering platform for Eval's monitoring and everything you need to get your up and running in production.

Starting point is 00:03:14 I remember last time when we talked and I just have the episode in front of me. It was called Open LMATRI and we had I think a tonguebreaker because we tried many times to figure out how to correctly pronounce it I think it was just too.

Starting point is 00:03:29 You still missed it up. I know. Open L-L-Ametry. Open L-L-Alemetry. So for those that have not followed what you're doing and what open elemetry is all about can you quickly

Starting point is 00:03:47 teach me how to correct it pronounce it and just explain what problems it solves? So it's open an LLM3. And it's all started in the summer of 20, 23, where we were playing around with OpenEI. I think GPT3 came out a couple of months ago,

Starting point is 00:04:15 and then GPD40 was right around the corner. And we were looking for ways to kind of understand better what's going on with the LLM. like we were like running these complex kind of workflows and agents and it was kind of difficult to understand what's going on. And so we were looking for a way to just see, you know, observe what's coming in and out of the LLM. And we knew about open telemetry, which is a huge open source project from CNCF. And we thought, okay, it would be cool to just take open telemetry and extend it to support LLMs.

Starting point is 00:04:49 And it was like the hot new thing. And so we started building open elementary, which is basically a. set of extensions to open telemetry to support Gen AI. Back then it was LLMs, now it's Gen AI because we have vision models and we have image generation models and audio and voice and whatnot. Yeah, so that's that. I think I remember when was the last time we spoke, but you know, since then we've like the project has grown.

Starting point is 00:05:19 We've built more and more kind of adapters to many different LLMs and it's now kind of slowly becoming also part of the open telemetry, the bigger open telemetry standard than project. I was just going to say quickly in terms of adoption, not trying to plug our product here

Starting point is 00:05:37 in any way, but as we're learning and coming up to speed on the sales engineering side on our capabilities of monitoring LMs and generally AI, it was told to me that's like, oh yeah, it's because all these products now,

Starting point is 00:05:53 chat and all the others have this stuff baked in. So a lot of it's already pre-instrumented for us, and we just have to collect the data because there's always the big question, well, how are you going to monitor those systems? And what I think is really fascinating is this was always the promise with open telemetry in the first place, that vendors were going to bake in telemetry code to their systems so that you wouldn't have to go do it manually, and it really seems to be being embraced by the companies that are using open lLemetry yes so awesome you see it like in real life just taken off yeah we've been we've been focusing we've been working closely with a lot of like

Starting point is 00:06:35 other vendors as well like we have a working closely with Versailles I think one of the coolest thing is that if you use like Versailles for example for just using AI in typescript then you have open telemetry baked in and it's using our standards so then you can just you know We just, it will just work. You don't need to do anything. So it's pretty, we didn't need to do anything. So it's pretty cool. Yeah.

Starting point is 00:06:59 One quick technical question. I mean, you partially just answered it, but if I am building an app that is interacting with an LLM, am I implementing them? Am I using open LLM metric to then extend your calls to the LLM? or is it that Brian alluded that chances are very high when whatever LLM I'm using that it already is pre-instrumented and I don't need to do anything else.

Starting point is 00:07:31 What's the current state? Yeah, you don't need to do anything else. It's instrumenting all those kind of LLMSDKs automatically and it's like for the last like two years our battle has been like just keeping up with technology because you know opening eye they come up with like the responses API. Okay, we need to support that.

Starting point is 00:07:51 Then like function calling, structured outputs, vision models, and like just keeping up with those ever-changing API so that people who use the open source would just get this kind of magic that, you know, install the SDK. You don't do anything and, well, you get a full trace. So basically, basically you walked the hard path

Starting point is 00:08:11 what many of the APM vendors years ago walked when we all implemented agents. And every time when a new Java version came out and Java runtime, any new framework versions, we had to make sure that our agents are instrumenting the latest and greatest and that we don't break things and that we capture the right things. And you basically did the same thing for all of these libraries and SDKs that now developers are using to interact with LLMs.

Starting point is 00:08:37 And I guess because you have achieved such a great success and adoption, now more and more people are just taking basically the work off your shoulders and they just instrument their stuff directly. I mean, that's also, that's perfect. Yeah. I mean, Andy, if you go back again to, you know, as I said before, the promise of open telemetry way back was this, right? And obviously, everything

Starting point is 00:08:59 moves so much faster in the AI world, including this adoption, which is just, you know, really, really cool to see because I still see with regular stuff. There's some people use open telemetry, some people don't, but most of the vendors haven't got on board and baking things in, so it's easy for people, but

Starting point is 00:09:15 it's like, I don't To me, it's just fascinating. Like, this is a beautiful world seeing that. It's like, bam, it's on. And it's, you know, even if you use the vendor like us, if you could get it in there, whatever technical challenges there may or may not be, you know, we would probably be picking up the basic stuff, right, in terms of the instrumentation. Entry point, entry point.

Starting point is 00:09:36 There may be a couple of key things because that's, you know, all you really know of. But the idea here is that these vendors know the important deep bits of code that you need to be exposed to. And they pre-instrument it. So you're getting a custom built, and I'm saying this, I guess, more for our listeners, because I guess we all know this stuff, right? But you're getting a very specific set of instrumentation as deemed important by the vendor who knows the code. Right. So then all you have to do is ingest that into whatever your choice is, hopefully us, but whatever.

Starting point is 00:10:10 And you have that deep set of data, which is fantastic. And it's all because of you. and your team that you've been working on. Can I brag? Can I brag about the six of the... I don't know if we talked about the open source status last time, but it's been growing pretty quickly since then. So can I brag?

Starting point is 00:10:31 Yeah, of course. Talk about it. Yeah, yeah. So I was just pulling my, like, the latest from GitHub. So as of today, we have, well, we've just crossed the three million downloads a month. We were like one million just, I don't know, a month ago. and we have more than 83 contributors

Starting point is 00:10:51 6.2K stars and I think even our competitors in the AI monitoring space even they are using open elementary to instrument like they tell their customers

Starting point is 00:11:06 hey you should use open elementary to to instrument your applications this is like my personal biggest doing you know it's like okay this is this is something real like people actually people actually look at this and say hey this is good I should use it I know these are my competitors but I should use it

Starting point is 00:11:22 they've done a good job there now why do you think that is why do you think you are so successful I think I think the like the the truth I think doing those writing those instrumentations is a lot of like 30 walk

Starting point is 00:11:39 and keeping up making sure that you get instrument everything and it works well with every kind of new version of any vendor LMSDK is a lot of really, really boring

Starting point is 00:11:53 hard work that we've done in the past two years. And people don't want to do it. Like if someone has already done it and it works and it, you know, you put in, you know, tens of different LLMs and you get out like a standard

Starting point is 00:12:07 set of kind of spans and semantic conventions and metrics, people like it. You know, we don't, like we we save you a lot of like annoying, boring coding work. And also one last technical question. Did you then basically contribute back to all of those SDKs? That means you open up pool requests to all of these SDKs on their Git repositories

Starting point is 00:12:35 and basically changed and instrumented their code correctly. What did you build some type of auto instrumentation that instrumented, the SDK and you provide a tool? What's the approach? It's auto-instrumentation. We try to work with some of the vendors, but sometimes it's difficult. There's like some of the code of the SDK is like auto-generated,

Starting point is 00:12:57 so it's kind of like more complex on how to instrument it within those SDKs. So right now for most instrument, like, yeah, all of these instrumentations are like auto-instrumented from the outside, and this is like what we manage in our repository. but we have some vendors that decided that this is how they want to instrument their SDK. So, for example, IBM is maintaining their own kind of Watsonics instrumentation within Open Nilemetry and also many like Vecl databases are also managing their own instrumentations in our repo. Now, the next one is a tough, but important question.

Starting point is 00:13:38 What is the data that I'm actually getting if I use open netherals? open l-l-metry what's the data and what problem does it solve so the data is everything I want to say everything that you need but it's a prompts completions token usage token data function calling if you're using a vision model you get the image if you're using a vector databases you can see you know the query you can say the response you can see the scores we basically instrument everything that we can in the request and the response and then we also kind of like define the standard way

Starting point is 00:14:23 like the names of the semantic conventions that we should use for those what you're getting is visibility is like you know let's say I think 2025 24 people were talking about agents like AI agents

Starting point is 00:14:41 2025 people are actually building AI agents and pushing them to production. And so when you push those to production, you need a way to understand what's going on. You have an agent, it's like going, you know, working for two minutes doing something, answering a question. You have no idea which tools are running,

Starting point is 00:14:59 how much time each tool takes, you know, to each call takes. What's going on? If there's a failure, what failed, how can I, like, re-run it and figure out what went more? So all of this information is super important for people. And it's especially important if you're using any frameworks like Landgraf or Mastra or a crew AI or many, many others.

Starting point is 00:15:25 They kind of obfuscate a lot of the data or a lot of the information. They encapsulate the proms that they're sending to the LLM and like the exact structure of what's going on in during tunnels. and as an AI engineer, as a data scientist, it's super important to see those, this information, like during development. For me, Brian, this again reminds me, and I think we talked about this in one of the last episodes. It reminds me of the old Java applications

Starting point is 00:15:56 using Hypernade to access a database. And actually then with instrumentation, seeing the actual database queries that get executed by this magic black box slash framework that is converting my request as a developer to the actual database statements, right? And I think it's in the end we identified so many inefficiencies

Starting point is 00:16:17 because it was a generic framework that was create generically but was never optimized for the particular use cases. And I think you're explaining something very similar. We're building very complex systems now and there's a lot of magic and black box things happening and so automated instrumentation into those layers provides you the visibility

Starting point is 00:16:37 so that you can then, A, understand, why does it take long? Why is it so costly? What is happening? And I think the fourth one, important one, is how can we optimize? Yes, exactly. I think so the way this like AI development life cycle looks like is that, okay, you begin, you know, you just build your first PLC version of,

Starting point is 00:17:01 let's say, your agent, like your chatbot or something. and then when you want to kind of start rolling it out to users, you need to figure out, okay, first, how can I make sure it actually works? And then B, how do I make changes to it? How do I improve it over time? And to improve it, you need to understand where is it not working, like where is it failing? And then when you figure out, okay, it's failing for this input, so this is what I'm not going to have to fix it.

Starting point is 00:17:29 Now we need to make sure that it is still working, right? you had like a completely big functionality built in, and then you need to kind of like run some tests, which are called eVALs in the name, like the Gen AIA space, before you push it to production. So you have, so I mentioned, okay, you have monitoring to kind of monitor your application in production and figuring and finding those use cases where your application fails.

Starting point is 00:17:54 And you take those use cases, you rerun them in your development environment. You fix the bugs. And then before you push it again, you run the evils to make sure that, well, A, you fix the bug and B, that you didn't introduce any new bugs. It really reminds, I think, like, it should remind you, like, a kind of traditional development, right? You have testing and monitoring.

Starting point is 00:18:15 But there are some nuances around it back, because, like, kind of, like, the whole, like, kind of concept of testing in LLM gets more complex because you have arbitrary text coming in and arbitrary text coming out, and so it's much more complex to figure out, okay, how do I, what's a right answer? How do I test for a right answer? And also if I run the same test five times with the same prompt, I make it five different responses, right?

Starting point is 00:18:40 Yeah, exactly. Yeah, it's like it's super non-stable. So it's not like a deterministic test that you call a function that adds to numbers. You send it, you know, four and three and you get seven, right? It will always be seven. So how do we how do we validate and the accuracy of the data that comes back? there's a lot of kind of

Starting point is 00:19:01 common techniques that people use today I think the most common one will be using another LLM which is called an LLM as a judge so you just you take the answer, you take some context and all the trace, all the input

Starting point is 00:19:15 and outputs that you send to the LLM and then you kind of you want to run another LLM to grade your response and there's a lot of like I think there's a lot of techniques around what you should do and what you shouldn't do when you're using another

Starting point is 00:19:31 LLM to grade your main LLM. And it can be tricky because you can get like you can get, you can think that you get, you got like a good judge

Starting point is 00:19:47 and your evils are working well, but then sometimes, you know, you may push your application to production given it like your judge said, yeah, everything is good. Go ahead. You push it to production and you realize, oh, wait, I didn't, I didn't test for something.

Starting point is 00:20:03 Or my judge told me that everything was fine, but it actually isn't. I just, like, the judge wasn't working well. So it's a, it's a much, like there's a lot of moving parts that you need to build and stabilize. I mean, for me, it feels like, obviously we are, we're mirroring the real world. And I give you, what I'm saying is, right,

Starting point is 00:20:23 if I post something on LinkedIn and I make a statement, Then in my bubble that I'm living, I probably get, because I live in a very specific bubble, I probably get a lot of positive response because the people that I connect to, they have similar backgrounds. And once all of a sudden I go out of my bubble, then I realize that actually everything that I have learned and everything that my judges told me that everything is fine, might not be fine, outside of my context. Yeah. And so that means if you're building system and we're testing, it with other LLMs and other agents and they're all trained on the same limited scoped information or bubble information, then we're just mimicking exactly what's happening in the world, right, with humans.

Starting point is 00:21:09 Yeah, exactly. And you know, it's like sometimes when we talk to our customers, sometimes they even ask us, like, we tell them, okay, here's the platform, you can build your evals, you can use them, you can train them, but then they get stuck because they're like, okay, but what how do I know if some like how do I even know what's a correct answer like sometimes even the question of like what's the what's a good answer versus a bad one is a really complex one to answer in the world of LLMs right it's like imagine you have a support chatbot I don't know you you're you go to united.com you go you they have I don't know if they

Starting point is 00:21:47 have something like but let's say they have like an AI assistant that helps you book flights or something then you go there and you ask it to book you a flight. and like I want to fly from, I don't know, Vienna to New York. Okay, great. And then he asks you, okay, when do you want to get back to Vienna? And then another agent asks, and then another engine, instead of asking you this question, just gives you, hey, here's your flight. There go, this is a button to book it. Which one is the right answer?

Starting point is 00:22:19 Which one is the wrong? Like, are both, like, are both just right possible answers? like how do you how do you even know which which one is good and which one is bad if you got if you got the first one in one version of your app and then with the other one in the another version if you're up is

Starting point is 00:22:35 is your up like getting better or worse like what you know it's kind of yeah but the analogy is interesting because it can happen that you call whatever support hotline today and tomorrow you call again you get a different agent on the line a real human and that your real human might just be on training

Starting point is 00:22:52 or a complete expert right, and follows a different path here. Yeah, exactly. On the United example, sorry, Brian, just on the United example, I think if it's United or some other U.S. airline, but I remember they have a new ad campaign where they say, we actually have real people behind the phone line and you're not being treated by an AI.

Starting point is 00:23:18 I have something contrary, and I would love to get treated by AI. I think, like, AI is easier to work with because like it's like it's if you if you have a specific task sometimes i can just you know help you easily more easily than some agent that like will will be slower because it like works on multiple conversations it's like uh i think it depends on the air they're using if it's just the chat bot that we've been used to for years and years and years which are pretty awful right it's funny i actually had an issue with a package from amazon yesterday came empty um And so I finally got on with the chat thing to try to get it resolved.

Starting point is 00:24:00 And I was trying to tell it, like, I got the package, but there was nothing inside the package, and there was a hole in the envelope, you know. And it was like, all right, don't worry about sending it back. We understand that was, you didn't order that. So we'll refund you. I'm like, okay, well, that's not the real resolution, but you're saying you'll refund me. And I don't have to send it back. So it's the outcome. but not because, like, the words don't match up to what it was.

Starting point is 00:24:27 I actually got to still see if that went through properly. But this also reminds me of another thing, especially with LLMs that I saw recently about, as LLMs, like, similar to what you're talking about, right? If you're saying, like, let's say you're dealing with a human, right? Humans often going to have a script and stuff they do and don't do, right? And the more they know, the wider their breadth of knowledge is and experience within that thing, then more they know how off script they can go and what things they can pull in all. And you might get a better result, right?

Starting point is 00:24:59 But a newer person is going to stick to the script and follow that script. And, you know, just like with my Amazon experience, it's, yeah, I might get to the outcome I want, but it's going to be like, okay, that wasn't really true, but sure, right? At the end, with LLMs, I was seeing how, especially with search engines, right? if they're training on data on the internet and then all the data is created by AI on the internet and that data becomes homogenous meaning all the same then the LLM continues to learn on that same data and the variation dies right so when you talk about these philosophical problems with AI this is one of those ones as well that link into I think some of this where it's like

Starting point is 00:25:49 all right, how do we keep the data set rich where it's not like AI feeding AI, the same stuff that it's been refining and refining until there's just one path and no other alternative component on it, right? And this is, I guess, not really for here, but it was just a really interesting tie

Starting point is 00:26:05 in to a lot of this kind of stuff. It reminds me of a paper I read. I think it was a while ago about the fact that they've done, like, they run some tests around like training, like how AI how LLMs are trained and they saw that like

Starting point is 00:26:21 the more like if you so today you know you go when GPD let's say 3 was trained most of 99% of the internet was like human generated content but now I don't know what percentage is like but it's

Starting point is 00:26:36 much less than I would say 90 I might argue that it becomes it slowly becomes like 50 or 60% is human generated and the rest is like AI generated you know imagine you know LinkedIn posts blog posts, everything today.

Starting point is 00:26:50 Everyone is just using AI. And so they run this test of like, okay, let's take we begin with an LM that was trained only on human data, human generated data, and then we use the LLM to generate AI data, and then we train at the LLM

Starting point is 00:27:06 only on the AI generated, like the synthetic AI generated data. Then we repeat this process and around, I think, after five times, they actually saw like a huge degradation in performance looking at like the, you know, AI was only trained on AI data, AI synthetic data. Right, right.

Starting point is 00:27:25 One other thing I do want to bring up, though, because this is going back to the beginning of this conversation, you were talking about AI agents, right? And I had another thought, and I don't know if it ties in directly to the same concept of an AI agent, but when we talk about possibilities of AI and all this, Andy, this I think is especially interesting for us, or actually even the open element,

Starting point is 00:27:48 LL Emetry, and I'll say, if you're having trouble with open LL.L. Emetry, think of LL. Cool J, one of the original hip-hop artists. It's LL.L.Metry. L.L. CoolJ., L.L. Emetry. So there you go. But I was just imagining an AI-based agent where when a slowdown occurs, let's say, in your code, the agent can automatically tune up instrumentation in the area of the slowdown to get more depth and more specific instrumentation and then turn it off when it's not, right? So, you know, it's like adding manual instrumentation but doing it in the spot that was identified,

Starting point is 00:28:29 doing it automatically so that it's not always on and then turning it off. And that to me is just even a fascinating concept. I don't know if anything's anywhere near that kind of level of thing, if it's even come up as a thought to start exploring. I can imagine it would be very difficult and risky in the beginning, but you know just the possibilities of all this stuff are are quite insane now i would argue i would argue that you don't need an a i and an agent for everything this use case for me

Starting point is 00:28:58 more screams like you you build in telemetry already and then you turn it on selectively like a log level from info to debug when you need it right or like our life debugging capabilities that we have you would then just turn it on because the instrumentation has already it has to be there, but you may not capture it actively because you don't need it always. So I think it's a great use case, but I think what's important is that we also understand where do we really need the power of an AI and the complexity and the cost of an AI and where can we solve a certain problem maybe in a much more straightforward, simple automation way.

Starting point is 00:29:40 You just pooped on my idea, but that's okay. No, I just gave you a different perspective. But you know, I live in my bubble. Years makes more sense. Hey, Neer, I have another question for you because you brought this up in the preparation of this call. There's some new exciting news, some stuff that you have open sourced, something about a hub that you are proud about and that you should talk about.

Starting point is 00:30:07 Yeah, we were working on a new exciting open source project. We called it the Traceloop Hub. It's an LM gateway. The idea is that you, when we were talking to a lot of companies, you know, okay, you want to instrument your code, right? You want to see what's going on with your LLMs, but imagine you're working at a huge company. And so it's not just a single service that's using an LLM. You have like many, many different services. And you want to instrument them all.

Starting point is 00:30:38 So one way is, okay, to go to each one and install your SDK. But another option would be to take our hub, our LLM gateway. and deploy it once in your system and just route your entire LLM traffic through that gateway. And so you're getting the benefit of great traceability. You will see everything that's going in and out of the LLM, which is great for audits and whatever you need to do in order to actually see, what are you using your LLM for?

Starting point is 00:31:07 And because it's an LN gateway, you can kind of like also use it for load balancing, switching between models, kind of getting a unified API, for all of your LLM models, all the LLM models you're using. But, you know, it's a single point of failure. So it's kind of like me as like as an engineer, I was like, I don't know. I'm not sure if I want to do it.

Starting point is 00:31:28 So we decided to build it in Rust. So it will be, you know, super low latency, super, super reliable and really small, you know, footprint, no garbage collector, nothing. So it's like this, Rust is a great language and building stuff on Rust, built gets you super reliable services

Starting point is 00:31:48 and this is this is what we've done with the Trace Loop Hub and we've been working extending in it for the last couple of months and hopefully we'll also like kind of release it to the public

Starting point is 00:31:59 it's already available but we haven't like announced it you know now I got a I got also ask a critical question a similar response to what I had to Brian's idea but I remember

Starting point is 00:32:11 at last CubeCon there were several other vendors that basically talked about AI gateways, LLM gateways. Isn't it essentially the same idea? I mean, aren't there really products out there? And why not partner with them and instrument their products

Starting point is 00:32:25 and come up with your own? So we've done. We haven't invented the concept of an AI gateway, of course. And we've partnered with a lot of other AI gateways and they are also supporting open telemetry. The problem, and I don't want to mention anyone by name, but it's like from our experimentation,

Starting point is 00:32:41 our testing, there were not reliable and slow. And so when we were talking to our customers and we recommended those solutions for some of them, we were like, it didn't work well for us. And so we wanted to build something that we can be, you know,

Starting point is 00:32:56 we can feel comfortable promoting to our customers and we, you know, get, we have the guarantee that we know, this is working, this is, this is reliable. This is not affecting your assistance latency. And nobody was, nobody has done anything like that in Rust and we figured okay this is

Starting point is 00:33:16 yeah this is the way to go yeah it also makes I mean to end on a positive note here right I guess if I look at the other vendors that I know that they've built something that is they came from somewhere else they came from an API gateway background and obviously they optimized all of their

Starting point is 00:33:31 routing and all the stuff based on different use cases and then they added the LLM use case on top you on the other end you have a lot of experience exactly with that type of traffic and So you could build a custom purpose-built solution to solve exactly that problem in a much more efficient way. And there are things you can put in an LLM gateway, which you cannot put in like a normal API gateway. For example, guardraise.

Starting point is 00:33:55 You want to block certain responses on the LLM or don't send out software with LLM. It's like it's really LLM specific. Yeah. And then I do have one more technical question, though, on this one. That means in an organization, if I would use your hub, that means that means, means I would only obviously route that type of traffic to you and I would route all the regular

Starting point is 00:34:17 traffic still to my existing API Gateway so it becomes a component that sits, I guess, either behind or on the side of your regular API Gateway. Yeah, exactly. Yeah, it's completely open source. We love building stuff which are Apache 2. So it's Apache 2

Starting point is 00:34:33 and it's open source, which is my dream has always been to build open source projects and so I'm so I'm so lucky that I get to build, you know, open elementry and now another one. Yeah. And obviously your success shows that the whole open source, and especially, as you said in the beginning, you did the tough, dirty, quote-unquote boring work that nobody wanted to do, but everybody now benefits.

Starting point is 00:35:03 This was leading to the fact that you became the defective standard because everybody was using you because it just worked then out of the box until you end. you reached a point where people actively came to you and implemented instrumentation based on your framework and everything is open source and, yeah. Exactly. Now, how do you make money? Because in the end, you need to pay your bills.

Starting point is 00:35:27 Should I? Oh, okay. I don't know. A little bit. It's interesting because I want to, no, I want to, I mean, this podcast, you know, it is all about thought leadership and we don't want to use it to promote too much any commercial products.

Starting point is 00:35:40 But on the other side, I think it's, it's interesting, right, because we live in a world that tries to figure out how can both worlds coexist in a way that it makes sense, right? Open source and commercial. So what's your angle? So we have, yeah, of course we have like we have our paid platform and we have a lot of customers using the platform and paying us money for using the platform. And the idea is that what we see is that, you know, you use open elementary for what I like calling. visibility. So you're just seeing what's going on in your system.

Starting point is 00:36:15 But once you hit production, you start reaching this scale where it doesn't make sense, where seeing is not enough. Because you know, in your development environment, you can just go to this trace and just view the prompts and completions. But if you have millions of users, hopefully, you have

Starting point is 00:36:31 millions of users. And then you just you know, you get these billions of traces and how do you make sense of that? How do you know which ones you should look into and like which one is interesting, which one contains like an interesting conversation of a user that you should debug or something. And when we started the open source and we started like building also the platform, it just

Starting point is 00:36:51 visualizes the stuff we had on the open source, which is like the V1, we saw that a lot of of the users we had back then were using, when they hit production, they were like just manually, you know, clicking on traces because they really, they were really curious to see what users are doing with their new shiny LLM product but they didn't know which one. They had like millions of traces like click here. Oh, interesting click this. I'm like, okay, maybe

Starting point is 00:37:18 I can build something that will like point into the right direction like another layer of insights that will help you understand and figure out from the mess. Okay, this are the traces that are interesting. These ones failed miserably. This one had errors. You know, you should look into these and this. And so

Starting point is 00:37:33 and this is kind of how the you know big kind of paid platform came into life and this is this is the first thing that we offered like real-time insights and monitoring for your LM applications so pointing into the device traces and now we're also we've also rolled out kind of like the completing pouches the evils like the offline evaluation feature so that means you are you're doing pattern detection you're you're detecting certain areas hotspots and as you said nobody can look into millions of conversations you want to understand if certain things are changing based on patterns maybe as a new topic that you never thought about people ask about or there's a certain area

Starting point is 00:38:14 where there's more failures here yeah exactly so it sounds like overall it's uh people people jump into this you know world of AI doing things they collect a bunch of data and then they say okay now what and you're you're the you're the now what like what do we do with this how do we analyze this how do we understand it how do we make sense of it I think that's important, right, because all this stuff is hard enough for people to keep on top of, right? And, you know, we've seen in our own models, too, we're sometimes offering these services to help analyze or to help do this stuff is very important for people because they're like, I have a million things to do. Can you use your expertise in platform to do that? Yeah, it makes a lot of great sense.

Starting point is 00:38:57 We even see this way back from, you know, going back to the idea of open source. You know, way back when Linux came out, right, and this company Red Hat launched, and then everyone was like, well, if it's a free operating system, how is there a company around it? It was all more about the support, the setup, the maintenance, and everything else around it. So I think we see these models a lot with open projects. Right. Right. Yeah.

Starting point is 00:39:29 I got to ask you one tough question, though, on this. Uh-oh. Because in the end, you are becoming an observation. Yeah, no, sorry, but I assume I get the answer what you give me, but I want to still ask it. So what you're explaining to me is that you are becoming an observability backend because you're analyzing all these traces and logs and metrics that are coming in. Obviously, you have a lot of expertise by exactly analyzing this type of metadata that is part of your semantic convention.

Starting point is 00:39:58 but if I am an enterprise and I collect my traces and I already have my existing observability platform do it and make a decision to send it to both to you and to them or to your child do I route the specific traces from the LLM apps just to your system but then potentially lose some of the other capabilities my observability platform gives me what's it feels like there's a lot of overlap that's a great question and I think

Starting point is 00:40:31 and I don't want people to use two different observability platforms so I think you should see all your trade like all your complete traces in a single kind of like your your APM or whatever observability platform you're already using for your cloud environment

Starting point is 00:40:47 and what I see that we do is that we kind of we connect to the same stream and we augment it and they give you more information based on our expertise of understanding, you know, LLMs and the responses and everything. And then we kind of can,

Starting point is 00:41:02 you know, route back the insights to your main observability platform. So you can see you can have like a single pane of glass and a single dashboard with all the information. So latency for databases, but also a quality for agents. And then, you know, the other part is like Trace Loop as a development platform.

Starting point is 00:41:22 So once you want to make fixes and make changes and improve your application, then, you know, you just use, we use the same stream of data to generate those test sets and data sets you can use for running evils and improving your application

Starting point is 00:41:35 which is kind of unrelated to form of observability. Yeah, yeah. But I like your initial response that you basically say you are, you provide the expertise to augment the data with your findings

Starting point is 00:41:50 so that if you decide to do so you could still visualize this in the Grafana dashboard or in the dinah or a data dog in your relic dashboard, right? But basically you're enriched it with insights. Yeah, that's cool. Yeah, and we try, like, our kind of metrics and everything, we try to make it standard.

Starting point is 00:42:10 So you can export our metrics in like, you know, PromQL or Open Telemetry and then just, you know, you can just connect it today to Grafana and Dana Trades, whatever you want. Yeah, cool. We're nearing the end of our recording slot, but I have one final question for you. It shouldn't be too tough,

Starting point is 00:42:31 but it should be hopefully very helpful for all of our listeners. If I am an engineer and I'm currently starting or I'm responsible for an AI project, what are the two, three things that people need to watch out for or to ask you the question, what are the top problems that you see?

Starting point is 00:42:49 What are the top mistakes that you see in most of the data that you've analyzed? So I think the top mistakes, interesting. I want to look at it from a positive perspective, like what you should do. Maybe then I can also go and think about what you shouldn't do. But like, what you should do is when A,

Starting point is 00:43:10 you need to think about your kind of like evil and monitoring solution early. Some people kind of like, let's just put something out there and we'll see. And I think it's the wrong way to look at it because if once you hit production and if you don't have those kind of tools in place, you have no idea, you know,

Starting point is 00:43:33 why your users are not using your Gen.A.I. Like, shiny Gen.A. feature. What's going on? What's, like, is it working or not? How do I make changes to the model? Like, you know, open AI deprecates your model. They need to upgrade what you do. And so thinking about your evil story,

Starting point is 00:43:50 thinking about, you know, how do you measure the quality of, your application is super important. It's like almost I would almost try to think about it like as a TDD you know, test driven design. Begin with figuring out okay how do I test it like what

Starting point is 00:44:06 what's the goal, what's the outcome that I expect and build a dataset with like examples of inputs you're expecting and and then work with that walk alongside that, you know as you build your application. And B which is

Starting point is 00:44:22 closely related to that, you needs to keep, you need to keep it up to date. Some, I, we've been talking to so many teams where, you know, they built their data set and they, they kind of like, how they change it, how, it's like their golden data sets, like, this is a set of examples that you know, we know people that might be using with our app and that's it, but you need to keep updating and you need to constantly keep getting more and more fresh new examples from production and kind of like refresh your data set that you're using for your evils.

Starting point is 00:44:54 cool awesome I'm taking notes because I also want to make sure that these things also make it to our show notes including obviously the links to trace loop to the I found the link to the trace loop hub any other I mean obviously will link to your LinkedIn if there's any additional links any getting started good tutorials any good videos obviously just let us know and we'll add it to the show notes as well because this We hope that our listeners will then want to know more and therefore we want to give them stuff to follow up. Cool. I assume this topic will not go away anytime soon. It's probably going to grow more and more. Brian, can we do the math in 42 episodes from now? What's the episode that we need to ask him back

Starting point is 00:45:53 for the next update. I should be able to do it in my head, but I can't, so I will do it. Looks like 283. 283. So near, mark your calendar. Probably 42 is about

Starting point is 00:46:07 in 8 months, in 10 months or so. We may have you back for another update because this is an exciting and ever-changing topic. And I think it's just,

Starting point is 00:46:17 as we see all these numbers going through the roof in terms of people that warner, that are implementing projects looking at organizations like companies like yours that are getting great funding

Starting point is 00:46:30 congratulations by the way on the latest funding round it's really cool and yeah we are here to spread the world and hopefully give people something new to think about

Starting point is 00:46:44 and new insights so that they can become more successful with their AI projects maybe we can make predictions eight months from now is really like June 2026

Starting point is 00:46:54 so what you know what's the question is how many million downloads how many million downloads will you have

Starting point is 00:46:59 you mean how many downloads billions billions but like but like I don't know

Starting point is 00:47:06 GPT5 will it be out GPT6 we're like what will be like what's the AGI will we reach

Starting point is 00:47:12 AGI by June 2026 yeah what's AGI I don't know artificial like this is what

Starting point is 00:47:21 you know Sam out I've been talking about it all the time, it's like, okay, we want to reach AGI, which is like artificial general intelligence, like the AI that can do everything, potentially, yeah, potentially to us all, I don't know. That or, you know, will there be a new model in play? You know, we keep on talking about GPT and all that. We've seen from history all the time, like, you know, promising one in the beginning

Starting point is 00:47:47 suddenly gets left in the dust by something else, right? Or, and hopefully not, I don't want to sound pessimistic, something dramatic happens, and for whatever reason we have to completely stop everything. Back to paper and pencil? Back to paper and pencil, yeah. Interesting. I think it's a no for AI by 2026, by the way. Yeah.

Starting point is 00:48:11 But yeah. Cool. Now, thank you so much. It's really, it's exciting because most of the conversations that we have these days is somehow always at least touching. this topic and it's great to know that people like you out there are willing to share insights building great tools and yeah thank you so much yeah i think it's also great too like i've from way back i always had this banner i would carry but not do much about that like antivirus

Starting point is 00:48:41 should be free for all computer users right like why are we you know like the idea of making the experience easier and less stressful um and that's exactly what you're doing. As you even said, like, you did all the dirty work that everybody else didn't want to do. You and your teams are right. And as a result, there's all this efficiency coming out of it. There's all this telemetry coming out of it, which is making the adoption better, more effective, more useful. And it's just helping push this along in one part of the trajectory, but a very, very important one. And if it wasn't for people like you and the project that are contributors

Starting point is 00:49:22 contributing. See, I can't say contributing. We'll have to say open LL imagery at the end. But if it wasn't for this, right, and you making it open source, right? It's the beauty we go back to of the beautiful side of the tech industry of

Starting point is 00:49:40 let's share knowledge, let's put it out there, let's make it all better for everybody, right? So I really, really appreciate what you're doing there and I'm sure everybody else does. So Andy, before we go, can you say it? Actually just writing it down as an opener. Have I finally figured out how to correctly pronounce

Starting point is 00:49:59 open at a lemmetry? Hey, there you go. This is good, yeah, yeah. Will LL CoolJ still be alive at the next episode? There's another question. All right, really appreciate the time, near. Always a pleasure. Great thing here.

Starting point is 00:50:18 Hope to talk to you soon. Thank you, everyone. listening to. Bye-bye. Thank you. Bye.

PurePerformance - State of AI Observability with OpenLLMetry: The Best is Yet to Come with Nir Gazit

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.