PurePerformance - State of AI Observability with OpenLLMetry: The Best is Yet to Come with Nir Gazit
Episode Date: September 1, 2025Most AI projects still fail, are too costly, or don't provide the value they hoped to gain. The root cause is nothing new: it's non-optimized models or code that runs the logic behind your AI Apps. Th...e solution is also not new: tuning the system based on insights from Observability!To learn more about the state of AI Observability, we invited back Nir Gazit, CEO and Co-Founder of traceloop, the company behind OpenLLMetry, the open source observability standard that is seeing exponential adoption growth!Tune in and learn how OpenLLMetry became such a successful open source project, which problems it solves, and what we can learn from other AI project implementations that successfully launched their AI Apps and AgentsLinks we discussedNir's LinkedIn: https://www.linkedin.com/in/nirga/OpenLLMetry: https://github.com/traceloop/openllmetryTraceloop Hub LLM Gateway: https://www.traceloop.com/docs/hub
Transcript
Discussion (0)
It's time for Pure Performance.
Get your stopwatch is ready.
It's time for Pure Performance with Andy Grabner and Brian Wilson.
Hello, everybody, and welcome to another episode of Pure Performance.
My name is Brian Wilson.
And with me, as always, I have my co-host, Andy Gravner.
Andy, it's good to be back.
I almost didn't make today's recording.
Yeah, thank you so much.
Yeah, I know.
I heard.
Why do you go back to COVID?
I thought we were past that stage of our lives.
It's the hip thing to do now.
Now it's like ironic COVID, which is probably very upsetting to joke about for anybody,
you lost anybody.
So don't mean to make light of it.
Sorry, everybody.
But, yeah, no, it's just been a whirlwind of,
events, you know, my
daughter's home because we can't have any caretaker
so my wife and I are juggling work
and taking care of her and yeah,
it's been a lot of fun. But what's more
fun, Andy,
is that we had our guest on previously
and
you know, in the nerd realm I think we all live in,
there's a special number, number 42
and the last time this guest
was on was 42 episodes
ago. So it's
awesome to have
a repeat at that level.
So I'm very excited for the episode.
But I'll let you take over
because you can probably figure out a transition
from this to our topic if you're on today.
Well, if I remember,
it had something to do from a science fiction novel.
And sometimes AI and artificial intelligence
for many sounds and is like science fiction, right?
It's like something that is awesome that you want,
but we cannot explain it.
And so without further ado, we are talking about AI, the topic that is on everybody's mind.
And just by putting AI on the title of our podcast, I assume this is going to have a lot of hits,
a lot of people that are listening in, but they are probably wondering who is our guest.
And therefore, I want to pass it over near.
Thank you so much for being here.
Maybe you want to introduce yourself for those that didn't listen to the 42nd episode before this one.
so they can go back to
what was it
299 right
199
so they can go to the 1099
episode
listen to it again
yeah so I'm
by the way
42 is like
it's from the
hitchhiker
hitchhiker
hitchhiker's guide
to the universe
it's like the answer
for everything so
yeah and it was created by
it wasn't deep mind
no no no
was deep mind
deep mind was the computer
on the ship
but the earth was a computer
super computer
to figure out the meaning
of the universe
so it ties into the AI
very well Andy
So you are here
So who are you?
So I'm near
I'm the CEO of Trace Loop
and we're building an LLM
Gen.I Engineering platform for
Eval's monitoring and
everything you need to get your up
and running in production.
I remember last time
when we talked and I just have the episode
in front of me. It was called
Open LMATRI
and we had I think a tonguebreaker
because we tried many times
to figure out how to correctly pronounce it
I think it was just too.
You still missed it up.
I know. Open L-L-Ametry.
Open L-L-Alemetry.
So for those
that have not followed
what you're doing
and what open elemetry is all about
can you quickly
teach me how to
correct it pronounce it and just explain
what problems it solves?
So it's open
an LLM3.
And it's all started in the summer of 20, 23,
where we were playing around with OpenEI.
I think GPT3 came out a couple of months ago,
and then GPD40 was right around the corner.
And we were looking for ways to kind of understand better
what's going on with the LLM.
like we were like running these complex kind of workflows and agents
and it was kind of difficult to understand what's going on.
And so we were looking for a way to just see, you know, observe what's coming in and out of the LLM.
And we knew about open telemetry, which is a huge open source project from CNCF.
And we thought, okay, it would be cool to just take open telemetry and extend it to support LLMs.
And it was like the hot new thing.
And so we started building open elementary, which is basically a.
set of extensions to open telemetry to support
Gen AI. Back then it was LLMs, now it's Gen AI because we have
vision models and we have image generation models and audio and voice
and whatnot. Yeah, so that's that.
I think I remember when was the last time we spoke, but
you know, since then we've like the project has grown.
We've built more and more kind of adapters to many different LLMs and it's now
kind of slowly becoming also part of the
open telemetry, the bigger
open telemetry standard than
project.
I was just going to say quickly
in terms of adoption,
not trying to plug our product here
in any way, but as we're
learning and coming up to speed
on the sales engineering side on
our capabilities
of monitoring LMs
and generally AI,
it was told to me that's like, oh yeah,
it's because all these products now,
chat and all the others have this stuff baked in.
So a lot of it's already pre-instrumented for us, and we just have to collect the data
because there's always the big question, well, how are you going to monitor those systems?
And what I think is really fascinating is this was always the promise with open telemetry in
the first place, that vendors were going to bake in telemetry code to their systems
so that you wouldn't have to go do it manually, and it really seems to be being embraced by
the companies that are using open lLemetry yes so awesome you see it like in real life just
taken off yeah we've been we've been focusing we've been working closely with a lot of like
other vendors as well like we have a working closely with Versailles I think one of the
coolest thing is that if you use like Versailles for example for just using AI in typescript
then you have open telemetry baked in and it's using our standards so then you can just you know
We just, it will just work.
You don't need to do anything.
So it's pretty, we didn't need to do anything.
So it's pretty cool.
Yeah.
One quick technical question.
I mean, you partially just answered it, but if I am building an app that is interacting
with an LLM, am I implementing them?
Am I using open LLM metric to then extend your calls to the LLM?
or is it that Brian alluded that chances are very high
when whatever LLM I'm using
that it already is pre-instrumented
and I don't need to do anything else.
What's the current state?
Yeah, you don't need to do anything else.
It's instrumenting all those kind of LLMSDKs automatically
and it's like for the last like two years
our battle has been like just keeping up with technology
because you know opening eye
they come up with like the responses API.
Okay, we need to support that.
Then like function calling,
structured outputs, vision models,
and like just keeping up with those ever-changing API
so that people who use the open source
would just get this kind of magic
that, you know, install the SDK.
You don't do anything and, well, you get a full trace.
So basically, basically you walked the hard path
what many of the APM vendors years ago walked
when we all implemented agents.
And every time when a new Java
version came out and Java runtime, any new framework versions, we had to make sure that our
agents are instrumenting the latest and greatest and that we don't break things and that we
capture the right things.
And you basically did the same thing for all of these libraries and SDKs that now developers
are using to interact with LLMs.
And I guess because you have achieved such a great success and adoption, now more and more people
are just taking basically the work off your shoulders and they just instrument their stuff
directly. I mean, that's also, that's
perfect. Yeah. I mean, Andy,
if you go back again to, you know, as I said
before, the promise of open telemetry way back
was this, right?
And obviously, everything
moves so much faster in the AI
world, including this adoption, which is
just, you know, really, really cool to see
because I still see with regular stuff.
There's some people use open
telemetry, some people don't, but most of the
vendors haven't got on board and baking things in,
so it's easy for people, but
it's like, I don't
To me, it's just fascinating.
Like, this is a beautiful world seeing that.
It's like, bam, it's on.
And it's, you know, even if you use the vendor like us, if you could get it in there,
whatever technical challenges there may or may not be, you know, we would probably be picking
up the basic stuff, right, in terms of the instrumentation.
Entry point, entry point.
There may be a couple of key things because that's, you know, all you really know of.
But the idea here is that these vendors know the important deep bits of code that you need
to be exposed to.
And they pre-instrument it.
So you're getting a custom built, and I'm saying this, I guess, more for our listeners, because I guess we all know this stuff, right?
But you're getting a very specific set of instrumentation as deemed important by the vendor who knows the code.
Right.
So then all you have to do is ingest that into whatever your choice is, hopefully us, but whatever.
And you have that deep set of data, which is fantastic.
And it's all because of you.
and your team that you've been working on.
Can I brag?
Can I brag about the six of the...
I don't know if we talked about the open source status last time,
but it's been growing pretty quickly since then.
So can I brag?
Yeah, of course.
Talk about it.
Yeah, yeah.
So I was just pulling my, like, the latest from GitHub.
So as of today, we have, well, we've just crossed the three million downloads a month.
We were like one million just, I don't know, a month ago.
and we have more than
83 contributors
6.2K stars
and
I think
even our competitors
in the AI monitoring space
even they are using
open elementary to instrument
like they tell their customers
hey you should use open elementary to
to instrument your applications
this is like my personal biggest doing
you know it's like okay this is this is something
real like people actually
people actually look at this and say
hey this is good I should use it
I know these are my competitors but I should use it
they've done a good job there
now why do you think that is
why do you think you are so successful
I think
I think the like the
the truth I think
doing those writing those instrumentations
is a lot of like 30 walk
and keeping up making sure that you get
instrument everything and it works well
with every kind of new
version of any
vendor
LMSDK is a lot
of really, really
boring
hard work that we've done
in the past two years.
And people don't want to do it.
Like if someone has already done it
and it works and it, you know,
you put in, you know,
tens of different LLMs and you get out
like a standard
set of kind of
spans and semantic conventions
and metrics, people like it.
You know, we don't, like we
we save you a lot of like annoying, boring coding work.
And also one last technical question.
Did you then basically contribute back to all of those SDKs?
That means you open up pool requests to all of these SDKs on their Git repositories
and basically changed and instrumented their code correctly.
What did you build some type of auto instrumentation that instrumented,
the SDK and you provide a tool?
What's the approach?
It's auto-instrumentation.
We try to work with some of the vendors,
but sometimes it's difficult.
There's like some of the code of the SDK is like auto-generated,
so it's kind of like more complex on how to instrument it within those SDKs.
So right now for most instrument, like, yeah,
all of these instrumentations are like auto-instrumented from the outside,
and this is like what we manage in our repository.
but we have some vendors that decided that this is how they want to instrument their SDK.
So, for example, IBM is maintaining their own kind of Watsonics instrumentation within Open
Nilemetry and also many like Vecl databases are also managing their own instrumentations in our repo.
Now, the next one is a tough, but important question.
What is the data that I'm actually getting if I use open netherals?
open l-l-metry what's the data and what problem does it solve so the data is
everything I want to say everything that you need but it's a prompts completions token usage
token data function calling if you're using a vision model you get the image if you're using
a vector databases you can see you know the query you can say the response you can see the scores
we basically instrument everything that we can
in the request and the response
and then we also kind of like define the standard way
like the names of the semantic conventions
that we should use for those
what you're getting is
visibility
is like you know let's say
I think 2025
24 people were talking about agents
like AI agents
2025 people are actually building AI agents
and pushing them to production.
And so when you push those to production,
you need a way to understand what's going on.
You have an agent, it's like going, you know,
working for two minutes doing something,
answering a question.
You have no idea which tools are running,
how much time each tool takes, you know,
to each call takes.
What's going on?
If there's a failure, what failed,
how can I, like, re-run it
and figure out what went more?
So all of this information is super important for people.
And it's especially important if you're using any frameworks like Landgraf or Mastra or a crew AI or many, many others.
They kind of obfuscate a lot of the data or a lot of the information.
They encapsulate the proms that they're sending to the LLM and like the exact structure of what's going on in during tunnels.
and as an AI engineer, as a data scientist,
it's super important to see those,
this information, like during development.
For me, Brian, this again reminds me,
and I think we talked about this in one of the last episodes.
It reminds me of the old Java applications
using Hypernade to access a database.
And actually then with instrumentation,
seeing the actual database queries that get executed
by this magic black box slash framework
that is converting my request as a developer
to the actual database statements, right?
And I think it's in the end
we identified so many inefficiencies
because it was a generic framework
that was create generically
but was never optimized for the particular use cases.
And I think you're explaining something very similar.
We're building very complex systems now
and there's a lot of magic and black box things happening
and so automated instrumentation
into those layers provides you the visibility
so that you can then,
A, understand, why does it take long?
Why is it so costly?
What is happening?
And I think the fourth one, important one, is how can we optimize?
Yes, exactly.
I think so the way this like AI development life cycle looks like
is that, okay, you begin, you know, you just build your first PLC version of,
let's say, your agent, like your chatbot or something.
and then when you want to kind of start rolling it out to users,
you need to figure out, okay, first, how can I make sure it actually works?
And then B, how do I make changes to it?
How do I improve it over time?
And to improve it, you need to understand where is it not working, like where is it failing?
And then when you figure out, okay, it's failing for this input,
so this is what I'm not going to have to fix it.
Now we need to make sure that it is still working, right?
you had like a completely big functionality built in,
and then you need to kind of like run some tests,
which are called eVALs in the name, like the Gen AIA space,
before you push it to production.
So you have, so I mentioned, okay, you have monitoring
to kind of monitor your application in production
and figuring and finding those use cases where your application fails.
And you take those use cases, you rerun them in your development environment.
You fix the bugs.
And then before you push it again, you run the evils to make sure that,
well, A, you fix the bug
and B, that you didn't introduce any new bugs.
It really reminds, I think, like, it should remind you, like,
a kind of traditional development, right?
You have testing and monitoring.
But there are some nuances around it back,
because, like, kind of, like,
the whole, like, kind of concept of testing in LLM
gets more complex because you have arbitrary text coming in
and arbitrary text coming out, and so it's much more complex to figure out,
okay, how do I, what's a right answer?
How do I test for a right answer?
And also if I run the same test five times with the same prompt, I make it five different responses, right?
Yeah, exactly.
Yeah, it's like it's super non-stable.
So it's not like a deterministic test that you call a function that adds to numbers.
You send it, you know, four and three and you get seven, right?
It will always be seven.
So how do we how do we validate and the accuracy of the data that comes back?
there's a lot of
kind of
common
techniques that people use today
I think the most common one will be
using another LLM
which is called an LLM as a judge
so you just
you take the answer, you take some context
and all the trace, all the input
and outputs that you send to the LLM
and then you kind of you want to run
another LLM to grade
your
response and there's a lot of like
I think there's a lot of
techniques around what you should do
and what you shouldn't do when you're using another
LLM to grade your main
LLM.
And it can be tricky
because you can get
like
you can get, you can
think that you get, you
got like a good judge
and your evils are working well, but then
sometimes, you know, you may
push your application to
production given it like your judge said,
yeah, everything is good.
Go ahead.
You push it to production and you realize,
oh, wait, I didn't, I didn't test for something.
Or my judge told me that everything was fine,
but it actually isn't.
I just, like, the judge wasn't working well.
So it's a, it's a much, like there's a lot of moving parts
that you need to build and stabilize.
I mean, for me, it feels like, obviously we are,
we're mirroring the real world.
And I give you, what I'm saying is, right,
if I post something on LinkedIn and I make a statement,
Then in my bubble that I'm living, I probably get, because I live in a very specific bubble, I probably get a lot of positive response because the people that I connect to, they have similar backgrounds.
And once all of a sudden I go out of my bubble, then I realize that actually everything that I have learned and everything that my judges told me that everything is fine, might not be fine, outside of my context.
Yeah.
And so that means if you're building system and we're testing,
it with other LLMs and other agents and they're all trained on the same limited scoped
information or bubble information, then we're just mimicking exactly what's happening in
the world, right, with humans.
Yeah, exactly.
And you know, it's like sometimes when we talk to our customers, sometimes they even ask
us, like, we tell them, okay, here's the platform, you can build your evals, you can use
them, you can train them, but then they get stuck because they're like, okay, but
what how do I know if some like how do I even know what's a correct answer like sometimes
even the question of like what's the what's a good answer versus a bad one is a really
complex one to answer in the world of LLMs right it's like imagine you have a support chatbot
I don't know you you're you go to united.com you go you they have I don't know if they
have something like but let's say they have like an AI assistant that helps you book flights or
something then you go there and you ask it to book you a flight.
and like I want to fly from, I don't know, Vienna to New York.
Okay, great.
And then he asks you, okay, when do you want to get back to Vienna?
And then another agent asks, and then another engine, instead of asking you this question, just gives you, hey, here's your flight.
There go, this is a button to book it.
Which one is the right answer?
Which one is the wrong?
Like, are both, like, are both just right possible answers?
like how do you how do you even know which
which one is good and which one is bad
if you got if you got
the first one in one version of your app
and then with the other one in the another version
if you're up is
is your up like getting better or worse
like what you know it's kind of
yeah but the analogy is interesting
because it can happen that you call
whatever support hotline today and tomorrow
you call again you get a different agent on the line
a real human and that your real human
might just be on training
or a complete expert
right, and follows a different path here.
Yeah, exactly.
On the United example, sorry, Brian, just on the United example,
I think if it's United or some other U.S. airline,
but I remember they have a new ad campaign
where they say, we actually have real people behind the phone line
and you're not being treated by an AI.
I have something contrary, and I would love to get treated by AI.
I think, like, AI is easier to work with
because like it's like it's if you if you have a specific task sometimes i can just you know
help you easily more easily than some agent that like will will be slower because it like
works on multiple conversations it's like uh i think it depends on the air they're using if it's just
the chat bot that we've been used to for years and years and years which are pretty awful right
it's funny i actually had an issue with a package from amazon yesterday came empty um
And so I finally got on with the chat thing to try to get it resolved.
And I was trying to tell it, like, I got the package, but there was nothing inside the package, and there was a hole in the envelope, you know.
And it was like, all right, don't worry about sending it back.
We understand that was, you didn't order that.
So we'll refund you.
I'm like, okay, well, that's not the real resolution, but you're saying you'll refund me.
And I don't have to send it back.
So it's the outcome.
but not because, like, the words don't match up to what it was.
I actually got to still see if that went through properly.
But this also reminds me of another thing,
especially with LLMs that I saw recently about, as LLMs, like, similar to what you're talking about, right?
If you're saying, like, let's say you're dealing with a human, right?
Humans often going to have a script and stuff they do and don't do, right?
And the more they know, the wider their breadth of knowledge is and experience within that thing,
then more they know how off script they can go and what things they can pull in all.
And you might get a better result, right?
But a newer person is going to stick to the script and follow that script.
And, you know, just like with my Amazon experience, it's, yeah, I might get to the outcome I want,
but it's going to be like, okay, that wasn't really true, but sure, right?
At the end, with LLMs, I was seeing how, especially with search engines, right?
if they're training on data on the internet and then all the data is created by AI on the
internet and that data becomes homogenous meaning all the same then the LLM continues to learn on
that same data and the variation dies right so when you talk about these philosophical problems
with AI this is one of those ones as well that link into I think some of this where it's like
all right, how do we keep the data set
rich where it's not like AI
feeding AI, the same stuff that it's been
refining and refining until there's just one
path and no other alternative
component on it, right?
And this is, I guess, not really for here, but
it was just a really interesting tie
in to a lot of this kind of stuff.
It reminds me
of a paper I read. I think it was a while ago
about the fact that they've
done, like, they run some tests around
like training, like how
AI how LLMs are trained
and they saw that like
the more like if you
so today you know you go
when GPD
let's say 3 was trained
most of 99% of the internet was like
human generated content
but now I don't know what percentage
is like but it's
much less than I would say 90
I might argue that it
becomes it slowly becomes like 50 or 60%
is human generated
and the rest is like AI generated
you know imagine you know
LinkedIn posts
blog posts, everything today.
Everyone is just using AI.
And so they run
this test of like, okay, let's take
we begin with an LM that was trained
only on human data, human
generated data, and then
we use the LLM to generate AI
data, and then we train at the LLM
only on the AI generated, like the
synthetic AI generated data. Then we repeat
this process and around, I think, after five
times, they actually saw like a
huge degradation in performance
looking at like the, you know,
AI was only trained on AI data, AI synthetic data.
Right, right.
One other thing I do want to bring up, though,
because this is going back to the beginning of this conversation,
you were talking about AI agents, right?
And I had another thought, and I don't know if it ties in directly
to the same concept of an AI agent,
but when we talk about possibilities of AI and all this,
Andy, this I think is especially interesting for us,
or actually even the open element,
LL Emetry, and I'll say, if you're having trouble with open LL.L. Emetry, think of LL. Cool J,
one of the original hip-hop artists. It's LL.L.Metry. L.L. CoolJ., L.L. Emetry. So there you go.
But I was just imagining an AI-based agent where when a slowdown occurs, let's say, in your code,
the agent can automatically tune up instrumentation in the area of the slowdown to get more depth
and more specific instrumentation
and then turn it off when it's not, right?
So, you know, it's like adding manual instrumentation
but doing it in the spot that was identified,
doing it automatically so that it's not always on
and then turning it off.
And that to me is just even a fascinating concept.
I don't know if anything's anywhere near that kind of level of thing,
if it's even come up as a thought to start exploring.
I can imagine it would be very difficult and risky in the beginning,
but you know just the possibilities of all this stuff are are quite insane now i would
argue i would argue that you don't need an a i and an agent for everything this use case for me
more screams like you you build in telemetry already and then you turn it on selectively like a log
level from info to debug when you need it right or like our life debugging capabilities that we
have you would then just turn it on because the instrumentation has already it has to be
there, but you may not capture it actively because you don't need it always.
So I think it's a great use case, but I think what's important is that we also understand
where do we really need the power of an AI and the complexity and the cost of an AI
and where can we solve a certain problem maybe in a much more straightforward, simple automation
way.
You just pooped on my idea, but that's okay.
No, I just gave you a different perspective.
But you know, I live in my bubble.
Years makes more sense.
Hey, Neer, I have another question for you because you brought this up in the preparation
of this call.
There's some new exciting news, some stuff that you have open sourced, something about
a hub that you are proud about and that you should talk about.
Yeah, we were working on a new exciting open source project.
We called it the Traceloop Hub.
It's an LM gateway.
The idea is that you, when we were talking to a lot of companies, you know, okay, you want to instrument your code, right?
You want to see what's going on with your LLMs, but imagine you're working at a huge company.
And so it's not just a single service that's using an LLM.
You have like many, many different services.
And you want to instrument them all.
So one way is, okay, to go to each one and install your SDK.
But another option would be to take our hub, our LLM gateway.
and deploy it once in your system and just route your entire
LLM traffic through that gateway.
And so you're getting the benefit of great traceability.
You will see everything that's going in and out of the LLM,
which is great for audits and whatever you need to do in order to actually see,
what are you using your LLM for?
And because it's an LN gateway, you can kind of like also use it for load balancing,
switching between models, kind of getting a unified API,
for all of your LLM models,
all the LLM models you're using.
But, you know, it's a single point of failure.
So it's kind of like me as like as an engineer,
I was like, I don't know.
I'm not sure if I want to do it.
So we decided to build it in Rust.
So it will be, you know, super low latency,
super, super reliable and really small, you know,
footprint, no garbage collector, nothing.
So it's like this, Rust is a great language
and building stuff on Rust,
built gets you
super reliable services
and this is
this is what we've done
with the Trace Loop Hub
and we've been
working extending in it
for the last couple of months
and hopefully we'll also
like kind of release it to the public
it's already available
but we haven't like announced it
you know
now I got a
I got also ask a critical question
a similar response
to what I had to Brian's idea
but I remember
at last CubeCon
there were several other vendors
that basically talked about AI
gateways, LLM gateways.
Isn't it essentially the same idea?
I mean, aren't there really products out there?
And why not partner with them
and instrument their products
and come up with your own?
So we've done.
We haven't invented the concept of an AI gateway, of course.
And we've partnered with a lot of other AI
gateways and they are also supporting
open telemetry.
The problem, and I don't want to mention anyone by name,
but it's like from our experimentation,
our testing, there were
not reliable and slow.
And so when we were talking to our customers
and we recommended those solutions
for some of them, we were like,
it didn't work well for us.
And so we wanted to build something
that we can be, you know,
we can feel comfortable promoting
to our customers and we, you know,
get, we have the guarantee that we know,
this is working, this is, this is reliable.
This is not affecting your assistance latency.
And nobody was,
nobody has done anything like that in Rust
and we figured okay this is
yeah this is the way to go
yeah it also makes I mean
to end on a positive note here
right I guess if I look at the
other vendors that I know that they've built
something that is they came from somewhere else
they came from an API gateway background
and obviously they optimized all of their
routing and all the stuff based on
different use cases and then they added the
LLM use case on top
you on the other end you have a lot of experience
exactly with that type of traffic and
So you could build a custom purpose-built solution to solve exactly that problem in a much more efficient way.
And there are things you can put in an LLM gateway, which you cannot put in like a normal API gateway.
For example, guardraise.
You want to block certain responses on the LLM or don't send out software with LLM.
It's like it's really LLM specific.
Yeah.
And then I do have one more technical question, though, on this one.
That means in an organization, if I would use your hub, that means that means,
means I would only
obviously route that type of traffic to you
and I would route all the regular
traffic still to my existing API Gateway
so it becomes a component
that sits, I guess,
either behind or on the side of your
regular API Gateway. Yeah,
exactly. Yeah, it's completely
open source. We love building stuff which are
Apache 2. So it's Apache 2
and it's open source, which is
my dream has always been to
build open source projects and so I'm so
I'm so lucky that I get to build, you know, open elementry and now another one.
Yeah.
And obviously your success shows that the whole open source, and especially, as you said in the beginning,
you did the tough, dirty, quote-unquote boring work that nobody wanted to do,
but everybody now benefits.
This was leading to the fact that you became the defective standard because everybody was
using you because it just worked then out of the box until you end.
you reached a point where people actively came to you
and implemented instrumentation based on your framework
and everything is open source and, yeah.
Exactly.
Now, how do you make money?
Because in the end, you need to pay your bills.
Should I?
Oh, okay.
I don't know.
A little bit.
It's interesting because I want to, no, I want to,
I mean, this podcast, you know,
it is all about thought leadership and we don't want to use it to promote
too much any commercial products.
But on the other side, I think it's,
it's interesting, right, because we live in a world that tries to figure out how can both
worlds coexist in a way that it makes sense, right? Open source and commercial. So what's
your angle? So we have, yeah, of course we have like we have our paid platform and we have a lot
of customers using the platform and paying us money for using the platform. And the idea is
that what we see is that, you know, you use open elementary for what I like calling.
visibility. So you're just seeing
what's going on in your system.
But once you hit production, you start reaching
this scale where it doesn't
make sense, where seeing is not
enough. Because
you know, in your development
environment, you can just go to this trace and
just view the prompts and completions. But if you have
millions of users, hopefully, you have
millions of users. And then you just
you know, you get these billions of traces
and how do you make sense
of that? How do you know which ones you should
look into and like which one
is interesting, which one contains like an interesting conversation of a user that you should
debug or something.
And when we started the open source and we started like building also the platform, it just
visualizes the stuff we had on the open source, which is like the V1, we saw that a lot of
of the users we had back then were using, when they hit production, they were like just
manually, you know, clicking on traces because they really, they were really curious to see
what users are doing with their new
shiny LLM product
but they didn't know which one. They had like
millions of traces like click here. Oh, interesting
click this. I'm like, okay, maybe
I can build something that will like
point into the right direction like another layer
of insights that will help you understand
and figure out from the mess. Okay, this
are the traces that are interesting.
These ones failed miserably. This one
had errors. You know, you should look
into these and this. And so
and this is kind of how
the you know big kind of paid platform came into life and this is this is the first thing
that we offered like real-time insights and monitoring for your LM applications so pointing
into the device traces and now we're also we've also rolled out kind of like the completing
pouches the evils like the offline evaluation feature so that means you are you're doing pattern
detection you're you're detecting certain areas hotspots and as you said nobody can look into
millions of conversations you want to understand if certain things are changing based on patterns
maybe as a new topic that you never thought about people ask about or there's a certain area
where there's more failures here yeah exactly so it sounds like overall it's uh people people jump
into this you know world of AI doing things they collect a bunch of data and then they say okay
now what and you're you're the you're the now what like what do we do with this how do we
analyze this how do we understand it how do we make sense of it
I think that's important, right, because all this stuff is hard enough for people to keep on top of, right?
And, you know, we've seen in our own models, too, we're sometimes offering these services to help analyze or to help do this stuff is very important for people because they're like, I have a million things to do.
Can you use your expertise in platform to do that?
Yeah, it makes a lot of great sense.
We even see this way back from, you know, going back to the idea of open source.
You know, way back when Linux came out, right, and this company Red Hat launched,
and then everyone was like, well, if it's a free operating system, how is there a company around it?
It was all more about the support, the setup, the maintenance, and everything else around it.
So I think we see these models a lot with open projects.
Right.
Right.
Yeah.
I got to ask you one tough question, though, on this.
Uh-oh.
Because in the end, you are becoming an observation.
Yeah, no, sorry, but I assume I get the answer what you give me, but I want to still ask it.
So what you're explaining to me is that you are becoming an observability backend
because you're analyzing all these traces and logs and metrics that are coming in.
Obviously, you have a lot of expertise by exactly analyzing this type of metadata
that is part of your semantic convention.
but if I am an enterprise and I collect my traces
and I already have my existing observability platform
do it and make a decision to send it to both to you and to them
or to your child do I route the specific traces from the LLM apps
just to your system but then potentially lose some of the other capabilities
my observability platform gives me what's it feels like there's a lot of overlap
that's a great question
and I think
and I don't want people
to use two different
observability platforms
so I think you should see
all your trade like all your complete traces
in a single kind of like your
your APM or whatever observability platform
you're already using for your cloud environment
and what I see that
we do is that we kind of
we connect to the same stream and we augment it
and they give you more information
based on our expertise
of understanding, you know,
LLMs and the responses and everything.
And then we kind of can,
you know, route back the insights
to your main observability platform.
So you can see you can have like a single pane of glass
and a single dashboard with all the information.
So latency for databases,
but also a quality for agents.
And then, you know, the other part is like
Trace Loop as a development platform.
So once you want to make fixes
and make changes and improve your application,
then, you know, you just use,
we use the same stream of data
to generate those test sets
and data sets you can use
for running evils
and improving your application
which is kind of
unrelated to form of observability.
Yeah, yeah.
But I like your initial response
that you basically say you are,
you provide the expertise
to augment
the data with your findings
so that if you decide to do so
you could still visualize this
in the Grafana dashboard or in the dinah
or a data dog in your relic dashboard, right?
But basically you're enriched it with insights.
Yeah, that's cool.
Yeah, and we try, like, our kind of metrics and everything,
we try to make it standard.
So you can export our metrics in like, you know,
PromQL or Open Telemetry and then just, you know,
you can just connect it today to Grafana and Dana Trades,
whatever you want.
Yeah, cool.
We're nearing the end of our recording slot,
but I have one final question for you.
It shouldn't be too tough,
but it should be hopefully very helpful
for all of our listeners.
If I am an engineer and I'm currently starting
or I'm responsible for an AI project,
what are the two, three things
that people need to watch out for
or to ask you the question,
what are the top problems that you see?
What are the top mistakes
that you see in most of the data that you've analyzed?
So I think
the top mistakes, interesting.
I want to look at it from a positive perspective,
like what you should do.
Maybe then I can also go and think about what you shouldn't do.
But like, what you should do is when A,
you need to think about your kind of like
evil and monitoring solution early.
Some people kind of like, let's just put something out there
and we'll see.
And I think it's the wrong way to look at it
because if once you hit production
and if you don't have those kind of tools in place,
you have no idea, you know,
why your users are not using your Gen.A.I.
Like, shiny Gen.A. feature.
What's going on?
What's, like, is it working or not?
How do I make changes to the model?
Like, you know, open AI deprecates your model.
They need to upgrade what you do.
And so thinking about your evil story,
thinking about, you know,
how do you measure the quality of,
your application is super
important. It's like almost
I would almost try to think about it
like as a TDD you know, test driven
design. Begin with figuring out
okay how do I test it like what
what's the goal, what's the outcome that I expect
and build a
dataset with like examples of inputs you're expecting
and
and then work with that
walk alongside that, you know
as you build your application.
And B which is
closely related to that, you
needs to keep, you need to keep it up to date.
Some, I, we've been talking to so many teams where, you know, they built their
data set and they, they kind of like, how they change it, how, it's like their golden
data sets, like, this is a set of examples that you know, we know people that might be
using with our app and that's it, but you need to keep updating and you need to constantly
keep getting more and more fresh new examples from production and kind of like refresh your
data set that you're using for your evils.
cool awesome I'm taking notes because I also want to make sure that these things also make it to our show notes including obviously the links to trace loop to the I found the link to the trace loop hub any other I mean obviously will link to your LinkedIn if there's any additional links any getting started good tutorials any good videos obviously just let us know and we'll add it to the show notes as well because this
We hope that our listeners will then want to know more
and therefore we want to give them stuff to follow up.
Cool.
I assume this topic will not go away anytime soon.
It's probably going to grow more and more.
Brian, can we do the math in 42 episodes from now?
What's the episode that we need to ask him back
for the next update.
I should be able to do it in my head,
but I can't, so
I will do it.
Looks like 283.
283.
So near, mark your calendar.
Probably 42 is about
in 8 months,
in 10 months or so.
We may have you back
for another update
because this is an exciting
and ever-changing
topic.
And I think it's just,
as we see all these numbers
going through the roof
in terms of people
that warner,
that are implementing projects
looking at organizations
like companies like yours
that are getting great funding
congratulations by the way
on the latest funding round
it's really cool
and yeah we are here
to spread the world
and hopefully
give people something new
to think about
and new insights
so that they can become more successful
with their AI projects
maybe we can make predictions
eight months from now
is really like
June
2026
so what
you know
what's the question
is how many
million downloads
how many
million downloads
will you have
you mean
how many
downloads
billions
billions
but like
but like
I don't know
GPT5
will it be out
GPT6
we're like
what will be
like what's the
AGI
will we reach
AGI by
June
2026
yeah
what's AGI
I don't know
artificial
like this is what
you know
Sam out
I've been talking about it all the time, it's like, okay, we want to reach AGI, which is like
artificial general intelligence, like the AI that can do everything, potentially, yeah, potentially
to us all, I don't know.
That or, you know, will there be a new model in play?
You know, we keep on talking about GPT and all that.
We've seen from history all the time, like, you know, promising one in the beginning
suddenly gets left in the dust by something else, right?
Or, and hopefully not, I don't want to sound pessimistic, something dramatic happens,
and for whatever reason we have to completely stop everything.
Back to paper and pencil?
Back to paper and pencil, yeah.
Interesting.
I think it's a no for AI by 2026, by the way.
Yeah.
But yeah.
Cool.
Now, thank you so much.
It's really, it's exciting because most of the conversations that we have these days
is somehow always at least touching.
this topic and it's great to know that people like you out there are willing to share insights
building great tools and yeah thank you so much yeah i think it's also great too like i've
from way back i always had this banner i would carry but not do much about that like antivirus
should be free for all computer users right like why are we you know like the idea of making
the experience easier and less stressful um and that's exactly
what you're doing. As you even said, like, you did all the dirty work that everybody else
didn't want to do. You and your teams are right. And as a result, there's all this efficiency
coming out of it. There's all this telemetry coming out of it, which is making the adoption
better, more effective, more useful. And it's just helping push this along in one part of the
trajectory, but a very, very important one. And if it wasn't for people like you and the project
that are contributors
contributing. See, I can't say
contributing. We'll have to say open LL imagery
at the end. But if it wasn't
for this, right,
and you making it open source,
right? It's the beauty
we go back to of the beautiful side of
the tech industry of
let's share knowledge, let's put it out there,
let's make it all better for everybody, right?
So I really, really appreciate what you're doing
there and I'm sure everybody else does.
So Andy, before we go, can you
say it?
Actually just writing it down as an opener.
Have I finally figured out how to correctly pronounce
open at a lemmetry?
Hey, there you go.
This is good, yeah, yeah.
Will LL CoolJ still be alive at the next episode?
There's another question.
All right, really appreciate the time, near.
Always a pleasure.
Great thing here.
Hope to talk to you soon.
Thank you, everyone.
listening to. Bye-bye. Thank you. Bye.