Tech Brew Ride Home - (Profile) Traceloop
Episode Date: June 9, 2024Get observability for your LLM application at TraceLoop.com Learn more about your ad choices. Visit megaphone.fm/adchoices...
Transcript
Discussion (0)
On April 4th, 2023, around 2 in the morning, a man was found stabbed multiple times on a sidewalk in downtown San Francisco.
Hey, who did this to you?
What happened next turned the story into a political firestorm.
Reports have identified the victim as Bob Lee, the founder of Cash App.
From Bloomberg Podcasts, this is Foundering, the Killing of Bob Lee, beginning April 16.
Welcome to another weekend bonus episode of the TechMeme Ride Home podcast.
I'm Brian McCullough, as always.
This is a portfolio profile episode.
Haven't done one of these in a while.
This was actually one of the first AI investments that I ever made, almost exactly a year ago now, but before the AI fund was stood up.
We're talking to Trace Loop and Nir Gazit, the founder of Trace Loop,
Neer, thanks for coming on the show.
Thank you.
Great to be here.
Let's start off with just sort of the elevator pitch of what Trace Loop does and why I jumped at it when I got introduced to you.
So Trace Loop is a platform for monitoring LLN applications in production.
So basically detecting hallucinations, monitoring token usage, and everything.
thing that you want to know about how your LLM is performing in production.
So it's observability for LLMs.
It's like data dog for LLMs in a way.
But people are familiar with finding bugs and getting notifications of things that aren't working.
What is different about doing that in a deployed LLM environment?
Are you able to not only see
oh, this isn't working or there was a crash or a bug here, but also this output is bad because
the, I don't know, the input was wrong or the weights are wrong. Are you able to get that level of
granularity in a deployed LLM? Yes, and I think the main challenge here is that it's really hard,
if you get like an arbitrary generated output from an LLM, it's really hard to tell what's a good
output versus a bad output. So what people are usually doing is that they're using this method called
LLM as a judge. They're basically taking another LLM and using it to score the results you've gotten
from the first LLM call. But you can't really do it in production, right? Because the costs would be too high.
You're going to basically call for every LLM call you're making, going to make another LLM call just
to validate the first call. So what we've built is we basically, we basically,
basically build a set of metrics ourselves that is able to accurately score results without using
LLMs. And we've seen it correlates well with human feedback and LM as a judge of metrics.
And you all are you're using open telemetry as well, right?
Right. So I think we basically pioneered the usage of open telemetry to monitor production applications like LLM production
applications.
So again, this is real-time stuff.
So I have an LLM out in the world.
Users are using it, and I can see, it's almost like,
that's got to be a huge help in terms of refining your model,
because it's giving, it's doing the dog fooding of people
actually getting the hallucinations or doing inputs that
make the model behave poorly or something like that or badly.
So is this basically applicable to anybody that's working on an LLM right now?
Yes.
And I think the nice thing about using open telemetry, and like our open source is called open
LLMetry, so you can understand the...
Did you come up with that term, open LLMetry?
No, actually, we saw it on Twitter, on X.
And some guy, like we were mid-development,
we were developing the open elementary package.
We didn't have a name yet.
And then someone wrote like,
waiting for open elementary.
And you were like, okay, this is a good name.
So I wrote to him and I asked him,
hey, can we use this name?
And he was sure.
So now if you go down the report,
the read me, you would see,
we linked to that tweet from back in September,
I think last September.
So it's as simple, if someone wanted to test this out right now,
it's as simple as just putting a snippet of code in and boom?
Yes.
So basically, you just one line of code,
you install the SDK, and we instrument,
we basically connect to all the SDKs of all the LLM
providers, vector databases, frameworks
like Langein, Lama Index, and others.
So it's one line and you get metrics and traces
just out of the box, just like that.
And because it's using open telemetry,
you can connect it to any observability platform that you want.
You can connect to Trace Loop, and then you
get another layer of the quality measure
that we've talked about before.
But you can also connect it to Datadog or Nurelic or Dinah Trace
and you get traces and metrics.
So I haven't mentioned this yet.
But if anyone wants to check it out while we're talking,
it's Traceloop.com.
Also, I'll have links of that.
the show notes, of course. Tell me how this idea came about in the sense that the current
sort of generation of AI is only a couple years old. But you had been working with machine
learning and chat-outs for a while. Is that right?
Yeah. So we actually was working at Google with
LLMs before they were that large.
We were building, we're trying to build a chatbot with Bert, actually.
It was back in 2019, I think.
And it was really difficult.
It wasn't like the quality was miserable.
It just didn't work that well.
And so fast forward, you know, to last year and, you know, GPT3 came out and everything.
Everyone was starting to use it.
and we thought it's a good tool that we can use to build.
We were really actually something else back then.
We were building a platform for generating tests automatically.
And we thought, you know, unit testing, front-end testing,
back-end testing, end-to-end tests.
And we thought, hey, like, GPD can be really good tool that we can use
to generate tests.
So we built this kind of crazy agent architecture where we had GPT trying to understand your system
and using tools to start writing the code and then testing it and making sure that the test passes
and it's not blakey. And it was pretty cool, but there were two problems.
One, it wasn't working that well.
Like the outputs, the tests that we were able to generate were not that impressive.
They were pretty naive or not working that well, too many, you know, too flaky,
too many bugs or I don't know.
And then the second thing is we had no way of monitoring that agents' execution.
At some point, I remember like working on this, generating just one test for this one customer.
and I let the agent run for a while.
And suddenly I noticed that just this single test generation cost me like $50.
Right, because like you described, you were using an LLM to monitor the LLM,
so you're calling out twice, essentially.
No, I was using the LLM.
The agent was based in LM, so it could make an arbitrary amount of calls to understand the new steps.
Right.
But the point is that that can run up a cost pretty quickly.
Yeah.
Yeah.
So we were looking for a way to monitor it.
So we basically built something like Trace Loop, like as the first version of Trace Loop internally,
just so we can understand what's happening.
And then, you know, the quality issues came in later because we wanted a way to make, you know,
make the agent work better and better with time.
And then at some point, we got to a point where we were like spending 60% of our time
building the monitoring tool.
And then 40% just building the agent that was actually our product.
And then at some point, we realized that maybe the monitoring platform is the thing we should be building.
So you mentioned that you had been working on LLMs at Google.
You're still a small team of like, what, four still?
Yeah, we're four people, me and my co-founder and then one engineer and one data science.
I'm going to ask you, don't let me forget to ask you if you're hiring at the end of this.
But the, so when you decided, okay, this is again that moment when Chris and I came off the bench and things like that,
like, this is an exciting thing.
We've got a tool that people are going to need.
What was the decision-making process in terms of, all right, let's go ahead and do a startup.
Had either you or your co-founder done a startup before?
No, this is our first try.
So me and a co-founder, we've known each other for a while.
And we always wanted, we were even roommates at some point, and we always wanted to start a startup.
And it was always like the bad timing for one of us.
Either, you know, I was at Google.
I was happy.
Like, why would I quit like a good job at Google to start a startup?
And then, yeah, Gall, my co-founder was doing something and he couldn't, he couldn't, he
couldn't really do a startup.
And then we ended up both being at Fiverr.
Guy was there before me.
He was a group manager.
And he convinced me to join.
So I joined like three years ago as a chief architect.
I actually left Google to join him.
And then we were in the same company.
So it was the right timing for both of us to decide to leave.
And OK, now we either do it now or we'll never do it.
So we quit and decided to just go for it.
So it was one of those things that.
you had, this was something you wanted to do.
This was, and then finally just the planet's aligned.
And then you saw that the AI space was taking off, so you're like, yeah.
That's always an interesting question because people have different answers.
Sometimes, like you said, why leave a good job at Google?
And then it's like this fraught decision.
And then other people, it's like, well, this was always in my blood.
It was just a matter of when and what.
That sounds like you were on that end.
Yeah.
You also did Y Combinator winner 23, I believe.
I always like to ask people about that.
Two parts on this one.
The second part being, tell me about your experience
with Y Combinator.
But also, did getting into Y Combinator also add impetus
to like, all right, let's do this.
This is for real?
No, so we actually quit our jobs before, even
applying to YC, we were starting to work with like no funding, nothing.
We just, I wouldn't say garage, we just said that another friend's startup offices.
And then we applied to YC and was like, we probably won't get in because it's, you know, the chances are slim.
But let's do it. Like it was the last day to apply.
So we just, we applied and we said, okay, probably won't get in, but it's a, we know, it's a,
It's a nice shot, you know.
And I remember we were so sure we won't get in that it was just before the holidays.
So me and the co-founder were about to like, we just, you know, we quit our jobs as fiber.
So he said, okay, let's let's, you know, go for a vacation.
It should be probably be our last real vacation before we have a start.
So I was about to fly to Colombia and Gal flew to Vietnam.
And then there were some circumstances and I couldn't fly at the end.
So I stayed home.
And, you know, and God was in Vietnam.
And then two weeks after, I get an email.
It was like, hey, so we want to invite you for an interview at YC.
And here's a link.
Just choose the right time that works for you tomorrow.
It was a Wednesday.
I won't forget it.
It was a Wednesday.
And it said like, yeah, Thursday.
Just choose.
You do want to interview at 8 p.m.
11 p.m. or 2 a.m. I was like, okay, let's do 8 p.m. I think it's a good time. And I asked,
Ghal, can you join? He was like in the middle of nowhere in Vietnam. And he joined. And there
we were, like, within 24 hours, both of us like ready for this YC, you know, 10 minute interview,
infamous interview. And it went terrible. It went, it was so fast. It was just 10 minutes.
bartered us with questions and then it was done and you were like okay I think I think we
haven't I'm pretty sure we haven't gotten in and you were like okay fine we knew that this will
happen and then the next day we got a phone call from Aaron one of the partners he was like yeah
you're in why why did you think it had gone poorly because they ask you a lot of questions
they they don't let you sometimes they don't even let you finish the answer and it's expected like
If you search online about how the interview process looks like,
they tell you that it will be like that.
But during the process, you feel like I'm not
answering the right question.
So that's why they ask you more questions.
So you always feel like you're on a defense.
And you can't really explain well about your startup.
Well, tell me about the actual experience of going
through YC and Demo Day and the real day and the
resources they provide, like sounds like it was a positive experience one way.
And I've never heard anyone say anything bad about it.
But just tell me your takeaway from the experience.
Yeah, so YC for us, WIC was really, really good.
It was the first or second batch post COVID.
So we got a chance to actually, we flew to San Francisco and we lived there for during
the batch.
So first, you know, you get to meet all of these super talented founders.
And we're also kept in touch with many of them till today.
So it's a great network to be part of the beginning with.
And then it really helps you to focus on the right things.
And I think especially for first-time founders, nobody tells you how to start a startup.
This is the first time I'm doing something where I have no idea what will be the next step.
When I used to work at Google, I know how to start a startup.
to be a software engineer or how to be a tech lead.
I know how the day-to-day looks like and what is expected of me.
And then now as a CEO, I have no idea.
I have no idea how to do sales, how to do marketing, how to fundraise.
And so YC is a great school for understanding
how to become a founder.
What is it, I mean, again, you're a first-time founder.
So I don't know.
The question I wanted to ask is, what do you think the difference is about doing an AI startup in this moment?
And like you said, you don't have the previous experience.
But to the degree that you can extrapolate, everything's moving so fast.
Every other day, someone else is leapfrogging with a new model and new techniques.
And being at the cutting edge of a frontier technology, does that
make it harder to do a startup, I would think, yes.
Or also, is that also possibly the thing that you get the most juice out of is being on the cutting edge?
I think first, it's exciting.
I love the day to day and I love the fact that we are, as you said, like the cutting edge of technology.
The main thing that we see is that everything is moving so fast.
And when we started, we built like instrumentations for open AI,
when it was fine.
But then, you know, lots of other models came out.
And you really feel the market and how it's changing like every day.
You know, at the beginning, everyone was mainly using open AI.
But then Google came out with the Gemini models, model family.
So now everyone is using Gemini.
So we need to support that.
And then Open AI came out with the vision, with GPD vision,
and dPD4 vision so we need to support that and then anthropic came out with blood three and you need to
support that and then you know we have a every day there's a new model there's a new something framework
vector database and we need to support them all because people are using everything people are trying
everything and they expect everything to just work for that and and i mean like anthropic didn't even
exist when you got started so right i mean tomorrow there could be somebody that that you know announces
themselves to the world. Do you ever worry about a lot of the startups in this space,
you know, sort of the cliche is, well, GPT5 could come out tomorrow and obviate a bunch of chatbot
type startups or stuff. You're not in that space. You're more of a picks and shovels sort of
company here. But is that a concern that's unique to working in AI where it's like, well,
maybe what we do the next generation, which could come out next week, could
aviate us and we either have to go back to square one or pivot hard or something?
I don't think so because, you know, I keep hearing people saying about a lot of like AI,
Gen. AI startup that there are like a thin layer on top of opening eyes.
This is like something that people say pretty often.
But the thing is, like, I think you could also say the same things about the first web, you know, web applications.
Like it's just a thin layer on top of Linux.
I don't know.
Right, right, right.
So I would say, like, sometimes, like, the real, you know, the real, I would say,
a really interesting thing about these various AI startups is not just, you know, the real, I would say, a really interesting thing about these various AI startups.
just the technology, but it's the product around it.
Even around the testing, testing generation that we were building,
I think one of the main challenges were how to,
how does the product, how the product should look like for our developer that is
they're using it. Because for example, the generation takes time and you
need to somehow communicate the user and allow them to edit and make changes or
instruct the model.
And so, for example, I would say copilot,
like a GitHub copilot is a really good.
It's not just a good.
It's basically a thin layer on top of Open AI.
But it's really successful because it uses the right prompt.
And it has a really nice and well thought of integration
into VS code, which makes it extremely popular.
But at the end of the day, it's just a prompt.
Just real quick, because I did mention this
or talk about this on the show recently.
Based on what you're seeing, do people still feel that Open AI is the cutting edge?
I know that every other day, oh, we're equivalent to GPT4 and this one and that one.
But based on popularity and just the anecdotal, is Open AI still the winner,
at least in most people's minds?
I think, so the data that we see around the open source and the platform as well,
like the vast majority are using open AI.
Some of them are also using other models,
but the vast majority are using open AI and sometimes nothing else.
And I think it's an interesting question
to tell whether it's actually still is the best model,
like GPD-40, or does Anthropic, like a dozen tropic
or Gemini or others have comparable models,
models in terms of performance.
So I'm not sure if there's like clean, you know,
yes, right or wrong answer here.
And I would say that sometimes it's just,
it's a matter of like a trend.
Because I often hear people saying, yeah,
we use open AI because everyone is using it.
Like we chose open AI because this is what everyone is using.
I don't hear a lot of people actually evaluating
four different models and then choosing open AI.
because it was the best model for their...
Right. Or it hasn't devolved into that...
Oh, I'm a Mac-only person or I only developed for Android versus...
Like, it hasn't calcified into those warring camps yet.
Okay, interesting.
No, maybe there's like one camp of the open source models.
Right.
Yeah, like Mistral and Lama and like those folks that use these.
The philosophical divide of, I want...
I want to throw my lot in with open source versus a proprietary platform.
Exactly.
So GitHub stars, 1400 stars, over 1,000 organizations.
Actually, what can you tell me about the uptake you all are seeing right now?
For the open source.
Right.
For right, what you're working on.
So for the open source, I think it's been.
hugely successful. We were lucky because I think when we started, we chose open telemetry
because we were super familiar with this technology, but it was like, it was a hard choice at the
beginning because it's complicated our SDK by a lot. It would have been much easier to just
build an SDK for collecting the data that is like super proprietary and that's it, and focus
on the platform. But building an open source, open telemetry based SDK.
has proven to be successful because we were like the other of the ability
platform seemed to really like it from the beginning. So we had like it it's the
I think it was the second month we had folks from like new relic and data
dog coming and like contributing like trying the open source and then adding
docs explaining how to integrate it with their system or writing
blockpost in there, you know,
Nurelic wrote a blog post about building
with Open Elementary in
like one month after we
released the open source. So they really
helped the open source
succeeded at the beginning.
And yeah, so now
we have a nice
more than 30 contributors
who are actively contributing to
the open source. And this really helps us to
keep up with the
industry.
And you're seeing that
OpenLMetry is like Amazon, engineers and Amazon, IBM, Microsoft are all using it.
So it is getting uptake even at the big players.
Yeah, exactly.
We see like Amazon, people from Amazon and Nvidia and like folks from lots of big,
you know, prominent AI companies using or promoting the open source within, you know,
their prospective communities.
Well, to that end and to kind of wrap things up, if people are interested, number
one, trace loop.com.
Number two, obviously check it out on GitHub and be another contributor to help dear and the four of them
keep this thing going.
But also, are you hiring?
Are you raising?
If people are interested in what we've been talking about, what should they know, how
should they get involved?
So first, yes, we are hiring.
It's been crazy maintaining customers for Trace Loop and the open source just the four of us so far, but we're definitely hiring right now.
Also, yeah, contributions.
If people want to try out, we have a community slack.
Everything, if you could just go to the open source.
You can go to the open source even through our website.
So we have an active community slack so you can chat with me and if you have my questions.
I'm helping, you know, I'm helping folks to use the open source, even if they're not using the platform.
I'm helping people connect it to Data Dog or connecting it to other observability platforms.
And it's fun and interesting to see this adoption.
And if you, you know, if you're looking to detect hallucinations in production without using LLMs,
then I think Trace Loop can be a really good solution for you.
you and I haven't seen, I think we're really unique in that domain so far.
I haven't seen other platforms doing something similar so far.
So if that's you, if you're working on this stuff, reach out to Neer, reach out to me,
I'll put you in touch with them and the team.
Neer, thanks for coming on the show and telling us all about Trace Loop.
Thank you.
It's great that being here.
