The Data Stack Show - 217: Bridging Data Models with Business Intuition with Zenlytic’s Founders Ryan Janssen and Paul Blankley
Episode Date: November 27, 2024Highlights from this week’s conversation include:Ryan and Paul’s Background and Journey (1:05)Excitement about AI and Data Intersection (2:50)Evolution of Language Models (5:05)Current Challenges ...in Model Training (6:51)Founding Zenlytic (9:12)Integrating Vibes into Decision-Making (12:58)Precision vs. Context in Data (15:03)Understanding Multimodal Inputs (17:47)The Challenge of Watching User Behavior (19:26)Empathy in Data Analysis (21:32)AI in Analytics (23:18)The Complexity of Data Models (25:33)Self-Serve Analytics Definition (28:15)Evolution of Self-Serve Analytics (32:09)Distillation of Data for End Users (36:44)Challenges in Data Interpretation (39:22)Building a Semantic Model (44:18)Using AI for Comprehensive Analysis (46:51)Future of AI in Analytics (51:31)Naming the AI Agent (52:53)Final Thoughts and Takeaways (54:21)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Transcript
Discussion (0)
Hi, I'm Eric Dotz.
And I'm John Wessel.
Welcome to the Data Stack Show.
The Data Stack Show is a podcast where we talk about the technical, business, and human
challenges involved in data work.
Join our casual conversations with innovators and data professionals to learn about new
data technologies and how data teams are run at top companies. Welcome back to the show. We are here with Ryan Jansen
and Paul Blankley from Zenlytic. Gentlemen, welcome to the show.
Thanks for having us on. Super excited to chat today.
All right. Well, give us just a brief background. You have different backgrounds,
but they actually converged at one point. So Paul, why don't you start and then tell us where your path crossed
with Ryan's?
Yeah, so I'm some nerds nerd. I was math and CS undergrad, math and CS grad. And Ryan and I met
actually doing a technical master's degree at Harvard, studying language models. And this was
right around the year that Attention is All You Need came out and pinch warmers were like sort
of first becoming a thing. So we got to see a lot of that the really early versions back when they
were language models where they became large language models and after that we started
consulting did consulting for about a few years and then started the analytic during the pandemic
right ryan you're half the story yeah well my background is uh i i was a software engineer
at the very start of my career in my native Canada.
But then after that, I've spent coming up on 15 years now in sort of the last mile of data analytics.
And, you know, first I was as a VC, you know, slash Excel monkey.
I went to school, became a data scientist.
So I worked in data science for a bit.
And, you know, Paul and I, that's where we met.
In fact, we started data science consultancy together
and then we founded Zenlytic together
and all of those have been different parts of the same problem
which is either I'm a non-technical end user
or I'm kind of a semi-technical analyst
or I'm a very technical data scientist
all trying to sort of solve problems with data
So guys, before the show we we talked about data versus vibes.
And, you know, founders or CEOs of running companies
on sometimes a combination of both
and sometimes, you know, a little bit more slated toward vibes.
So I'm excited to dig in on that.
What are you guys excited about?
I'm excited for that one
because I think that hits on a really important point
that I'm excited to sort of expound on.
And other than that, I'm excited to dig into just, you know,
what is possible, what is not possible with language models,
how, you know, how can we kind of fit language models
in how we as humans sort of think about and operate in the world.
And talk a little bit more about how that
and how language models work actually affects what we do at Zermatt,
where we are very AI-native,
like AI-native first sort of business intelligence brand.
Awesome. What about you, Ryan?
Yeah, excited for all those.
Really excited to chat about intersection of AI and BI
or AI and data in general,
which is like, how do we get AI agents
to answer problems and data?
And it's a really hard problem, frankly,
because you've got this huge surface area
of potential data types and configurations on one side. You've got this huge surface area of questions
people want to ask on the other side. There's a little pinch point in the middle. So fascinating
field to work in. And LLMs, there's just new stuff every day. So lots of stuff to talk about there.
All right. Well, hopefully we can get to all of that. So let's dig in.
Let's do it.
We have so many questions that we want to get to,
but I'd like to start actually with a little bit of history where your paths crossed.
So it was at Harvard,
and you were studying language models
in the context of machine learning.
And Paul, in the intro, you said that you were studying language models in the context of machine learning. And Paul, in the intro,
you said that you were studying language models before the additional L got added to the acronym.
So can you just talk about what were you studying? How did you think about it? Did you perceive it as
tectonic shift? And Ryan, you were ac prior to that experience so yeah i would just love to
hear from both of you about what you were studying and what that was like and then how it informed
you know founding zanalytic yeah totally i dig into that a bit one of the one of the things to
kind of remember from that time is that it transformed in general were like this sort of
huge shift because before transformers you had those things called recurrent neural networks
which had these problems with memory and being able to generate anything you tried to solve that
with this other architecture we called lstms and all of these were sort of more complicated but
like less effective versions of transformers the innovation in transformers was just realizing that
the attention mechanism was kind of all you needed to actually do a really good job of predicting sequences.
And so BERT, which is kind of the sort of initial, the initial transformer, if you will,
there's a bunch of others in its class, but that was sort of in the initial like groundbreaking
one.
It was just dramatically better than anything else that people had seen before.
And again, this is not like it's generating, you know, speech that sounds like a
human, it's not going to pass the Turing test. But at all the things that it was evaluated on,
it was pretty dramatically better than everything else. So we definitely did not know it was going
to get here. But you can see with transformers, it was like unlocking of a new architecture.
And whenever there's a new architecture that does, you know, unreasonably well at something
compared to previous generations, you're
on this trajectory where it's going to
just get better. And you can pretty reliably
like a Moore's Law of sorts,
just kind of be like, hey, it's going to get
this much better every
year. And
that was true until Chandra BT kind of broke
that and went pretty accidental.
Yeah. Yeah. when Paul and I
actually were studying
this, I don't know if
there was the second
L, I think there were
just models of that
but I wouldn't even
call it a language
model.
We were, early, early
on in the days, there
were these tools that
you could like prove
that like, you know,
king minus man plus
woman equals queen and
these like, these very
basic tools for
performing, you know,
algebra on individual words
and it wasn't really i think you know burke was like a huge step change and then i think
watching the gpt2 gpt3 progression is when things became really apparent that there was plenty of
room at the top for these models right and everyone kind of had a fundamental understanding
these things are predicting the next token or next word there's a big question for like if you know
the early models are very word salad, right?
They almost made sense.
And then like GBT-3,
then you get a paragraph,
but there wasn't alignment across paragraphs.
And you could see it was becoming more and more coherent.
And, you know, for me,
the watershed moment was right about that,
like GBT-3 level when I was like,
okay, like these things are actually being able
to sort of demonstrate early
sort of what looks like understanding to us.
And, you know, that was kind of a turning point because that's when we could start thinking about this in terms of scaling laws. And which is really cool. We actually have
a really predictable trajectory for this stuff because we had understanding of what you could
put into it. Right. And it's like, so can we put in more compute? And it's like, yes. And you 10x
the compute and it gets, you know, 50% better or whatever. It's like, can we put in more data? And
it's like, yes, you 10x the data and it gets better.
So we not only had a good idea of what the inputs were to improve the model performance,
and I'm saying we collectively,
like we as a research community.
We also, we can even predict the trajectory from them
by seeing what would happen when you scaled up.
And the question then is like,
where does that take us today?
Which is kind of an interesting question, right?
So we have a bunch of those scaling opportunities
have been kind of tapped out, right?
So if you think about, you know, 10x-ing the data,
we can't because these LLMs are basically using all the data,
you know, like they're just turning it into internet.
Yeah, if you want to 10x the computer, 10x the cost of that,
like, you know, they're saying the next class of models
is a billion dollars to train a model,
and then you 10x that, it becomes $10 billion.
You know, there's not many 10x that, it becomes 10 billion.
There's not many 10x's left in that dimension, basically.
I think that's why some people are saying, oh, things are rounding off now,
and there's a couple things that have to happen next.
One thing that might happen is a new architecture,
these architectures that are more efficient and they can learn faster,
which is what transformers ultimately were.
The other thing that could happen, actually, the one area that we have to scale 10x still is inference time. And inference time is how long does the model run for? And now you might
have heard of like, you know, GPT-01, for instance, is a reasoning model that uses longer inference.
And instead of answering in 100 milliseconds, it takes a few seconds to think and then gets back
to you. Yep. You might have heard of Devin, the software development engineer agent. And, you know,
it's performance step change.
And they got that by letting it run for like, they could run for 24 hours at a time
on a program. So there's still plenty of room at the top in terms of inference time. And that,
I think, is why we're seeing the emergence of AI agents now. Because an agent is,
a fundamental part of agent is actually expandable inference. So I think that's the
next step in the scaling law. After that,
out of big axes that might be tied,
we might need to find some sort
of architectural change, which is a hard problem,
but that's the next big unlock, I think.
Yep.
Connect the dots
between all of the
things you just...
You're in the middle of this learning.
You guys did some consulting,
but then you founded a data company
with AI at the center of it.
I'm interested why,
when I say data company,
let's just say analytics or business intelligence company
with AI at the center.
Why did you go there when you could have gone so many different places in terms of building
a company around AI?
I think a lot of it was that Roran and I both really liked data.
It's just something that we feel comfortable with.
And when we did consulting, we got to see and experience a lot of the problems that
we solve within the day firsthand.
And I think that kind of firsthand exposure to the problem gives you insight to it in a way that if you don't have firsthand exposure to the actual problem,
it is not going to be able to empathize with the users of your product very much.
So that's something that's really important to us, is that we've got to actually be able to empathize with the users.
And that led to the biggest problem that we saw in our consulting was this last mile we would do so
much work to like set up snowflakes set a big query make sure you know everything's clean and
like relatively easy to use and then it would just you know lay untouched in tableau or rbi
yeah it was like okay well this is a problem, Ben. You know, it's like 10% of any organization
can actually use
the data that they produce.
And that's like
this massive bottleneck
on the rest of the org
that just wants to know
basic things about,
you know,
which campaign
should I be investing
more money in,
you know,
and other things like that.
So, Ryan,
did you want to
add on to that?
No, totally agree.
It's just that,
you know,
data's cool.
Well, finally,
data's cool. I think find that data's cool.
I think we all think data's cool.
But I think there's this kind of,
the feeling the data community is like,
do we add value?
Ben Stancil talks about this in a blog post
where it's like, at the accounting convention,
the accountants don't get together and say,
does accounting add value?
And I just feel as a community,
we've always kind of been a little bit anxiety
about being on the sidelines a bit.
And I think that the root of that problem is that it's hard for people to use data
and that LLMs are kind of an unlock for making it easier
to use and access data.
So those two things together might be what takes us off the sidelines
and into really well-adopted, well-used tooling.
And that's what gets me excited.
So let's dig into, John, a question that you brought up that you were
excited to talk about. And it's this concept of vibes are stronger than data. And you wanted to,
so I love that. It sounds like a t-shirt. It sounds like a meme. I'm sure it already is.
But dig into that. What did you mean by that when you asked that question or brought up that topic? Yeah. So we were talking before the show about that there are certain companies,
a lot of times very founder-led companies, that the founder somehow just gets locked in on the
product, maybe from talking to people, from vibes, and can really grow and scale companies to surprising sizes with this like vibe
gut reaction type you know abilities and i would even say some of those you know some of those
situations where you to try to move that company like no like cancel the vibes let's like just make
all the decisions on the data now you probably do
some damage at least you know especially if the company's at a certain scale and then the vibes
do run out like there are certain companies that it's vibe driven and then you hit a point and like
all right the vibes are running out for whatever reason scale or whatever and then like there's
like do we need to like kind of weigh more heavily toward data but yeah
so right
you know both Ryan and Paul
you've done some consulting
obviously work with
a lot of companies with data
what is your
what are your thoughts on that
how did Ryan model vibes
into his VC spreadsheet
yeah right
right
is there a weighted model
for vibes
yeah
that's right
you know the funny thing
is actually that is
a big part of VC
is like there's like you know totally I don't know if they actually say the vibes are through the roof they don't, the funny thing is actually that is a big part of VC is like, there's like,
you know, I don't know if they actually say the vibes are through the roof.
They don't like, I don't know if that's like explicitly modeled, but it definitely plays
a role.
I mean, it was tongue in cheek, but for sure.
Yeah.
Yeah.
Well, yeah, it's interesting.
Like the, the thing we said before the show is that data is strong, but vibes are stronger.
That's right.
You know, I think that's probably true, right?
And like the mental model that I use to think about the world is that human beings are,
you know, feeling machines that happen to think.
You know, so we feel first and think later.
And I think that everyone likes to pretend that we're all very rational and predictable
and everything.
But, you know, in reality, it's the feeling brain that's running the show.
So like, that's why vibes become the feeling brain that's running the show.
So like, that's why bot becomes important.
And that's very difficult to,
you know, that's not only,
it's not data driven,
but like, it's very difficult
to model in data to capture that.
So like, I think the right approach
needs both, really, you know?
And it's like, there's times
when you have to really be thoughtful
and like actually use data,
understand something
at a high level of precision.
And there's times when
the broad strokes are important
and it's more driven by how people think or get feeling or whatever.
And I guess the hard part is knowing which to employ when.
Well, one of the things that I would throw in
where I almost would disagree with Brian's last point
about racism a little bit,
because the thing is data strives to explain what's going on.
It's like you sell things
or you have these transactions
and you can see like specific events
that happen.
And it's precise in the sense
that you can very accurately
calculate revenue,
but it's imprecise in the sense
that you lose a lot of like
the feel of something
or the sort of surrounding context
that you just can't meaningfully
capture in data.
So, and I think a good point of this is it's like, if you, I want to Ben Sanchez's analogy,
it's like, if you run a failing bar, are you going to go and like, look at data, all your bar tabs?
Are you going to watch videos with all of the people who went to your bar and weren't happy to
hear from them? Like why you're going to get so much extra information from the actual video,
way more than you are from a transcript. I think another good example on this vibe thing is like
Dylan Fields, CEO of Figma.
Figma is a massive company.
He'll get in there and like read customer support tickets that come in
because it's like, it's just really not lossy like representation.
Yeah, the sample size isn't big, but he comes in,
he gets this like really fat hype to like,
what are the customers actually asking about?
And yeah, it's not a big sample size, but it's also not like cleaned up.
It's not like, oh, we turned with the outliers that don't ask that often.
And in that sense, it is a lot more precise because you just see everything.
You see all the dimensions, like the video people's facial expressions, and
that's where you get this gut feel.
So I think when people say like, you know know use your gut versus data your gut is kind
of the distillation of every single data point you've ever seen in your life multimodal too
yeah multimodal exactly because like data isn't just stuff that sits on a spreadsheet like data
is all forms of you know comprehension that we have we take in yeah it's i i do think there's
actually there's a certain whole class of a person who says,
oh, I don't need data to make decisions. I just trust my gut. But I, you know, you hear that a
lot. Every time I hear that, I think, well, what do you think is you're putting into the gut to
like feed that gut, you know, and like, I think that is the right chain is like, get some good
data, put it into your intuition, let your intuition figure it out, and then make a decision
and action on that. Yeah. Yeah. Yeah. I mean, I think that's why
the, I mean, this is slightly tangential, but
why you want to bring in someone at a certain stage of a company
who has a lot of experience solving similar problems
because they put a lot of data into the gut, right? And so their
intuition, they probably have really good intuition
even if it's a different context.
Data gut health.
Is that a supplement you guys
sell?
Kombucha derivative.
Hey, Coalesce 2025.
That's right.
That's the next bit.
Go to the Zenlytic booth and get your
data kombucha.
Incredibly sick
from the Zenlytic swag. I have a question for Paul. you know data kombucha um incredibly sick from this analytic swag
yeah that's great
um
and then
I have a question
I have a question for Paul
about your
loss of representation
oh this is my podcast
no I'm sorry
I'm taking over
it is your podcast
so the loss of representation
the things you mentioned
are mostly
textual right
so it's like
the thing with CEO
is the CEO is reading
these things individually
and
is it
possible that loss representation has been because we haven't been able to understand text,
but now very recently we have pretty good tools for processing and structuring raw text. And,
you know, if we increase that, the fidelity of that representation, does that mean that the CEO,
COO of Figma should actually be looking at structured data from those text representations.
Yeah, no, I think that's absolutely right.
And I think the better we get at understanding
all these sort of multimodal forms of input,
like the better, the higher fidelity actual signal
you would get from those things.
Because it's like, a good example actually
that we did as a company, right?
It's like we would watch videos of customers
use the product when we first launched. We'd see where they get stuck you know you see their mouse move
when they get a little frustrated and they're like not sure exactly like what to do next or
what to put on really high fidelity but it's something where you just have to be using you
can't watch all the sessions and you've got to then do you know event tracking you know you go
and you track events all this like how often are people logging in how often are they viewing dashboards how often are they doing this activity when are
they asking questions and you have this this representation that's a lot lost here now like
you don't see the little frustrated thing right before they click on the thing but it lets you
view things at a higher scale and i think what ryan's alluding to which i think is absolutely
right is the better and higher fidelity we can process these inputs and effectively aggregate
these inputs in a way that we weren't able to aggregate them before you'll be able to get just
much higher fidelity signal on what people are actually doing like answering the actual question
you're trying to answer right it's like if you had time by a time to watch every single video of
every single person using like our product or using something else,
you'd have a great intuition for what's going
on there, but no one has that time.
It's just no one lives without
hooks. And it's like, how do you
aggregate that? Actually, an
example of that is, y'all ever did
a thing where you search for something, you want to look for something, and the answer
is in a YouTube video,
and you're like, I don't want to watch this whole YouTube video.
That's great, but I just want a quick answer.
I find myself increasingly,
when that happens, I search in Perplexity for it.
Perplexity has indexed that entire video.
So, you know, put the same string
in Perplexity, even put the link in the video
and Perplexity will get the answer
out of the video without you having to go through
and like find it inside
45 minutes of video, which I think is cool.
So it's like compressing that
much more rich format
into the ads that you need.
Let's talk a little bit about this mental model of the map
not being the territory.
Which I think is a fascinating subject because data is a,
and Paul, you had a couple of very elegant ways
of describing this that I'm going to butcher
just like I butchered the vibe statement.
But you talked about how it's a distorted distillation.
Data is a distorted distillation of reality, right?
And we just covered a couple of these things, right?
Like, what are you losing
when you go from watching all of these things, right? Like, what are you losing when you go from watching
all of these videos of users to just looking at essentially a log of behaviors, right? Like,
you lose something there. You know, one thing that's interesting is even just the act of watching
a user and perceiving that they might be frustrated develops a certain level of empathy
that I think is almost impossible to get by looking
at a log of behaviors. But can you speak to that in terms of, as you've thought about Zenlytic,
right, you're managing the loss of that in some sense, right? You're trying to create a controlled
loss of reality, right? So like you're building a map that speaks to the territory,
but is not actually territory, right? Because, you know, that's really difficult. How do you
think about managing that process for users who are trying to, you know, trying to use data in
a useful way? Yeah, I think the map is not the territory is a great way to think about this.
Because, again, it's like data is going to give you good insight on like high level things and sort of be aggregated. But you lose a lot of this intuition of just like you see someone get frustrated, you see this, this problem. So how I think about balancing that is that at some point, you just can't watch all the videos, you can't read all the the tickets like the volume just gets too high to be
able to handle that but on the two axes you need to of course have the high level like how many
times are people logging in like that stuff is important but you also need to dive back into
the just like raw seed if you will like you know talk to the customer on a video call this is why
one of the sort of perennial like startup advice is like talk to your customers the intuition behind
that isn't just that they are going to tell you what's built because
easily they won't.
They will have great feedback.
And you'll get all these sort of nonverbal things.
And how does the product make them feel?
All these other things that are really important that you just don't get if you're looking
at laws.
So that's why that advice works too.
Because it forces you to get back into the actual reality
of how are people experiencing this
and deal with all the feedback and
do something to make that experience better.
Can AI help close that
gap, I guess, to be like to ask
a direct question about Zenlytic?
Is that part of the hypothesis where
you can actually draw some of
the territorial
characteristics out with you can actually draw some of the like territorial characteristics
out with
AI that it's
really difficult to do, let's say, with a traditional
BI tool?
I think they can do it for itself. I don't think
it fully replaces it. No matter how good it is
because remember, it's not just that you
have this data at the end.
It's part of like the training
of your, you know, think about your brain as like a neural network, right? It It's part of like the training of your, you know, think about your
brain as like a neural network, right? It's like, it's the training of your own weights on like how
you think about something. So I don't think it can ever like fully replace that. It's just, I just
don't know if it's actually fundamentally possible, but it definitely helps. And one of the big ways
in which it helps is that you're able to ask things at a higher level than you would before.
Whereas before you would have to say,
I want the number of logins weekly for this customer or whatever.
And now you can just sort of say,
hey, how is this customer doing?
Is there anything I should be concerned about?
And maybe it chooses, it being an AR agent,
it chooses logins, dashboard views, chat questions, apps.
A lot of these other sort of interaction metrics that it has, and it can give you this more holistic view
and maybe think of things that you wouldn't have thought of.
And a lot of that sort of gives me a hypothesis
and let me look at a bunch of different things
and go and look at all of them
and then kind of summarize them for me.
That gives you the ability to cover a lot more territory
a lot faster.
And that's, I think, one of the big advantages that we can actually provide as a product.
You know, Eric, one common manifestation that we see of that is when we give someone a sort of a
higher precision view of data, which is our goal, right? It's like, you know, help everyone see the
data more easily. And we often get into a situation where it works really well, but it works so well that it exposes
a lot of underlying issues in the data.
They kind of see it for the
first time, and they're like, oh, we've got all these
data quality issues.
And there's a bit of an existential crisis
around that. It's net good.
If those were floating around
for years and you didn't know it, it's like, was that
data valuable? It's great to reveal
that, but that's definitely sort of symptomatic
of like this map versus territory.
Yeah, yeah.
I'm going to take the argument against fall
where sometimes it's good
that the map is not the territory.
And, you know, so like we've been talking
a lot about models, right?
The map is a model.
We have mental models.
We have LLM models.
We have beta models.
And it's cool.
Like our world is all models.
And those models, you can
scale them up or down. There's a
famous story about a map of the UK.
The simplest model is an oval or whatever
and it gives a rough shape. And then you can make
a higher precision one that shows the shoreline
and higher precision, higher precision. And if you wanted
to make the model
completely perfect reality, you could.
But the model becomes reality.
The map becomes the same size as the territory.
Right.
So there's a trade-off between complexity and expediency here.
And sometimes it's okay to have a simple representation.
I think a lot about the concept of tolerance in data.
And when you're a mechanical engineer
and someone asks you to build an aircraft blade or whatever,
you give them a spec, but you say,
I want this to be a 16-inch blade.
But you don't just say 16-inch blade.
You say, okay, it has to be 16 inches plus or minus an eighth of an inch.
And you give an acceptable range where it would be wrong, right?
Yep.
And we don't have a version of that in data, really, right?
But it's funny because it is really important.
There's certain times if you're running a high-precision experiment
where the difference between false positive and true negative
is a tenth of a percent or something, you need very high-quality, high-precision experiment where the difference between false positive and true negative is a tenth of a percent or something,
you need very high-quality, high-precision data.
If you're running an e-commerce store
and one SKU has doubled the return than everything else,
it doesn't matter if it's 1.5x or 2.5x or 3x,
you know they're getting a lot more returns.
So investing more time and adding precision to that model,
making that map
better is not really worth it because you're already getting, you know, the information you
need to make a decision. Yeah. Yeah. It makes me think about this year, my son and his geography
class, they started out with, they call it blob mapping. Right. And so it's fascinating. He
actually can draw a pretty representative map of the entire world using circles and ovals.
And it's like, this is pretty, you know, he has like a good understanding of the layout of the
world, you know, on a rectangular map, but it is literally just, you know, ovals, which is
interesting. Also one thing, Ryan, just to empathize with you, we have an identity resolution.
It's basically an identity stitching product at Ruddersack, right? So it takes all these disparate tables
in your warehouse
and creates nodes and edges.
And it's super powerful, super useful,
but it's also a great way
to discover big problems in your data, right?
Because you just have thousands of things
collapsing on one node
and it's actually inevitable, right?
It's not, that's not a problem. You know, that's
not an identity switching problem. It's actually an underlying data problem, which is fascinating.
So like same exact, you know, same exact thing. Yeah, absolutely. Okay. I'm going to this next,
I want to talk about the Zenlytic product. We've been dancing around this. I have a very good grasp
of, I think the shared worldview and then even some of the differences,
you know, which is fun to hear from both of you. But I'm actually going to start,
I want to dig in on the topic of self-serve analytics, because this is a big Zenlytic thing.
But I'm going to start actually by asking a question to you, John. So leading data at a
company where you had all sorts of different data consumers, right?
So, you know, you ran marketing for a while, you oversaw all the data, and you had these
different stakeholders from the sales team to customer success to marketing to executives.
So what is your definition of self-serve analytics?
Man, loaded question.
Totally loaded.
So loaded. analytics man loaded question totally load so loaded like say 10 years ago like if you told me
hey uh self-serve analytics is going to be like a controversial position like i would be like no
way like everybody wants self-serve analytics yeah but it's really not like there's been a
and you i mean you guys are laughing like ryan and paul but like there's been quite a backlash against it so from my perspective i thought of it probably in two or three categories
like when we're doing like evaluation of like what tech do we want to use how do we want to
enable people like first category was like okay i'll call it like the full feature category like
if somebody has something we can guaranteed build it because we have
every single pie chart,
gauge, big number,
you name it.
We can build it.
Option value in terms of the
say products or
interface that you can deliver to
your customers in turn.
Yeah, because if you've been
an analyst for any length of time,
you will have that one customer
like, let's just
pick on sales. I always pick on marketing.
A sales executive that's like, I gotta
have a dashboard and I need gauges
and the gauges need to look like this and I need
these colors here.
You can get people that are very precise about what they want
down to the color and the type.
So there's a whole class of tools that are built to where you can like fine craft like very detailed things like that
yeah and then there's another class of tools that are more i would say more built toward
optimizing analyst workflow yep which i guess spoiler alert that's the route we went was like okay like maybe it's
a little counterintuitive but our we believe that our best way to to do self-serve was actually to
empower a couple of analysts to be able to move really quickly in the tool produce things that
were useful for people and then the funny part about that is although we intentionally went with a tool that
was selling to and enabling analysts, we ended up with a lot of citizen analysts, a sales manager
is like, I want to learn this tool. The analyst part, actually writing a little bit of SQL and
tweaking things, they did it. And we had a customer service leader do the same thing.
And one or two other people that were a manager or leader in a department,
they got very, I mean, very light SQL, but nothing crazy typically.
One or two of them went pretty deep, but like very light,
like technical things and using the analyst workflow.
It was one of the most counterintuitive things where you would think that if
you gave somebody like the most like kind of call,
and this was before like AI was an option, but you would think if you give somebody the most kind of... And this was before AI was an option.
But you would think if you give somebody the most polished,
like, hey, look, you just have to click and drag.
That's what they would want.
But it ended up being more strong to basically have a few...
Basically one, maybe two key people in a certain area
to really enable them to be fast and awesome.
They're the go-to for all of sales, all of marketing,
all of customer service.
And keep that analyst-centered workflow.
That's best for us.
Super interesting.
Okay, Paul and Ryan, now the question is on to you.
It's a great question.
And first of all, I would say self-service in analytics
is like a spectrum.
It's something where the goalposts
had constantly been moving for it
as products have evolved over the years.
So if you go way back,
people would look at business objects
and say like,
that was self-service.
Like you just made the cubes
and everyone made the cubes
and then you could mess around
on the cube until you hit
the limits of the cube.
And that was pretty self-service.
Yeah.
And it's like,
then people were like,
oh, well actually, that's not really self-service. Yeah, and it's like, then people were like, oh, well, actually, that's not really
self-service. Tableau is way more
interactive dashboards, and you can
just upload your own CSV, and you can
really make the visual whatever you
want. You want to gauge, and you want to make it blue?
Go for it. Tableau is
really powerful on the visualization side.
And then people were like, wait a minute, Tableau is still too
hard for most people to use. It's just something
the analysts are using to make the perfect dashboardsboards for the exact and then it's like okay
well looker is a lot easier you don't have to figure out the visualization thing you just click
on the data you want to see it'll sell you the data they might even sell you visualization
and it's like there you go it's like a lot easier now but i think where we come in and kind of
following that is saying actually look it's still too hard like just look at how many people are actually using it and the reason for that is that you've got to find the
right explorer you've got to know which data to use in the explorer you've got to remember do we
trend revenue by like the process state the created date the ship state like i don't know i
don't remember yeah actually i'm just gonna ask john and that's how that process goes right yeah
and it's like i think sort of the thesis
of Zenlytic is that actually the best interface for data is talking to the analyst. Like you ask
for what you want and the analyst says like, Hey, yeah, I've asked you that question a bunch. It
looks like this typically, or yeah, we don't actually track that. Yeah. That's not in the
warehouse. We gotta, you know, we gotta start tracking that. And it's like, that is sort of
the feature that we're building towards because we're really building a co-worker we're not trying to
make someone faster using a bi product we're trying to make something where the analysts can
basically give the system the right context so it can actually just go and do that same work that
you were describing john and everyone can have an analyst panel with them because the problem with
having the analysts do the work work is you have a finite amount
of them, even if they're sort of citizen analysts
who are embedded in the teams.
And the amount of questions that people have is
really gated by the number of people that can answer them.
So what we're trying to do is build a system
where the data people can come in and say,
hey, this is how we do things. This is what you need about
our environment. This is why these
things are calculated in this weird way, and this is what you need
to know about us. And then Zoe's able
to go and actually answer
those things. Ryan?
Yeah, well, I agree with Paul. As a
co-founder of Pulse, I agree with Paul.
Well, you've disagreed a couple times.
That's why I wanted to throw it over to you.
No, that's not what we're doing
at all. That's not what we're building.
That's not what we're building.
Group therapy for founders right here. That's what we're doing at all. That's not what we're building. That's not what we're building. Oh, yeah. Group therapy for founders
right here.
That's what we're doing.
That's right.
That's what we do.
No, a couple things to add.
I mean, first thing,
you know, Paul's spot on
about like the goalposts
have moved.
I'd say it might even be
more sinister than that
and that it's, you know,
who benefits.
And I think that a lot of people
have benefited from
keeping the definition
of self-serve
to be as murky as possible.
You know, and it's like, because that's something that sells. It's also something that's very hard
to do in a product perspective. And, you know, if you look at a platform 20 years ago, they had no
hope of actually delivering a self-serve experience, but they still wanted to, you know,
call themselves self-serve. And that's been the case ever since, right? Like people always forget
to play fast and loose with that definition because it benefits people making the definitions.
I think one thing that makes it really crisp for me is, and I'll borrow Eric's PM ad for a second
here, I think about the personas using the last mile of analytics. And I think one of the big
misconceptions is that there's two of them. People always talk about sort of technical,
non-technical. I think there's actually sort of three big buckets. You know, I call them the 1%, the 10%, the 89%.
The 1% are the SQL monkeys, right?
Those are the people that are, you know, really technical, the people who are building your
semantic layers and who are administering your data warehouses and writing your DBT
transformations.
That's the sort of one.
The 10% are the analysts.
And this includes the sort of citizen analysts that you're talking about, John.
Like it's quite often they're Excel powerhouses sort of stretching the sort of citizen analysts that you were talking about, John. It's quite often their
Excel powerhouses sort of stretching
into sort of like enthusiasts, and we'll
sort of dabble in the BI tool a little bit, but
they don't spend a lot of time
writing SQL or Python or some of
the really more flexible
scripting languages, basically. And then the
rest is the 89%. And that's the group that's
the end users. That doesn't mean that
they're not data-driven. It means that they're busy focusing on the vibes, you know,
like buys are a big part of their jobs too. And like, so it's like, you know, when you're a,
when you're a marketing manager, it's, it means that you're, you're too busy being a marketing
manager to have time to do analytics. And it's funny, actually, I was just talking on LinkedIn,
like I'm the 89%. Like, even though I'm very good at Python, I'm very good at SQL, I'm a huge data nerd,
but I'm too busy doing, you know, CEO stuff
to go and, you know, write a bunch of queries
against their own data warehouse, for instance.
So it's like, it's more a question of time
and what you can focus on.
But I think, you know, historically,
when I think about what sort of, you know, BI has done,
it's been the 1% making dashboards for the 10%,
and then the 89% are left out in
the cold.
And I think that's what they call self-serve, really.
The 10%, yeah, they dabble a bit in exploration stuff, but not very much.
They'll flirt with it a bit, but they don't really get into full-on deep, they're not
writing notebooks to do analysis or anything like that.
And then the 89% are usually missing out on the the data and it's all vibes. It's all vibes, you know, at the top.
So I think that,
I think the opportunity here
is that that can shift, you know?
And I think that if,
I think that if we can multiply,
you know, the more technical folks
so they can be available
and they can multiply themselves,
they can do what I like to call
analytics at scale, right?
And like move from sort of that like point to point defense
where it's like, you know, one question, one answer
through to being able to build tools
that the entire team can use and answer those questions.
And then the analyst job shifts over to
analyzing the data from the team.
It's understanding the sort of questions the team is asking
and like how they're using the data
and what they need to have.
And then you add that in a scalable way
so that, you know, not just the person asking that question can receive, you data and what they need to have. And then you add that in a scalable way so that not just the person asking that question
can receive what they need,
but the entire company will get those metrics added
or will get whatever they need.
So I think that's what we're going to see happening
over the next few years.
It's interesting to apply the mental model
of the map is not the territory
and the distortion that happens because of the distillation
across the spectrum that you just talked about. And so let's go back to like watching videos of
users. Okay. So you're watching videos of users that is, you know, sort of actually in itself
is a distillation of reality, right? Because you're interpreting certain things, right?
But let's just call that sort of raw data
at least as far as we can consume it.
Then you go to event logs, right?
Then those event logs need to be summarized in some way, right?
And so you can call that a semantic layer,
you can call that a model or whatever you can call that, you know,
a model or whatever. So there's distillation happening there. That's performed by the 1%. They're delivering some asset to the 10%. So there's distillation there. And then of course,
that sort of filters out to the 89%. And so when you add on, you know, the you go from the like
raw data to the logs, you know, there distillation, logs to the 1% building
an asset, 1% to 10%, 10%
to 89%.
It's an insane
amount of...
Why is it hard to be...
To use data at a company and it's like,
well, I mean,
the distance,
it's a distance problem, right?
I think you just described the vibes
because basically it
could start out as data in a log somewhere and by the time it gets out to the 89 it is just a vibe
it's like an echo it's like an echo of like whatever the reality the device processing machine
yeah i think it's true and actually interestingly enough i think that probably the weakest part in
that whole chain right now so i think that chain is hard but achievable for starters.
And part of that is actually just the right systems that allow drillable data.
Paul was talking about setting up cubes and stuff like that.
That's a hard block in that entire chain, right?
Because you can't have a higher resolution than that cube.
Part of that is good lineage.
It's like, hey, where did this video come from?
Or where did this data point come from?
I think on a human basis, I just think actually the hardest people to put in that chain are
probably the 10%. And these are finding really great folks who can translate the technical stuff
into the vibes business outcome stuff. And being able to be a translator for that is actually a really hard job.
It's a bit like
a unicorn job
you have to do
and that's why
finding folks like that
is actually probably
the hardest part.
It's like they're
actually pretty rare
and that's also
one of the reasons
why, you know,
we're always lamenting
like are we adding
value bases
because we don't
have enough people
like that.
Yeah.
Yeah, interesting.
Can I ask about,
okay, so I want to talk
more about the product experience and I'm going to
just frame a question. I'm going to frame like a, I guess an analytics type question, right?
And I have a hypothesis on why AI could be really helpful here, but yeah, I'll just frame this
question. Then I'd love for you to explain, okay, how would I use Zenlytic or how, you know, what
would the product experience of Zenlytic be like? So I want to go back to something that you mentioned, Paul, that's related to event
logging. So, and you mentioned a particular user behavior, like a login event. And so traditionally
login events, you know, you would associate that with maybe that's part of your definition for an
active user or, you know, there are semantics there, right? It could be an active user. It could
actually contribute to, you know, a churn score. I mean, there are a number of things there, right?
But one thing that's really tricky about logins is it varies so much by product. And so I'll give
you a specific example from Ruddersack. It's really not a great
indicator of, you know, whether or not things are going well, because a lot of times if the data is
flowing, you don't need to log into the product, right? And so you can have this really crazy
inverse relationship where, you know, like that could be a sign that everyone's super happy,
right? I mean, that makes our job harder in terms of understanding
the user and maybe
there are other indicators.
But that's a tricky problem.
If you have a product where event logs are actually
less straightforward maybe than say a consumer product
where there's a daily login event that's an indicator
of some sort of outcome or stickiness or loyalty.
In the absence of that, how do you...
And the reason I bring that example up is that's highly contextual, highly contextual,
right?
Like, it's the nature of the product.
It's the problem that the product solves for the user.
There are different personas.
And so even that metric could and probably is very
different depending on the type of user and the platform, which means that the semantic definition
is different for different users. But context is something that AI is awesome at, right?
So there's my really rough sort of problem of trying to understand maybe the health of an account where my event logs and the semantics related to them
are actually pretty complicated and highly contextual.
So there we go.
I think it might be fun to frame this
and even more specifically,
because Zoe's the Zen Linux agent.
I was like, Eric just asked the question of
hey, how are my accounts doing?
Or how is my account specifically
this one doing? It would be really interesting
to learn from you guys what types
of things can the AI say from that?
And then what kind of context would it need
to do a good job?
And you can go as technical as you want.
And for the sake of argument to set
the table, let's just say we have really good event logging,
which we do from Red or SAC.
So we have like a lot of event data.
And then we also have sort of,
let's say your traditional like sort of,
you know, we're ETLing in Salesforce
and, you know, customer success,
and all of that, right?
And so we have all those tables in the warehouse.
Yeah, no, it's perfect.
I think it's helpful to start with a little bit
about kind of like how then let it works and how we sort of think about the world. Yeah, the way we think about it is that the data tool should be trying to build these building blocks that can be used to answer a ton of different questions. And they should try to add as much context as possible to those building blocks. So that looks like is it's like, hey, this is the law. This is how we calculate logins. This is how we calculate active users.
Always very complicated to actually do,
but it's like, that's why the data team needs to define it.
We don't think you should be putting that definition off to the business people
because you're going to get a ton of different definitions.
Nothing's going to create it.
It's going to be a disaster.
So that's why, philosophically,
we're like, data team should be defining
what does it mean to be an active user?
How do we calculate gross margin?
Like all of these kinds of-
Or business definitions, yeah.
Yeah.
And part of that is not just defining like the SQL
of like how do I aggregate up something into active users?
It's also like, what does this mean?
Like how is it calculated?
Like why would it be used in a certain way, you know?
So in addition to, let's say we've got our logins metric, like, you know, how often are people logging in?
We've also got like product usage.
We've got, you know, some meta level context on like, what is, if we're going with Redistack as an example, what is Redistack?
What do they do?
Like, what is the company?
And you've got these contexts at these different layers.
The most important one being like, okay, this is what product usage looks like.
Like this is the amount of like gigabytes of events that have been logged by whatever
customer we're talking about here.
So when you go in and you ask Zoe, hey, like, you know, can you tell me about my customer
health for like XYZ customer?
She's going to go in and she's going to search in this magic model for like XYZ customer
and then any other terms that
she thinks could be relevant. So she looked for like usage, health in case we have a health score,
logins, activity. She just searched for a bunch of these different terms. And then we probably
have a ton of stuff come back. So it'd be like, okay, I see like, you know, gigabytes used,
I see logins, I see, you know, number of events streamed. I see, you know, like session duration on the site.
I see like whatever other stuff we're tracking there.
And then since I was able to like run more than one query too,
she could say, hey, let's look at like logins and session duration.
Those are over here.
Let's look at like quantity of usage,
like events and like gigabytes and everything that's streamed over here.
And then she might be able to say, okay, well,
it looks like this customer, you know, has kind of not that many logins.
Like they got like two logins in the last week,
but they also had like 80 gigabytes,
you know, of, you know,
actual information transferred.
So they're, you know,
pretty heavy users of the product,
regardless of them logging in.
And she's going to be able to actually go
and reason it out that and say,
okay, let's look at this more holistic picture because I can do more than one thing like it's not helping you run one query it's able to
actually go pull a few different things and then you know whether the summary and it's saying like
there's a lot of logins but there's a lot of log usage that's going to kind of be a summary and
you're going to be able to say okay well are they healthy or not? What else do I need to know if they're healthy or not?
Yeah, it was either...
This is a perfect segue.
Ryan, it was either you or Paul
that posted, I think it was a week or two ago,
about one of the use cases, one of your customers.
It was this unlock
for them of like, oh,
I can run 10 scenarios at once.
What would take me...
I got to do it one at a time as a human.
I can say, hey, customer health.
What's customer health?
And keep it really broad.
See 10 or 12 different things.
Be like, no, yes, yes, yes.
And then continue to drill in.
Whereas as a human, you're just going to like,
customer health, whatever comes to mind.
Oh, I need to look at logins.
And you go down that road.
And then you're like, oh, well, logins isn't good and you go the next thing
so i i feel like that was a cool thought even for me because as an analyst of course i would
treat it that same way as like well which one do i think is best i'll look at that first i'll go
the next one but you're not limited that way yeah it's especially time wise i think it matters
because if someone asks you this fairly broad question, I always get this, like, scene-revealing in my stomach
because I'm like,
oh, where do I even start?
Yeah, yeah.
There's so many things
I could look at.
Like, do I look at all of them
and do I look at some of them
and then I'll look at some of them,
which ones?
But then, like,
you ask a system like Joey
and it's like,
I could look at all of these things
and they're like,
yeah, you go and do that.
Like, how can I go get a cup of coffee
or something while you train?
Yeah, yeah.
Because I think that's where
the context,
that's a much more articulate way to explain
what I meant by context because even if we think about
something like product health, it varies significantly
on how you slice the business.
We have a free account that's trying the product
versus a large enterprise
that's paying us a lot of money.
Product health is really different.
Adoption happens at different rates.
And this is, I think, where Map is Not a Territory
causes huge problems.
We have a product health score.
And it's like, great.
It actually is a bunch of different product health scores
because you can't distill all users or customers
down into a single composite.
Like it all comes back to map design.
So in that case, that's the equivalent of the map
showing the UK 400 miles north
of where it really is. And if you go to sell the UK,
you're going to miss it basically because the map is not right.
So it's like, the question becomes
what are the right primitives?
What are the right Mad Libs that you can give?
Whether it's an AI analyst or human analyst,
when you set those primitives, data teams have a tremendous amount of power
to shape how an organization thinks.
And it's like, if you start putting the wrong metrics in there
that don't let you account for that context,
then an AI analyst will probably use it,
people will probably use them incorrectly too.
But if you set those properly,
I guess in that case,
our goal is to bubble up
all the most relevant information.
There's still always going to be a synthesis step
at the top for a human.
It's like, yeah, Zoe can summarize things
and talk a little bit about it,
but we fully expect that the human's
going to review all the data
and make a decision based on that.
And our objective is really to make sure that they have a really fat pipe
to that data they need to make a decision.
A couple of specific questions, and John, you may have a couple too,
because I know we're close on time.
But in terms of the product experience, can I bring my own primitives?
So let's say I have, as an analyst, I've generated,
I have some sort of definition of active users,
you know, that's represented as a model or a table or whatever. How does that work? Because
there's obviously a semantic layer here. Does Zenlytic provide that? Can I plug my own pieces
into that? Just from a product experience at a company, let's just use RutterSec, right?
We have models that are running, we have reports or whatever. So if I'm onboarding into Zenlytic, what do I need to bring? What do I need to develop?
How does the semantic layer work? Yeah, great question. Our interface is always
table in the SQL warehouse. It's like, as long as you can define some SQL to aggregate up active
users on some table in your warehouse, then that's all you need to kind of take down.
We basically sit on top of those tables. And the kind of things that we expect you to find on top
of those are like any additional sort of English contracts you need and sort of the aggregation,
like the measure, like how do you calculate gross margin, how do you calculate active users.
With those building blocks, then we can just kind of wheel and deal and pair those around
however she needs to answer the kind of incoming questions.
Love it. I have one more question before we we end but did you have any other questions oh i gotta give you the last word yeah i appreciate that i've used the product
so i i i guess my actually one question this is kind of future looking when do you think
ai agents in general you can i think this is a general question, will be better
at knowing what they don't know
and be able to better integrate into
project management
and things like that? Because I think that
to me would be a really interesting
component to this.
Yeah,
definitely. I think part of that
is there's two components.
One is the underlying models
as they get smarter will be like less falsely confident the other one is it's kind of like
the kind of fine-tuning you do on them does actually shape this kind of behavior like this
is the right kind of behavior to shape with fine-tuning whereas like some behavior you want
it to be just sort of like how you tell it to behave if it's like in line with how it's been trained so far. But
there's other things where you want it to
not be confident.
You know, you want it to have a little more
granularity there. I would say if you want a really
concrete answer, I think we're
not going to get all the way over there, but we're going to see a step
change at LLM's being able to understand
what they don't know when the
general release of reasoning models comes out,
which no one knows
for sure. OpenAI is the furthest ahead with this. No one knows for sure, but the rumors are that
will be this year. Wow. Love it. All right. One last question for you, Paul, which is maybe the
hardest question. And that is, how did you come up with the name Zoe for your AI? I mean, I feel like that's the hardest thing
for any AI company is to name their agent, right?
And then defend the name against the other agent.
And then defend the name against the other agent.
So I think we've got a good,
I think we've got a good case here, actually.
So again, like I studied, you know,
Burks, Elmo, Big Bird,
like all the initial transformer models
were actually named after Sesame Street characters,
believe it or not.
I did not make that connection.
Wow.
So Zoe is the only Z-named Sesame Street character.
And then Liddick, obviously.
You know, we wanted something that's sort of, like,
close to us enumeration-wise.
So Zoe was, like, the sort of obvious choice for us
because we wanted to sort of pay homage
to the original, like, Transformer models
and be sort of consistent with the Z branding with selenitic wow zoe was always
the obvious choice wow that is awesome i did not put that together and i did not yeah and you have
like you got the z right so it's yours yeah exactly we got we got the z which is not always
a good thing sometimes you want to be at the top of the list, not the bottom of the list. Yeah, that's true.
That's true.
Awesome.
Well, Paul and Ryan, thank you so much for joining us.
I learned so much.
And yeah, it was fun talking about mental models and everything.
And we'll check out the product.
It sounds awesome.
Loved it.
Thank you guys so much for having us.
It's absolutely a blast.
The Data Stack Show is brought to you by Rudderstack,
the warehouse-native customer data platform.
Rudderstack is purpose-built to help data teams
turn customer data into competitive advantage.
Learn more at ruddersack.com.