The Data Stack Show - 217: Bridging Data Models with Business Intuition with Zenlytic’s Founders Ryan Janssen and Paul Blankley

Episode Date: November 27, 2024

Highlights from this week’s conversation include:Ryan and Paul’s Background and Journey (1:05)Excitement about AI and Data Intersection (2:50)Evolution of Language Models (5:05)Current Challenges ...in Model Training (6:51)Founding Zenlytic (9:12)Integrating Vibes into Decision-Making (12:58)Precision vs. Context in Data (15:03)Understanding Multimodal Inputs (17:47)The Challenge of Watching User Behavior (19:26)Empathy in Data Analysis (21:32)AI in Analytics (23:18)The Complexity of Data Models (25:33)Self-Serve Analytics Definition (28:15)Evolution of Self-Serve Analytics (32:09)Distillation of Data for End Users (36:44)Challenges in Data Interpretation (39:22)Building a Semantic Model (44:18)Using AI for Comprehensive Analysis (46:51)Future of AI in Analytics (51:31)Naming the AI Agent (52:53)Final Thoughts and Takeaways (54:21)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcript
Discussion (0)
Starting point is 00:00:00 Hi, I'm Eric Dotz. And I'm John Wessel. Welcome to the Data Stack Show. The Data Stack Show is a podcast where we talk about the technical, business, and human challenges involved in data work. Join our casual conversations with innovators and data professionals to learn about new data technologies and how data teams are run at top companies. Welcome back to the show. We are here with Ryan Jansen and Paul Blankley from Zenlytic. Gentlemen, welcome to the show.
Starting point is 00:00:36 Thanks for having us on. Super excited to chat today. All right. Well, give us just a brief background. You have different backgrounds, but they actually converged at one point. So Paul, why don't you start and then tell us where your path crossed with Ryan's? Yeah, so I'm some nerds nerd. I was math and CS undergrad, math and CS grad. And Ryan and I met actually doing a technical master's degree at Harvard, studying language models. And this was right around the year that Attention is All You Need came out and pinch warmers were like sort of first becoming a thing. So we got to see a lot of that the really early versions back when they
Starting point is 00:01:09 were language models where they became large language models and after that we started consulting did consulting for about a few years and then started the analytic during the pandemic right ryan you're half the story yeah well my background is uh i i was a software engineer at the very start of my career in my native Canada. But then after that, I've spent coming up on 15 years now in sort of the last mile of data analytics. And, you know, first I was as a VC, you know, slash Excel monkey. I went to school, became a data scientist. So I worked in data science for a bit.
Starting point is 00:01:42 And, you know, Paul and I, that's where we met. In fact, we started data science consultancy together and then we founded Zenlytic together and all of those have been different parts of the same problem which is either I'm a non-technical end user or I'm kind of a semi-technical analyst or I'm a very technical data scientist all trying to sort of solve problems with data
Starting point is 00:02:01 So guys, before the show we we talked about data versus vibes. And, you know, founders or CEOs of running companies on sometimes a combination of both and sometimes, you know, a little bit more slated toward vibes. So I'm excited to dig in on that. What are you guys excited about? I'm excited for that one because I think that hits on a really important point
Starting point is 00:02:21 that I'm excited to sort of expound on. And other than that, I'm excited to dig into just, you know, what is possible, what is not possible with language models, how, you know, how can we kind of fit language models in how we as humans sort of think about and operate in the world. And talk a little bit more about how that and how language models work actually affects what we do at Zermatt, where we are very AI-native,
Starting point is 00:02:45 like AI-native first sort of business intelligence brand. Awesome. What about you, Ryan? Yeah, excited for all those. Really excited to chat about intersection of AI and BI or AI and data in general, which is like, how do we get AI agents to answer problems and data? And it's a really hard problem, frankly,
Starting point is 00:03:02 because you've got this huge surface area of potential data types and configurations on one side. You've got this huge surface area of questions people want to ask on the other side. There's a little pinch point in the middle. So fascinating field to work in. And LLMs, there's just new stuff every day. So lots of stuff to talk about there. All right. Well, hopefully we can get to all of that. So let's dig in. Let's do it. We have so many questions that we want to get to, but I'd like to start actually with a little bit of history where your paths crossed.
Starting point is 00:03:37 So it was at Harvard, and you were studying language models in the context of machine learning. And Paul, in the intro, you said that you were studying language models in the context of machine learning. And Paul, in the intro, you said that you were studying language models before the additional L got added to the acronym. So can you just talk about what were you studying? How did you think about it? Did you perceive it as tectonic shift? And Ryan, you were ac prior to that experience so yeah i would just love to hear from both of you about what you were studying and what that was like and then how it informed
Starting point is 00:04:10 you know founding zanalytic yeah totally i dig into that a bit one of the one of the things to kind of remember from that time is that it transformed in general were like this sort of huge shift because before transformers you had those things called recurrent neural networks which had these problems with memory and being able to generate anything you tried to solve that with this other architecture we called lstms and all of these were sort of more complicated but like less effective versions of transformers the innovation in transformers was just realizing that the attention mechanism was kind of all you needed to actually do a really good job of predicting sequences. And so BERT, which is kind of the sort of initial, the initial transformer, if you will,
Starting point is 00:04:51 there's a bunch of others in its class, but that was sort of in the initial like groundbreaking one. It was just dramatically better than anything else that people had seen before. And again, this is not like it's generating, you know, speech that sounds like a human, it's not going to pass the Turing test. But at all the things that it was evaluated on, it was pretty dramatically better than everything else. So we definitely did not know it was going to get here. But you can see with transformers, it was like unlocking of a new architecture. And whenever there's a new architecture that does, you know, unreasonably well at something
Starting point is 00:05:23 compared to previous generations, you're on this trajectory where it's going to just get better. And you can pretty reliably like a Moore's Law of sorts, just kind of be like, hey, it's going to get this much better every year. And that was true until Chandra BT kind of broke
Starting point is 00:05:40 that and went pretty accidental. Yeah. Yeah. when Paul and I actually were studying this, I don't know if there was the second L, I think there were just models of that but I wouldn't even
Starting point is 00:05:50 call it a language model. We were, early, early on in the days, there were these tools that you could like prove that like, you know, king minus man plus
Starting point is 00:05:59 woman equals queen and these like, these very basic tools for performing, you know, algebra on individual words and it wasn't really i think you know burke was like a huge step change and then i think watching the gpt2 gpt3 progression is when things became really apparent that there was plenty of room at the top for these models right and everyone kind of had a fundamental understanding
Starting point is 00:06:19 these things are predicting the next token or next word there's a big question for like if you know the early models are very word salad, right? They almost made sense. And then like GBT-3, then you get a paragraph, but there wasn't alignment across paragraphs. And you could see it was becoming more and more coherent. And, you know, for me,
Starting point is 00:06:35 the watershed moment was right about that, like GBT-3 level when I was like, okay, like these things are actually being able to sort of demonstrate early sort of what looks like understanding to us. And, you know, that was kind of a turning point because that's when we could start thinking about this in terms of scaling laws. And which is really cool. We actually have a really predictable trajectory for this stuff because we had understanding of what you could put into it. Right. And it's like, so can we put in more compute? And it's like, yes. And you 10x
Starting point is 00:06:59 the compute and it gets, you know, 50% better or whatever. It's like, can we put in more data? And it's like, yes, you 10x the data and it gets better. So we not only had a good idea of what the inputs were to improve the model performance, and I'm saying we collectively, like we as a research community. We also, we can even predict the trajectory from them by seeing what would happen when you scaled up. And the question then is like,
Starting point is 00:07:20 where does that take us today? Which is kind of an interesting question, right? So we have a bunch of those scaling opportunities have been kind of tapped out, right? So if you think about, you know, 10x-ing the data, we can't because these LLMs are basically using all the data, you know, like they're just turning it into internet. Yeah, if you want to 10x the computer, 10x the cost of that,
Starting point is 00:07:39 like, you know, they're saying the next class of models is a billion dollars to train a model, and then you 10x that, it becomes $10 billion. You know, there's not many 10x that, it becomes 10 billion. There's not many 10x's left in that dimension, basically. I think that's why some people are saying, oh, things are rounding off now, and there's a couple things that have to happen next. One thing that might happen is a new architecture,
Starting point is 00:07:58 these architectures that are more efficient and they can learn faster, which is what transformers ultimately were. The other thing that could happen, actually, the one area that we have to scale 10x still is inference time. And inference time is how long does the model run for? And now you might have heard of like, you know, GPT-01, for instance, is a reasoning model that uses longer inference. And instead of answering in 100 milliseconds, it takes a few seconds to think and then gets back to you. Yep. You might have heard of Devin, the software development engineer agent. And, you know, it's performance step change. And they got that by letting it run for like, they could run for 24 hours at a time
Starting point is 00:08:30 on a program. So there's still plenty of room at the top in terms of inference time. And that, I think, is why we're seeing the emergence of AI agents now. Because an agent is, a fundamental part of agent is actually expandable inference. So I think that's the next step in the scaling law. After that, out of big axes that might be tied, we might need to find some sort of architectural change, which is a hard problem, but that's the next big unlock, I think.
Starting point is 00:08:53 Yep. Connect the dots between all of the things you just... You're in the middle of this learning. You guys did some consulting, but then you founded a data company with AI at the center of it.
Starting point is 00:09:13 I'm interested why, when I say data company, let's just say analytics or business intelligence company with AI at the center. Why did you go there when you could have gone so many different places in terms of building a company around AI? I think a lot of it was that Roran and I both really liked data. It's just something that we feel comfortable with.
Starting point is 00:09:40 And when we did consulting, we got to see and experience a lot of the problems that we solve within the day firsthand. And I think that kind of firsthand exposure to the problem gives you insight to it in a way that if you don't have firsthand exposure to the actual problem, it is not going to be able to empathize with the users of your product very much. So that's something that's really important to us, is that we've got to actually be able to empathize with the users. And that led to the biggest problem that we saw in our consulting was this last mile we would do so much work to like set up snowflakes set a big query make sure you know everything's clean and like relatively easy to use and then it would just you know lay untouched in tableau or rbi
Starting point is 00:10:20 yeah it was like okay well this is a problem, Ben. You know, it's like 10% of any organization can actually use the data that they produce. And that's like this massive bottleneck on the rest of the org that just wants to know basic things about,
Starting point is 00:10:33 you know, which campaign should I be investing more money in, you know, and other things like that. So, Ryan, did you want to
Starting point is 00:10:41 add on to that? No, totally agree. It's just that, you know, data's cool. Well, finally, data's cool. I think find that data's cool. I think we all think data's cool.
Starting point is 00:10:46 But I think there's this kind of, the feeling the data community is like, do we add value? Ben Stancil talks about this in a blog post where it's like, at the accounting convention, the accountants don't get together and say, does accounting add value? And I just feel as a community,
Starting point is 00:11:01 we've always kind of been a little bit anxiety about being on the sidelines a bit. And I think that the root of that problem is that it's hard for people to use data and that LLMs are kind of an unlock for making it easier to use and access data. So those two things together might be what takes us off the sidelines and into really well-adopted, well-used tooling. And that's what gets me excited.
Starting point is 00:11:21 So let's dig into, John, a question that you brought up that you were excited to talk about. And it's this concept of vibes are stronger than data. And you wanted to, so I love that. It sounds like a t-shirt. It sounds like a meme. I'm sure it already is. But dig into that. What did you mean by that when you asked that question or brought up that topic? Yeah. So we were talking before the show about that there are certain companies, a lot of times very founder-led companies, that the founder somehow just gets locked in on the product, maybe from talking to people, from vibes, and can really grow and scale companies to surprising sizes with this like vibe gut reaction type you know abilities and i would even say some of those you know some of those situations where you to try to move that company like no like cancel the vibes let's like just make
Starting point is 00:12:22 all the decisions on the data now you probably do some damage at least you know especially if the company's at a certain scale and then the vibes do run out like there are certain companies that it's vibe driven and then you hit a point and like all right the vibes are running out for whatever reason scale or whatever and then like there's like do we need to like kind of weigh more heavily toward data but yeah so right you know both Ryan and Paul you've done some consulting
Starting point is 00:12:48 obviously work with a lot of companies with data what is your what are your thoughts on that how did Ryan model vibes into his VC spreadsheet yeah right right
Starting point is 00:12:57 is there a weighted model for vibes yeah that's right you know the funny thing is actually that is a big part of VC is like there's like you know totally I don't know if they actually say the vibes are through the roof they don't, the funny thing is actually that is a big part of VC is like, there's like,
Starting point is 00:13:05 you know, I don't know if they actually say the vibes are through the roof. They don't like, I don't know if that's like explicitly modeled, but it definitely plays a role. I mean, it was tongue in cheek, but for sure. Yeah. Yeah. Well, yeah, it's interesting. Like the, the thing we said before the show is that data is strong, but vibes are stronger.
Starting point is 00:13:22 That's right. You know, I think that's probably true, right? And like the mental model that I use to think about the world is that human beings are, you know, feeling machines that happen to think. You know, so we feel first and think later. And I think that everyone likes to pretend that we're all very rational and predictable and everything. But, you know, in reality, it's the feeling brain that's running the show.
Starting point is 00:13:44 So like, that's why vibes become the feeling brain that's running the show. So like, that's why bot becomes important. And that's very difficult to, you know, that's not only, it's not data driven, but like, it's very difficult to model in data to capture that. So like, I think the right approach
Starting point is 00:13:55 needs both, really, you know? And it's like, there's times when you have to really be thoughtful and like actually use data, understand something at a high level of precision. And there's times when the broad strokes are important
Starting point is 00:14:05 and it's more driven by how people think or get feeling or whatever. And I guess the hard part is knowing which to employ when. Well, one of the things that I would throw in where I almost would disagree with Brian's last point about racism a little bit, because the thing is data strives to explain what's going on. It's like you sell things or you have these transactions
Starting point is 00:14:27 and you can see like specific events that happen. And it's precise in the sense that you can very accurately calculate revenue, but it's imprecise in the sense that you lose a lot of like the feel of something
Starting point is 00:14:38 or the sort of surrounding context that you just can't meaningfully capture in data. So, and I think a good point of this is it's like, if you, I want to Ben Sanchez's analogy, it's like, if you run a failing bar, are you going to go and like, look at data, all your bar tabs? Are you going to watch videos with all of the people who went to your bar and weren't happy to hear from them? Like why you're going to get so much extra information from the actual video, way more than you are from a transcript. I think another good example on this vibe thing is like
Starting point is 00:15:06 Dylan Fields, CEO of Figma. Figma is a massive company. He'll get in there and like read customer support tickets that come in because it's like, it's just really not lossy like representation. Yeah, the sample size isn't big, but he comes in, he gets this like really fat hype to like, what are the customers actually asking about? And yeah, it's not a big sample size, but it's also not like cleaned up.
Starting point is 00:15:27 It's not like, oh, we turned with the outliers that don't ask that often. And in that sense, it is a lot more precise because you just see everything. You see all the dimensions, like the video people's facial expressions, and that's where you get this gut feel. So I think when people say like, you know know use your gut versus data your gut is kind of the distillation of every single data point you've ever seen in your life multimodal too yeah multimodal exactly because like data isn't just stuff that sits on a spreadsheet like data is all forms of you know comprehension that we have we take in yeah it's i i do think there's
Starting point is 00:16:02 actually there's a certain whole class of a person who says, oh, I don't need data to make decisions. I just trust my gut. But I, you know, you hear that a lot. Every time I hear that, I think, well, what do you think is you're putting into the gut to like feed that gut, you know, and like, I think that is the right chain is like, get some good data, put it into your intuition, let your intuition figure it out, and then make a decision and action on that. Yeah. Yeah. Yeah. I mean, I think that's why the, I mean, this is slightly tangential, but why you want to bring in someone at a certain stage of a company
Starting point is 00:16:31 who has a lot of experience solving similar problems because they put a lot of data into the gut, right? And so their intuition, they probably have really good intuition even if it's a different context. Data gut health. Is that a supplement you guys sell? Kombucha derivative.
Starting point is 00:16:53 Hey, Coalesce 2025. That's right. That's the next bit. Go to the Zenlytic booth and get your data kombucha. Incredibly sick from the Zenlytic swag. I have a question for Paul. you know data kombucha um incredibly sick from this analytic swag yeah that's great
Starting point is 00:17:05 um and then I have a question I have a question for Paul about your loss of representation oh this is my podcast no I'm sorry
Starting point is 00:17:13 I'm taking over it is your podcast so the loss of representation the things you mentioned are mostly textual right so it's like the thing with CEO
Starting point is 00:17:21 is the CEO is reading these things individually and is it possible that loss representation has been because we haven't been able to understand text, but now very recently we have pretty good tools for processing and structuring raw text. And, you know, if we increase that, the fidelity of that representation, does that mean that the CEO, COO of Figma should actually be looking at structured data from those text representations.
Starting point is 00:17:46 Yeah, no, I think that's absolutely right. And I think the better we get at understanding all these sort of multimodal forms of input, like the better, the higher fidelity actual signal you would get from those things. Because it's like, a good example actually that we did as a company, right? It's like we would watch videos of customers
Starting point is 00:18:04 use the product when we first launched. We'd see where they get stuck you know you see their mouse move when they get a little frustrated and they're like not sure exactly like what to do next or what to put on really high fidelity but it's something where you just have to be using you can't watch all the sessions and you've got to then do you know event tracking you know you go and you track events all this like how often are people logging in how often are they viewing dashboards how often are they doing this activity when are they asking questions and you have this this representation that's a lot lost here now like you don't see the little frustrated thing right before they click on the thing but it lets you view things at a higher scale and i think what ryan's alluding to which i think is absolutely
Starting point is 00:18:40 right is the better and higher fidelity we can process these inputs and effectively aggregate these inputs in a way that we weren't able to aggregate them before you'll be able to get just much higher fidelity signal on what people are actually doing like answering the actual question you're trying to answer right it's like if you had time by a time to watch every single video of every single person using like our product or using something else, you'd have a great intuition for what's going on there, but no one has that time. It's just no one lives without
Starting point is 00:19:12 hooks. And it's like, how do you aggregate that? Actually, an example of that is, y'all ever did a thing where you search for something, you want to look for something, and the answer is in a YouTube video, and you're like, I don't want to watch this whole YouTube video. That's great, but I just want a quick answer. I find myself increasingly,
Starting point is 00:19:27 when that happens, I search in Perplexity for it. Perplexity has indexed that entire video. So, you know, put the same string in Perplexity, even put the link in the video and Perplexity will get the answer out of the video without you having to go through and like find it inside 45 minutes of video, which I think is cool.
Starting point is 00:19:43 So it's like compressing that much more rich format into the ads that you need. Let's talk a little bit about this mental model of the map not being the territory. Which I think is a fascinating subject because data is a, and Paul, you had a couple of very elegant ways of describing this that I'm going to butcher
Starting point is 00:20:08 just like I butchered the vibe statement. But you talked about how it's a distorted distillation. Data is a distorted distillation of reality, right? And we just covered a couple of these things, right? Like, what are you losing when you go from watching all of these things, right? Like, what are you losing when you go from watching all of these videos of users to just looking at essentially a log of behaviors, right? Like, you lose something there. You know, one thing that's interesting is even just the act of watching
Starting point is 00:20:37 a user and perceiving that they might be frustrated develops a certain level of empathy that I think is almost impossible to get by looking at a log of behaviors. But can you speak to that in terms of, as you've thought about Zenlytic, right, you're managing the loss of that in some sense, right? You're trying to create a controlled loss of reality, right? So like you're building a map that speaks to the territory, but is not actually territory, right? Because, you know, that's really difficult. How do you think about managing that process for users who are trying to, you know, trying to use data in a useful way? Yeah, I think the map is not the territory is a great way to think about this.
Starting point is 00:21:23 Because, again, it's like data is going to give you good insight on like high level things and sort of be aggregated. But you lose a lot of this intuition of just like you see someone get frustrated, you see this, this problem. So how I think about balancing that is that at some point, you just can't watch all the videos, you can't read all the the tickets like the volume just gets too high to be able to handle that but on the two axes you need to of course have the high level like how many times are people logging in like that stuff is important but you also need to dive back into the just like raw seed if you will like you know talk to the customer on a video call this is why one of the sort of perennial like startup advice is like talk to your customers the intuition behind that isn't just that they are going to tell you what's built because easily they won't. They will have great feedback.
Starting point is 00:22:09 And you'll get all these sort of nonverbal things. And how does the product make them feel? All these other things that are really important that you just don't get if you're looking at laws. So that's why that advice works too. Because it forces you to get back into the actual reality of how are people experiencing this and deal with all the feedback and
Starting point is 00:22:30 do something to make that experience better. Can AI help close that gap, I guess, to be like to ask a direct question about Zenlytic? Is that part of the hypothesis where you can actually draw some of the territorial characteristics out with you can actually draw some of the like territorial characteristics
Starting point is 00:22:45 out with AI that it's really difficult to do, let's say, with a traditional BI tool? I think they can do it for itself. I don't think it fully replaces it. No matter how good it is because remember, it's not just that you have this data at the end.
Starting point is 00:23:02 It's part of like the training of your, you know, think about your brain as like a neural network, right? It It's part of like the training of your, you know, think about your brain as like a neural network, right? It's like, it's the training of your own weights on like how you think about something. So I don't think it can ever like fully replace that. It's just, I just don't know if it's actually fundamentally possible, but it definitely helps. And one of the big ways in which it helps is that you're able to ask things at a higher level than you would before. Whereas before you would have to say, I want the number of logins weekly for this customer or whatever.
Starting point is 00:23:30 And now you can just sort of say, hey, how is this customer doing? Is there anything I should be concerned about? And maybe it chooses, it being an AR agent, it chooses logins, dashboard views, chat questions, apps. A lot of these other sort of interaction metrics that it has, and it can give you this more holistic view and maybe think of things that you wouldn't have thought of. And a lot of that sort of gives me a hypothesis
Starting point is 00:23:56 and let me look at a bunch of different things and go and look at all of them and then kind of summarize them for me. That gives you the ability to cover a lot more territory a lot faster. And that's, I think, one of the big advantages that we can actually provide as a product. You know, Eric, one common manifestation that we see of that is when we give someone a sort of a higher precision view of data, which is our goal, right? It's like, you know, help everyone see the
Starting point is 00:24:19 data more easily. And we often get into a situation where it works really well, but it works so well that it exposes a lot of underlying issues in the data. They kind of see it for the first time, and they're like, oh, we've got all these data quality issues. And there's a bit of an existential crisis around that. It's net good. If those were floating around
Starting point is 00:24:40 for years and you didn't know it, it's like, was that data valuable? It's great to reveal that, but that's definitely sort of symptomatic of like this map versus territory. Yeah, yeah. I'm going to take the argument against fall where sometimes it's good that the map is not the territory.
Starting point is 00:24:56 And, you know, so like we've been talking a lot about models, right? The map is a model. We have mental models. We have LLM models. We have beta models. And it's cool. Like our world is all models.
Starting point is 00:25:04 And those models, you can scale them up or down. There's a famous story about a map of the UK. The simplest model is an oval or whatever and it gives a rough shape. And then you can make a higher precision one that shows the shoreline and higher precision, higher precision. And if you wanted to make the model
Starting point is 00:25:19 completely perfect reality, you could. But the model becomes reality. The map becomes the same size as the territory. Right. So there's a trade-off between complexity and expediency here. And sometimes it's okay to have a simple representation. I think a lot about the concept of tolerance in data. And when you're a mechanical engineer
Starting point is 00:25:39 and someone asks you to build an aircraft blade or whatever, you give them a spec, but you say, I want this to be a 16-inch blade. But you don't just say 16-inch blade. You say, okay, it has to be 16 inches plus or minus an eighth of an inch. And you give an acceptable range where it would be wrong, right? Yep. And we don't have a version of that in data, really, right?
Starting point is 00:25:57 But it's funny because it is really important. There's certain times if you're running a high-precision experiment where the difference between false positive and true negative is a tenth of a percent or something, you need very high-quality, high-precision experiment where the difference between false positive and true negative is a tenth of a percent or something, you need very high-quality, high-precision data. If you're running an e-commerce store and one SKU has doubled the return than everything else, it doesn't matter if it's 1.5x or 2.5x or 3x,
Starting point is 00:26:18 you know they're getting a lot more returns. So investing more time and adding precision to that model, making that map better is not really worth it because you're already getting, you know, the information you need to make a decision. Yeah. Yeah. It makes me think about this year, my son and his geography class, they started out with, they call it blob mapping. Right. And so it's fascinating. He actually can draw a pretty representative map of the entire world using circles and ovals. And it's like, this is pretty, you know, he has like a good understanding of the layout of the
Starting point is 00:26:49 world, you know, on a rectangular map, but it is literally just, you know, ovals, which is interesting. Also one thing, Ryan, just to empathize with you, we have an identity resolution. It's basically an identity stitching product at Ruddersack, right? So it takes all these disparate tables in your warehouse and creates nodes and edges. And it's super powerful, super useful, but it's also a great way to discover big problems in your data, right?
Starting point is 00:27:17 Because you just have thousands of things collapsing on one node and it's actually inevitable, right? It's not, that's not a problem. You know, that's not an identity switching problem. It's actually an underlying data problem, which is fascinating. So like same exact, you know, same exact thing. Yeah, absolutely. Okay. I'm going to this next, I want to talk about the Zenlytic product. We've been dancing around this. I have a very good grasp of, I think the shared worldview and then even some of the differences,
Starting point is 00:27:45 you know, which is fun to hear from both of you. But I'm actually going to start, I want to dig in on the topic of self-serve analytics, because this is a big Zenlytic thing. But I'm going to start actually by asking a question to you, John. So leading data at a company where you had all sorts of different data consumers, right? So, you know, you ran marketing for a while, you oversaw all the data, and you had these different stakeholders from the sales team to customer success to marketing to executives. So what is your definition of self-serve analytics? Man, loaded question.
Starting point is 00:28:23 Totally loaded. So loaded. analytics man loaded question totally load so loaded like say 10 years ago like if you told me hey uh self-serve analytics is going to be like a controversial position like i would be like no way like everybody wants self-serve analytics yeah but it's really not like there's been a and you i mean you guys are laughing like ryan and paul but like there's been quite a backlash against it so from my perspective i thought of it probably in two or three categories like when we're doing like evaluation of like what tech do we want to use how do we want to enable people like first category was like okay i'll call it like the full feature category like if somebody has something we can guaranteed build it because we have
Starting point is 00:29:05 every single pie chart, gauge, big number, you name it. We can build it. Option value in terms of the say products or interface that you can deliver to your customers in turn.
Starting point is 00:29:21 Yeah, because if you've been an analyst for any length of time, you will have that one customer like, let's just pick on sales. I always pick on marketing. A sales executive that's like, I gotta have a dashboard and I need gauges and the gauges need to look like this and I need
Starting point is 00:29:36 these colors here. You can get people that are very precise about what they want down to the color and the type. So there's a whole class of tools that are built to where you can like fine craft like very detailed things like that yeah and then there's another class of tools that are more i would say more built toward optimizing analyst workflow yep which i guess spoiler alert that's the route we went was like okay like maybe it's a little counterintuitive but our we believe that our best way to to do self-serve was actually to empower a couple of analysts to be able to move really quickly in the tool produce things that
Starting point is 00:30:19 were useful for people and then the funny part about that is although we intentionally went with a tool that was selling to and enabling analysts, we ended up with a lot of citizen analysts, a sales manager is like, I want to learn this tool. The analyst part, actually writing a little bit of SQL and tweaking things, they did it. And we had a customer service leader do the same thing. And one or two other people that were a manager or leader in a department, they got very, I mean, very light SQL, but nothing crazy typically. One or two of them went pretty deep, but like very light, like technical things and using the analyst workflow.
Starting point is 00:30:58 It was one of the most counterintuitive things where you would think that if you gave somebody like the most like kind of call, and this was before like AI was an option, but you would think if you give somebody the most kind of... And this was before AI was an option. But you would think if you give somebody the most polished, like, hey, look, you just have to click and drag. That's what they would want. But it ended up being more strong to basically have a few... Basically one, maybe two key people in a certain area
Starting point is 00:31:22 to really enable them to be fast and awesome. They're the go-to for all of sales, all of marketing, all of customer service. And keep that analyst-centered workflow. That's best for us. Super interesting. Okay, Paul and Ryan, now the question is on to you. It's a great question.
Starting point is 00:31:41 And first of all, I would say self-service in analytics is like a spectrum. It's something where the goalposts had constantly been moving for it as products have evolved over the years. So if you go way back, people would look at business objects and say like,
Starting point is 00:31:53 that was self-service. Like you just made the cubes and everyone made the cubes and then you could mess around on the cube until you hit the limits of the cube. And that was pretty self-service. Yeah.
Starting point is 00:32:02 And it's like, then people were like, oh, well actually, that's not really self-service. Yeah, and it's like, then people were like, oh, well, actually, that's not really self-service. Tableau is way more interactive dashboards, and you can just upload your own CSV, and you can really make the visual whatever you want. You want to gauge, and you want to make it blue?
Starting point is 00:32:16 Go for it. Tableau is really powerful on the visualization side. And then people were like, wait a minute, Tableau is still too hard for most people to use. It's just something the analysts are using to make the perfect dashboardsboards for the exact and then it's like okay well looker is a lot easier you don't have to figure out the visualization thing you just click on the data you want to see it'll sell you the data they might even sell you visualization and it's like there you go it's like a lot easier now but i think where we come in and kind of
Starting point is 00:32:40 following that is saying actually look it's still too hard like just look at how many people are actually using it and the reason for that is that you've got to find the right explorer you've got to know which data to use in the explorer you've got to remember do we trend revenue by like the process state the created date the ship state like i don't know i don't remember yeah actually i'm just gonna ask john and that's how that process goes right yeah and it's like i think sort of the thesis of Zenlytic is that actually the best interface for data is talking to the analyst. Like you ask for what you want and the analyst says like, Hey, yeah, I've asked you that question a bunch. It looks like this typically, or yeah, we don't actually track that. Yeah. That's not in the
Starting point is 00:33:19 warehouse. We gotta, you know, we gotta start tracking that. And it's like, that is sort of the feature that we're building towards because we're really building a co-worker we're not trying to make someone faster using a bi product we're trying to make something where the analysts can basically give the system the right context so it can actually just go and do that same work that you were describing john and everyone can have an analyst panel with them because the problem with having the analysts do the work work is you have a finite amount of them, even if they're sort of citizen analysts who are embedded in the teams.
Starting point is 00:33:49 And the amount of questions that people have is really gated by the number of people that can answer them. So what we're trying to do is build a system where the data people can come in and say, hey, this is how we do things. This is what you need about our environment. This is why these things are calculated in this weird way, and this is what you need to know about us. And then Zoe's able
Starting point is 00:34:06 to go and actually answer those things. Ryan? Yeah, well, I agree with Paul. As a co-founder of Pulse, I agree with Paul. Well, you've disagreed a couple times. That's why I wanted to throw it over to you. No, that's not what we're doing at all. That's not what we're building.
Starting point is 00:34:22 That's not what we're building. Group therapy for founders right here. That's what we're doing at all. That's not what we're building. That's not what we're building. Oh, yeah. Group therapy for founders right here. That's what we're doing. That's right. That's what we do. No, a couple things to add. I mean, first thing,
Starting point is 00:34:30 you know, Paul's spot on about like the goalposts have moved. I'd say it might even be more sinister than that and that it's, you know, who benefits. And I think that a lot of people
Starting point is 00:34:39 have benefited from keeping the definition of self-serve to be as murky as possible. You know, and it's like, because that's something that sells. It's also something that's very hard to do in a product perspective. And, you know, if you look at a platform 20 years ago, they had no hope of actually delivering a self-serve experience, but they still wanted to, you know, call themselves self-serve. And that's been the case ever since, right? Like people always forget
Starting point is 00:34:59 to play fast and loose with that definition because it benefits people making the definitions. I think one thing that makes it really crisp for me is, and I'll borrow Eric's PM ad for a second here, I think about the personas using the last mile of analytics. And I think one of the big misconceptions is that there's two of them. People always talk about sort of technical, non-technical. I think there's actually sort of three big buckets. You know, I call them the 1%, the 10%, the 89%. The 1% are the SQL monkeys, right? Those are the people that are, you know, really technical, the people who are building your semantic layers and who are administering your data warehouses and writing your DBT
Starting point is 00:35:36 transformations. That's the sort of one. The 10% are the analysts. And this includes the sort of citizen analysts that you're talking about, John. Like it's quite often they're Excel powerhouses sort of stretching the sort of citizen analysts that you were talking about, John. It's quite often their Excel powerhouses sort of stretching into sort of like enthusiasts, and we'll sort of dabble in the BI tool a little bit, but
Starting point is 00:35:51 they don't spend a lot of time writing SQL or Python or some of the really more flexible scripting languages, basically. And then the rest is the 89%. And that's the group that's the end users. That doesn't mean that they're not data-driven. It means that they're busy focusing on the vibes, you know, like buys are a big part of their jobs too. And like, so it's like, you know, when you're a,
Starting point is 00:36:11 when you're a marketing manager, it's, it means that you're, you're too busy being a marketing manager to have time to do analytics. And it's funny, actually, I was just talking on LinkedIn, like I'm the 89%. Like, even though I'm very good at Python, I'm very good at SQL, I'm a huge data nerd, but I'm too busy doing, you know, CEO stuff to go and, you know, write a bunch of queries against their own data warehouse, for instance. So it's like, it's more a question of time and what you can focus on.
Starting point is 00:36:35 But I think, you know, historically, when I think about what sort of, you know, BI has done, it's been the 1% making dashboards for the 10%, and then the 89% are left out in the cold. And I think that's what they call self-serve, really. The 10%, yeah, they dabble a bit in exploration stuff, but not very much. They'll flirt with it a bit, but they don't really get into full-on deep, they're not
Starting point is 00:36:58 writing notebooks to do analysis or anything like that. And then the 89% are usually missing out on the the data and it's all vibes. It's all vibes, you know, at the top. So I think that, I think the opportunity here is that that can shift, you know? And I think that if, I think that if we can multiply, you know, the more technical folks
Starting point is 00:37:16 so they can be available and they can multiply themselves, they can do what I like to call analytics at scale, right? And like move from sort of that like point to point defense where it's like, you know, one question, one answer through to being able to build tools that the entire team can use and answer those questions.
Starting point is 00:37:33 And then the analyst job shifts over to analyzing the data from the team. It's understanding the sort of questions the team is asking and like how they're using the data and what they need to have. And then you add that in a scalable way so that, you know, not just the person asking that question can receive, you data and what they need to have. And then you add that in a scalable way so that not just the person asking that question can receive what they need,
Starting point is 00:37:48 but the entire company will get those metrics added or will get whatever they need. So I think that's what we're going to see happening over the next few years. It's interesting to apply the mental model of the map is not the territory and the distortion that happens because of the distillation across the spectrum that you just talked about. And so let's go back to like watching videos of
Starting point is 00:38:15 users. Okay. So you're watching videos of users that is, you know, sort of actually in itself is a distillation of reality, right? Because you're interpreting certain things, right? But let's just call that sort of raw data at least as far as we can consume it. Then you go to event logs, right? Then those event logs need to be summarized in some way, right? And so you can call that a semantic layer, you can call that a model or whatever you can call that, you know,
Starting point is 00:38:50 a model or whatever. So there's distillation happening there. That's performed by the 1%. They're delivering some asset to the 10%. So there's distillation there. And then of course, that sort of filters out to the 89%. And so when you add on, you know, the you go from the like raw data to the logs, you know, there distillation, logs to the 1% building an asset, 1% to 10%, 10% to 89%. It's an insane amount of... Why is it hard to be...
Starting point is 00:39:15 To use data at a company and it's like, well, I mean, the distance, it's a distance problem, right? I think you just described the vibes because basically it could start out as data in a log somewhere and by the time it gets out to the 89 it is just a vibe it's like an echo it's like an echo of like whatever the reality the device processing machine
Starting point is 00:39:36 yeah i think it's true and actually interestingly enough i think that probably the weakest part in that whole chain right now so i think that chain is hard but achievable for starters. And part of that is actually just the right systems that allow drillable data. Paul was talking about setting up cubes and stuff like that. That's a hard block in that entire chain, right? Because you can't have a higher resolution than that cube. Part of that is good lineage. It's like, hey, where did this video come from?
Starting point is 00:40:02 Or where did this data point come from? I think on a human basis, I just think actually the hardest people to put in that chain are probably the 10%. And these are finding really great folks who can translate the technical stuff into the vibes business outcome stuff. And being able to be a translator for that is actually a really hard job. It's a bit like a unicorn job you have to do and that's why
Starting point is 00:40:28 finding folks like that is actually probably the hardest part. It's like they're actually pretty rare and that's also one of the reasons why, you know,
Starting point is 00:40:35 we're always lamenting like are we adding value bases because we don't have enough people like that. Yeah. Yeah, interesting.
Starting point is 00:40:40 Can I ask about, okay, so I want to talk more about the product experience and I'm going to just frame a question. I'm going to frame like a, I guess an analytics type question, right? And I have a hypothesis on why AI could be really helpful here, but yeah, I'll just frame this question. Then I'd love for you to explain, okay, how would I use Zenlytic or how, you know, what would the product experience of Zenlytic be like? So I want to go back to something that you mentioned, Paul, that's related to event logging. So, and you mentioned a particular user behavior, like a login event. And so traditionally
Starting point is 00:41:19 login events, you know, you would associate that with maybe that's part of your definition for an active user or, you know, there are semantics there, right? It could be an active user. It could actually contribute to, you know, a churn score. I mean, there are a number of things there, right? But one thing that's really tricky about logins is it varies so much by product. And so I'll give you a specific example from Ruddersack. It's really not a great indicator of, you know, whether or not things are going well, because a lot of times if the data is flowing, you don't need to log into the product, right? And so you can have this really crazy inverse relationship where, you know, like that could be a sign that everyone's super happy,
Starting point is 00:42:03 right? I mean, that makes our job harder in terms of understanding the user and maybe there are other indicators. But that's a tricky problem. If you have a product where event logs are actually less straightforward maybe than say a consumer product where there's a daily login event that's an indicator of some sort of outcome or stickiness or loyalty.
Starting point is 00:42:29 In the absence of that, how do you... And the reason I bring that example up is that's highly contextual, highly contextual, right? Like, it's the nature of the product. It's the problem that the product solves for the user. There are different personas. And so even that metric could and probably is very different depending on the type of user and the platform, which means that the semantic definition
Starting point is 00:42:50 is different for different users. But context is something that AI is awesome at, right? So there's my really rough sort of problem of trying to understand maybe the health of an account where my event logs and the semantics related to them are actually pretty complicated and highly contextual. So there we go. I think it might be fun to frame this and even more specifically, because Zoe's the Zen Linux agent. I was like, Eric just asked the question of
Starting point is 00:43:25 hey, how are my accounts doing? Or how is my account specifically this one doing? It would be really interesting to learn from you guys what types of things can the AI say from that? And then what kind of context would it need to do a good job? And you can go as technical as you want.
Starting point is 00:43:41 And for the sake of argument to set the table, let's just say we have really good event logging, which we do from Red or SAC. So we have like a lot of event data. And then we also have sort of, let's say your traditional like sort of, you know, we're ETLing in Salesforce and, you know, customer success,
Starting point is 00:43:56 and all of that, right? And so we have all those tables in the warehouse. Yeah, no, it's perfect. I think it's helpful to start with a little bit about kind of like how then let it works and how we sort of think about the world. Yeah, the way we think about it is that the data tool should be trying to build these building blocks that can be used to answer a ton of different questions. And they should try to add as much context as possible to those building blocks. So that looks like is it's like, hey, this is the law. This is how we calculate logins. This is how we calculate active users. Always very complicated to actually do, but it's like, that's why the data team needs to define it. We don't think you should be putting that definition off to the business people
Starting point is 00:44:32 because you're going to get a ton of different definitions. Nothing's going to create it. It's going to be a disaster. So that's why, philosophically, we're like, data team should be defining what does it mean to be an active user? How do we calculate gross margin? Like all of these kinds of-
Starting point is 00:44:45 Or business definitions, yeah. Yeah. And part of that is not just defining like the SQL of like how do I aggregate up something into active users? It's also like, what does this mean? Like how is it calculated? Like why would it be used in a certain way, you know? So in addition to, let's say we've got our logins metric, like, you know, how often are people logging in?
Starting point is 00:45:08 We've also got like product usage. We've got, you know, some meta level context on like, what is, if we're going with Redistack as an example, what is Redistack? What do they do? Like, what is the company? And you've got these contexts at these different layers. The most important one being like, okay, this is what product usage looks like. Like this is the amount of like gigabytes of events that have been logged by whatever customer we're talking about here.
Starting point is 00:45:33 So when you go in and you ask Zoe, hey, like, you know, can you tell me about my customer health for like XYZ customer? She's going to go in and she's going to search in this magic model for like XYZ customer and then any other terms that she thinks could be relevant. So she looked for like usage, health in case we have a health score, logins, activity. She just searched for a bunch of these different terms. And then we probably have a ton of stuff come back. So it'd be like, okay, I see like, you know, gigabytes used, I see logins, I see, you know, number of events streamed. I see, you know, like session duration on the site.
Starting point is 00:46:06 I see like whatever other stuff we're tracking there. And then since I was able to like run more than one query too, she could say, hey, let's look at like logins and session duration. Those are over here. Let's look at like quantity of usage, like events and like gigabytes and everything that's streamed over here. And then she might be able to say, okay, well, it looks like this customer, you know, has kind of not that many logins.
Starting point is 00:46:26 Like they got like two logins in the last week, but they also had like 80 gigabytes, you know, of, you know, actual information transferred. So they're, you know, pretty heavy users of the product, regardless of them logging in. And she's going to be able to actually go
Starting point is 00:46:41 and reason it out that and say, okay, let's look at this more holistic picture because I can do more than one thing like it's not helping you run one query it's able to actually go pull a few different things and then you know whether the summary and it's saying like there's a lot of logins but there's a lot of log usage that's going to kind of be a summary and you're going to be able to say okay well are they healthy or not? What else do I need to know if they're healthy or not? Yeah, it was either... This is a perfect segue. Ryan, it was either you or Paul
Starting point is 00:47:11 that posted, I think it was a week or two ago, about one of the use cases, one of your customers. It was this unlock for them of like, oh, I can run 10 scenarios at once. What would take me... I got to do it one at a time as a human. I can say, hey, customer health.
Starting point is 00:47:28 What's customer health? And keep it really broad. See 10 or 12 different things. Be like, no, yes, yes, yes. And then continue to drill in. Whereas as a human, you're just going to like, customer health, whatever comes to mind. Oh, I need to look at logins.
Starting point is 00:47:43 And you go down that road. And then you're like, oh, well, logins isn't good and you go the next thing so i i feel like that was a cool thought even for me because as an analyst of course i would treat it that same way as like well which one do i think is best i'll look at that first i'll go the next one but you're not limited that way yeah it's especially time wise i think it matters because if someone asks you this fairly broad question, I always get this, like, scene-revealing in my stomach because I'm like, oh, where do I even start?
Starting point is 00:48:07 Yeah, yeah. There's so many things I could look at. Like, do I look at all of them and do I look at some of them and then I'll look at some of them, which ones? But then, like,
Starting point is 00:48:14 you ask a system like Joey and it's like, I could look at all of these things and they're like, yeah, you go and do that. Like, how can I go get a cup of coffee or something while you train? Yeah, yeah.
Starting point is 00:48:22 Because I think that's where the context, that's a much more articulate way to explain what I meant by context because even if we think about something like product health, it varies significantly on how you slice the business. We have a free account that's trying the product versus a large enterprise
Starting point is 00:48:42 that's paying us a lot of money. Product health is really different. Adoption happens at different rates. And this is, I think, where Map is Not a Territory causes huge problems. We have a product health score. And it's like, great. It actually is a bunch of different product health scores
Starting point is 00:49:02 because you can't distill all users or customers down into a single composite. Like it all comes back to map design. So in that case, that's the equivalent of the map showing the UK 400 miles north of where it really is. And if you go to sell the UK, you're going to miss it basically because the map is not right. So it's like, the question becomes
Starting point is 00:49:21 what are the right primitives? What are the right Mad Libs that you can give? Whether it's an AI analyst or human analyst, when you set those primitives, data teams have a tremendous amount of power to shape how an organization thinks. And it's like, if you start putting the wrong metrics in there that don't let you account for that context, then an AI analyst will probably use it,
Starting point is 00:49:42 people will probably use them incorrectly too. But if you set those properly, I guess in that case, our goal is to bubble up all the most relevant information. There's still always going to be a synthesis step at the top for a human. It's like, yeah, Zoe can summarize things
Starting point is 00:49:56 and talk a little bit about it, but we fully expect that the human's going to review all the data and make a decision based on that. And our objective is really to make sure that they have a really fat pipe to that data they need to make a decision. A couple of specific questions, and John, you may have a couple too, because I know we're close on time.
Starting point is 00:50:15 But in terms of the product experience, can I bring my own primitives? So let's say I have, as an analyst, I've generated, I have some sort of definition of active users, you know, that's represented as a model or a table or whatever. How does that work? Because there's obviously a semantic layer here. Does Zenlytic provide that? Can I plug my own pieces into that? Just from a product experience at a company, let's just use RutterSec, right? We have models that are running, we have reports or whatever. So if I'm onboarding into Zenlytic, what do I need to bring? What do I need to develop? How does the semantic layer work? Yeah, great question. Our interface is always
Starting point is 00:50:53 table in the SQL warehouse. It's like, as long as you can define some SQL to aggregate up active users on some table in your warehouse, then that's all you need to kind of take down. We basically sit on top of those tables. And the kind of things that we expect you to find on top of those are like any additional sort of English contracts you need and sort of the aggregation, like the measure, like how do you calculate gross margin, how do you calculate active users. With those building blocks, then we can just kind of wheel and deal and pair those around however she needs to answer the kind of incoming questions. Love it. I have one more question before we we end but did you have any other questions oh i gotta give you the last word yeah i appreciate that i've used the product
Starting point is 00:51:33 so i i i guess my actually one question this is kind of future looking when do you think ai agents in general you can i think this is a general question, will be better at knowing what they don't know and be able to better integrate into project management and things like that? Because I think that to me would be a really interesting component to this.
Starting point is 00:51:59 Yeah, definitely. I think part of that is there's two components. One is the underlying models as they get smarter will be like less falsely confident the other one is it's kind of like the kind of fine-tuning you do on them does actually shape this kind of behavior like this is the right kind of behavior to shape with fine-tuning whereas like some behavior you want it to be just sort of like how you tell it to behave if it's like in line with how it's been trained so far. But
Starting point is 00:52:25 there's other things where you want it to not be confident. You know, you want it to have a little more granularity there. I would say if you want a really concrete answer, I think we're not going to get all the way over there, but we're going to see a step change at LLM's being able to understand what they don't know when the
Starting point is 00:52:41 general release of reasoning models comes out, which no one knows for sure. OpenAI is the furthest ahead with this. No one knows for sure, but the rumors are that will be this year. Wow. Love it. All right. One last question for you, Paul, which is maybe the hardest question. And that is, how did you come up with the name Zoe for your AI? I mean, I feel like that's the hardest thing for any AI company is to name their agent, right? And then defend the name against the other agent. And then defend the name against the other agent.
Starting point is 00:53:13 So I think we've got a good, I think we've got a good case here, actually. So again, like I studied, you know, Burks, Elmo, Big Bird, like all the initial transformer models were actually named after Sesame Street characters, believe it or not. I did not make that connection.
Starting point is 00:53:28 Wow. So Zoe is the only Z-named Sesame Street character. And then Liddick, obviously. You know, we wanted something that's sort of, like, close to us enumeration-wise. So Zoe was, like, the sort of obvious choice for us because we wanted to sort of pay homage to the original, like, Transformer models
Starting point is 00:53:44 and be sort of consistent with the Z branding with selenitic wow zoe was always the obvious choice wow that is awesome i did not put that together and i did not yeah and you have like you got the z right so it's yours yeah exactly we got we got the z which is not always a good thing sometimes you want to be at the top of the list, not the bottom of the list. Yeah, that's true. That's true. Awesome. Well, Paul and Ryan, thank you so much for joining us. I learned so much.
Starting point is 00:54:13 And yeah, it was fun talking about mental models and everything. And we'll check out the product. It sounds awesome. Loved it. Thank you guys so much for having us. It's absolutely a blast. The Data Stack Show is brought to you by Rudderstack, the warehouse-native customer data platform.
Starting point is 00:54:27 Rudderstack is purpose-built to help data teams turn customer data into competitive advantage. Learn more at ruddersack.com.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.