The Data Stack Show - 130: From Business Intelligence to Product Analytics and Beyond with Vijay Ganesan of NetSpring.io

Starting point is 00:00:00 Welcome to the Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You'll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by Rudderstack, the CDP for developers. You can learn more at rudderstack.com. Welcome to the Data Stack Show. Today, we are going to talk with Vijay, who is one of the founders of ThoughtSpot, which is a hugely influential company in the world of BI.

Starting point is 00:00:36 Kind of came of age along with Looker in many ways, somewhat of a different audience, Kostas, but obviously, he knows what he's talking about when it comes to analytics. And amazingly, he started another analytics company, which is fascinating. We've talked to a couple of people now who came from a world of sort of decade defining analytics and have started subsequent analytics companies. And so part of what I want to ask is why? And what's the motivation behind that? Obviously, there are still major challenges to be solved, major opportunities to be taken advantage of.

Starting point is 00:01:18 And yeah, that's just fascinating to me. So that's what I'm going to ask about. Yeah. And I want to talk with him about product analytics more specifically because the new company is about product analytics, right? Yeah. And it is like an interesting breed of analytics. There are products out there, right?

Starting point is 00:01:38 Like 150% sure that he has some very good reasons to start a product analytics company today, right? So there's probably some things that have changed and they have created like, that make it makes today like a good timing to do that. So it will be super interesting to see and hear from him, like what's the reasons are for that, what are the differences between like the previous, let's say wave of product analytics tools? And what's the opportunity?

Starting point is 00:02:10 Because the opportunity also corresponds, let's say, to a need. So let's see what made him like to start a combining product analytics. Yeah, I agree. And we need to figure out how NetSpring works, of course, under the hood because it sits on the warehouse. So let's dig in and find out. Vijay, welcome to the show. We are so excited to chat with you. Thank you for having me.

Starting point is 00:02:33 I'm excited too. All right. Well, give us your background. Yeah. Vijay Ganesan. I'm co-founder and CEO of NetSpring. We are a stage startup in the product analytics space. My background before this, my co-founders and I did a company called ThoughtSpot, which is now a leader in the business

Starting point is 00:02:52 intelligence space. Prior to that, I spent some years in Oracle. People started working on business intelligence analytics systems. So my DNA is building price class data analytics products. Very cool. Can you, there are so many things to dive into. ThoughtSpot has been such an influential company in the world of BI. Can you give us just the brief story of the founding? Were you at Oracle when, you know,

Starting point is 00:03:25 the idea came about to found it? Yeah, I was at Oracle. I was part of, you know, Oracle, which is part of that, you know, the first generation of what people call big BI, right? These are very large, very complex, centralized BI environments where, you know, you bring all your data into a central repository and then you write, you know, you bring all your data into a central

Starting point is 00:03:45 repository and then you write, you know, you have these armies of people building very complex analytics, very large centralized team that is building analytics for businesses. And so that was sort of like the first wave of BI, you know, and then there was the second generation of systems that are what are called departmental BI, where people said, you know, this centralized, complex, large systems are too painful. You know, I'm going to buy a desktop license of Tableau and, you know, somebody writes a SQL, pull some data out of Teradata. And I got it on my desktop and I got reports going and I don't care about the central team. And that was a lot of value, actually. You know, it sounded, you know, we used to play that down, but actually it was a huge value that these

Starting point is 00:04:27 companies brought, the workloads and the clicks with the departmental solutions, simple and easy, very business user-friendly. That's almost Salesforce-esque, right? Like sort of the no software, like, you know, the user can access this, etc. Well, it was still,

Starting point is 00:04:43 you know, when they started, it was all like Windows desktop installs and so on. So it wasn't SaaS yet, but the thing was, it was easy. It was very easy for a departmental person who didn't have to depend on ETL teams and BI teams. Yeah, and users. Right? They just, all they needed is, hey, give me,

Starting point is 00:05:01 just dump some data out of, just the data I need out of your central system and I can then do my own thing, right? So, and that was huge value. And so, you know, the next generation of BI was bought by these folks, right? So that was the second way where you solve the problem of sort of these very complex, centralized, very sophisticated systems, very highly scalable, very performant, highly sophisticated. You can do some incredibly complex analytics, but it takes months. And then you have this very easy to use, very quick to get started, simple, visually very appealing, easy to use.

Starting point is 00:05:34 There's a second generation, right? And so when ThoughtSpot, when we started, we said, look, why can't you have the best of both worlds, right? Why can't you have enterprise class systems that is centrally managed, but also make it very easy for business folks to get to the data and build analytics with it? And then coupled with this idea that, hey, we use search for everything in life,

Starting point is 00:05:54 why not for data, right? And that's when we hit upon this idea that what if every business user simply has a search bar like Google and they ask the question of their centralized, governed data and they get a report and they're done. They don't have to go install anything on their desktop or build a shadow departmental IT team. And now you've got the best of both worlds. You've got the enterprise

Starting point is 00:06:14 class scale performance and you've got your self-service for business folks. And so that's really the third generation of BI that we are sharing. Yeah, fascinating. Yeah, I spoke too early. That's really the sales force, you know, when it became SaaS and then truly the end user could access it. Now, when we were talking before the show, you made a statement that I thought was so interesting. You know, back in, I guess it was 2012 or around that time, when you found a ThoughtSpot,

Starting point is 00:06:48 which is a really interesting time, by the way, because you have the data warehouse emerging. There are a lot of things happening in that time that were nascent. But you described BI as mature. BI was mature when we found a ThoughtSpot. And the more I thought about that, I thought, you know, maybe some people would maybe not disagree with that, but be surprised by that. Probably people who, you know, maybe weren't doing analytics on a large scale back then. But can you describe that a little more? Like what does mature BI look like? Yeah. So there's, you know, when I say BI was mature when we started, it was mature in the sense of the kinds of analytics that you could do in these systems was pretty mature. In other words, any kind of analytics anybody wanted, you could do in these traditional systems.

Starting point is 00:07:38 But it was just that it was very painful. It took weeks and months to do that, right? So what you can do through a taskbar or a looker, these types of tools, the next generation tools, is something that, it's not something you could never do analytically, right? You could go and use business objects and, you know, if you had five people, experts, and give them two months,

Starting point is 00:08:01 they will build it for you, right? So that's what I mean. Like, you know, that's what I mean. Analytically, it was mature, but the delivery mechanisms were primitive. It was too cumbersome. It was not effective in the sense that by the time you got this report done to your business folks, it's already too late, right?

Starting point is 00:08:18 Because business is moving too fast. That's what I meant by there was maturity on that one, but on the usability and the democratization and the effectiveness for business. Yeah. And I think it went down market too, right? With the ThoughtSpots and the lookers of the world, all of a sudden you didn't have to be an Oracle customer in order to actually deliver insights. That's right. Absolutely. Super interesting. Okay.

Starting point is 00:08:48 Well, let's talk about product analytics because NetSpring is a product analytics company. And can you just give me the one-minute explanation of what NetSpring is? Yeah. So NetSpring is next-generation product analytics, right? So we are warehouse-native product analytics. We're the first warehouse-native product analytics company that brings the analytical power of business intelligence

Starting point is 00:09:14 to the world of product analytics, right? So you can think of us in a nutshell for data folks, the way we try to describe it that really hits home is think of us as amplitude plus looker in a package working directly off Snow. That's the imagery that best describes us. Yeah, that's so great. You know, one of the reasons I love that is because in my past and trying to build these sort of stacks,

Starting point is 00:09:40 it's like I've taken looker and like tried to turn it into amplitude and there are so many it's fully capable of doing that as a tool it's actually interesting to mention you know it's like all the you can sort of build whatever you want it's not like anything's off limits but it's also like oh man it takes you know weeks to build a cohort report and looker that is basically out of the box and amplitude and then. And then at the same time, I've tried to take Amplitude and do some crazy stuff with it. I would say it's actually the SaaS systems tend to be much more inflexible when you're trying to do more complex querying. So I've definitely felt that tension, which is interesting. And then ultimately, I think everyone probably ends up in the data warehouse just because that's where you end up being able to perform the types of queries with flexibility that you want.

Starting point is 00:10:30 Is that sort of the dynamic that NetSpring is responding to? Yeah, you said it. So this is the world people are in, right? So either you are in the world of product analytics vendors, what we sort of call first-generation product analytics, with magnitude mixed panel, great products, by the way. Either you're in that world where they're purpose-built for product analytics. You want to do retention, you want to do cohort analysis, they're built for that.

Starting point is 00:10:57 They give you easy-to-use UI, very nice UIs. You can quickly whip out a funnel and a cohort analysis, all that stuff very easily, right? So for that first level of analytics, they're actually great tools, right? But when you have the next question, right, now that's where the problem comes. You don't have the power that you have in a BI tool to write arbitrarily complex queries and do the kinds of analytics.

Starting point is 00:11:20 And so then you end up in this other world of BI, but then these tools are not really built for time series and event-oriented processing and so on. So we were talking earlier, Kwasi, as you were asking, what is the difference in nature of this data? So the way I describe it is, say, if you think of businesses, there is reporting on outcome. For example, I want to report on how many orders did I take today on my website, right? That's reporting on the outcome, right? But when an order gets placed, there's a whole bunch of interactions that users go through before that final order gets committed. And you have a record in your data warehouse that says, XYZ purchased this much amount, right? You log in, you know, you do search and you're adding something to a

Starting point is 00:12:04 shopping cart. You're, you know, there's search, you're adding something to a shopping cart, you're, you know, there's lots of interactions, there's a lot of events that get captured that lead to that final state, right? The reporting in BI is about reporting on those final states. Product analytics is about understanding patterns of behavior, a set of events that lead to that final state, you're doing analysis on that. It's that half of analytics that is studying patterns of behavior that lead to outcomes, and the second half is really reporting on something that happened.

Starting point is 00:12:37 So to use that analogy, how many widgets did I sell in North America last quarter? That's a BI reporting tool. But to understand which cohorts of users are buying more from me and why, that's a product analytics session, right? That involves product instrumentation data. That involves event streams. That's the fundamental nature difference in the nature of the data.

Starting point is 00:13:01 The second thing is the representation of the data is very different, right? So, you know, these first-generation product analytics tools, they are purpose-built for representing that event data in a certain fashion that's amenable

Starting point is 00:13:16 for those specialized queries, like a cohort query or a funnel query and so on, right? Now, those are very difficult to express in a relational model or in a star schema-type model that is typical for BI reporting tools, right? Now, those are very difficult to express in a relational model with a star schema type model that is typical for BI reporting tools, right?

Starting point is 00:13:28 And that's where the tension comes, right? And what we have done at NetString is we've really brought those two worlds together, right? We call it the relational event streams technology,

Starting point is 00:13:38 which is our model is fundamentally relational, but we've layered this event-oriented concepts on top of a relational model. So we can work natively off data warehouses and still get the specialized processing that you have in these event-oriented systems. And that's really the key technology breakthrough that enables sort of the best of both worlds,

Starting point is 00:13:58 right? One other aspect to this is historically that data never came to the data warehouse, right? If you think about, you know, like product instrumentation streams, IoT, you know, data coming, skewing from your mobile phone, you know, those types of event-oriented data historically never landed in the warehouse. Warehouse was sort of a small subset of mission-critical business data. But that has shifted, right? You know, which is a fundamental shift in thinking with cloud data warehouses.

Starting point is 00:14:29 People are putting pretty much anything and everything into the warehouse now, right? And today's cloud warehouses are amenable. That's a huge shift that's happening, right? And then if now, if the data is in there, by the way, all these tools today require you to ship the data out to their SaaS service.

Starting point is 00:14:47 Data is actually going out of your systems into some black hole somewhere. And that's becoming a big problem these days. Nobody wants increasingly GDPR, privacy, security, and nobody wants data copies going off into some black hole somewhere. People want control over their data and they want it in the warehouse. Yeah, 100%. I have a question because you mentioned something very interesting here. The data warehouse,

Starting point is 00:15:15 the OLAP model in general, and like the Snowblade, the way that data has been like traditionally, let's say, structured and modeled to drive BI is not good for working like with product analytics, rather than with event data. Can you tell us a little bit more about like the technology that you built, like as Let's Spring, to bridge these two things together, right? Like going from the very, let's say, tailor-made representation of data that

Starting point is 00:15:47 something like Mixpanel has, somewhere in between that and what's, let's say, like a tabular representation that Snowflake has. Like how does this work? Because that sounds like something like super, super interesting. Yeah, that's the crux of the underlying technology differentiation. So basically, the existing products, they are typically tailored as single table type models. So they are basically, there's one event table, essentially. Everything is stored. And they have a very fixed data models. And there's a notion of a user, there's a notion of a session, there is a notion of an event, and that's it, right?

Starting point is 00:16:26 It's pretty much those are the concepts that you have in these data models, right? And there's very good for, you know, like traditional sort of shopping cart

Starting point is 00:16:34 type applications, which is where these products originated. So in our world, what we said is we're not going to go with that single table model, right?

Starting point is 00:16:43 We're going to go with this generic model of any business entity represented as a table in your system. You could have a user table, you could have a document, you could have a ticket. You can study journeys of anything, not just users. But then some of these tables, if you imagine a table in Snowflake, some of these tables through some annotations can become event streams, right? So you could have a table in snowflake that you could annotate in NetSpring to say, you know, this represents an event stream. And if you think about it, an event stream really, you know, you have a timestamp column, you have an actor that is performing the event, and you have some kind of

Starting point is 00:17:19 an event type, right? You know, there's a click event or an act cart event, right? And those are really the decorations that you need. So we started with that approach of saying, we take a generic relational data model and we layer in these annotations on certain data models that are that can behave like event streams in the system, right? And

Starting point is 00:17:37 then the second thing is the joinability, right? How do you join an event stream with a traditional static table? So we've got that ability to model those relationships and so on. The second thing, a layer above, is really if you think about the fundamental difference at the crux of it, all these event-oriented systems treat time as a first-class entity. In Snowflake, time is just another dimension, like an account or a... So time being a first-class entity

Starting point is 00:18:09 is very core to these types of systems, and that's one of the differences, right? So in some ways, you're bringing some of these specialized concepts of time series databases to the world of data warehouse-type systems. So that is one. And then the third layer is really an innate understanding of concepts like a flow and a funnel and a cohort and things like that.

Starting point is 00:18:33 That is first class understanding of these entities that you don't have in traditional analytical tools that go off a data warehouse. So those are the three layers that enable us to do this. Then the other secret sauce that is really around the abstractions. See, at the end of the day, if you want to describe a code or you want to describe a funnel, SQL is not expressible for those kinds of things. It's not suited for it. So we have a language called NextScript, which lends itself to very succinct and elegant expressibility of these types of queries.

Starting point is 00:19:09 But then this under the covers, it compiles down to SQL. Yeah. And so that's how we sort of get the most out. And so what happens then is you describe, you know, your typical product analytics type analysis in a language that is very natural, right? You define stages and drop offs and churn and things like that. The way you describe it is very succinct and then it compiles down to SQL that is optimized for different data warehouses. That optimization of the SQL is also something that takes advantage of this

Starting point is 00:19:43 understanding of these nature of these queries, time as a first-class entity, the way the data is partitioned, and so on. So a lot of this data is, you know, there's a sequence in the data that you can take advantage of. And the user logs in, and there is a sequence. And that's not going to change. That's not going to get updated and so on. Taking advantage of that in the query generation is also part of the IP. GARY ILLYESIASPIRITIUS PEDROZA You mentioned time-series database at some

Starting point is 00:20:08 point, and that's something that I would like, I wanted from the beginning to ask you, so, okay, going from, let's say, the completely tailor-made solutions for representing the data to more generic, let's say, database system. Why a data warehouse and not a time series database? Because at the end, events are a time series, right? Like with the main difference that you have like more dimensions than just the time, one more dimension, right? Great question. When we started, this is exactly the question we asked ourselves.

Starting point is 00:20:42 What is the kind of underlying system that we would need? Time series databases are good for doing operational monitoring. If you look at systems like Datadog, SignalFX, and APMs-type systems, they're great for that. Essentially, they're good for visualizing and rendering fast-changing time series data, right? If you look at these monitoring tools, there are essentially a lot of it is simply, I want to see a temporal view of some metric, and I want this to be able to compute this very fast incrementally

Starting point is 00:21:16 and be able to ingest at extremely high rates and so on, right? So, you know, they are really purpose-built for those kinds of visualizations of time series data. But if you look at product analytics, yes, it is event-oriented, time-oriented data. But the kinds of analytics you do is very sophisticated, right? You're not simply looking at, okay, what is my temperature at this point in time and how is it trending and so on, right? You're studying, you know, these sequence and these very it trending and so on, right? You're going, you're studying, you know,

Starting point is 00:21:46 the sequence and these very complex, like sophisticated behavioral patterns, right? Which require, which require a lot of massaging of the time

Starting point is 00:21:54 to use data in a very similar fashion to the kind of things that you do in a BI-style system or a data warehouse, right? So,

Starting point is 00:22:01 so it was closer to the compute patterns were closer, the analytical patterns were closer to what you do in a data warehouse and a? So it was closer to, the compute patterns were closer, the analytical patterns were closer to what you do in a data warehouse and a BI type system than you do in a time series database and a monitoring type system, and that's why

Starting point is 00:22:13 we chose that. Can I jump in and ask a question? Because this is, sorry, this is so interesting. So one of the reasons, like I think one of the reasons, like I, one, I think one of the reasons like, okay, so if we take a typical like SAS product analytics, we'll use a time series database. And there are a number of advantages to that. But also controlling the underlying data model allows them to create safeguards for their users, right? So that, you know, you can reliably produce a funnel report or a cohort report. How do you manage those safeguards when you are warehouse native? Because the data can change underneath the tool, right? I mean, the data can change underneath NetSpring,

Starting point is 00:23:01 right? And a different table, you know, a table may change that represents some sort of important, you know, metric in a funnel report. And so there's certainly the advantages of like modeling from a time series standpoint, but also having a single base model also create safeguards

Starting point is 00:23:18 for the user. So how do you manage that? And how do you think about that from like a data modeling perspective? Because you don't have as much control necessarily. Yeah, you of all, if you have a purpose-built system with a purpose-built data model that is not even exposed to the user, you don't even see the underlying data model. It's just there is this out-of-the-box concept of a user and so on. Everything has to fit into

Starting point is 00:23:43 that data model and so on. So clearly, those are easier from a management and so on. There's only, everything has to fit into the data model and so on. So, you know, clearly those are easier from a management point of view, right? I mean, there's nothing you can screw up, right? You can't join the table incorrectly to some other table or something like that, right? But then it's got a lot of deficiencies, right? You know, there is, you know, there's this data shipping off

Starting point is 00:23:59 to these other systems. These are very constrained. Like, you know, we were talking to some customer who was studying document journeys and they were trying to twist this document entity into a user entity. I was like, you know, there's all these

Starting point is 00:24:11 artificial things. I want, you know, very difficult, like, you know, hierarchies. I want to study behavioral patterns by account

Starting point is 00:24:20 or, you know, product hierarchies, product categories. I mean, all this, all this stuff are very difficult to do with these rigid data models. Yes, they give you some ease in terms of their purposeability. You can't really mess up the data model,

Starting point is 00:24:34 but then there's too much inefficiency in those models. Now, in some ways, you're talking about a classic problem of I've got this very general purpose, very sophisticated tool that can model anything, which comes with advantages, but then you can potentially shoot yourself in the foot because you pulled the wrong table out of the warehouse, you're going the wrong way. Whereas in this purpose build, you have no choice. They give you out of the box some canned stuff and that's all you do. So there's definitely the trade-offs. But what we have done is we've said, look, I think the value of going against the warehouse and being able to address anything that's available in the warehouse is huge business benefits. I mean, the kinds of analytics you can do is phenomenally

Starting point is 00:25:14 more sophisticated and business impacting. It's not just the siloed product analytics, it's not just product metrics, right? So the advantage is far outweigh some of these challenges, hey, you know, what if somebody goes and just pulls the wrong thing, does the wrong thing? But the way we tackle that problem is two things. One, what you see, the view into the warehouse that you get through NetSpring can be tightly controlled by the data engineering team. They can decide, you know, there is this notion of data sets, logical entities that you create, that they are the only ones that you can expose to your business folks, right? You can also control that this group gets access to this, but not this other thing. So you're not exposing the entire world to folks. You're exposing what they need.

Starting point is 00:25:57 And there is some notion of an application that has a, you can then say for this application, for this group, for these set of users, this is what is relevant and they can expose that. That has huge advantages because the central team that is responsible for the warehouse and the data model, they have control. Because end of the day, from a governance security point of view, they're accountable. So it gives them that control. But then you can self-service for the business folks. On the business side, what we're saying is we have these templates. So the same kinds of templates that you have

Starting point is 00:26:27 in these product analytics tools, we have that. So you launch this template to create a cohort. It's a wizard, point and click type interface where you're not writing SQL, you're not doing join,

Starting point is 00:26:38 you're just like filling the blanks and boom, you get a report. So that's another way we have these guardrails for at least the non-sufficient users, the basic users who don't get tripped up by having to work with a data warehouse. So I think it's possible to be warehouse native,

Starting point is 00:26:59 to get the power of what the warehouse offers. But with these controls in place, I think it's possible to provide the best of both worlds. Yeah, I mean, that's kind of the dream, right? Like you wouldn't export, you know, SaaS product analytics data into your warehouse if there weren't an issue with, you know, trying to query the data and ask questions.

Starting point is 00:27:20 So yeah, super interesting. Sorry, Kostas. I had to ask just because I've, you know, tried to build product analytics on the warehouse and that's, you know, super interesting. Sorry, Kostas. I had to ask just because I've tried to build product analytics on the warehouse and it's not easy. Yeah, no worries. Absolutely. using, doing product analytics and BI tools and BI and brought in, you know, and you're absolutely right. What people end up doing is they export the data out into our house. They're writing, you know, Looker and more, writing SQL basically, right? And it's extremely painful. One, we solved the problem, but there's also another, the other problem we solve, which

Starting point is 00:27:59 is it's not just, it's this interoperability and it's the seamlessness of this analytics, right? So you want to be able to jump back and forth between these two worlds, right? You start off with, you're studying a cohort of users that exhibited a certain behavior, right? You want to take that drop off and you fork off into this more BI style analytics that brings in account information, support information. But then when you're done with that, you want to bring that back into this funnel analysis and further continue your analysis. It's that seamlessness of the analytics that goes between these two worlds.

Starting point is 00:28:32 And that's always been a problem because you exported it out of Amplit here. You ran your Looker report. Two weeks later, the business guy got a report. Okay, what do I do with it? How do I upload this back into my product analytics tool and continue my analysis, right? And so that adds to the problem that we have today. Instead, if you have one tool against the warehouse, you've got everything in one place. You can go back and forth between these two flavors of analytics all in context. I want to ask you something about SQL. And you mentioned that SQL is, let's say, not exactly like the best syntax out

Starting point is 00:29:09 there to ask these questions of the data warehouse. It doesn't mean that it cannot be done, but it's like hard for a user like to work with this syntax. You also mentioned that you have like introduced like a new language that's called NetScript. Can you elaborate a little bit more? And also give us like an example or two about like what makes it so hard when we are talking about product analytics, like to use a language like SQL to do it. Yeah. So, I mean, you know, SQL obviously is, you know, it's a great language.

Starting point is 00:29:41 You know, the lingua franca of data, right? I mean, like you said, we SQL at the end of the day, but the expressibility of things above SQL, that's really what we're bringing to the table, one layer above SQL. So the crux of expressing SQL product analytics queries in SQL is really around the nature of this type of analysis that you're doing. If you have an event analysis that you're doing, right? If you think about

Starting point is 00:30:05 if you have an event table and you're studying this is a patterns, right? That requires a lot of, you know, and I'll give you a simple, it's a bit simplistic perhaps, it's sort of self-referential type things, right? That

Starting point is 00:30:22 you know, you first have to get all users who did this particular event, right? That, you know, you first have to get all the users who did this particular event, right? That's another table. And on the table, then you want to be able to do the next level of things.

Starting point is 00:30:31 But the product analytics queries in the SQL world are often like, you take your table, you write a snippet of SQL to get a subset of the data. Then you take that and you write another SQL

Starting point is 00:30:43 that takes another subset of the data. So it's sort of layered write another SQL that takes another subset of the data. So it's sort of layered above and you're painting all these things together, right? And that makes it very difficult, right? If you look at these kinds of SQL that you generate for these funnels and paths and so on, you will see these layers and layers of SQL because the results of a particular stage of your analysis is a function of all the previous stages, which is not the case in BI type queries. You're just reporting on that final stage. So that's the thing.

Starting point is 00:31:15 There are these computations that depend on previous computations that depend on these previous computations and so on. There's this chaining of computations that are very difficult. And before you know it, you have 10 pages of SQL, right? Yeah. So that is the expressibility aspect of it. The second is, and this is true in general,

Starting point is 00:31:32 not just for product analytics queries, is a composability and reusability of SQL. So I write a big chunk of SQL and I give it to you and you want to change, you know, yeah. I'm filtering for West region, you're going to filter for East region.

Starting point is 00:31:46 And I want to, then I'm looking at it by product and you want to then break it down by sales rep or whatever, right? The composability and reusability of SQL is very difficult because you have to go and do surgery within that SQL, right? You know, there's some where clause and you have to insert your thing and so on, right? What if there was a higher level way of doing it, right? And I gave you a chunk of SQL and you said, you took that and you said, hey, I want to extend this and say, now I want to break it down by this other dimension, right? So that is something that this new language brings, right?

Starting point is 00:32:20 Where you can extend it, right? It's composable, right? And it's sort of like Lego blocks that you can build on top of each other. And the system knows how to then do surgery on the underlying SQL to produce that final SQL. Yes. So expressibility, composability, reusability, those are the things that where SQL falls short in this world of product handling.

Starting point is 00:32:41 Yeah, it makes a lot of sense. Going back to something else that you mentioned, like at the beginning of our conversation, that people started like using these product analytics tools. And it was great, as you said, like it's so easy, like through a visual interface to go and create like cohorts and all that stuff, but then they reached like a point where like they wanted to do something more and and it wasn't expressing enough to do that. So the reason I'm bringing this back is because I want to ask you why the user interface, this graphical language is not enough

Starting point is 00:33:16 and we need SQL or we need NetScript or whatever else out there to complement what someone can do for product analytics on the user interface. Yeah. So the way we like to describe this is what you can do today in traditional tools is answering the first question. And if you think about the primary value of a lot of these tools brought to the table is really what are people doing in my product?

Starting point is 00:33:44 That's the first question every product manager wants to know. I release, what are people doing? I release a new feature. How many people are using it? So that first level of answering the first level of questions, it's actually quite good. And we have replicated the same kind of easy-to-use template to a first level of question.

Starting point is 00:34:03 Where it falls short is the follow-up question, right? You know, okay, you told me that this is my conversion rate, right? I mean, but why is it that this conversion rate dropped, you know, between 9 a.m. and 4 p.m. yesterday? What happened? And then, you know, why are, you know, within that, what are the patterns, right? Are there certain patterns, right?

Starting point is 00:34:26 Are certain types of customers converting? So it's the next level of question, right? And the next level of question, it is a free form ad hoc interface that you need for expressing that, right? You can't, you know, you can't build templates for every possible next question, right? You can build templates for that first level of questions.

Starting point is 00:34:45 The next level of questions is very ad hoc. People see some things, oh, maybe this has something to do with this campaign that we ran last week. That must have been the thing. I want to bring in some campaign information. So answering the next level of question is where a lot of these tools fall short. And they fall short for two reasons. One is you don't have an interface where you can do these ad hoc exploratory analysis forking off from your templated analysis, right?

Starting point is 00:35:12 So imagine you're in Amplitude and you're forking off to do looker type query. So that is non-existent in these tools. The second thing is oftentimes the next question involves context that is not in the product instrumentation stream you know this is data from salesforce this is support systems you know this is other systems you know that have nothing to do with with these product analytics data sources right so that's

Starting point is 00:35:36 the second thing right you need richer context that's non-existent in these tools and that is the warehouse and so so that's where the second level of answering the next question problem comes in. To incorporate that business context, you need to be able to, you would have modeling capability that can reflect your Salesforce schema, your Zendesk schema, and your other schemas, right? And be able to mix it with the product streams. Yeah, 100%.

Starting point is 00:36:02 No, it makes total sense. Okay. And one last question from me and then I'll give the microphone back to Eric. One of the things that there's a lot of like conversation about is pricing of data related infrastructure, right? Like there has been a lot of like conversations about like the consumption based pricing, like the innovation that happened there. Like we think it's not like, but it's not like the about the consumption-based pricing, the innovation that happened there with Snowflake. But there's something, let's say there's some kind of convenience with this previous generation of tools.

Starting point is 00:36:37 I knew that if I went and used Mixpanel, regardless of how,, let's say, complicated might be like the pricing model they have, at the end, when I use the product, I know exactly or almost exactly what I'm going to be charged for, right? When we started like putting layers on top of like other infrastructure, like we have Snowflake and then we put like NetSpring on top of it, right? We start having, let's say, we start like using and getting priced and charged like for different things and communicating this pricing to the customer, the user of NetSpring in this case,

Starting point is 00:37:16 it's probably not the most easy thing to do, right? And I ask you that like as a founder, as someone who's building a business now and not the product itself. How do you deal with that? Yeah. Great question. You know, so the pricing model does get a little more involved in this composable CDP where we're talking about, right?

Starting point is 00:37:39 Where previously I could go to a product analytics vendor. I get instrumentation. I get product analytics. I get a compute engine. I get storage, I get everything, right? All in one package, right? And I have to deal with one vendor. Now, with NetSpring and this new world we're living in, I have to deal with Snowflake, I have to deal with Rudderstock, I have to deal with NetSpring, I have to deal with three vendors. At the end of the day, I get best of breed in all of these. When I get the best instrumentation, I get a flexible data model. I get all the business benefits. I get next generation product analytics. But I have three contracts that I have to work with three vendors to put together a solution. So in some ways, it's sort of like your classic, do you go with best of breed or do you go with go with a single vendor that can give you everything?

Starting point is 00:38:29 There's some pros and cons to that. But there is another dimension to this around pricing, which actually is one of the big reasons why people are attracted to NetSpring. So if you look at the way the pricing is done for tools today that are very event-oriented. They're priced based on events. So many events, clicks per month, so many events per month. So these things become prohibitively expensive at scale. When you're talking about, if you're talking about, take Zoom, for example, we're on Zoom. Think of the number of events Zoom generates in a single day.

Starting point is 00:39:09 It's hundreds of billions of events a day. So now this is, of course, extreme scale, but at large scale, people cannot afford to be paying by thousands of events, millions of events, right? It just gets prohibitively expensive. But the reason this is even bigger problem is most of the data, you will never do any analysis on right in a lot of these tools i'm paying by event but 60 70 of the data i'm nobody is using but i'm still paying and so whereas in this new world you know that you can put a lot of data into your snowflake storage cost is relatively cheap right you can dump petabytes of data, but you only pay for the data that you query. If you query. So if 70%

Starting point is 00:39:52 of your data nobody is touching, you're only paying for S3 cost, which is much, much smaller than what you're paying these other vendors now. Every event, whether you use it or not, you're paying a lot of money. Whereas in this new world, there is a lot of pricing advantage. Yes, it's complicated in terms of having to deal with multiple vendors,

Starting point is 00:40:12 but at the end of the day, our belief is you could pay an order of magnitude less than you would pay one of these prepackaged vendors. That's great. I'm happy that you shared this because I think many people out there are confused. And of course, it's very easy to end up in situations where you get inflated bills at the end. And the complexity is much higher when you have to deal with many vendors. But it is important to hear that. If you're a small company, just getting started,

Starting point is 00:40:45 you know, you've got very low volume. You don't have data teams, IT teams, you don't have a warehouse. I mean, you should go with repackage solution. That's the right thing, you know, for you. Although these days, spinning up a warehouse is very simple. I mean, you know, getting a router stack working is like really simple. We got it working in like a day, right? I mean, so these things are not as difficult, even for small startups.

Starting point is 00:41:09 We're seeing people, you know, like warehouses typically appeared in companies much later down the line. They're appearing now in like so early, like, and it's just so easy to spin it out and so on. Yeah. 100%. I think like my opinion is that what has happened is that like the technology has matured like that fast that it's literally so easy to go and spin up like all these tools.

Starting point is 00:41:30 But it's like what is missing is probably the maturity from the industry to use effectively these tools. There's a lot of education, I think, that like needs to happen. And that's where like in many cases, you know, like people are like getting burned at the end because they're like, okay, yeah, sure. Like let's get like Snowflake. It's very easy, like to set it up and start using. They have like pretty much an idea what they're going to do with it.

Starting point is 00:41:56 And going through like fast iterations and making mistakes. Yeah. Like these things cost. And when you don't get like value at the end, it's like a bit of a bitter taste at the end. Right. Right. But I definitely think that it's, it has a lot to do with education at the end and like how people know what to do actually, and what questions like to seek answers for.

Starting point is 00:42:20 Anyway, so Eric, all yours. I really monopolized this conversation. No, it's great. I've learned so much. I'm so interested in the term product analytics after this conversation because when we think about product analytics, on one hand, as you described it, it's really more event-based data. It's understanding interactions with a customer that lead to certain outcomes over time. on one hand, as you described it, it's really more event-based data.

Starting point is 00:42:50 It's understanding interactions with a customer that lead to certain outcomes over time. But you get into the world of combining, to your point, other data sets, right? So you can bring in the sales force, you can bring in ad platform performance data. And of course, as a marketer, one thing that I think about that I've actually heard lots of data teams, you know, discuss is like something that's particularly challenging is attribution, right? It's really difficult to build a good attribution model that reflects what's actually happening in your business, right? And you sort of have two extremes, which you talked about before. Either you sort of use the Google Analytics or the amplitude, like here's your default model or set of options, or you work with your analyst to

Starting point is 00:43:36 build it. And anyone who's tried to do that, which if you haven't, fear warning, it's pretty brutal to build multi-touch attribution model using brute force SQL warehouse. But what's interesting is with this in-between of sort of having a lot of that sort of, let's say, outsource for you a la NetSpring, and you have access to the data, that actually becomes pretty interesting. But then what does that mean for the term product analytics? Because now you're getting into a world where you can do a lot of interesting things. And, you know, that's kind of, I mean, maybe people would classify that under product analytics, but you're talking to like a pretty wide variety of users at that point. Yeah, that's a great point, Eric. You know, we use the term product analytics because it's a

Starting point is 00:44:21 familiar term that everybody understands, at least you're sort of zeroing on a third category that you but yeah you know it's um uh you know it's much broader right like it like the marketing thing you describe you have a couple customers that are it's basically top of the funnel right you know you're really it's not it's even before people get to using your product right product is typically after your people have started using your product this is top of the funnel you after your people have started using your product. This is top of the funnel. What campaigns are working? Did this?

Starting point is 00:44:52 What's driving acquisition, conversion, activation, and things like that, right? This is even before people start engaging with the product. So in some sense, conceptually, it is the same type of analysis. It happens to be,

Starting point is 00:45:03 funnel happens to be the top versus the bottom. But yeah, so, you know, so if you think of it as a category, you know, product analytics does not do justice. Or I think it's a bit narrow. I mean, you know, the people have toyed with different terms,

Starting point is 00:45:20 you know, are they commanders of counter-digital experience platforms? It's actually quite a confusing world. There's terms like digital experience, customer experience, there's product analytics and behavioral analytics. There's lots of terms.

Starting point is 00:45:35 This industry still hasn't converged on an Uber term that truly reflects the things that we're talking about. But yeah, product analytics is sort of the most widely understood. We call ourselves product and behavioral analytics. And the distinction we make, it's not so much product usage.

Starting point is 00:45:54 It could be like marketing that drives to product adoption and so on. But the distinction we make is product analytics is really around measuring outcomes. Behavior analytics is understanding patterns of behavior that lead to that outcome, right? So that's really how we distinguish it.

Starting point is 00:46:10 But yeah, I think there is a category, a term that needs to be invented here that truly reflects the end-to-end journey, everything from the acquisition of the customer all the way to engagement and upsell and retention. Yeah, that's tough. I mean, the way to engagement and to upsell and retention. Yeah. Yeah. That's tough. I mean, the way that the practitioners describe it is like, Ooh, I like a funnel report, but

Starting point is 00:46:31 then I want to like slice it or like do a pivot on a piece of data that I get from, you know, a completely different system. Right. I mean, that's the practical reality is like, I have a valuable report, but if I can pivot or slice on like, you know on a piece of data that's a hierarchy above that expresses the container of the customer journey that I'm interested in, or the subset of customers. Because at that point, that really is truly the blend, in my opinion, of BI and product analytics, because now you're looking at what it costs you to acquire a customer

Starting point is 00:47:06 in the context of the customer journey, right? And that's those two things fully coming together. Exactly. And really, if you think about it, the crux of the problem is that data is all sitting in different systems, right? Marketing systems are completely different from your product analytics systems, different from your BI systems. And so this is where I think everything coming into the warehouse is really the core of it. At the end of the day, you have to bring the data into one place.

Starting point is 00:47:34 You cannot have it living in 50 different SaaS services than expected to analytics across it. You have to bring it into the data warehouse. You have to have curation of the data. It has to be modeled correctly. It has to be clean.

Starting point is 00:47:49 And then that's when... So the warehouse centricity is an enabler for products like NetSpring. Yeah. Yeah. Love it. Well, it's super exciting. Okay. Time for one more question here.

Starting point is 00:48:01 So we talked about sort of the three waves of BI. We talked about the first wave of product analytics, right? Which, you know, again, like these are great tools. And I think probably a lot of companies, me said at ThoughtSpot, you use Mixpanel, right? As an analytics company. And so I think many companies will end up adopting sort of a multi-pronged analytics approach, you know, even on a team level. So NetSpring is the second wave, right? Where we're sort of building this on top of a central data store that has access to all this data, right? And so we're going to close the gap between product analytics and BI.

Starting point is 00:48:37 But you, at least from our conversation before the show, you think about analytics and sort of waves of decades. So what's the third wave in product analytics? I'm so interested. I've actually been waiting the whole show to ask this question. Yeah, no. So just to be clear, the second wave is, like you described, it's much richer product analytics

Starting point is 00:49:01 that goes beyond traditional product analytics, siloed product analytics. It's more enterprise-wide. It's for larger. It's not just for your product managers. It's for your marketeers, your customer success. It's a much richer analytics cross-cutting across any team that has anything to do with product and customer, right? So it's a much broader thing.

Starting point is 00:49:19 So the warehouse centricity is more mechanical. It's an enabler. It's aligned with the order of data stack and so on. So the third wave is really around AI and system-generated insight. So today, even with a product like NetSpring, you can build a very fancy core. You can do some pretty sophisticated analysis, slice and dice, and so on. But let's say, so if I have a hypothesis, I can go test the hypothesis out really well. So I have a hypothesis that people in the education sector tend to use whiteboards more in Zoom meetings than others. I can go test out that hypothesis.

Starting point is 00:50:02 All the data is available, the modeling and everything, you know, everything is easy to use, point and click and boom. In five minutes, I've got the answer, right? I can test out the hypothesis. What if I didn't know? I don't know. I didn't know about that hypothesis.

Starting point is 00:50:14 What if the system could tell me, hey, you know, listen, you know, you're a whiteboards PM, right? For Zoom, you're, you know, hey, what if I magically told you that you should be looking at folks in the education sector? That's really your profitable Zoom, you're, you know, hey, what if I magically told you that you should be looking at folks in the education sector?

Starting point is 00:50:26 That's really your profitable customer base for this particular aspect of your product, right? So there's sort of the system-generated, machine-learning-driven insights, which has been talked about for a long time, right? When it's never, you know, worked really well. I think that's the next wave. That's the third generation. And there's two things there, I think,

Starting point is 00:50:48 which make it possible, which is a very tough problem. We've tried it at TASPAD. We have had some success, but many people have tried it and it's an extremely difficult problem to crack. But I think there's two things that are happening. If you look at the data warehouse ecosystem,

Starting point is 00:51:04 if you look at BigQuery, BigQuery ML is pretty sophisticated. It's integrated with the warehouse. I think there's two things that are happening, right? If you look at the data warehouse ecosystem, you know, if you look at BigQuery, right? BigQuery ML is pretty sophisticated. It's integrated with the warehouse, right? You can use it natively in the warehouse, right? And it's pretty sophisticated ML stuff, right? So you have that available for you to use that you can take advantage of. And these are fairly sophisticated algorithms.

Starting point is 00:51:18 And the second thing is, you know, you see like what, you know, like you see chat GPT, right? I mean, the kinds of, you know, advancements that have happened in that world of IML. And so the third wave is really, you know, those things getting to a level of sophistication, maturity, where they can actually do

Starting point is 00:51:35 more system-generated insights. That's the third wave. Yeah. Yeah, I agree. Man, I've used so many of those sort of AI type, you know, analytics insights features over the years. And you just end up going back to SQL, let's be honest. Yeah. But I agree.

Starting point is 00:51:58 That's actually interesting. Like you talked about the separation of like instrumentation and ingestion from, you know, sort of the actual like analytics layer and sort of the decoupling of these certain things. And it's fascinating to think about, you know, AI generated insights is infrastructure on top of analytics, right? But if you actually break the ML infrastructure out from both the data and the analytics, like from an infrastructure perspective, like it starts to get interesting because then, you know, BigQuery ML does all

Starting point is 00:52:32 the heavy lifting and you need to feed it, you know, data and context, and then you have the visualization. Absolutely. Yeah, absolutely right. And we're very big believers in it. Philosophically, we think that's what kind of pressure should be moving towards, right? I use a rudder stack or a snowplow or a segment. These are best in breed, right?

Starting point is 00:52:50 They're purpose-built for instrumentation. That's what they do, right? And they really do it really well, right, with schema management. And it's amazing. You can put any tool on top of it, right? So you've got best of breed product analytics. And then one of the things we're toying with

Starting point is 00:53:03 is really writing output of the analysis back into the data warehouse. Let's say you've tested your hypothesis, you've constructed a sophisticated cohort of users that you want to do something about, right? You can write it back into the data warehouse simply as a logical database view. It doesn't even have to be a physical table, right? It can be a view into that data that's already in the warehouse. And then some other tool that is doing machine learning or doing data activation can simply address that view and do some more sophisticated things. So it's sort of like your warehouse. You can plug in all these very specialized,

Starting point is 00:53:39 best of grade systems on top of it, and you can get some phenomenal value, right? Yep. Wonderful. We are at the buzzer, as I like to say. Maybe we went a little long, but Brooks isn't here, so we'll have to ask for forgiveness. Vijay, this has been absolutely wonderful. I have learned so much and NetSpring seems like a super exciting company. So best of luck as you continue to build it. Thank you guys. Thanks for having us. Appreciate it. Enjoyed speaking. Costas, I think probably the most helpful thing for me was thinking about these tool sets in a way that was just really helpful. And I said towards the beginning that it was surprising that he described the world of BI, business intelligence, as mature when they started

Starting point is 00:54:32 ThoughtSpot in 2012. Because he referenced things like business objects. And if you've ever worked at a company or with a company who is using business objects, it just feels so antiquated compared to, you know, sort of a modern product analytics tool. But, you know, I love the perspective of history, right? And that what he said was true. Like, there was no analysis that was impossible once, you know, sort of that level of BI came to fruition, you know, with, you know, sort of Oracle's BI solution. You could do whatever you want. It was just a time and cost and difficulty question.

Starting point is 00:55:12 So thinking through those phases of BI was helpful. And then, yeah, the same for product analytics. So we'll see how much AI takes over the world, whatever features are built in the future. Yeah, a hundred percent. I think we are, we are doing to see another, at least another wave of like BI analytics, like analytics, BI tools. That's okay. They are not like primarily driven by AI.

Starting point is 00:55:42 Before that happens, there's still like, I think, and we see that we have like companies here that we recorded episodes with them, but they're still like, okay, providing like new ways to visualize and interact with the data. I mean, I think it's time to see like more innovation in this space. And there are like things that have started like changing. Like we see like what is happening with Tableau, for example, right? So we'll see. I'm very, I'm very curious to see what other companies will appear in the next

Starting point is 00:56:13 couple of months around the more traditional side of data analytics with SSBI and also I think we will start probably seeing like more, you know, like specialized analytical tools, like for product analytics, for example, right? But a new iteration of these that's going to be leveraging, let's say the new data infrastructure out there with the cloud data warehouses, the data lakes, the lake houses, blah, blah, blah, all that stuff. Right?

Starting point is 00:56:42 So we'll see. I think that we're going to see more tools like this. That's why it was very interesting to go to the shuttle today because I think it's a glimpse of what we'll see in the future. I agree. All right.

Starting point is 00:57:00 Well, thank you for joining us. Subscribe if you haven't, tell a friend, and we'll catch you on the next one. We hope you enjoyed this episode of the Datastack Show. Be sure to subscribe on your favorite podcast app to get notified about new episodes every week. We'd also love your feedback. You can email me, ericdodds, at eric at datastackshow.com. That's E-R-I-C at datastackshow.com. The show is brought to you by Rudderstack, the CDP for developers. Learn how to build a CDP on your data warehouse at rudderstack.com.

The Data Stack Show - 130: From Business Intelligence to Product Analytics and Beyond with Vijay Ganesan of NetSpring.io

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.