The Data Stack Show - 65: Operationalizing Data from the Warehouse With Aayush Jain of Cliff.ai

Episode Date: December 8, 2021

Highlights from this week’s conversation include:Aayush’s career background (4:13)How his biological sciences academic training impacts his work (8:04)How do we allow dashboards to get messy? (9:3...5)Building cultural or technical solutions to effective dashboards (15:19)Using data dashboards to make material business improvements (23:19)What is business observability? (32:23)Building a platform for operations teams (43:15)How important community is to the cliff.ai business proposition (41:03)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You'll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by Rudderstack, the CDP for developers. You can learn more at rudderstack.com. We have a really exciting episode coming up. And what's most exciting is we're going to live stream it.
Starting point is 00:00:31 The topic is the modern data stack. And we're going to talk about what that means. It's December 15th and you'll want to register for the live stream. Now, Costas, it's really exciting because we have some amazing leaders from some amazing companies. So tell us who's going to be there. Yeah, amazing leaders and also an amazing topic. I think we have mentioned the modern data stack
Starting point is 00:00:49 so many times on this show. I think it's time to get all the different vendors who have contributed in creating this new category of products. And they define the modern data stack and discuss about what makes it so special. So we are going to have people like Databricks, dbt and Fivetran, companies that are implementing state-of-the-art
Starting point is 00:01:12 technologies around their data stack like Hinge. And we are also going to have VCs and see what's their own opinion about the modern data stack. So in a sense, VCs are going also to be there. And yeah, it's going to be super exciting and super interesting. So, and since VC is going also to be there and yeah, it's going to be super exciting and super interesting. So we invite everyone to our first live streaming. Yeah, we're super excited. The date is December 15th. It's going to be at 4 PM Eastern time and you can register at rudderstack.com slash live. So that's just rutterstack.com slash live. And we'll send you a link to watch the live stream. We can't wait to see you there. Welcome back to the Data Stack Show. Today, we're going to talk with Ayush Jain, and he is one of the founders of a company called
Starting point is 00:02:00 Cliff.ai. He's actually sort of a serial entrepreneur, has done a couple of startups, and this one's focused on what he calls a business observability platform. So I think it'll be a really interesting conversation. I don't know if we've come across that term yet on the show, have we, Kostas? No, I don't think so. I think it's going to be a quite interesting conversation, especially talking about how to operationalize data and how the data warehouse, again, is a very important component on how we extract value of data from the data.
Starting point is 00:02:35 So, yeah. And also, he also leads a community. So I think it's also going to be interesting to hear from him what's the role of the community and how it helps and all that stuff. Absolutely. And I think my question, and I'm going to go way back here. So I use, you know, in looking at our guests' backgrounds,
Starting point is 00:02:56 I'm always interested when someone's building something in the data space, but they don't come from a technical background. And Ayush actually comes from a scientific background. And I always find that connection really interesting. And I haven't asked that question in a long time. So I'm way overdue. So my question is going to be how his academic background has influenced his work in data, which I know you're going to enjoy because you haven't gotten that in a while.
Starting point is 00:03:21 I think you're going to force me to be philosophical again and claim that science is data, but... Science is data, well said. We can just wrap the show up. Perfect. Yeah, let's go and chat with him. Let's do it. Ayush, welcome to the DataSec show. It's great to have you here.
Starting point is 00:03:42 Thank you very much for having me. All right, well, tell us about yourself. What's your background and what are you working on in your day job? Great. So just to give you a little bit of background about myself, I'm one of the co-founders of Clip.ai. And what Clip.ai does is basically it's a business observability platform. We help companies track and monitor their matrix without the need of requiring a dashboard and in terms of my background I come from a very kind of different background so I have my master's in biological sciences and we started we started off we're a team of three co-founders and we started our company we started our first business together as soon as we got out of college so we
Starting point is 00:04:23 like we used to live next to each other in dorm rooms for good five years of college and just after that we thought guys we got to do something together but the doing doing a job was kind of uncool for us back then but so we kind of took a plunge into starting our own business without even without even without even knowing how to do a business how to run a a business. So we started an online pharmacy in India. So this is way back in like 2016, where the internet, especially the online commerce was booming in India. And the question that we asked ourselves is, in India, at that point, you could get literally anything delivered to your doorstep by just ordering something on an app. The only thing that you would have to still go to a you know or to a
Starting point is 00:05:05 shop and go to a physical store and buy was medicines we thought why don't we kind of build an application that would allow people to just put their prescription and get their medicines delivered at their doorstep so that's the first business that we started together we ran it for around eight months a big company came in and said we want to buy in the whole business. And we said, you know, fine, let's do that. So we sold that business. Yeah. So we sold that business off. And then after that, kind of took that cash in and then said, you know, what do we do next? And for a short period of time, we started exploring a few ideas that we could work on. And then eventually we, you know, started our B2B SaaS business that's called GreenDeck. And what basically GreenDeck was, it was a price optimization engine for online retailers.
Starting point is 00:05:53 We would help online fashion companies optimize their pricing using AI. And one of the core components of the GreenDeck platform was dealing with huge amount of data. So the way we would help our customers make their pricing better is by providing them a huge amount of data in terms of how their products are priced with respect to their competitors. And that was our first experience with dealing with data, the huge amount of data. So we would crawl hundreds and thousands of websites on a daily basis and collect the products and pricing information and help our customers make decisions. And that's where we had our first kind of encounter with huge amount of data. And then eventually sometime last year, around March last year, we kind of pivoted from GreenDeck to Clip.ai. And this pivot was basically from the primary, the thing that made us kind of excited about what we are doing here
Starting point is 00:06:52 in Clip.ai is we were dealing with a huge amount of dashboards, a huge amount of dashboards to be able to monitor and monitor and track various processes that were happening on GreenDeck. And it all started with one simple anomaly detection script. We kind of wrote a script internally saying, hey, we have tons of metrics that we want to monitor as a part of the GreenDeck platform. And the core problem was there were so many dashboards and we had a small team that we can't even look into all of those dashboards. So we kind of created an anomaly detection script that would just monitor those
Starting point is 00:07:30 metrics and send a Slack notification whenever there is any anomaly in that metric. And that worked magic for us. And that's when we kind of got excited, this whole problem about dashboards and how various businesses deal with dashboards. And eventually we pivoted into Clip.ai. Very cool. Well, I want to talk about dashboards, but first I'd love to know, we love talking with guests who just have different diverse backgrounds. So biological sciences, I'd love to know what lessons have you taken from that sort of academic background into being an entrepreneur and working with data? I think one of the core important things that I learned coming from a science background was this whole, in terms of my understanding of the things I think what I've
Starting point is 00:08:26 learned the most is asking questions fundamentally I think so a lot of times when when when we when we look at things from a business perspective we often kind of forget or often kind of stop asking the fundamental questions and science works on fundaments. I think that's what my biggest thing that I kind of learned from the science background is asking fundamental questions and starting to think from the first principles. I think that that is what is the biggest takeaway from from for me from science background. Yeah, I love it. I love it. That's um, sure. I mean, in
Starting point is 00:09:04 science, you sort of start out with trying to remove sort of influence from whatever you're trying to study. And it's hard to do that in business. So that's a really good lesson. And actually, for anyone working with data, in fact, like that's a great, that's just a great reminder in general. So love hearing that. Well, let's so dashboards, you had all these dashboards, small team, you're trying to understand them. What I'd love to know, because you're building a company that helps people sort of avoid this or solve the problem of messy dashboards. But I'm interested in your perspective on how dashboards even get messy. And Costas would
Starting point is 00:09:43 love to know as well from your perspective, because you've built companies and dealt with dashboards as well. But you would think, I mean, businesses vary a lot, sort of basic business models and metrics and stuff. There's a lot of commonality across businesses, right? I mean, you have website traffic, you have some level of conversion, some of it's qualified, and then someone pays you some money. But everyone, I think, listening has experienced messy dashboards. So how do they get messy?
Starting point is 00:10:18 So before I answer that question in terms of why and how dashboard gets messy, I think so, let me take a step back and kind of paint a picture in terms of how we got here and why we got here. So basically, if you look at what's happening in the data space in the past few years, is an undeniable rise of the modern data stack. And people have different definitions around what constitutes as a modern data stack. And it's still very kind of, some people would argue that it's more of a marketing jargon than something that is more of a marketing jargon, then something that is actually substantial. But nonetheless, the undeniable trend is that the data warehouse has been becoming a heart of the modern data space, more modern data stack itself. And there has been an incredible amount of progress that
Starting point is 00:10:59 has been made in terms of allowing businesses to be able to bring their mission-critical data into a data warehouse. You have a rise of ETL tools, you have a rise of data quality tools, data quality monitoring tools. And what that means is that now the business have way more amount of data that they would have a few years back because the barrier to put the data into a data warehouse has significantly reduced. And what that means, what that means as a business is they have a huge amount of data. And up until a couple of years back, the only mode of consumption of this data had been warehouse. There has been a tremendous progress made to bring the data into warehouse, but the consumption of that data has been typically, or even still today in the 90% cases, it's consuming of that data into a dashboard. Now, the thing that has changed in
Starting point is 00:11:57 past few years is that dashboards, the fundamental limitation of a dashboard is that a human consumes a dashboard. So basically, the underlying assumption of a dashboard is that there would be a human being that would be visually looking at a dashboard and making decisions. Now, this thing has changed fundamentally in the few years. The way it has changed is that before I answer the question why dashboard gets messy, the thing that has been happening is that dashboards, since they fundamentally rely on a human being to consume them, the thing that has happened with the growth of this data, the scale at which people can consume this information, that data, consume that data that has drastically reduced. Now, what that means is basically, as a business, as you mentioned, you have tons of KPIs, but can be visitors and then followed by users and then customers, so on, so forth. And right now, what
Starting point is 00:12:58 happens is most of the times you people put their matrix into a dashboard expecting a human being to monitor them on a continuous basis and because of that what happens is there are two things that happens either the since the mode of consumption of data for people is dashboards whenever people have questions someone would want to know why something broke or why something changed in the business the inherent bias is to go to the data team and say, hey, I want a dashboard. And what that leads to be is sometimes answering these data question leads to a creation of a dashboard. And these dashboards are, in a lot of cases, what happens is these dashboards kind of used for a couple of times to get a couple of answers, and then no one
Starting point is 00:13:42 even looks at that. Now, what's happening because of that is this creation of this whole dashboard rot where every organization have hundreds and thousands of dashboards that are probably only a fraction of them are actually used on a day-on-day basis. So that is one thing that is what in my experience and in my learning with working with various data teams is that the data and engineering team has kind of an ownership of maintaining so many dashboards that are rarely used. So that is one of the reasons what I feel is that we got into this messy situation where we have hundreds of dashboards which are rarely used. And even the ones that are used, they still rely on human beings to be able to take those decisions. So that is what my understanding is in terms of how do we got into this messiness of dashboards. Yeah.
Starting point is 00:14:38 So I guess, yeah, I wanted to ask you, I mean, you described like the problem pretty well. And I think that anyone who has to work with data, not even with data, but I mean, you described the problem pretty well, and I think that anyone who has to work with data, not even with data, but I mean people that are working in any organization inside the company, at some point they had to face these issues. But did you think that this is like an organizational
Starting point is 00:14:58 problem or a technical problem? Or an organizational problem that has a technical solution. So what's your perspective on that? I think, yeah, so I think so. I don't think so it's more of a technical problem. I think just more of an organizational behavioral problem where the way we deal with data has
Starting point is 00:15:21 been through dashboards and reports. So the natural bias whenever anyone has any question is to ask and request a dashboard i think this is more of an organizational issue it's more of a behavioral issue and more if even if i have to cover any technical aspect of it i think so the up until you know i'll get to that point also later down the line but i think so the only way for people to get answers right now is dashboards and reports. And I think so until and unless we have things,
Starting point is 00:15:52 technology that kind of solves that part of the thing where people can ask questions without requiring dashboards, I think that this is something that's going to stay. Okay. So how we can do that? How we can ask questions without the dashboards? So I think so now, now this is a very interesting trend
Starting point is 00:16:10 that has been, you know, that we have been noticing in the past few months and that has been growing very rapidly is this whole idea of operationalizing the data into the data warehouse. Now, this is something
Starting point is 00:16:24 that is a very interesting concept that, you know, that broadly covers few categories of the modern data stack. So one of the most obvious thing that we have very recently seen a very good success in terms of the adoption from the board tech community and from a business perspective as well, is this rise of reverse ETL. Now, so far, what we have seen in the industry is, you know, people bringing in data into a data warehouse and then kind of writing ad hoc kind of, you know, scripts and report kind of, you know, some people would kind of, you know, do airflow DAGs and everything to kind of put this data back into those source systems.
Starting point is 00:17:03 Now, with this, you know, with recent rise of reverse ETL, you know, there are a few companies that are kind of doing an amazing job when it comes to reverse ETL, you know, companies like Hightouch and Sensors, where what's now happening is the data that is required is now no longer sitting into a data warehouse. This data, this insights data needs to be generated from the data in the data warehouse is now being fed back into the source system. So now imagine, you know, a sales guy, you know, using HubSpot or Salesforce as a CRM, they, instead of them going to a data team and saying, you know, hey, show me, you know, I want this list of users who have done these things so that, you know, I can target them and so on and so forth.
Starting point is 00:17:45 Now what they're doing is they have this information back into their operational systems, which is Salesforce or HubSpot. Now, this is kind of a fundamental shift in terms of how do you think about data? The data into the data warehouse is no longer a place where data goes and people would query it for reporting later on. You are actually activating the data that is being put into a data warehouse and making kind of business decisions on the top of that. That's around the reverse ETL space. I think so another interesting trend that has been happening is people, you know, using this data into a data warehouse and kind of building downstream applications from this data warehouse. So one of the companies that I really admire in this space is this company called Continual AI.
Starting point is 00:18:39 They have been kind of, you know, helping people build machine learning models directly on the top of the data warehouse. And this solves a very important problem in end-to-end delivery of machine learning models, where previously what used to happen is people would kind of have bad jobs where they would train a model and then kind of use it into production. And then they have to build that pipelines where they have to continuously fetch in the new data from source systems and then retrain their machine learning models and kind of have that end to an automated pipeline. Continual solve this problem by kind of directly placing themselves on the top of the data warehouse and then kind of allowing people
Starting point is 00:19:20 to build their machine learning models. That's a very smart thing that I've seen recently in terms of operationalizing the data warehouse. And then in terms of operationalizing the data warehouse, one of the other things that I think, so that is something that we are trying to deal, kind of tackle with cliff.ai is how do we help companies actually generate insights from these data that is being captured into a data warehouse and actually make it actionable.
Starting point is 00:19:50 And that kind of is, that's what we are kind of doing here at clip.ai. So one of the few things that we have heard with our customers is that, you know, most of the time they would have so many APIs and metrics that they would want to keep a track on, and they use dashboards to be able to do that. And one of the fundamental problems that I just talked about when it comes to dashboard is,
Starting point is 00:20:14 dashboards are now being abused as a monitoring tool while they were not meant for that purpose. And this is something that we want to solve using Cliff AI. We position Cliff as a business observability platform. And this is something that we want to solve using Cliff AI is Cliff AI. We want to position Cliff as a business observability platform. It's a platform that sits on the top of the data warehouse and monitors every single matrix that is in that data warehouse and let the relevant person in the team know whenever anything changes in those KPIs and matrix. And this is, in a way way is activating the data warehouse and
Starting point is 00:20:47 allowing people to actually take business decisions and actually take actions on the top of their data. So these are, you know, this is ever growing, this is a rapidly growing field in terms of, you know, how do various businesses activate the data into the data warehouse. And we, I think, so we are just kind of touching the tip of the iceberg as of now, I think. So there are a lot of innovative solutions that are kind of emerging in the whole modern data stack very recently that is kind of tackling this part of operationalizing the data into the data warehouse. One of the categories that I kind of forgot to kind of mention there is this whole category of product load you know plg crms where you know companies kind of uh have kind of it's kind of a crm built for product-led growth companies
Starting point is 00:21:33 that is kind of tightly coupled with the data warehouse so there is a there is interesting so much interesting activity that is happening around you, what happens to the data once it has reached the data warehouse. That's very interesting. I think you touched like many, many important topics. Quick question. You mentioned like a very interesting term, which is the business observability platform that you're trying to build. And you're trying also like to move away from the traditional way like methodology of like consuming dashboards for that so how does it work i mean you say there are kpis these kpis getting tracked and the right teams are getting notified when something changes described to us with a little bit more detail like the journey that your customer has on the product and how these things are defined, how they are consumed, how they are notified and how
Starting point is 00:22:29 they can react also, right? Because reaction is also important. That's why we need data so we can act. Okay. So, you know, so one of the key things that, you know, I'm personally very fascinated about is the whole concept of SRE in the engineering domain, right? So if you look at what happened in the engineering domain in the past, you know, past decade is basically in engineering, you have tons of applications and services. And it got in at
Starting point is 00:22:58 one point, it got really difficult for business to kind of for any company to kind of stay on the top of their infrastructure without having an observability platform and you know companies like Datadog and New Relic did a incredible job in terms of providing the you know those SRE teams the right tools and the right platform so that they can stay on the top of their systems. Now, typically, if you look at the business side of the things, before the advent of cloud data warehouses, a lot of business processes were not that data-driven. But now with the whole evolution of the modern data stack, each business process is also data-driven.
Starting point is 00:23:42 And now it has become really difficult for business teams to be able to keep track of their matrix without having a system that can actually help them the way Datadog has helped with the SRE teams. So that is the whole concept that we have when it comes to business observability is, can Cliff become that system that allows business teams to stay on the top of their business processes? And the way the platform works is basically, we sit on the top of the data warehouse. So the Cliff AI is basically, you know, it's a SaaS offering, where we plug in directly into a data warehouse. And we integrate with various data warehouses like Snowflake, Redshift, BigQuery, and so on and so forth.
Starting point is 00:24:28 And the way integration works is basically, so let me just kind of walk you through the entire process. You plug in Cliff AI on the top of your data warehouse. We monitor every single metric that is there into a data warehouse in a completely automated manner. And there is no, you know, there is, there are no rules that needs to be defined. There is no thresholds, or there is no such kind of setting that needs to be defined. Every piece of data gets monitored automatically. So we typically deal with metrics. So we don't deal with like the raw data. We deal with the KPIs and metrics that are there into the data warehouse. And then once, whenever any
Starting point is 00:25:06 significant changes happens in those metrics, so for example, if you see a sudden spike in your visitor, sudden dip in your conversion rates, what happens is, you know, we send notification via email or Slack or Teams integration to the relevant team member within an organization. And there is one interesting thing that we have learned from the engineering domain is in terms of one of the biggest challenges that we have also faced is how do we identify the right person within an organization that needs to be notified about some important KPI changes. So that's where we have kind of drawn our inspiration
Starting point is 00:25:40 from an engineering domain where the way engineering domain have this whole concept of incident response systems, you know, you have these tools like pager duty, where if anything goes wrong, a pager duty incident is created, and you have an escalation process in terms of who gets to be notified about what, that's something that we try to bring into the business domain. So for example, imagine this, you know, you have certain, you know, drop in your conversion rate, you define that, okay, you know, the first L1 escalation about that changes about that, that drop in that conversion goes to XYZ person in the team, if they don't respond within a particular interval of time, the escalation goes to, you know, L2 managers, and so on, so forth. So defining that whole process of escalation
Starting point is 00:26:26 in terms of who gets to be notified about what and at what time, that's also a very important part of this whole observability platform that we are building here at Cliff AI. One of the key things, one of the most important, you know, member of this whole, you know,
Starting point is 00:26:41 journey that we try to bring in with the business observability is actually the data engineering teams itself. And what we do is basically data warehouse is a huge, huge kind of space where there are a lot of data that might be usable directly and that might not be usable. And the core team that is responsible for ensuring that the right data, the right insights get delivered to the business are actually the data engineering team. So what we also have is as a part of our onboarding process for Cliff is we also integrate
Starting point is 00:27:16 directly with DBT. So imagine that a business already having defined their key matrix and key KPI that they want to track as a part of the dbt models. So what we do is we directly pull in all of those definitions from the dbt projects and have them monitored in an automated manner so that they don't have to redefine those KPIs that they would want to ideally track into clip.ai. So that is one thing that our customers find incredibly useful when it comes to having this end-to-end process
Starting point is 00:27:50 and having a system that kind of fits in really well with the ecosystem of products that they're already using. You have said that a new user doesn't have to mess with definitions and rules and all that stuff. So let's say I'm a company that I don't have my KPIs defined as DBT models. What do I do then? How are these KPIs defined and
Starting point is 00:28:15 consequently tracked? Yes. So there are two things here. So let's assume a scenario where a company does not have already have their core KPIs defined, you know, in right now. So what what would happen in that case is, whenever we connect to a data warehouse, what we also do is we have an inbuilt SQL editor as a part of the platform where people can just define their queries right within the CliffWire platform and those queries get executed into the data warehouse and those metrics are collected and monitored into Cliff platform. What we have also seen and this is something that is kind of a new evolution that is happening in the whole modern data stack is there is now a rise of an intermittent or kind of intermediate matrix layer where what people are doing is, you know, this is kind of, you know, the looker has done it really well. The looker has this functionality of which is what they call as look ML, where they have kind of made it really easy for businesses to define and manage their KPIs in a very declarative manner. And what we are seeing recently in the industry right now is the rise of an independent matrix layer, which is kind of right now defining the matrix and kind of maintaining those metrics as a function
Starting point is 00:29:37 used to lie with the BI tools, right? You know, you would typically define your matrix and dimensions in a BI tool. Now with this, you know, rise of matrix layer, there is a new paradigm that has been happening is you have a dedicated space and you have a dedicated platform to be able to define and govern all the KPIs and matrix into one singular place. So we have a very limited and a very kind of a basic version of that matrix layer within the Cliff platform itself, where people can define and manage their matrix in one single place. Oh, wow. And is this like, do you have some kind of like declarative language that is used for this definition? Or is it just SQL right now? No, as of now, it's just SQL. So what we believe is that, you know, introducing a new language wouldn't kind of solve any purpose for us.
Starting point is 00:30:29 And I think SQL is the kind of language, you know, it's kind of the most commonly used language for the data engineers. So our goal is to fit in the existing workflow as smoothly as we can. So that's why it's a plain, simple SQL editor that can be used to define queries. Yeah, makes sense. So are the dashboards dead? No, so the non-marketing answer to that is no, dashboards are not dead.
Starting point is 00:30:58 I think so. Dashboards serve their own purpose. Dashboards are good for, the dashboards are still a good tool for reporting. But I think so what Dashboards are good for, you know, the dashboard are still a good tool for reporting. But I think so what dashboards are dead for is the, are dead for kind of, you know, the use cases where you need to have a continuous monitoring of something. So dashboards are dead for all those use cases where you need to have a human being constantly monitoring them for any kind of insights. I think that is something dashboards are definitely dead for.
Starting point is 00:31:34 Yeah. I don't know, Eric. I have a feeling that the highest level of escalation of a tool like this is probably going to be the board of directors. I don't know if this is a very good idea. What do you think? It's a good question. You know, one question I have, Ayush, and this may be, this is sort of jumping back to specifics, but when you think about continuous monitoring and becoming, basically being served a a notification that's really talking about anomalies, right? Things that are important enough to need someone to look at it because there's been some sort of change. And when you think about, and I'm going to oversimplify this, but let's, you sort of have two classes of anomaly, right? So one where a metric is changing significantly because of some sort of business activity. So marketing runs a campaign, you get on the first page of Hacker News, you know, your, or, you know,
Starting point is 00:32:46 so that's sort of positive or, you know, negative stuff, right? Like AWS servers go down, you know, so user activity, you know, falls off or whatever. Yeah. So the first class is like something's happening with, with the business that sort of a fundamental lift or decline. The other one is changes in data, right? A definition changes, the name of an event changes, those sorts of things. How do you think about classifying those? And you mentioned this a little bit before saying, how do you get the notification to the right person in the organization? But it's interesting because numbers can go up or down either because the business itself is changing or because the business really is, there isn't an anomaly in the business. There's actually just a change in the data that creates an anomaly in the metric.
Starting point is 00:33:38 Yeah, no, that's a great question, Eric. And I think so, you know, the way you have classified them into two kind of problems where those anomalies are actual business anomalies or versus those are those data anomalies right and i think so this both of those pieces are very critical component to have and build an into an observability and i think so we have recently seen a rise of a lot of data quality monitoring tools that are kind of tackling this, you know, this whole problem of the data quality. And, you know, you know, you know, and they're kind of, you know, doing an amazing job in doing that, right? You know, in terms of ensuring that whatever data that gets business anomalies rather than the data quality issues per se.
Starting point is 00:34:32 And I think so when we talk about, you know, the business observability platform, you know, monitoring matrix for anomalies is just one part of it right because when you monitor something you know if you're if it is all about anomaly detection it would it's better to call it a monitoring platform rather than a an observability platform and i think so what makes something uh observability tool rather than a monitoring tool is not only telling something that something went wrong, but also to assist in terms of identifying why it went wrong. So the second part is equally important in terms of why something went wrong. And I think so, you know, if you look at from a business context, right, so if you see that there is a sudden spike in, you know, let's say the number of visitors, the very first question that any business team would ask is why? Why is this happening?
Starting point is 00:35:25 Is it because of, you know, is it because of some campaigns that the marketing team is doing or, you know, is it because of certain other parameters? So that's where we come. So the second part of the observability platform is answering these questions. Why? And within the Clif platform, what we have also built is a very smart root cause analysis tool. What it does is basically if you have, imagine, let's say you have a metric and you have within the cliff platform what we have also built is a very smart root cause analysis tool what it does is basically if you have let's imagine let's say you have a metric and you have a certain set
Starting point is 00:35:50 of dimensions associated with that matrix so what cliff platform does is basically it does an automated root cause analysis to be able to identify what were the key segments that contributed to that spike statistically you know what are those statistically significant factors that contributed to that spike? So in that way, what we're trying to do is we are trying to do and try to complete an end loop around observability, where you not only know what went wrong, but you also get an idea of why it went wrong. And obviously, you know, there is an underlying hypothesis is the business have the right dimensions or right dimensions associated with that particular matrix.
Starting point is 00:36:32 Do they have the right dimensions that would assist them into helping that root cause analysis as a part of the data warehouse? So that's a key assumption that we have. Yeah, super interesting. And this may be a funny question, but I'm interested. I'm just thinking about all the listeners in our audience who have worked on the underlying data layer that drives metrics.
Starting point is 00:36:53 I mean, we're all familiar with that. How many, and let's just define metric as sort of a single number that represents some part of the business. How many metrics are your customers tracking? I mean, is it 10? Is it 100? 1,000? Because I think we all sort of,
Starting point is 00:37:11 when you work inside of a business, sometimes it's like, wow, are we tracking a lot of stuff or are we not tracking a lot of stuff? So can you provide some perspective on that since you see it every day? Yeah, so that's actually a great question. So I think so, I would just know, have a clear demarcation here
Starting point is 00:37:28 in terms of what a KPI and a what a metric would mean. So basically, what might happen is business might have a limited set of KPIs that they would want to monitor. But the number of metrics that can arrive that can occur because of the combination of dimensions can grow exponentially. So let me give you an example. So for example, the number of visitors coming to the website, so that's just one KPI. But this particular KPI can have hundreds and thousands of metrics of what that would be number of visitors coming from Google, number of visitors coming from Facebook, and each of those metrics can have a significant impact on the business. And what what business try to do is they would not want to have just a monitoring on the top level KPI, like how many visitors are coming on the website,
Starting point is 00:38:16 but also how many visitors are coming from, let's say the social medias or from let's say organic search and so on so forth. So the number of metrics can grow exponentially depending on the size of business. Typically, one of our biggest customers is a telco company. And in that telco business, I think at this point, they're monitoring, I think so, a broad number for that would be they're monitoring roughly around 200,000 KPIs, not KPIs, the matrix in a near real-time manner. So it can grow as, yes. Wow. That's incredible. And as kind of a follow
Starting point is 00:38:54 on to that. So it's really interesting to think about the, one of our previous or recent guests use the term data value chain, right? So we think about, you know, you have collection of the data, you are sending it to places, whether that's to different tools in your stack, you know, you unify it in the warehouse. Ideally, there's a metrics layer. Ideally, I agree with you. I think the metrics layer is, is the way that things are moving in the future. I mean, it's super cool what you can do with tools like dbt and sort of all of that. Then you are building dashboards for certain things. Then you're sending and operationalizing that data and, you know, ideally observing, you know, sort of doing business observability. Where, as those things change, where does the data engineer sort of fit into the data value chain in the context of having
Starting point is 00:39:48 sort of data quality type automation, you know, with tools say like BigEye or Monte Carlo, you know, with business observability tools like yours, what are you seeing? Are these tools and sort of the changing data value chain around this modern stack, is that sort of repositioning teams and data engineers in terms of where they fit? Actually, that's a very interesting question. And I think so. I was recently reading an article and what the author kind of gave a very apt summary of what's happening in the data space is previously, you know, there used to be in separate roles within the organization called data platforms, where, you know, you would have a combination of, you know, engineers,
Starting point is 00:40:37 and the data guys who are kind of owning and building the data platforms within the company. And with this whole rise of the modern data stack and with the size of whole tools and that are kind of emerging in various aspects of the modern data stack, what's happening is that the role of a data engineering becomes more prominent in a sense that with this, with all of these tools,
Starting point is 00:41:02 Monty Carlo data, Cliff AI, Big Eye data, the data platform aspect has been taken by the third-party vendors, whereas the core delivering the data value chain or the tying of this data value chain still remains with the data engineers. So previously, initially the data engineers would be the one who would be writing the ETL pipelines
Starting point is 00:41:24 to pull in the data from one place and putting into a data warehouse. But now their role has become way more significant in the entire value chain, right from the generation of the data to the consumption of the data. And this is, you know, I think so with the advent of DBT, I think so roles like data, you know, analytics engineer, you know, these were the roles that were not even heard of, you know, a couple of years back. But this has become like a mainstream titles in terms of the, you know, titles like analytics engineer. So data engineering, analytics engineering is something that is just growing day on day. Ayush, I have a question. You use the term business observability platform, right? And okay, the most common observability, let's say observability started from the need to
Starting point is 00:42:10 like to observe our infrastructure. Then recently we are also have started talking about like data observability, where we have like other tools that are trying like to mimic what an observability platform does, but like for data infrastructure specifically. Now you are taking this on a level higher, which is the business observability platform. Now, in each one of these cases, usually we have a very specific role that is related to operation that is interesting that is using the user of these tools. We have the SREs that they are using DataBelt, for example. Who is using the business observability?
Starting point is 00:42:46 Like who is the equivalent of an SRE, but for a business, let's say? Yes, so that's a good question. So I think so for us, the end consumer of the business observability platforms are typically the operations team. So now this operations team is also a very broad term. So it can span across, you know, revenue ops, marketing ops, or, you know, sales ops, or even in a lot of cases, actually the operations, you know, the physical operations of a company. So for us, that typically
Starting point is 00:43:15 the audience are the teams that works with the operations team. You know, in one case, we have a revenue operations team that is, you monitoring various kps related to finance using cliff.ai so typically for us the audience are the operations teams the teams that are most impacted whenever any numbers changes in their matrix so for for us the audience are the operations teams but the enabler of the platforms are the engineering teams. You know, the finance operations teams, we don't expect a finance operations teams to kind of connect to their data warehouse and actually write those queries
Starting point is 00:43:55 and get those numbers that they want. So typically for us, the enablers are the analytics and the engineering teams. The consumers are the operations team. Great. One last question from me, and then I'll give the stage back to Eric.
Starting point is 00:44:14 I know that you are also actively creating a community. So I'd like to ask you, and this is something that we have seen happening a lot lately, especially with products that really have to do with data. I think everyone saw the success of DBT and the DBT community. Of course, communities are not something new.
Starting point is 00:44:36 Open source communities exist for like since forever. Based on your experience, how important is a community around the data products and how they relate together or is it just like a marketing tool at the end like how do you see the community what's the position of the community as part of like your the business value that your company at the end right like delivers yeah yeah so i i think so that that this is something that I have a slightly different opinion, probably, you know, would be very contradictory with what opinion that other people would have is, you know, I think so if we start a community with intention of getting a business value in return, I think so we defeat the whole purpose of the community itself. And, you know, we started this modern
Starting point is 00:45:25 data stack.xyz community with the whole purpose of, you know, finding a place where people interested in the modern data stack can come together and create a resource that can help anyone to learn about the modern data stack. And in terms of the value that is there in the community, I think so for us, the modern data stack community that we have built is more about just interacting with the like-minded individuals about what's happening in the modern data stack. Because the modern data stack is kind of changing every single day. There isn't something new coming up. There's something, the categories there, the entire categories are getting created
Starting point is 00:46:06 in the modern data stack very, very, you know, very quickly. And that's one of the reasons, that's the only reason we wanted to create this community to just bring in like-minded people together. In terms of speaking about the value of the community, I think so the biggest value of a community is having an audience or having a kind of, a kind of a set of people connected with you where you can share these ideas. Because one of the key things that is happening in the whole modern data stack is the emergence of the new ideas.
Starting point is 00:46:39 You know, you know, people haven't heard of reverse ETL, you know, up until a few years back. People haven't heard of PLG CRNs. People haven't heard of reverse ETL you know up until a few years back people haven't heard of PLG CRNs people haven't heard of business observability and having a community the biggest value of having a community is having a connection with the like-minded individuals with whom you can share this whole idea you know no matter how much crazy it sounds you you share that idea with those set of people and you kind of get a feedback on those ideas. And from another value perspective is, we found our first set of customers, first set of customers from this community itself.
Starting point is 00:47:16 We would share the ideas, share the things that we are working on at S-Cliff.ai and we would get the feedback from that community. So I think that is the value that you get from the community. And we have been very conscious that we always wanted to create an open community. We don't want this to be a community by cliff.ai. You know, even if you look at the modern data stack.xyz website, you know, we have a very, very small footer at the bottom,
Starting point is 00:47:44 which says, you know, run by the team at the Cliff AI. And one of my close friends, you know, who has been a pioneer in building an A-B testing platform, you know, this is one thing that I learned from them is, you know, when they were building a community, they would rarely talk about the product, their, you know, their offerings, you know, they would just talk about the product, their, you know, their, their offerings, you know, they would just talk about A-B testing, and they've kind of became kind of a go to, go to place for anyone to learn about, you know, A-B testing, and eventually that helped the business itself. So from from my perspective, the, you know, the community is kind of a long term game. And the goal here is to just bring the like-minded
Starting point is 00:48:25 individual together, not from any specific business goal perspective. Really great perspective there, Ayush. I agree. I think building a community that provides true value around subject matter without commercializing it really takes commitment over a long period of time. You know, it's not necessarily something, it can't, it helps create context for the business problem, you know, the company that you're building, but really appreciate that perspective. We're close to time here, but I have one more question for you. So you get to see anomalies in data all the time from customers who are using Cliff, cliff.ai. What are some of the most interesting anomalies that you've seen that just sort of surprised both your customers and you?
Starting point is 00:49:12 Okay. So I think, so I, I remember this case where, you know, and it's actually, it's not, not very surprising, but it was very kind of impactful was when, you know, there's a customer who was, who were monitoring their marketing ad spend with respect to.ai. And one day they saw a sudden spike in their ad budgets. You know, they had set up some, you know, in the Google ads, they have set up their daily limits, but it was that, that limit was set to be kind of decently high. And what happened was, you know, there was one footballer who kind of mentioned a term that is that is what one of the target keywords and what they saw is they they had a sudden spike in their ad spend. And that was
Starting point is 00:50:00 coming from a completely irrelevant audience. they saw a spike and they kind of turned on their bidding on that particular keyword for the specific duration of time. So that was very interesting. Yeah. But I mean, really useful to like be able to catch that pretty quickly. Wow. That is so funny. Amazing how, you know, someone with a huge following just mentioning something can impact,
Starting point is 00:50:24 you know, a company's ad budget. And, you know, and they were a B2B SaaS company that has nothing to do with, you know, what that footballer mentioned. Yeah, that's hilarious. Do you remember what team the footballer played on? You know, I can very vaguely remember who was that footballer, but I think so it was someone from Chelsea. Ah, okay, Chelsea. Hilarious. Well, Aish, this has been a really fun show. Thanks for joining us and best of luck with Cliff.ia.
Starting point is 00:50:59 Thank you. Thank you very much, Eric. Thank you for having me. Really interesting conversation. One thing that I really appreciated, and it's one of those things where you kind of know it in the back of your head, but since you're experiencing it every day, you don't really think about it, but it is pretty wild the number of ad hoc analyses that are created and require a lot of work and then are basically thrown away. I mean, Google Sheets for every company must be a massive graveyard of ad hoc analyses, you know, and in some ways you have, you know, saved queries on the warehouse that are similar and thinking about the paradigm shift to sort of monitoring or observability in the context of your stable metrics and KPIs is really interesting. So yeah, it made me think
Starting point is 00:51:55 about how many Google Sheets I have in my drive that, you know, are old stale ad hoc analyses that were really useful for 15 minutes and then I've never looked at it again. Yeah. Yeah. I think everyone can relate to that. And I think anyone that comes from a BI background, they can definitely feel the pain of what it means to maintain all the different dashboards that the company has. I think it's much more evident early on in the life of a company because there's no
Starting point is 00:52:25 clear ownership of the BI process. Everyone is like a BI analyst in a way. And I think it's a problem that BI tools have struggled to figure out the solution for forever. And I don't think that they have managed to do it efficiently, to be honest. Mainly because it's not just a technological
Starting point is 00:52:41 problem, it's also an organizational problem. But ad hoc analysis is very important, that's for sure. We always need to ask ad hoc questions. So, of course, we have to do that also with our data. Now, how we manage all this garbage that's created from there, that's something that we discussed today with Ayush. I think there are two things that I found really interesting in our conversation. One is the fact that this space of operationalizing data becomes richer and richer.
Starting point is 00:53:15 If I remember, our audience probably is already familiar with reverse ETL, but that's only one manifestation of how to operationalize your data. We had another guest who, the CEO of Airbyte, mentioning that machine learning models are also operationalization of data, which is a very valid point. And I used today mentioned continual AI, for example, which is exactly that. And of course, like all the like Tekton and all the different feature stores at the end, that's exactly what they are doing. And now we have business observability,
Starting point is 00:53:52 which is another way to operationalize like the data. That's very interesting. And I'm really looking forward to see what else will come up. And the other thing is that how big of an impact the SRE discipline and the operations, come up. And the other thing is that how big of an impact the SRE discipline and the operations, the engineering operations discipline has outside
Starting point is 00:54:12 of just like managing the infrastructure, the IT infrastructure of the company. We see that like getting repeated in data and now we see it also like in business, where we have like these themes of like rev ops marketing ops sales ops and all these different like roles that they arise and
Starting point is 00:54:33 of course like all that stuff like is built on the availability and accessibility of data today so yeah this was like a very very interesting. And I'm very curious to see how this business observability category is going to evolve. I agree. Well, thanks again for joining us on the Data Stack Show. And we will catch you on the next episode. We hope you enjoyed this episode of the Data Stack Show. Be sure to subscribe on your favorite podcast app to get notified about new episodes every week. We'd also love your feedback.
Starting point is 00:55:06 You can email me, ericdodds, at eric at datastackshow.com. That's E-R-I-C at datastackshow.com. The show is brought to you by Rudderstack, the CDP for developers. Learn how to build a CDP on your data warehouse at rudderstack.com.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.