The Data Stack Show - 65: Operationalizing Data from the Warehouse With Aayush Jain of Cliff.ai
Episode Date: December 8, 2021Highlights from this week’s conversation include:Aayush’s career background (4:13)How his biological sciences academic training impacts his work (8:04)How do we allow dashboards to get messy? (9:3...5)Building cultural or technical solutions to effective dashboards (15:19)Using data dashboards to make material business improvements (23:19)What is business observability? (32:23)Building a platform for operations teams (43:15)How important community is to the cliff.ai business proposition (41:03)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Transcript
Discussion (0)
Welcome to the Data Stack Show.
Each week we explore the world of data by talking to the people shaping its future.
You'll learn about new data technology and trends and how data teams and processes are
run at top companies.
The Data Stack Show is brought to you by Rudderstack, the CDP for developers.
You can learn more at rudderstack.com.
We have a really exciting episode coming up.
And what's most exciting is we're going to live stream it.
The topic is the modern data stack.
And we're going to talk about what that means.
It's December 15th and you'll want to register for the live stream.
Now, Costas, it's really exciting because we have some amazing leaders from some amazing
companies.
So tell us who's going to be there.
Yeah, amazing leaders and also an amazing topic.
I think we have mentioned the modern data stack
so many times on this show.
I think it's time to get all the different vendors
who have contributed in creating
this new category of products.
And they define the modern data stack
and discuss about what makes it so special.
So we are going to have
people like Databricks, dbt and Fivetran, companies that are implementing state-of-the-art
technologies around their data stack like Hinge. And we are also going to have VCs and see what's
their own opinion about the modern data stack. So in a sense, VCs are going also to be there.
And yeah, it's going to be super exciting and super interesting. So, and since VC is going also to be there and yeah,
it's going to be super exciting and super interesting. So we invite everyone to our first
live streaming. Yeah, we're super excited. The date is December 15th. It's going to be at
4 PM Eastern time and you can register at rudderstack.com slash live. So that's just rutterstack.com slash live. And we'll send you
a link to watch the live stream. We can't wait to see you there. Welcome back to the Data Stack
Show. Today, we're going to talk with Ayush Jain, and he is one of the founders of a company called
Cliff.ai. He's actually sort of a serial entrepreneur, has done a couple of startups,
and this one's focused on what he calls a business observability platform. So I think it'll be a
really interesting conversation. I don't know if we've come across that term yet on the show,
have we, Kostas? No, I don't think so. I think it's going to be a quite interesting conversation,
especially talking about how to operationalize data
and how the data warehouse, again,
is a very important component
on how we extract value of data from the data.
So, yeah.
And also, he also leads a community.
So I think it's also going to be interesting to hear
from him what's the role of the community
and how it helps and all that stuff.
Absolutely.
And I think my question, and I'm going to go way back here.
So I use, you know, in looking at our guests' backgrounds,
I'm always interested when someone's building something in the data space,
but they don't come from a technical background.
And Ayush actually comes from a scientific background.
And I always find that connection really interesting.
And I haven't asked that question in a long time.
So I'm way overdue.
So my question is going to be how his academic background has influenced his work in data,
which I know you're going to enjoy because you haven't gotten that in a while.
I think you're going to force me to be philosophical again and claim that science is data, but...
Science is data, well said.
We can just wrap the show up.
Perfect.
Yeah, let's go and chat with him.
Let's do it.
Ayush, welcome to the DataSec show.
It's great to have you here.
Thank you very much for having me.
All right, well, tell us about yourself. What's your background and what are you working on in
your day job? Great. So just to give you a little bit of background about myself,
I'm one of the co-founders of Clip.ai. And what Clip.ai does is basically it's a business
observability platform. We help companies track and monitor their matrix without the need of requiring a dashboard
and in terms of my background I come from a very kind of different background so I have my master's
in biological sciences and we started we started off we're a team of three co-founders and we
started our company we started our first business together as soon as we got out of college so we
like we used to live next to each other in dorm rooms for good five years of college and just after that we thought guys we got to do
something together but the doing doing a job was kind of uncool for us back then but so we kind of
took a plunge into starting our own business without even without even without even knowing
how to do a business how to run a a business. So we started an online pharmacy
in India. So this is way back in like 2016, where the internet, especially the online commerce was
booming in India. And the question that we asked ourselves is, in India, at that point, you could
get literally anything delivered to your doorstep by just ordering something on an app. The only
thing that you would have to still go to a you know or to a
shop and go to a physical store and buy was medicines we thought why don't we kind of build
an application that would allow people to just put their prescription and get their medicines
delivered at their doorstep so that's the first business that we started together we ran it for
around eight months a big company came in and said we want to buy in the whole business. And we said, you know, fine, let's do that. So we sold that business. Yeah. So we sold that business off.
And then after that, kind of took that cash in and then said, you know, what do we do next? And
for a short period of time, we started exploring a few ideas that we could work on.
And then eventually we, you know, started our B2B SaaS business that's called GreenDeck.
And what basically GreenDeck was, it was a price optimization engine for online retailers.
We would help online fashion companies optimize their pricing using AI.
And one of the core components of the GreenDeck platform was dealing with huge amount of data. So the way we would help our customers make their pricing better is by providing them a huge amount of data
in terms of how their products are priced with respect to their competitors.
And that was our first experience with dealing with data, the huge amount of data.
So we would crawl hundreds and thousands of websites on a daily basis and collect the products and pricing information and help our
customers make decisions. And that's where we had our first kind of encounter with huge amount of
data. And then eventually sometime last year, around March last year, we kind of pivoted from GreenDeck to Clip.ai. And this pivot was
basically from the primary, the thing that made us kind of excited about what we are doing here
in Clip.ai is we were dealing with a huge amount of dashboards, a huge amount of dashboards to be
able to monitor and monitor and track various processes that were happening on GreenDeck.
And it all started with one simple anomaly detection script.
We kind of wrote a script internally saying, hey, we have tons of metrics that we want
to monitor as a part of the GreenDeck platform.
And the core problem was there were so many dashboards and we had a small team that we
can't even look into all of those
dashboards. So we kind of created an anomaly detection script that would just monitor those
metrics and send a Slack notification whenever there is any anomaly in that metric. And that
worked magic for us. And that's when we kind of got excited, this whole problem about dashboards
and how various businesses deal with dashboards. And eventually we pivoted into Clip.ai. Very cool. Well, I want to talk about dashboards, but first I'd love to
know, we love talking with guests who just have different diverse backgrounds. So biological
sciences, I'd love to know what lessons have you taken from that sort of
academic background into being an entrepreneur and working with data?
I think one of the core important things that I learned coming from a science background was
this whole, in terms of my understanding of the things I think what I've
learned the most is asking questions fundamentally I think so a lot of times when when when we when
we look at things from a business perspective we often kind of forget or often kind of stop
asking the fundamental questions and science works on fundaments. I think that's what my biggest thing that I kind of
learned from the science background is asking fundamental
questions and starting to think from the first principles. I
think that that is what is the biggest takeaway from from for
me from science background.
Yeah, I love it. I love it. That's um, sure. I mean, in
science, you sort of start out with
trying to remove sort of influence from whatever you're trying to study. And it's hard to do that
in business. So that's a really good lesson. And actually, for anyone working with data,
in fact, like that's a great, that's just a great reminder in general. So love hearing that.
Well, let's so dashboards, you had all these dashboards,
small team, you're trying to understand them. What I'd love to know, because you're building
a company that helps people sort of avoid this or solve the problem of messy dashboards. But
I'm interested in your perspective on how dashboards even get messy. And Costas would
love to know as well from your perspective, because you've built companies and dealt with dashboards as well.
But you would think, I mean, businesses vary a lot,
sort of basic business models and metrics and stuff.
There's a lot of commonality across businesses, right?
I mean, you have website traffic, you have some level of conversion,
some of it's qualified, and then someone pays you some money.
But everyone, I think,
listening has experienced messy dashboards. So how do they get messy?
So before I answer that question in terms of why and how dashboard gets messy, I think so,
let me take a step back and kind of paint a picture in terms of how we got here and why we got here.
So basically, if you look at what's happening in the data space in the past few years,
is an undeniable rise of the modern data stack.
And people have different definitions around what constitutes as a modern data stack. And it's still very kind of, some people would argue that it's more of a marketing
jargon than something that is more of a marketing jargon, then something that is actually substantial. But nonetheless,
the undeniable trend is that the data warehouse has been becoming a heart of the modern data
space, more modern data stack itself. And there has been an incredible amount of progress that
has been made in terms of allowing businesses to be able to bring their mission-critical data into a data
warehouse. You have a rise of ETL tools, you have a rise of data quality tools, data quality
monitoring tools. And what that means is that now the business have way more amount of data that
they would have a few years back because the barrier to put the data into a
data warehouse has significantly reduced. And what that means, what that means as a business is they
have a huge amount of data. And up until a couple of years back, the only mode of consumption of
this data had been warehouse. There has been a tremendous progress made to bring the data into warehouse, but the consumption of that data has been typically, or even still today in the 90%
cases, it's consuming of that data into a dashboard. Now, the thing that has changed in
past few years is that dashboards, the fundamental limitation of a dashboard is that a human consumes a dashboard.
So basically, the underlying assumption of a dashboard is that there would be a human being that would be visually looking at a dashboard and making decisions.
Now, this thing has changed fundamentally in the few years. The way it has changed is that before I answer the question why dashboard gets messy,
the thing that has been happening is that dashboards, since they fundamentally rely on a
human being to consume them, the thing that has happened with the growth of this data,
the scale at which people can consume this information, that data, consume that data that has drastically reduced. Now, what
that means is basically, as a business, as you mentioned, you have tons of KPIs, but can be
visitors and then followed by users and then customers, so on, so forth. And right now, what
happens is most of the times you people put their matrix into a dashboard expecting a human being to monitor them on a
continuous basis and because of that what happens is there are two things that happens either the
since the mode of consumption of data for people is dashboards whenever people have questions
someone would want to know why something broke or why something changed in the business the
inherent bias is to go to the data team and say,
hey, I want a dashboard. And what that leads to be is sometimes answering these data question
leads to a creation of a dashboard. And these dashboards are, in a lot of cases, what happens
is these dashboards kind of used for a couple of times to get a couple of answers, and then no one
even looks at that. Now, what's happening
because of that is this creation of this whole dashboard rot where every organization have
hundreds and thousands of dashboards that are probably only a fraction of them are actually
used on a day-on-day basis. So that is one thing that is what in my experience and in my learning with working with various data teams is that the data and engineering team has kind of an ownership of maintaining so many dashboards that are rarely used.
So that is one of the reasons what I feel is that we got into this messy situation where we have hundreds of dashboards which are rarely used.
And even the ones that are used, they still rely on human beings to be able to take those decisions.
So that is what my understanding is in terms of how do we got into this messiness of dashboards.
Yeah.
So I guess, yeah, I wanted to ask you, I mean, you described like the problem pretty well.
And I think that anyone who has to work with data, not even with data, but I mean, you described the problem pretty well, and I think that anyone who has to work with data,
not even with data, but I mean
people that are working in
any organization inside
the company, at some point they had to
face these issues. But did
you think that this is like an organizational
problem or
a technical problem?
Or an organizational problem that
has a technical solution.
So what's your perspective on that?
I think, yeah, so I think so.
I don't think so it's more of a technical problem.
I think just more of an organizational behavioral problem where the way we deal with data has
been through dashboards and reports.
So the natural bias whenever anyone has
any question is to ask and request a dashboard i think this is more of an organizational issue
it's more of a behavioral issue and more if even if i have to cover any technical aspect of it
i think so the up until you know i'll get to that point also later down the line but
i think so the only way for people to get answers right now
is dashboards and reports.
And I think so until and unless we have things,
technology that kind of solves that part of the thing
where people can ask questions without requiring dashboards,
I think that this is something that's going to stay.
Okay.
So how we can do that?
How we can ask questions without the dashboards?
So I think so now,
now this is a very interesting trend
that has been,
you know, that we have been noticing
in the past few months
and that has been growing very rapidly
is this whole idea
of operationalizing the data
into the data warehouse.
Now, this is something
that is a very interesting
concept that, you know, that broadly covers few categories of the modern data stack. So one of
the most obvious thing that we have very recently seen a very good success in terms of the adoption
from the board tech community and from a business perspective as well, is this rise of reverse ETL.
Now, so far, what we have seen in
the industry is, you know, people bringing in data into a data warehouse and then kind of writing
ad hoc kind of, you know, scripts and report kind of, you know, some people would kind of, you know,
do airflow DAGs and everything to kind of put this data back into those source systems.
Now, with this, you know, with recent rise of reverse ETL, you know, there are a few companies that are kind of doing an
amazing job when it comes to reverse ETL, you know, companies like Hightouch and Sensors,
where what's now happening is the data that is required is now no longer sitting into a data
warehouse. This data, this insights data needs to be generated from
the data in the data warehouse is now being fed back into the source system. So now imagine,
you know, a sales guy, you know, using HubSpot or Salesforce as a CRM, they, instead of them going
to a data team and saying, you know, hey, show me, you know, I want this list of users who have
done these things so that, you know, I can target them and so on and so forth.
Now what they're doing is they have this information back into their operational systems,
which is Salesforce or HubSpot.
Now, this is kind of a fundamental shift in terms of how do you think about data?
The data into the data warehouse is no longer a place where data goes and people would query it for reporting
later on. You are actually activating the data that is being put into a data warehouse and making
kind of business decisions on the top of that. That's around the reverse ETL space. I think so
another interesting trend that has been happening is people, you know, using this data into a data warehouse and kind of building downstream applications from this data warehouse.
So one of the companies that I really admire in this space is this company called Continual AI.
They have been kind of, you know, helping people build machine learning models directly on the top
of the data warehouse.
And this solves a very important problem in end-to-end delivery of machine learning models,
where previously what used to happen is people would kind of have bad jobs where they would
train a model and then kind of use it into production.
And then they have to build that pipelines where they have to continuously fetch in the new data from source systems and then retrain their machine learning models and kind
of have that end to an automated pipeline. Continual solve this problem by kind of directly
placing themselves on the top of the data warehouse and then kind of allowing people
to build their machine learning models. That's a very smart thing that I've seen recently in terms of operationalizing the data warehouse.
And then in terms of operationalizing the data warehouse,
one of the other things that I think,
so that is something that we are trying to deal,
kind of tackle with cliff.ai is how do we help companies
actually generate insights
from these data that is being captured
into a data warehouse and actually make it actionable.
And that kind of is,
that's what we are kind of doing here at clip.ai.
So one of the few things that we have heard
with our customers is that, you know,
most of the time they would have so many APIs and metrics
that they would want to keep a track on,
and they use dashboards to be able to do that.
And one of the fundamental problems that I just talked about when it comes to dashboard is,
dashboards are now being abused as a monitoring tool while they were not meant for that purpose.
And this is something that we want to solve using Cliff AI.
We position Cliff as a business observability platform. And this is something that we want to solve using Cliff AI is Cliff AI.
We want to position Cliff as a business observability platform.
It's a platform that sits on the top of the data warehouse and monitors every single
matrix that is in that data warehouse and let the relevant person in the team know whenever
anything changes in those KPIs and matrix.
And this is, in a way way is activating the data warehouse and
allowing people to actually take business decisions and actually take actions on the top of
their data. So these are, you know, this is ever growing, this is a rapidly growing field in terms
of, you know, how do various businesses activate the data into the data warehouse. And we, I think,
so we are just kind of touching the tip of the iceberg as of now, I think. So there are a lot of innovative
solutions that are kind of emerging in the whole modern data stack very recently that is kind of
tackling this part of operationalizing the data into the data warehouse. One of the categories
that I kind of forgot to kind of mention there is this whole category of product load you know plg crms where you know
companies kind of uh have kind of it's kind of a crm built for product-led growth companies
that is kind of tightly coupled with the data warehouse so there is a there is interesting so
much interesting activity that is happening around you, what happens to the data once it has reached the data warehouse.
That's very interesting. I think you touched like many, many important topics. Quick question.
You mentioned like a very interesting term, which is the business observability platform that you're
trying to build. And you're trying also like to move away from the traditional way like methodology of like
consuming dashboards for that so how does it work i mean you say there are kpis these kpis getting
tracked and the right teams are getting notified when something changes described to us with a
little bit more detail like the journey that your customer has on the product and how these things are defined, how they are consumed, how they are notified and how
they can react also, right?
Because reaction is also important.
That's why we need data so we can act.
Okay.
So, you know, so one of the key things that, you know, I'm personally very fascinated about
is the whole concept of SRE in the engineering domain, right? So
if you look at what happened in the engineering domain in the past, you know, past decade is
basically in engineering, you have tons of applications and services. And it got in at
one point, it got really difficult for business to kind of for any company to kind of stay on the top of their infrastructure
without having an observability platform and you know companies like Datadog and New Relic did a
incredible job in terms of providing the you know those SRE teams the right tools and the right
platform so that they can stay on the top of their systems. Now, typically, if you look at the business side of the things,
before the advent of cloud data warehouses,
a lot of business processes were not that data-driven.
But now with the whole evolution of the modern data stack,
each business process is also data-driven.
And now it has become really difficult
for business teams to be able to keep track of their matrix without having a system that can
actually help them the way Datadog has helped with the SRE teams. So that is the whole concept
that we have when it comes to business observability is, can Cliff become that
system that allows business teams to stay on the top of their business processes? And the way the
platform works is basically, we sit on the top of the data warehouse. So the Cliff AI is basically,
you know, it's a SaaS offering, where we plug in directly into a data warehouse. And we integrate
with various data warehouses like Snowflake, Redshift, BigQuery, and so on and so forth.
And the way integration works is basically, so let me just kind of walk you through the entire process.
You plug in Cliff AI on the top of your data warehouse.
We monitor every single metric that is there into a data warehouse in a completely automated manner.
And there is no,
you know, there is, there are no rules that needs to be defined. There is no thresholds,
or there is no such kind of setting that needs to be defined. Every piece of data gets monitored
automatically. So we typically deal with metrics. So we don't deal with like the raw data. We deal
with the KPIs and metrics that are there into the data warehouse. And then once, whenever any
significant changes happens in those metrics, so for example, if you see a sudden spike in your
visitor, sudden dip in your conversion rates, what happens is, you know, we send notification
via email or Slack or Teams integration to the relevant team member within an organization.
And there is one interesting thing that we have learned from the engineering domain is in terms of one of the biggest challenges
that we have also faced is how do we identify the right person
within an organization that needs to be notified
about some important KPI changes.
So that's where we have kind of drawn our inspiration
from an engineering domain where the way engineering domain have this whole concept of
incident response systems, you know, you have these tools like pager duty, where if anything
goes wrong, a pager duty incident is created, and you have an escalation process in terms of who
gets to be notified about what, that's something that we try to bring into the business domain.
So for example, imagine this, you know, you have certain, you know, drop in your conversion rate, you define that, okay, you know, the first L1 escalation about that changes
about that, that drop in that conversion goes to XYZ person in the team, if they don't respond
within a particular interval of time, the escalation goes to, you know, L2 managers,
and so on, so forth. So defining that whole process of escalation
in terms of who gets to be notified about what
and at what time,
that's also a very important part
of this whole observability platform
that we are building here at Cliff AI.
One of the key things,
one of the most important, you know,
member of this whole, you know,
journey that we try to bring in
with the business observability
is actually the data
engineering teams itself. And what we do is basically data warehouse is a huge, huge kind of
space where there are a lot of data that might be usable directly and that might not be usable.
And the core team that is responsible for ensuring that the right data,
the right insights get delivered to the business are actually the data engineering team.
So what we also have is as a part of our onboarding process for Cliff is we also integrate
directly with DBT. So imagine that a business already having defined their key matrix and key
KPI that they want to track as a part of the dbt models.
So what we do is we directly pull in all of those definitions from the dbt projects and
have them monitored in an automated manner so that they don't have to redefine those
KPIs that they would want to ideally track into clip.ai.
So that is one thing that our customers
find incredibly useful when it
comes to having this end-to-end process
and having a system that
kind of fits in really well with the
ecosystem of products that they're already using.
You have said that a new user
doesn't have to mess with definitions
and rules and all that stuff.
So let's say I'm a company that I
don't have my KPIs defined as DBT models. What do I do then? How are these KPIs defined and
consequently tracked? Yes. So there are two things here. So let's assume a scenario where a company does not have already have their core KPIs defined, you know, in right now. So what what would happen in that case is, whenever we connect to a data warehouse, what we also do is we have an inbuilt SQL editor as a part of the platform where people can just define their queries right within the CliffWire platform and those queries get executed into the data warehouse and those
metrics are collected and monitored into Cliff platform. What we have also seen and this is
something that is kind of a new evolution that is happening in the whole modern data stack is
there is now a rise of an intermittent or kind of intermediate matrix layer where what people are doing is, you know, this is kind of, you know, the looker has done it really well.
The looker has this functionality of which is what they call as look ML, where they have kind of made it really easy for businesses to define and manage their KPIs in a very declarative manner. And what we are seeing recently in the industry right now
is the rise of an independent matrix layer,
which is kind of right now defining the matrix
and kind of maintaining those metrics as a function
used to lie with the BI tools, right?
You know, you would typically define your matrix
and dimensions in a BI tool.
Now with this, you know, rise of matrix layer, there is a new paradigm that has been happening is you have a dedicated space and you have a dedicated platform to be able to define and govern all the KPIs and matrix into one singular place. So we have a very limited and a very kind of a basic version of that matrix layer
within the Cliff platform itself, where people can define and manage their matrix in one single place.
Oh, wow. And is this like, do you have some kind of like declarative language that is used for
this definition? Or is it just SQL right now? No, as of now, it's just SQL. So what we believe
is that, you know, introducing a new language wouldn't kind of solve any purpose for us.
And I think SQL is the kind of language, you know, it's kind of the most commonly used language for the data engineers.
So our goal is to fit in the existing workflow as smoothly as we can.
So that's why it's a plain, simple SQL editor
that can be used to define queries.
Yeah, makes sense.
So are the dashboards dead?
No, so the non-marketing answer to that is
no, dashboards are not dead.
I think so.
Dashboards serve their own purpose.
Dashboards are good for,
the dashboards are still a good tool for reporting. But I think so what Dashboards are good for, you know, the dashboard are still a good tool for
reporting. But I think so what dashboards are dead for is the, are dead for kind of, you know,
the use cases where you need to have a continuous monitoring of something. So dashboards are dead
for all those use cases where you need to have a human being constantly monitoring them for any kind of insights.
I think that is something dashboards are definitely dead for.
Yeah. I don't know, Eric. I have a feeling that the highest level of escalation of a tool like this is probably going to be the board of directors.
I don't know if this is a very good idea. What do you think?
It's a good question. You know, one question I have, Ayush, and this may be,
this is sort of jumping back to specifics, but when you think about continuous monitoring and becoming, basically being served a a notification that's really talking about
anomalies, right? Things that are important enough to need someone to look at it because
there's been some sort of change. And when you think about, and I'm going to oversimplify this, but let's, you sort of have two classes of anomaly,
right? So one where a metric is changing significantly because of some sort of business
activity. So marketing runs a campaign, you get on the first page of Hacker News, you know, your, or, you know,
so that's sort of positive or, you know, negative stuff, right? Like AWS servers go down, you know,
so user activity, you know, falls off or whatever. Yeah. So the first class is like something's
happening with, with the business that sort of a fundamental lift or decline. The other one is changes in data, right? A definition changes,
the name of an event changes, those sorts of things. How do you think about classifying those?
And you mentioned this a little bit before saying, how do you get the notification to the right
person in the organization? But it's interesting because numbers can go up or down either because the
business itself is changing or because the business really is, there isn't an anomaly
in the business. There's actually just a change in the data that creates an anomaly in the metric.
Yeah, no, that's a great question, Eric. And I think so, you know, the way you have classified
them into two kind of
problems where those anomalies are actual business anomalies or versus those are those data anomalies
right and i think so this both of those pieces are very critical component to have and build an
into an observability and i think so we have recently seen a rise of a lot of data quality
monitoring tools that are kind of tackling this, you know, this whole problem of the data quality.
And, you know, you know, you know, and they're kind of, you know, doing an amazing job in doing that, right?
You know, in terms of ensuring that whatever data that gets business anomalies rather than the data quality issues per se.
And I think so when we talk about, you know, the business observability platform, you know, monitoring matrix for anomalies is just one part of it right because when you monitor something you know if you're if
it is all about anomaly detection it would it's better to call it a monitoring platform rather
than a an observability platform and i think so what makes something uh observability tool rather
than a monitoring tool is not only telling something that something went wrong, but also to assist in terms of identifying
why it went wrong. So the second part is equally important in terms of why something went wrong.
And I think so, you know, if you look at from a business context, right, so if you see that there
is a sudden spike in, you know, let's say the number of visitors, the very first question that
any business team would ask is why? Why is this happening?
Is it because of, you know, is it because of some campaigns that the marketing team
is doing or, you know, is it because of certain other parameters?
So that's where we come.
So the second part of the observability platform is answering these questions.
Why?
And within the Clif platform, what we have also built is a very smart root cause analysis
tool. What it does is basically if you have, imagine, let's say you have a metric and you have within the cliff platform what we have also built is a very smart root cause analysis tool what it
does is basically if you have let's imagine let's say you have a metric and you have a certain set
of dimensions associated with that matrix so what cliff platform does is basically it does an
automated root cause analysis to be able to identify what were the key segments that contributed
to that spike statistically you know what are those statistically significant factors that contributed to that spike?
So in that way, what we're trying to do is we are trying to do and try to complete an end loop around observability,
where you not only know what went wrong, but you also get an idea of why it went wrong.
And obviously, you know, there is an underlying hypothesis
is the business have the right dimensions
or right dimensions associated with that particular matrix.
Do they have the right dimensions
that would assist them into helping that root cause analysis
as a part of the data warehouse?
So that's a key assumption that we have.
Yeah, super interesting.
And this may be a funny question, but I'm interested.
I'm just thinking about all the listeners in our audience who have worked on the underlying
data layer that drives metrics.
I mean, we're all familiar with that.
How many, and let's just define metric as sort of a single number that represents some
part of the business.
How many metrics are your customers tracking?
I mean, is it 10?
Is it 100?
1,000?
Because I think we all sort of,
when you work inside of a business,
sometimes it's like,
wow, are we tracking a lot of stuff
or are we not tracking a lot of stuff?
So can you provide some perspective on that
since you see it every day?
Yeah, so that's actually a great question.
So I think so, I would just know, have a clear demarcation here
in terms of what a KPI and a what a metric would mean. So basically, what might happen is business
might have a limited set of KPIs that they would want to monitor. But the number of metrics that
can arrive that can occur because of the combination of dimensions can grow exponentially. So let me
give you an example. So for example, the number of visitors coming to the website, so that's just
one KPI. But this particular KPI can have hundreds and thousands of metrics of what that would be
number of visitors coming from Google, number of visitors coming from Facebook, and each of those metrics can have a significant
impact on the business. And what what business try to do is they would not want to have just
a monitoring on the top level KPI, like how many visitors are coming on the website,
but also how many visitors are coming from, let's say the social medias or from let's say organic
search and so on so forth. So the number of metrics can grow exponentially
depending on the size of business.
Typically, one of our biggest customers is a telco company.
And in that telco business, I think at this point,
they're monitoring, I think so, a broad number for that would be
they're monitoring roughly around 200,000 KPIs, not KPIs, the matrix in a
near real-time manner. So it can grow as, yes. Wow. That's incredible. And as kind of a follow
on to that. So it's really interesting to think about the, one of our previous or recent guests
use the term data value chain, right? So we think about, you know, you have collection of the data, you are sending it to places, whether that's to different tools in
your stack, you know, you unify it in the warehouse. Ideally, there's a metrics layer.
Ideally, I agree with you. I think the metrics layer is, is the way that things are moving in
the future. I mean, it's super cool what you can do with tools like dbt and sort of all of that.
Then you are building dashboards for certain things. Then you're sending and operationalizing that data and, you know, ideally observing,
you know, sort of doing business observability.
Where, as those things change, where does the data engineer sort of fit into the data value chain in the context of having
sort of data quality type automation, you know, with tools say like BigEye or Monte Carlo,
you know, with business observability tools like yours, what are you seeing?
Are these tools and sort of the changing data value chain around this modern stack,
is that sort of repositioning teams and data engineers in terms of where they fit?
Actually, that's a very interesting question. And I think so. I was recently reading an article
and what the author kind of gave a very apt summary of what's happening in the data
space is previously, you know, there used to be in separate roles within the organization called
data platforms, where, you know, you would have a combination of, you know, engineers,
and the data guys who are kind of owning and building the data platforms within the company.
And with this whole rise of the modern data stack
and with the size of whole tools
and that are kind of emerging
in various aspects of the modern data stack,
what's happening is that the role of a data engineering
becomes more prominent in a sense that with this,
with all of these tools,
Monty Carlo data, Cliff AI, Big Eye data,
the data platform aspect has been taken
by the third-party vendors,
whereas the core delivering the data value chain
or the tying of this data value chain
still remains with the data engineers.
So previously, initially the data engineers
would be the one who would be writing the ETL pipelines
to pull in the data from one place and putting into a data warehouse. But now their role has
become way more significant in the entire value chain, right from the generation of the data to
the consumption of the data. And this is, you know, I think so with the advent of DBT, I think
so roles like data, you know, analytics engineer, you know, these were the roles that were not even heard of, you know, a couple of years back.
But this has become like a mainstream titles in terms of the, you know, titles like analytics engineer.
So data engineering, analytics engineering is something that is just growing day on day.
Ayush, I have a question.
You use the term business observability platform, right? And okay, the most common observability, let's say observability started from the need to
like to observe our infrastructure.
Then recently we are also have started talking about like data observability, where we have
like other tools that are trying like to mimic what an observability platform does,
but like for data infrastructure specifically.
Now you are taking this on a level higher, which is the business observability platform.
Now, in each one of these cases, usually we have a very specific role that is related to operation
that is interesting that is using the user of these tools. We have the SREs that they are using
DataBelt, for example. Who is using the business observability?
Like who is the equivalent of an SRE, but for a business, let's say?
Yes, so that's a good question.
So I think so for us, the end consumer of the business observability platforms
are typically the operations team.
So now this operations team is also a very broad term.
So it can span across,
you know, revenue ops, marketing ops, or, you know, sales ops, or even in a lot of cases,
actually the operations, you know, the physical operations of a company. So for us, that typically
the audience are the teams that works with the operations team. You know, in one case, we have
a revenue operations team that is, you monitoring various kps related to finance using cliff.ai so typically for us the audience are the operations
teams the teams that are most impacted whenever any numbers changes in their matrix so for for us
the audience are the operations teams but the enabler of the platforms are the engineering teams.
You know, the finance operations teams,
we don't expect a finance operations teams
to kind of connect to their data warehouse
and actually write those queries
and get those numbers that they want.
So typically for us,
the enablers are the analytics
and the engineering teams.
The consumers are the operations team.
Great.
One last question from me,
and then I'll give the stage back to Eric.
I know that you are also actively
creating a community.
So I'd like to ask you,
and this is something that we have seen
happening a lot lately,
especially with products that really have to do with data.
I think everyone saw the success of DBT and the DBT community.
Of course, communities are not something new.
Open source communities exist for like since forever.
Based on your experience, how important is a community around the data products and how they relate together
or is it just like a marketing tool at the end like how do you see the community what's the
position of the community as part of like your the business value that your company at the end
right like delivers yeah yeah so i i think so that that this is something that I have a slightly different opinion,
probably, you know, would be very contradictory with what opinion that other people would have
is, you know, I think so if we start a community with intention of getting a business value in
return, I think so we defeat the whole purpose of the community itself. And, you know, we started this modern
data stack.xyz community with the whole purpose of, you know, finding a place where people
interested in the modern data stack can come together and create a resource that can help
anyone to learn about the modern data stack. And in terms of the value that is there in the
community, I think so for us, the
modern data stack community that we have built is more about just interacting with the like-minded
individuals about what's happening in the modern data stack. Because the modern data stack is kind
of changing every single day. There isn't something new coming up. There's something,
the categories there, the entire categories are getting created
in the modern data stack very, very, you know, very quickly.
And that's one of the reasons,
that's the only reason we wanted to create this community
to just bring in like-minded people together.
In terms of speaking about the value of the community,
I think so the biggest value of a community
is having an audience or having a kind of, a kind of a set of people connected with you where you can share these ideas.
Because one of the key things that is happening in the whole modern data stack is the emergence of the new ideas.
You know, you know, people haven't heard of reverse ETL, you know, up until a few years back.
People haven't heard of PLG CRNs. People haven't heard of reverse ETL you know up until a few years back people haven't heard of PLG CRNs people haven't heard of business observability and having a community the biggest value of having a
community is having a connection with the like-minded individuals with whom you can share
this whole idea you know no matter how much crazy it sounds you you share that idea with those set
of people and you kind of get a feedback on those ideas.
And from another value perspective is,
we found our first set of customers,
first set of customers from this community itself.
We would share the ideas,
share the things that we are working on at S-Cliff.ai
and we would get the feedback from that community.
So I think that is the value that you get from the community.
And we have been very conscious that we always wanted to create an open community.
We don't want this to be a community by cliff.ai.
You know, even if you look at the modern data stack.xyz website,
you know, we have a very, very small footer at the bottom,
which says, you know, run by the team at the Cliff AI. And one of my close friends, you know, who has been a pioneer
in building an A-B testing platform, you know, this is one thing that I learned from them is,
you know, when they were building a community, they would rarely talk about the product,
their, you know, their offerings, you know, they would just talk about the product, their, you know, their, their offerings, you know,
they would just talk about A-B testing, and they've kind of became kind of a go to,
go to place for anyone to learn about, you know, A-B testing, and eventually that helped the
business itself. So from from my perspective, the, you know, the community is kind of a long term
game. And the goal here is to just bring the like-minded
individual together, not from any specific business goal perspective.
Really great perspective there, Ayush. I agree. I think building a community that provides true
value around subject matter without commercializing it really takes commitment over a long period of time.
You know, it's not necessarily something, it can't, it helps create context for the business
problem, you know, the company that you're building, but really appreciate that perspective.
We're close to time here, but I have one more question for you. So you get to see anomalies
in data all the time from customers who are using Cliff, cliff.ai. What are some of the most interesting
anomalies that you've seen that just sort of surprised both your customers and you?
Okay. So I think, so I, I remember this case where, you know, and it's actually, it's not,
not very surprising, but it was very kind of impactful was when, you know, there's a customer who was, who were
monitoring their marketing ad spend with respect to.ai.
And one day they saw a sudden spike in their ad budgets.
You know, they had set up some, you know, in the Google ads, they have set up their
daily limits, but it was that, that limit was set to be kind of decently high. And what happened
was, you know, there was one footballer who kind of mentioned a term that is that is what one of
the target keywords and what they saw is they they had a sudden spike in their ad spend. And that was
coming from a completely irrelevant audience. they saw a spike and they kind
of turned on their bidding on that particular keyword for the specific duration of time.
So that was very interesting.
Yeah.
But I mean, really useful to like be able to catch that pretty quickly.
Wow.
That is so funny.
Amazing how, you know, someone with a huge following just mentioning something can impact,
you know, a company's ad budget. And, you know, and they were a B2B SaaS company that has nothing to do
with, you know, what that footballer mentioned. Yeah, that's hilarious. Do you remember what
team the footballer played on? You know, I can very vaguely remember who was that footballer,
but I think so it was someone from Chelsea.
Ah, okay, Chelsea.
Hilarious.
Well, Aish, this has been a really fun show.
Thanks for joining us and best of luck with Cliff.ia.
Thank you.
Thank you very much, Eric.
Thank you for having me.
Really interesting conversation. One thing that I really appreciated, and it's one of those things
where you kind of know it in the back of your head, but since you're experiencing it every day,
you don't really think about it, but it is pretty wild the number of ad hoc analyses that are
created and require a lot of work and then are basically thrown away. I mean, Google Sheets for every company must be a massive graveyard of ad hoc analyses, you know, and in some ways you have, you know, saved queries on the warehouse that are similar and thinking about the paradigm shift to sort of monitoring or observability in
the context of your stable metrics and KPIs is really interesting. So yeah, it made me think
about how many Google Sheets I have in my drive that, you know, are old stale ad hoc analyses
that were really useful for 15 minutes and then I've never looked at it again.
Yeah.
Yeah.
I think everyone can relate to that.
And I think anyone that comes from a BI background, they can definitely feel the pain of what
it means to maintain all the different dashboards that the company has.
I think it's much more evident early on in the life of a company because there's no
clear ownership of
the BI process. Everyone is like a
BI analyst in a way. And I think it's a
problem that BI tools have struggled
to figure out the solution for
forever. And I don't think that they have managed
to do it efficiently, to be honest. Mainly
because it's not just a technological
problem, it's also an
organizational problem.
But ad hoc analysis is very important, that's for sure.
We always need to ask ad hoc questions.
So, of course, we have to do that also with our data.
Now, how we manage all this garbage that's created from there,
that's something that we discussed today with Ayush. I think there are two things that I found really interesting in our conversation.
One is the fact that this space of operationalizing data becomes richer and richer.
If I remember, our audience probably is already familiar with reverse ETL, but that's only
one manifestation of how to operationalize your data. We had another guest who, the CEO of Airbyte,
mentioning that machine learning models are also operationalization of data,
which is a very valid point.
And I used today mentioned continual AI, for example, which is exactly that.
And of course, like all the like Tekton and all the different feature stores at the end,
that's exactly what they are doing.
And now we have business observability,
which is another way to operationalize like the data.
That's very interesting.
And I'm really looking forward to see what else will come up.
And the other thing is that how big of an impact
the SRE discipline and the operations, come up. And the other thing is that how big of an impact the
SRE discipline and the operations,
the engineering operations discipline
has outside
of just like managing
the infrastructure,
the IT infrastructure of the company.
We see that like
getting repeated in data
and now we see it also like
in business, where we have like these themes
of like rev ops marketing ops sales ops and all these different like roles that they arise and
of course like all that stuff like is built on the availability and accessibility of data today
so yeah this was like a very very interesting. And I'm very curious to see how this business observability category is going to evolve.
I agree.
Well, thanks again for joining us on the Data Stack Show.
And we will catch you on the next episode.
We hope you enjoyed this episode of the Data Stack Show.
Be sure to subscribe on your favorite podcast app to get notified about new episodes every week.
We'd also love your feedback.
You can email me, ericdodds, at eric at datastackshow.com.
That's E-R-I-C at datastackshow.com.
The show is brought to you by Rudderstack, the CDP for developers.
Learn how to build a CDP on your data warehouse at rudderstack.com.