The Data Stack Show - 146: What Is a Customer Data Platform? Featuring Soumyadeb Mitra of Rudderstack
Episode Date: July 12, 2023Highlights from this week’s conversation include:Soumyadeb’s background and journey in data (5:49)Defining customer data (8:10)The complexity of customer data collection (10:04)What is a CDP and h...ow it is properly deployed (17:12)Bridging the gap of data collection and useful analytics for marketing (21:46)How Rudderstack translates data and the new profile feature (25:30)The foundations of data in building a 360 degree customer profile (30:30)Solutions for the intersection between engineering and business users (34:35)How AI and other future technologies will impact data (41:14)Final thoughts and takeaways (46:30)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Transcript
Discussion (0)
Welcome to the Data Stack Show.
Each week we explore the world of data by talking to the people shaping its future.
You'll learn about new data technology and trends and how data teams and processes are run at top companies.
The Data Stack Show is brought to you by Rudderstack.
They've been helping us put on the show for years and they just launched an awesome new product called Profiles.
It makes it easy to build an identity graph and complete customer profiles right in your warehouse or data lake.
You should go check it out at ruddersack.com today.
Welcome back to the Data Stack Show. Custis, we have a pretty special guest today,
someone who you and I both have worked for, I still do work for, Sumi Deb, who is the founder
and CEO of Ruddersack, actually, who helps us put the show on, which is really great.
So it's going to be really fun to talk with him about all sorts of things.
One thing that I actually am excited to chat about that I've chatted with Sumi Deb a bunch
about over the
years, but it'll be nice just to have a casual conversation, you know, about it is
sort of the history of tooling around customer data. It's been very marketer centric.
And it really, the way it seems to be shifting back towards the data team,
you know, because data is sort of a fundamentally technical problem.
So I think it'll be interesting to get his perspective on that because he's done a lot of work over the years around on data teams, ML teams, etc.
With a focus on customer data.
So I think that'll be a great topic to cover.
But what do you think?
Yeah, I think, I mean, outside of like the very interesting, like, I don't know,
they're personal things that we can chat about because we've worked together with
Shomi and like we work with him from a very early stage of the company.
Right.
So there are many interesting stories to talk about there, but outside of this
and like things that like, I think he's probably
like the most appropriate person like to talk about is to help us understand
like a little bit better, what is this whole thing about like customer data?
Like, why do we have like customer data?
Why do we need like to differentiate these?
So the data that we have, right.
Why do we need like almost this? The rest of the data that we have, right? Why do we need almost a complete different category
of managing this data?
Why we have CDPs?
What is a CDP, right?
Like all these terms that they have been used
and maybe also abused a little bit in the market
in the past couple of years.
And we're still trying to figure out exactly what all these things are.
I think we have the right person to talk about giving like some very
concrete definitions and because, okay, we are addressing primarily
like data practitioners, what does this mean like for like a data engineer
or like a data analyst or a data analyst?
All these people that they get requests to work with this data.
And yeah, Somya is the person who can give us the whole spectrum, from the data engineering
side of things to how the marketer is actually working with this data and why.
So we're definitely going to talk a lot about that stuff.
Yeah, that'd be great.
I don't know if we've had a sort of customer data platform type conversation on the show yet.
We've maybe mentioned it, but I think we haven't done like a deep dive,
especially as it relates to data infrastructure.
So that'll be great.
All right, well, let's dig in.
All right. So hello, everyone, and welcome to another episode of the Data Sack Show. And
this is going to be a very special one. First of all, because as you have probably noticed,
I'm the one doing the introduction to the episodes, which means that I'm going to be alone from what it seems like as the host.
But we also have an extremely special guest today,
which, first of all, I can call a friend,
but also it's a person that I've spent maybe a little bit more than two years
working at Radarstack and seeing the amazing journey
of the company starting from almost zero to becoming the company it is today.
So we have Soumya, the CEO of Rutherstack, who, by the way, and that's probably something that not many people know, but the
first, first idea around having a podcast actually came from him.
And he had this idea when we were working together at Rutherstack.
And I took it and started working on it.
And then the rest is obviously history.
But this history also have been supported a lot by him.
Because as you know, the DelaStack show,
it is a very independent show,
but we have the support of RadarStack to keep the show as it is today.
So welcome and thank you so much, Assamadev.
How are you?
I'm very well.
Firstly, Astaz, really great to be doing this show with you.
Thanks a lot for the very kind words about me and RadarStack.
I mean, I have been a follower of the show.
I mean, being a sponsor is the easy thing.
You have built an amazing show.
Like having those in-depth conversations.
Again, kudos to you for setting this up for success.
But yeah, I'm very glad to be finally
having the opportunity to be talking to you in this show.
So thanks.
Yeah, yeah.
It's actually, it took us a while to make it happen, right?
It's probably more than, well, it was like three years that we're on now the show.
And it's the first time that we are here.
So that's super exciting for me.
But before we start, let's do what we usually do with all of our guests.
And I'll ask you about, give us like a brief background of yourself.
Who are you, what you have done and what led you into building RadarStack?
Yeah, that's a great question.
So I can maybe go like reverse chronology.
I've been doing RataStack for the last four years.
I'm the founder and CEO.
We started in 2019.
Before RataStack, I spent a year in a company called 8x8
as a part of their data team,
leading some of their machine learning,
customer data initiatives,
and building use cases on top of customer data.
And some of the experiences and challenges I ran into at that company
prompted us to start Runnistat.
Now, before that, I was the co-founder of a company called Mariana IQ.
We were building almost like the next generation AI-driven marketing automation system.
Those were the early days of deep learning and so on.
And we thought maybe this is an opportunity to transform the way marketing is done.
We were definitely early, both in terms of the tech,
but more importantly, in our customers' data completeness.
So even though we worked with really large brands,
none of them had good customer data.
And if you don't have data about your own customers,
there's not much ML you can do.
So I learned a lot of lessons in that company,
ended up selling to 8x8,
tried to do a similar thing inside of 8x8.
And then again, there's very similar problems
around collecting data, unifying data.
And that's what hopefully we will...
We have kind of built something in Rattlesnack
and we'll hopefully solve this someday.
And yeah, prior to that,
I worked in a company called Data Domain.
I don't know how many of you have heard of it,
but this is Frank Slootman,
who is now a big shot.
That was his first company as a CEO.
So I learned a lot from that company.
That was my first startup experience,
my first real job experience,
and probably the only one.
That's like a quick overview.
And I have like a PhD in data.
So I've kind of like worked in the data space
pretty much all my life.
Yeah, that's amazing.
And okay, I have like a question that I have like from my side,
like I think like I know what it is the answer,
but I think it's very important like to hear your definition on that
and share it also like with our audience because we talk a lot about it,
but I'm not 100% sure that like people like, you know,
like share the same semantics or deeply understand what
it is.
So the question is simple.
What is customer data and why it's not just any other data and we need to treat it differently?
Yeah, that's a great question.
In fact, even on that, I don't think there is a consistent definition.
Everyone has a different view of that.
But in the simplest form, one way to think about it is,
if you are a B2C company, let's say you are a company
who is selling stuff to consumers.
Those consumers interact with your brand, right?
And they do that over many different channels.
They're probably coming to your website
and doing things on the website.
They're probably going to your mobile app
and taking actions in the app.
Or they may be calling your call center.
They may be going to your store.
They may be making purchases and so on, right?
Now, each of this interaction with your brand
produces some data. A transaction produces
a transaction data. Similarly, somebody coming to your
website and clicking on some products and browsing
your catalog, that produces data. What products they looked at,
what products they clicked, what they added to the cart. Of course, different data
has different value.
Your website click data may not be as valuable as transaction data.
But in a loose way, all of this data can be called customer data.
All right, that makes total sense.
So we have these, let's call it, behavioral breadcrumbs, right?
Of the user, of our customer out there.
And we want to collect them.
Let's start with that
because I think that's like one of the first big,
let's say, challenges that we have to go through.
What it means like to collect this data
because you mentioned
there are like many different channels, right?
And from what I understand,
like we are talking about different channels that can even be as diverse as talking about physical
channels. Like someone enters your shop and makes a transaction through a POC machine
over there. And at the same time, they forget something, they go out and they make an online transaction to buy from your firm again.
It's really diverse.
So let's talk a little bit, actually two things that I want from you on that.
Give us a little bit of more color on how complex this process of collecting is and the history behind it.
I know that for the past 10 years,
starting with segments,
there's been a lot of innovation in the industry,
but there's also many things have changed
from back then to today.
So I'd love to hear from you,
share with us your experience on these two fronts.
Yeah, great question.
So let me start with the complexity
and then I can comment about the history.
And you pointed it out already.
The complexity comes from literally three things.
One is the diversity of sources.
You have, as you mentioned,
you have your Android app
and you have your website
and the point-of-sale device,
your backend systems
and transactional systems and so on.
So you need to collect data
from all of these places.
So you have to have SDKs
that your developers can embed
and all that stuff. So you have to have SDKs that your developers can embed and all that stuff.
So nothing rocket science, but
it does require a lot of engineering effort
to build this out. That's number
one. The variety
of data sources. The second
is around
I would say the volume of
data. If you are a reasonably sized
consumer
company, you're talking about
anywhere from millions
to billions of events per
day. And you're
working with some customers who are at peak
sending a million
events per second. So you
have to set up the backend infrastructure
to handle this volume
and so on. So again,
these are not rocket sciences,
but these are engineering problems.
You have to set up a team to do that.
And this is not core to any...
I mean, this data is extremely important,
but setting up this infrastructure
is grunt work that most companies
don't want to do.
The third,
and I think the most important problem is consistency. And I think that
is what is often overlooked. Like what I mean by that is like, what is the goal of collecting all
this data, right? You want to collect all this data and you want to personally send it to downstream users of the data.
You want to send it to, let's say, like a tool, product analytics tool like Amplitude or Xman.
Or you want to send it to like a marketing tool like Braze or Salesforce or some other marketing cloud.
Now, each one of them expects this data in a slightly different format.
They have their own APIs.
They have their own APIs, they have their
own standardization. So if you have to build this infrastructure from grounds up, you have to
handle that. You have to make sure whatever schema and structure you're using for your behavioral
events, that can be sent to all these downstream destinations. You have to manage those
translations. Alternatively,
you can embed their SDKs
but even the SDKs
expect a standardized event
format. So you have to manage
that yourself. And each one is slightly
different. The same goes with
user identities.
Identity management is hard.
People come to the website
anonymously and they browse things anonymously. But you can still track that activity by setting
a cookie wherever it's possible. Let's say with the mobile device. And then like once they log in,
you may have an email or like an address or like a phone number. And you have to like stitch all these identities and you have to manage those identities.
So this standardization of like,
what should I be calling the events?
What are the events should I be collecting?
What are the properties should I be collecting
with those events so that I can send them
to downstream destination?
Like this standardization is a lot of work
that a lot of vendors have to like do from scratch
again and again.
And that's kind of like the three main pillars. I mean, data, the variety of sources,
variety of the volume of data, and standardization. That comes primarily because
there are all these downstream users of this data. Now, if you look at this space,
almost like if you look at early 2000,
then the number of channels that somebody would interact with a brand
was fairly low.
You go to a store, you buy things.
This was pre-mobile, early days of web.
So this was not
really a problem. I think that the explosion happened in like after iPhone, where like
pretty much like everybody, or maybe like slightly earlier, like web became an important channel.
And then like over time, mobile become an important channel. So your sources exploded,
and so did the volume, right? So. And that's kind of one end.
And on the other side, number of destinations also exploded.
When people had a specific tool for running email campaigns
and another tool for running push campaigns.
So you have to get this data to all these different destinations.
That complexity suddenly exploded in the 2000s,
like I would say early 2010s.
And that's where the space, the technical problems
of the customer data platforms came into being.
Segment was almost the early leader in the space.
They built this multiplexer, right?
I mean, you collect from all these places,
send it to Segment, and they can pack it out
to all the destinations.
So Segment was the early mover,
but then other companies came, like M Particle came like Helium and so on. They all kind of
handled some version of this problem and Runnistack was. We are in the same space,
hopefully the last company in this space. All right, that was very insightful. So
my next question is a little bit similar to how we opened the conversation.
I asked you at the beginning what the definition of customer data is.
But I think now that we've talked a little bit about the problem, and also we mentioned
a couple of vendors in this space, there is another concept that has many different definitions.
And that's the concept of the CDP, right?
The customer data platform.
And I'd like to hear your take on that.
Like what is a CDP, right?
And is Rutterstack a CDP?
Yeah.
CDP is probably the most wrongly used term because everybody and anyone is a CDP now.
Anyone who touches the customer data, they're calling themselves a CDP.
So, yeah, let me take a stab at defining CDP the way we want to. At a fundamental level, the problem is
what I described earlier. You have all these sources, you are generating all this
ton of data, you have to get that data and to all the downstream destinations. So a tool which can support that is a customer data platform. But then the space evolved into these initial data
multiplexers, realizing that, okay, we have all the data, everything is flowing through us. Why do I have to just multiplex the data?
We can provide more value-added services, right?
We can stitch all of these different identities
and create what is called a customer 360,
like a golden customer record.
And then we can let our customers come in
and run marketing campaigns on top of that.
They can come in and create audiences.
And an audience is, let's say, a list of people who have come to the checkout page but did
not purchase.
That's a cart drop-off audience that you want to run a marketing campaign against.
So all these initial data pipelines, companies, they realized,
okay, now we can provide these value-added services.
We can create this customer 360.
We can provide this audience tool.
We can provide the activation tool. Activation means taking that audience
and sending it to something like Facebook
so that we can show them ads.
So that was one evolution of CDP,
data multiplexers providing
more customer 360 audience capabilities.
At the other end of the spectrum,
there were traditional
marketing automation companies
which had that golden customer record, right?
Like think of sales,
think of CRMs like Salesforce
or marketing tools like Pardot and Mercato.
They all had customer records.
They had the emails.
They had phone numbers and all this stuff.
And then those were traditionally used for running email campaigns and other sales and marketing campaigns. Now, because they had the customer record,
they figured out that,
why don't I layer on this behavioral data
and I can provide more insights
to personalize these campaigns.
So they also evolved into collecting this first-party data,
augmenting their capabilities to provide this customer 360 and segmentation capabilities
and so on. So they also call themselves a CDP. So now you
have this space with a mishmash of these data pipeline companies
providing some capabilities on audiences or the traditional audience
companies providing data pipeline capabilities and data collection capabilities.
And this entire space is now a CDP.
And in that sense, Datastack is also a CDP.
We help our customers collect data.
We help our customers unify, create that customer 360, and activate it to all the downstream
tools.
Where we differentiate is all of this happens on top of the data warehouse.
We don't store any data.
This happens on top of the data warehouse, right? We don't store any data. This happens on top of customer snowflake.
I mean, I can go on and on,
but that might be a topic for a separate conversation,
but that's how we position ourselves in this market.
Yeah, yeah, that makes total sense.
And, okay, let's get a little bit deeper
into the data warehouse part.
So what is the added value to the organization
by delivering their own data,
their own customer data into the data warehouse
and then start building on top of that
the different layers that the organization needs
to reach the point of having this customer 360 view
or the customer golden record, like the
customer golden record.
And how do we also bridge the gap from going from that and data result, let's say, data
product result to actually having the analyst, not the analyst, but let's say the marketeer use this information to
go and do marketing, right? Because I would assume that if I'm a marketeer, like the last
thing that I want to do is like play around with databases. Like what I want to do is probably
focus on my marketing tools and being able like to run my campaigns, generate revenue, and all that stuff, right?
So how do we bridge that gap there?
Yeah, that's a great question.
In fact, that is the reason
a lot of the initial customer data platforms
came into existence.
You had these traditional marketing tools, right? I mean,
you'll use Salesforce or something else. The marketeer would use that and they would complain
about, okay, we don't have data. We need web data, we need mobile data to personalize their
experience. I don't have that data. They would go to IT and say that,
okay, can you set up these data pipelines
to collect data from the website,
from the mobile app,
and send me the data into my downstream tool?
IT would say, oh, this is the 10th project in my list.
Plus, I don't even have the capability
to write these SDKs and create like this data pipelines and manage the pipelines.
So, so that led to this whole space, right?
And then marketers thought, okay, we need some of the tools.
And I mean, I don't want to like, like wait for IT.
So they'll go and buy these vendors and these vendors like segment and so on.
The early adopters, early players in the space, they would say like, Oh,
here's the SDK, you
engineering team, just embed this SDK
and that's all, your marketing team will be
off you, they will not come and bother you anymore.
All the data will magically
start flowing into their tools
and so on. So like, happy ending.
I think that
matured the market quite a bit,
right? I mean, like, by no means
it was a failure.
I mean, we have, like, segment got acquired by Twilio.
There was, like, huge customer base.
I think the, when it started failing,
was the around completeness of data, right?
I mean, IT was always a laggard in 2010.
I mean, IT could not set up this infrastructure to collect data, store data, of data. I mean, IT was always a laggard in 2010. I mean, IT could not set up this infrastructure
to collect data, store data, process data.
That changed by late 2020,
like the late last quarter,
like the last decade, right?
Where the people started buying data warehouses
and investing in data warehouse technologies.
They started centralizing a lot of that data. started buying data warehouses and investing in data warehouse technologies, they started
centralizing a lot of that data.
So that is what is triggering the new wave of CDPs.
But the traditional CDPs try to address the exact problem you're talking about.
That makes sense.
Yep.
Makes total sense.
All right. So we have, let's say like in these past decades,
there's like a lot of data related infrastructure that came into the market
that has changed like a lot of the dynamics around what can be built on top of the data that the company has.
So we have tools like to collect the data, put them into the data warehouses.
Data warehouses are quite easy to manage because they
are on the cloud. We have tools for doing modeling and manage modeling with like dbt and the likes.
But still, even if we solve the problem completely of delivering the data into the data warehouse, there still is, let's say, this process of taking this
raw, noisy soup of events about the user,
like all these breadcrumbs that we put into a basket, in a way,
and we need to transform it into something that can be digested
by an analyst or even a marketeer.
So how can we do that?
How can we do that with RutterStack?
Yeah.
So before I answer how can you do that with RutterStack, let me briefly explain how can you do that without RutterStack?
And what are the pain points?
And I think that will help understand the value of what RutterStack does. are the pain points. And I think that will help understand how do you do the value of
what RutterStack does.
You're 100% right. That is
the problem. You get all these data streams.
You use the tool like
RutterStack segment, your homegrown thing,
and you collected all these behavioral
data into your data warehouse.
You have 20 different tables
of 20 different events.
Then you bring your ETL data, again, through RutterStack, 5Tran, whatever, you name it, some other ETL tool.
And you end up with another 20, 30 different tables.
Now, what you are trying to get out of all of this is a clean customer view.
Think of it as a one-rope-per-customer with a bunch of attributes computed
for the customer. That's all
a consumer of the data,
whether it's analyst, whether it's marketeer
cares about it. And when I say attributes, these are
things like total revenue for
the customer, like how many times they have come to
the checkout page, what are the recent products
they have looked at. You can think of all
these features, like unfunnel features.
Have they come to the checkout page but dropped off? These are
all the features that you're computing
for the user, which your downstream
users of the data care about.
So you have all this raw data
on your left, and you want to get this
clean customer view on top of your data
values. Now,
how do you do that? Traditionally, you will
go and hire a team of data analysts
who will come in and
write the SQL. Some of this could go into... Traditionally, this was hundreds and hundreds of lines of SQL. DBT almost was a big force in this space. Instead of you having to like manage, like poorly manage the sequels.
Now you can apply software engineering best practices
with your dbt, like with dbt,
that you could organize them into projects
and then take them into GitHub.
So dbt brought a lot of sanity in the space,
but it still had to go and write those transformations
that you have to write,
how to figure out stitching identities,
which is a hard problem to do on top of SQL.
You have to figure out how to create features
or funnels, which is, again,
funnel is hard to do in SQL.
So you have to still go and write these
with a team of data analysts.
But the biggest problem here was like,
it's not just one time that you hire a team
and that they come up with a clean three
customer 360, right?
Every time your market here wants a new
feature, let's say they
want total revenue in the last
seven days. They
want something else in the last 15 days.
You have to go back to the data team.
They will go and update the models.
Then they have to push to production. And we'll take
probably anywhere from one
sprint to a couple of sprints to even get
this out rolling. So that is the state of the art where you have to hire and like go through this
slow painful process right now that's what we are trying to solve with RataStack. RataStack's
vision was to like enable this end-to-end workflow on top of a data warehouse but something which
does not require this painful process right We are launching this product called Profiles
which simplifies
that. Number one, you can
define all these things in a very high
level language. You don't have to write complex SQL.
We also
have a UI around it. So even a non-tech
person can come in and define the features.
Everything goes into
a country which can be
checked into Git.
So you still have software engineering best practices,
but then it exposes this process to non-tech or non-SQL experts.
That's what profiles help our customers with.
Okay, that's awesome. I have a couple of questions
here. So, first of all,
let's talk a little bit about
the
foundations of building a
profile, right? You mentioned
a couple of things. You mentioned about identity
stitching
and a couple others, but
is there a minimum set
of operations that
there's no way that you can avoid when you
have like to go and build this like customer 360 table right like this table where you have one
role per user per customer and then like a number of columns each one representing something like
doesn't matter what but what I like if we want to define the minimum set of problems and operations that
somebody needs to do there, what you would say are the fundamentals?
Yeah. So there are literally three things that need to happen to get to this customer 360 from your
like dirty data right i mean on the left you have all these like events and like
and etl sources and you want this like clean transform data on the right right
the first step is the step on identity stitching or id resolution Like, as I was mentioning earlier,
you have all these different identities about the user, right?
Somebody comes to your website, you assign it a cookie ID,
and then they provide their email, so you have your email.
Similarly, the same person comes to the mobile app,
they have a device ID, and then they provide their email.
And then now you can stitch all of these people into the same user, right?
So when you're computing a feature like total number of times somebody has come to your particular product page,
you have to combine this mobile activity and the web activity based on these identities, right?
So that's like step zero that needs to happen, like stitching all these identities.
And it's actually a hard problem because it's not just like one level IDs, right?
You can have like multiple levels, right?
You have like an email which joins with a phone number,
that phone number joins with an address.
And some of this could be like non,
like address is a good example,
but it's not like the domain state.
So you have to create this ID graph
and stitch all of them into like one ID.
That's step one.
The step two that needs to happen is like,
you have to now define these features.
Total number of times somebody has come to the login page.
Total number of times somebody has viewed a product.
Total number of times total revenue.
These are all interesting features.
But then every business may be caring about features that are important to them.
So you cannot have a static set of features.
You want this flexibility where anyone can come in
and define those features.
And what I mean by anyone is like,
not necessarily just a data engineer or data analyst, right?
You want a marketing person who is using that feature
to be also like come in and define features
wherever it makes sense, right?
So you need this like additional layer
where multiple people can define and contribute
features. So that's the second step that needs to happen. The third step is actually what is
called some version of time travel. It's not enough to just compute the features
at today's time. There are use cases like, let's say,
training a machine learning model. You're trying to train a churn model and to do
that you need to compute the features which go into the model at the point of
churn, not today. So a user churned six days ago, so you want their features at
that point. So you need this, you can call it some kind
of a lightweight feature store, where you're not just computing and skipping track of the today's
feature today, but you should be able to go back in time at any point and compute that feature at
that time. And this is a hard problem. So you need these three things to happen to create a usable customer 360. Okay, that was super, super interesting.
And you mentioned something that I find very fascinating as a problem.
And I'd love to hear how you rather stack is dealing with it.
And by the way, it's one of the reasons that I'm really personally attracted to work together in harmony, right?
So you put it very well.
You mentioned we have the data engineer on one side, but we want also to allow the domain experts, and the domain expert obviously is like the marketeer here,
to be able to define and express what they need, right?
So, how do you
do that? Like, how
do you deliver
like a product experience
that can resonate
both with
the engineering persona
and the marketing persona?
Yeah, that's a great question. And, I mean, by no means with the engineering persona and the marketing persona?
Yeah, that's a great question. And I mean, by no means I can claim
that we have solved this problem, right?
This is almost like, I don't know,
anyone who solves that will get like,
whatever is the equivalent of Nobel Prize for data engineers.
But like, ProFence is an attempt to do that, right?
And the way I think about this is,
as you rightly so you rightly pointed out,
there are all these different personas that need to come together to work on top of this customer 360, right?
I mean, if we take that example, there are like data engineers
who are responsible for producing the data and cleanly modeling the data.
And then there are marketing people who are using that data
and they might want to define their own set of features,
like their own funnels.
And you want them to come together.
Now, I think, yes, your product experience
has to bring them together.
But there are also like boundaries.
There are things the marketing person does not want to do, cannot do. A good
example is identity sticking.
I mean, often
as a marketeer,
you know the data sources. You know
this is my website,
this is my mobile app.
You really know
the nitty-gritty details of how
IDs are generated on these apps and how
they are stitched together and all that stuff.
So that is a problem that is best
left to the engineering team.
Similarly, there are
things like
what
to call an event.
Should it be called
product underscore purchase? It's a simpler problem,
but still, somebody has to take care of it.
So that, again, can be left to the engineering
team. So there are
things that engineering team
has to contribute, like the IT
stitching rules and how does it happen
to create this, like, some
version of initial customer 360.
And
then you want your marketeer
to come in and build on top
of that, right? What does a marketing persona care about?
They want to create funnels.
When you want to say, give me all the people who have done X but not Y.
That funnel step should not require going back to an engineer every time.
You want that user experience to define funnels.
Now, the funnels are defined on events which are defined by engineering.
They define the properties
and they make sure that the events are clean.
But it comes to, yeah,
but the ability to create funnels
should be exposed to the marketing person.
And there are other simpler events also, right?
I mean, total number of times
somebody has done a page view, right?
I mean, in the last X days,
that is a feature that, again,
should be exposed to a marketing
person. It shouldn't have to go back to engineering.
So, now
these are the
profiles product kind of enables
this use case where a data engineer
can come in and
define these
ID-stretching rules and these complex
features into a config,
commit to a repo, push it to RutterStack,
and then on the RutterStack UI, a non-tech person can come in
and build on top.
They can build funnels on top of these features defined by the data persona.
And they can define other simple features.
And everything goes back to the same config,
the kind of link to the core config that was built by the data team.
So, so that's what we have done.
There are problems we have not solved, but there are things which actually
require, which don't have that clean boundary.
A good example is like, what event should I be tracking?
Right.
A marketing person may be interested in a new feature.
Maybe they are saying that, okay, I'm interested in
how many times somebody has come to a specific
page. But then
the event for that may not be even
present today, right? So somebody has to
go back and go and instrument the event, which
again, the marketing person cannot do, right?
Like somebody has to now go and
instrument the event and then
make sure that the right properties are captured.
That's a complex workflow that, again, I don't think we have solved but hopefully at some
point we'll get to it. Yeah, 100%. That's a great way to describe in a pragmatic way
the kind of problem that has to be solved here and how hard it is and i do find very refreshing to hear like from someone about
like boundaries because many times and like what especially like we see from products that they
start from a more engineering let's say mindset usually are very absolute in terms of this is
like how things should work right like they try they try like to impose, let's say,
a way of doing things, which of course, like, okay, it might work like for people that are like,
like minded, like engineers, but you're your peers, but you can't really go out there and
like ask someone who's a marketeer, like to change the way they think, right? Like there is a good
reason that they think the way they do. And that's because that's what helps them deliver the maximum in whatever they have to do right so having these boundaries and
use these boundaries like to develop a well-defined like user experience on a product i think is key
so i'm very curious to see like how this has been implemented as part of this new product in RutterStack.
But we are close to the end here,
and before we close,
I'd like to ask you
something about something
that relates to
a term that you use
sometimes. You mentioned
the term feature store,
which is obviously very related to machine learning.
But we are also living in very interesting times.
There's AI out there.
There are some very new ways of interacting with a machine through interfaces like ChatGPT and all that stuff.
And of course, all these things, all these new technologies,
they are data-related technologies.
They are based on that.
If we didn't have the data, we wouldn't have the models.
Based on your experience, and I'm talking about here your like your whole experience right like starting from
like everything that you have done like in in your career so far uh how do you see the future
what do you see next and how do you see like these new technologies and paradigms affecting
customer data and the space you are in?
Yeah, that's a great question.
And that's something like we talk about quite often in our company.
So if we take a step back, right? I mean, we always wanted to have this holy grail of like one-to-one personalization, right?
Anything that you hear from a brand
should be perfectly tuned for you, right?
I mean, based on what your interests are
and what your desires are and so on, right?
Now, there were two problems
to make that happen, right?
Number one was like
to have all the data about you.
I mean, unless I know you, how can I even personalize? So having all the data was the first step.
The second step was, even if you had all the data, how do you personalize?
If I have a million users and I know everything about them,
their likes and their dislikes,
while interacting with the brand,
outside of the brand,
and what else?
Even if I know something,
if you ask a human to come in
and draft the perfect message,
they can do that.
But how do you make a machine do that?
So neither the data problem was solved,
nor the ML problem was solved.
That's what we tried to do in my previous company. And then we kind of struggled on
both the fronts. Now, what ChatGPT has done is hopefully solve the second problem. Somehow
magically, if you can feed in all the data, like I tell it that okay, these are all
the products like Kostas
has looked at in the past. This is
where he lives. This is what his
interests are. Craft the perfect
marketing message. I mean, you'll have
to do some prompt engineering, but I think
chat GPT can give you a good
enough answer, right? Like an answer
that is personalized to you. You could not
do that earlier. That's why you have to do
this broad segment-based marketing and you have to create
segments for all people in San Francisco.
I'll do something. And all people in
New York, I'll do something else.
Those days will be gone in like five years.
Everything will be personalized
and all the generative AI
techniques will make that happen.
Now, like we are
I mean, we are not doing generative AI, but we will be using that.
And a lot of other brands will be using that. But the data problem still has to be solved.
You still have to get everything that you know about a customer to call into these generative
AI techniques. And hopefully that was not a big problem because you couldn't do much with the data anyway. But this problem will explode over the next 10 years and hopefully
we will have a role to play in that data problem, if that makes sense.
Yeah, absolutely. All right. That was an awesome conversation that we had. I hope we are going to
repeat it much earlier than after another three years.
So I'm looking forward to have you back on the show, Somia.
But before we go, where can our listeners learn more about both BrotherStack, of course, and also the new product?
The best way to do that is go to our website.
We will be launching it on our website.
Request a demo.
We will also do a Hacker News show.
So yeah, that's kind of one channel.
The other is like hit me up on LinkedIn.
My first name is clearly unique, Somnadeb.
So there aren't too many
of Somnadevs in the world. So it should be easy to find me on LinkedIn. So hit me up
and I'd love to get feedback from anyone who is interested.
All right. Thank you so much, Somnadev.
Thanks, Kostas, for having me. I really enjoyed chatting with you. All right, Costas.
What were your big takeaways from this?
And the reason I'm so interested is,
I mean, A, you worked with Simidev at Ruddersack.
You've built tooling that had a pretty heavy emphasis
on customer data and getting it into the warehouse.
So what were your takeaways?
What do you think about his thoughts on CDP, the landscape, et cetera?
Yeah.
There are plenty of very interesting insights in the conversation that we had with Tomia.
First of all, it was very interesting to go through like the history of this category right
like how things started like more than 10 years ago and how they are still evolving and how although
you know like every time like that you have like a cycle in the market like it feels like the
problem has been solved but actually it's just like the beginning of another iteration of like getting closer
to the solution, right?
Like, so it was very interesting, like to hear all these things about what started like
the first iteration of these platforms, right?
With segments and even before that and where we are today.
Like, how do we work with this data today and like how much we still
have like to build out there, right?
That's one thing.
The other thing that I found like extremely interesting, and I think
it's like one of the most interesting challenges in this type of products that are like very data oriented, is that you never have only one persona involved, right?
And I think CDPs or like, let's say, customer data related infrastructure is like probably one of the most exaggerated of these. Because you have all
the data infrastructure that you need. You even have the application developers, right?
But at the end, you have the marketeer. And the marketeer is who is actually going to turn
all this work that has happened before into actual value, right? And the marketeer is who is actually going to turn all this work that has happened before into actual value right and the marketeer is like a very different persona compared to the rest so it's
very interesting like we had like a very interesting conversation about the difficulty of like building
products that can you know satisfy like all these different personas and of course we had
like as part of that like we also had the opportunity to see like
what Radarstack is doing today,
new products, new solutions that Radarstack brings
like to solve all these problems.
So very interesting conversation.
Soumya does not talk that often
or like as often as he should, in my opinion,
because he's really good at helping us understand these complex concepts.
So I would suggest everyone to tune in and listen to the conversation. And there is also like a very interesting fact
shared about the origin of the show.
So I'm not going to say more about that,
but people should just listen.
Sneaky.
All right.
Well, tune in for some insider information
and a complete breakdown of Customer Data Platform,
Customer Data, the whole nine yards. Subscribe if you haven't, tell a friend our information and a complete breakdown of customer data platform, customer data,
the whole nine yards. Subscribe if you haven't, tell a friend, and we will catch you on the next one. We hope you enjoyed this episode of the Data Stack Show. Be sure to subscribe on your
favorite podcast app to get notified about new episodes every week. We'd also love your feedback.
You can email me, ericdodds, at eric at datastackshow.com. That's E-R-I-C
at datastackshow.com. The show is brought to you by Rudderstack, the CDP for developers.
Learn how to build a CDP on your data warehouse at rudderstack.com. Thank you.