The Data Stack Show - 146: What Is a Customer Data Platform? Featuring Soumyadeb Mitra of Rudderstack

Episode Date: July 12, 2023

Highlights from this week’s conversation include:Soumyadeb’s background and journey in data (5:49)Defining customer data (8:10)The complexity of customer data collection (10:04)What is a CDP and h...ow it is properly deployed (17:12)Bridging the gap of data collection and useful analytics for marketing (21:46)How Rudderstack translates data and the new profile feature (25:30)The foundations of data in building a 360 degree customer profile (30:30)Solutions for the intersection between engineering and business users (34:35)How AI and other future technologies will impact data (41:14)Final thoughts and takeaways (46:30)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You'll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by Rudderstack. They've been helping us put on the show for years and they just launched an awesome new product called Profiles. It makes it easy to build an identity graph and complete customer profiles right in your warehouse or data lake. You should go check it out at ruddersack.com today. Welcome back to the Data Stack Show. Custis, we have a pretty special guest today,
Starting point is 00:00:40 someone who you and I both have worked for, I still do work for, Sumi Deb, who is the founder and CEO of Ruddersack, actually, who helps us put the show on, which is really great. So it's going to be really fun to talk with him about all sorts of things. One thing that I actually am excited to chat about that I've chatted with Sumi Deb a bunch about over the years, but it'll be nice just to have a casual conversation, you know, about it is sort of the history of tooling around customer data. It's been very marketer centric. And it really, the way it seems to be shifting back towards the data team,
Starting point is 00:01:23 you know, because data is sort of a fundamentally technical problem. So I think it'll be interesting to get his perspective on that because he's done a lot of work over the years around on data teams, ML teams, etc. With a focus on customer data. So I think that'll be a great topic to cover. But what do you think? Yeah, I think, I mean, outside of like the very interesting, like, I don't know, they're personal things that we can chat about because we've worked together with Shomi and like we work with him from a very early stage of the company.
Starting point is 00:01:56 Right. So there are many interesting stories to talk about there, but outside of this and like things that like, I think he's probably like the most appropriate person like to talk about is to help us understand like a little bit better, what is this whole thing about like customer data? Like, why do we have like customer data? Why do we need like to differentiate these? So the data that we have, right.
Starting point is 00:02:24 Why do we need like almost this? The rest of the data that we have, right? Why do we need almost a complete different category of managing this data? Why we have CDPs? What is a CDP, right? Like all these terms that they have been used and maybe also abused a little bit in the market in the past couple of years. And we're still trying to figure out exactly what all these things are.
Starting point is 00:02:49 I think we have the right person to talk about giving like some very concrete definitions and because, okay, we are addressing primarily like data practitioners, what does this mean like for like a data engineer or like a data analyst or a data analyst? All these people that they get requests to work with this data. And yeah, Somya is the person who can give us the whole spectrum, from the data engineering side of things to how the marketer is actually working with this data and why. So we're definitely going to talk a lot about that stuff.
Starting point is 00:03:31 Yeah, that'd be great. I don't know if we've had a sort of customer data platform type conversation on the show yet. We've maybe mentioned it, but I think we haven't done like a deep dive, especially as it relates to data infrastructure. So that'll be great. All right, well, let's dig in. All right. So hello, everyone, and welcome to another episode of the Data Sack Show. And this is going to be a very special one. First of all, because as you have probably noticed,
Starting point is 00:03:56 I'm the one doing the introduction to the episodes, which means that I'm going to be alone from what it seems like as the host. But we also have an extremely special guest today, which, first of all, I can call a friend, but also it's a person that I've spent maybe a little bit more than two years working at Radarstack and seeing the amazing journey of the company starting from almost zero to becoming the company it is today. So we have Soumya, the CEO of Rutherstack, who, by the way, and that's probably something that not many people know, but the first, first idea around having a podcast actually came from him.
Starting point is 00:04:55 And he had this idea when we were working together at Rutherstack. And I took it and started working on it. And then the rest is obviously history. But this history also have been supported a lot by him. Because as you know, the DelaStack show, it is a very independent show, but we have the support of RadarStack to keep the show as it is today. So welcome and thank you so much, Assamadev.
Starting point is 00:05:30 How are you? I'm very well. Firstly, Astaz, really great to be doing this show with you. Thanks a lot for the very kind words about me and RadarStack. I mean, I have been a follower of the show. I mean, being a sponsor is the easy thing. You have built an amazing show. Like having those in-depth conversations.
Starting point is 00:05:52 Again, kudos to you for setting this up for success. But yeah, I'm very glad to be finally having the opportunity to be talking to you in this show. So thanks. Yeah, yeah. It's actually, it took us a while to make it happen, right? It's probably more than, well, it was like three years that we're on now the show. And it's the first time that we are here.
Starting point is 00:06:14 So that's super exciting for me. But before we start, let's do what we usually do with all of our guests. And I'll ask you about, give us like a brief background of yourself. Who are you, what you have done and what led you into building RadarStack? Yeah, that's a great question. So I can maybe go like reverse chronology. I've been doing RataStack for the last four years. I'm the founder and CEO.
Starting point is 00:06:51 We started in 2019. Before RataStack, I spent a year in a company called 8x8 as a part of their data team, leading some of their machine learning, customer data initiatives, and building use cases on top of customer data. And some of the experiences and challenges I ran into at that company prompted us to start Runnistat.
Starting point is 00:07:15 Now, before that, I was the co-founder of a company called Mariana IQ. We were building almost like the next generation AI-driven marketing automation system. Those were the early days of deep learning and so on. And we thought maybe this is an opportunity to transform the way marketing is done. We were definitely early, both in terms of the tech, but more importantly, in our customers' data completeness. So even though we worked with really large brands, none of them had good customer data.
Starting point is 00:07:50 And if you don't have data about your own customers, there's not much ML you can do. So I learned a lot of lessons in that company, ended up selling to 8x8, tried to do a similar thing inside of 8x8. And then again, there's very similar problems around collecting data, unifying data. And that's what hopefully we will...
Starting point is 00:08:08 We have kind of built something in Rattlesnack and we'll hopefully solve this someday. And yeah, prior to that, I worked in a company called Data Domain. I don't know how many of you have heard of it, but this is Frank Slootman, who is now a big shot. That was his first company as a CEO.
Starting point is 00:08:24 So I learned a lot from that company. That was my first startup experience, my first real job experience, and probably the only one. That's like a quick overview. And I have like a PhD in data. So I've kind of like worked in the data space pretty much all my life.
Starting point is 00:08:40 Yeah, that's amazing. And okay, I have like a question that I have like from my side, like I think like I know what it is the answer, but I think it's very important like to hear your definition on that and share it also like with our audience because we talk a lot about it, but I'm not 100% sure that like people like, you know, like share the same semantics or deeply understand what it is.
Starting point is 00:09:07 So the question is simple. What is customer data and why it's not just any other data and we need to treat it differently? Yeah, that's a great question. In fact, even on that, I don't think there is a consistent definition. Everyone has a different view of that. But in the simplest form, one way to think about it is, if you are a B2C company, let's say you are a company who is selling stuff to consumers.
Starting point is 00:09:38 Those consumers interact with your brand, right? And they do that over many different channels. They're probably coming to your website and doing things on the website. They're probably going to your mobile app and taking actions in the app. Or they may be calling your call center. They may be going to your store.
Starting point is 00:09:58 They may be making purchases and so on, right? Now, each of this interaction with your brand produces some data. A transaction produces a transaction data. Similarly, somebody coming to your website and clicking on some products and browsing your catalog, that produces data. What products they looked at, what products they clicked, what they added to the cart. Of course, different data has different value.
Starting point is 00:10:25 Your website click data may not be as valuable as transaction data. But in a loose way, all of this data can be called customer data. All right, that makes total sense. So we have these, let's call it, behavioral breadcrumbs, right? Of the user, of our customer out there. And we want to collect them. Let's start with that because I think that's like one of the first big,
Starting point is 00:10:52 let's say, challenges that we have to go through. What it means like to collect this data because you mentioned there are like many different channels, right? And from what I understand, like we are talking about different channels that can even be as diverse as talking about physical channels. Like someone enters your shop and makes a transaction through a POC machine over there. And at the same time, they forget something, they go out and they make an online transaction to buy from your firm again.
Starting point is 00:11:29 It's really diverse. So let's talk a little bit, actually two things that I want from you on that. Give us a little bit of more color on how complex this process of collecting is and the history behind it. I know that for the past 10 years, starting with segments, there's been a lot of innovation in the industry, but there's also many things have changed from back then to today.
Starting point is 00:12:04 So I'd love to hear from you, share with us your experience on these two fronts. Yeah, great question. So let me start with the complexity and then I can comment about the history. And you pointed it out already. The complexity comes from literally three things. One is the diversity of sources.
Starting point is 00:12:28 You have, as you mentioned, you have your Android app and you have your website and the point-of-sale device, your backend systems and transactional systems and so on. So you need to collect data from all of these places.
Starting point is 00:12:41 So you have to have SDKs that your developers can embed and all that stuff. So you have to have SDKs that your developers can embed and all that stuff. So nothing rocket science, but it does require a lot of engineering effort to build this out. That's number one. The variety of data sources. The second
Starting point is 00:12:56 is around I would say the volume of data. If you are a reasonably sized consumer company, you're talking about anywhere from millions to billions of events per day. And you're
Starting point is 00:13:12 working with some customers who are at peak sending a million events per second. So you have to set up the backend infrastructure to handle this volume and so on. So again, these are not rocket sciences, but these are engineering problems.
Starting point is 00:13:28 You have to set up a team to do that. And this is not core to any... I mean, this data is extremely important, but setting up this infrastructure is grunt work that most companies don't want to do. The third, and I think the most important problem is consistency. And I think that
Starting point is 00:13:49 is what is often overlooked. Like what I mean by that is like, what is the goal of collecting all this data, right? You want to collect all this data and you want to personally send it to downstream users of the data. You want to send it to, let's say, like a tool, product analytics tool like Amplitude or Xman. Or you want to send it to like a marketing tool like Braze or Salesforce or some other marketing cloud. Now, each one of them expects this data in a slightly different format. They have their own APIs. They have their own APIs, they have their own standardization. So if you have to build this infrastructure from grounds up, you have to
Starting point is 00:14:32 handle that. You have to make sure whatever schema and structure you're using for your behavioral events, that can be sent to all these downstream destinations. You have to manage those translations. Alternatively, you can embed their SDKs but even the SDKs expect a standardized event format. So you have to manage that yourself. And each one is slightly
Starting point is 00:14:58 different. The same goes with user identities. Identity management is hard. People come to the website anonymously and they browse things anonymously. But you can still track that activity by setting a cookie wherever it's possible. Let's say with the mobile device. And then like once they log in, you may have an email or like an address or like a phone number. And you have to like stitch all these identities and you have to manage those identities. So this standardization of like,
Starting point is 00:15:30 what should I be calling the events? What are the events should I be collecting? What are the properties should I be collecting with those events so that I can send them to downstream destination? Like this standardization is a lot of work that a lot of vendors have to like do from scratch again and again.
Starting point is 00:15:46 And that's kind of like the three main pillars. I mean, data, the variety of sources, variety of the volume of data, and standardization. That comes primarily because there are all these downstream users of this data. Now, if you look at this space, almost like if you look at early 2000, then the number of channels that somebody would interact with a brand was fairly low. You go to a store, you buy things. This was pre-mobile, early days of web.
Starting point is 00:16:24 So this was not really a problem. I think that the explosion happened in like after iPhone, where like pretty much like everybody, or maybe like slightly earlier, like web became an important channel. And then like over time, mobile become an important channel. So your sources exploded, and so did the volume, right? So. And that's kind of one end. And on the other side, number of destinations also exploded. When people had a specific tool for running email campaigns and another tool for running push campaigns.
Starting point is 00:16:54 So you have to get this data to all these different destinations. That complexity suddenly exploded in the 2000s, like I would say early 2010s. And that's where the space, the technical problems of the customer data platforms came into being. Segment was almost the early leader in the space. They built this multiplexer, right? I mean, you collect from all these places,
Starting point is 00:17:19 send it to Segment, and they can pack it out to all the destinations. So Segment was the early mover, but then other companies came, like M Particle came like Helium and so on. They all kind of handled some version of this problem and Runnistack was. We are in the same space, hopefully the last company in this space. All right, that was very insightful. So my next question is a little bit similar to how we opened the conversation. I asked you at the beginning what the definition of customer data is.
Starting point is 00:17:54 But I think now that we've talked a little bit about the problem, and also we mentioned a couple of vendors in this space, there is another concept that has many different definitions. And that's the concept of the CDP, right? The customer data platform. And I'd like to hear your take on that. Like what is a CDP, right? And is Rutterstack a CDP? Yeah.
Starting point is 00:18:22 CDP is probably the most wrongly used term because everybody and anyone is a CDP now. Anyone who touches the customer data, they're calling themselves a CDP. So, yeah, let me take a stab at defining CDP the way we want to. At a fundamental level, the problem is what I described earlier. You have all these sources, you are generating all this ton of data, you have to get that data and to all the downstream destinations. So a tool which can support that is a customer data platform. But then the space evolved into these initial data multiplexers, realizing that, okay, we have all the data, everything is flowing through us. Why do I have to just multiplex the data? We can provide more value-added services, right? We can stitch all of these different identities
Starting point is 00:19:33 and create what is called a customer 360, like a golden customer record. And then we can let our customers come in and run marketing campaigns on top of that. They can come in and create audiences. And an audience is, let's say, a list of people who have come to the checkout page but did not purchase. That's a cart drop-off audience that you want to run a marketing campaign against.
Starting point is 00:19:58 So all these initial data pipelines, companies, they realized, okay, now we can provide these value-added services. We can create this customer 360. We can provide this audience tool. We can provide the activation tool. Activation means taking that audience and sending it to something like Facebook so that we can show them ads. So that was one evolution of CDP,
Starting point is 00:20:24 data multiplexers providing more customer 360 audience capabilities. At the other end of the spectrum, there were traditional marketing automation companies which had that golden customer record, right? Like think of sales, think of CRMs like Salesforce
Starting point is 00:20:43 or marketing tools like Pardot and Mercato. They all had customer records. They had the emails. They had phone numbers and all this stuff. And then those were traditionally used for running email campaigns and other sales and marketing campaigns. Now, because they had the customer record, they figured out that, why don't I layer on this behavioral data and I can provide more insights
Starting point is 00:21:14 to personalize these campaigns. So they also evolved into collecting this first-party data, augmenting their capabilities to provide this customer 360 and segmentation capabilities and so on. So they also call themselves a CDP. So now you have this space with a mishmash of these data pipeline companies providing some capabilities on audiences or the traditional audience companies providing data pipeline capabilities and data collection capabilities. And this entire space is now a CDP.
Starting point is 00:21:47 And in that sense, Datastack is also a CDP. We help our customers collect data. We help our customers unify, create that customer 360, and activate it to all the downstream tools. Where we differentiate is all of this happens on top of the data warehouse. We don't store any data. This happens on top of the data warehouse, right? We don't store any data. This happens on top of customer snowflake. I mean, I can go on and on,
Starting point is 00:22:07 but that might be a topic for a separate conversation, but that's how we position ourselves in this market. Yeah, yeah, that makes total sense. And, okay, let's get a little bit deeper into the data warehouse part. So what is the added value to the organization by delivering their own data, their own customer data into the data warehouse
Starting point is 00:22:35 and then start building on top of that the different layers that the organization needs to reach the point of having this customer 360 view or the customer golden record, like the customer golden record. And how do we also bridge the gap from going from that and data result, let's say, data product result to actually having the analyst, not the analyst, but let's say the marketeer use this information to go and do marketing, right? Because I would assume that if I'm a marketeer, like the last
Starting point is 00:23:12 thing that I want to do is like play around with databases. Like what I want to do is probably focus on my marketing tools and being able like to run my campaigns, generate revenue, and all that stuff, right? So how do we bridge that gap there? Yeah, that's a great question. In fact, that is the reason a lot of the initial customer data platforms came into existence. You had these traditional marketing tools, right? I mean,
Starting point is 00:23:47 you'll use Salesforce or something else. The marketeer would use that and they would complain about, okay, we don't have data. We need web data, we need mobile data to personalize their experience. I don't have that data. They would go to IT and say that, okay, can you set up these data pipelines to collect data from the website, from the mobile app, and send me the data into my downstream tool? IT would say, oh, this is the 10th project in my list.
Starting point is 00:24:19 Plus, I don't even have the capability to write these SDKs and create like this data pipelines and manage the pipelines. So, so that led to this whole space, right? And then marketers thought, okay, we need some of the tools. And I mean, I don't want to like, like wait for IT. So they'll go and buy these vendors and these vendors like segment and so on. The early adopters, early players in the space, they would say like, Oh, here's the SDK, you
Starting point is 00:24:45 engineering team, just embed this SDK and that's all, your marketing team will be off you, they will not come and bother you anymore. All the data will magically start flowing into their tools and so on. So like, happy ending. I think that matured the market quite a bit,
Starting point is 00:25:02 right? I mean, like, by no means it was a failure. I mean, we have, like, segment got acquired by Twilio. There was, like, huge customer base. I think the, when it started failing, was the around completeness of data, right? I mean, IT was always a laggard in 2010. I mean, IT could not set up this infrastructure to collect data, store data, of data. I mean, IT was always a laggard in 2010. I mean, IT could not set up this infrastructure
Starting point is 00:25:25 to collect data, store data, process data. That changed by late 2020, like the late last quarter, like the last decade, right? Where the people started buying data warehouses and investing in data warehouse technologies. They started centralizing a lot of that data. started buying data warehouses and investing in data warehouse technologies, they started centralizing a lot of that data.
Starting point is 00:25:47 So that is what is triggering the new wave of CDPs. But the traditional CDPs try to address the exact problem you're talking about. That makes sense. Yep. Makes total sense. All right. So we have, let's say like in these past decades, there's like a lot of data related infrastructure that came into the market that has changed like a lot of the dynamics around what can be built on top of the data that the company has.
Starting point is 00:26:17 So we have tools like to collect the data, put them into the data warehouses. Data warehouses are quite easy to manage because they are on the cloud. We have tools for doing modeling and manage modeling with like dbt and the likes. But still, even if we solve the problem completely of delivering the data into the data warehouse, there still is, let's say, this process of taking this raw, noisy soup of events about the user, like all these breadcrumbs that we put into a basket, in a way, and we need to transform it into something that can be digested by an analyst or even a marketeer.
Starting point is 00:27:08 So how can we do that? How can we do that with RutterStack? Yeah. So before I answer how can you do that with RutterStack, let me briefly explain how can you do that without RutterStack? And what are the pain points? And I think that will help understand the value of what RutterStack does. are the pain points. And I think that will help understand how do you do the value of what RutterStack does. You're 100% right. That is
Starting point is 00:27:30 the problem. You get all these data streams. You use the tool like RutterStack segment, your homegrown thing, and you collected all these behavioral data into your data warehouse. You have 20 different tables of 20 different events. Then you bring your ETL data, again, through RutterStack, 5Tran, whatever, you name it, some other ETL tool.
Starting point is 00:27:51 And you end up with another 20, 30 different tables. Now, what you are trying to get out of all of this is a clean customer view. Think of it as a one-rope-per-customer with a bunch of attributes computed for the customer. That's all a consumer of the data, whether it's analyst, whether it's marketeer cares about it. And when I say attributes, these are things like total revenue for
Starting point is 00:28:15 the customer, like how many times they have come to the checkout page, what are the recent products they have looked at. You can think of all these features, like unfunnel features. Have they come to the checkout page but dropped off? These are all the features that you're computing for the user, which your downstream users of the data care about.
Starting point is 00:28:32 So you have all this raw data on your left, and you want to get this clean customer view on top of your data values. Now, how do you do that? Traditionally, you will go and hire a team of data analysts who will come in and write the SQL. Some of this could go into... Traditionally, this was hundreds and hundreds of lines of SQL. DBT almost was a big force in this space. Instead of you having to like manage, like poorly manage the sequels.
Starting point is 00:29:06 Now you can apply software engineering best practices with your dbt, like with dbt, that you could organize them into projects and then take them into GitHub. So dbt brought a lot of sanity in the space, but it still had to go and write those transformations that you have to write, how to figure out stitching identities,
Starting point is 00:29:23 which is a hard problem to do on top of SQL. You have to figure out how to create features or funnels, which is, again, funnel is hard to do in SQL. So you have to still go and write these with a team of data analysts. But the biggest problem here was like, it's not just one time that you hire a team
Starting point is 00:29:40 and that they come up with a clean three customer 360, right? Every time your market here wants a new feature, let's say they want total revenue in the last seven days. They want something else in the last 15 days. You have to go back to the data team.
Starting point is 00:29:56 They will go and update the models. Then they have to push to production. And we'll take probably anywhere from one sprint to a couple of sprints to even get this out rolling. So that is the state of the art where you have to hire and like go through this slow painful process right now that's what we are trying to solve with RataStack. RataStack's vision was to like enable this end-to-end workflow on top of a data warehouse but something which does not require this painful process right We are launching this product called Profiles
Starting point is 00:30:26 which simplifies that. Number one, you can define all these things in a very high level language. You don't have to write complex SQL. We also have a UI around it. So even a non-tech person can come in and define the features. Everything goes into
Starting point is 00:30:41 a country which can be checked into Git. So you still have software engineering best practices, but then it exposes this process to non-tech or non-SQL experts. That's what profiles help our customers with. Okay, that's awesome. I have a couple of questions here. So, first of all, let's talk a little bit about
Starting point is 00:31:09 the foundations of building a profile, right? You mentioned a couple of things. You mentioned about identity stitching and a couple others, but is there a minimum set of operations that
Starting point is 00:31:24 there's no way that you can avoid when you have like to go and build this like customer 360 table right like this table where you have one role per user per customer and then like a number of columns each one representing something like doesn't matter what but what I like if we want to define the minimum set of problems and operations that somebody needs to do there, what you would say are the fundamentals? Yeah. So there are literally three things that need to happen to get to this customer 360 from your like dirty data right i mean on the left you have all these like events and like and etl sources and you want this like clean transform data on the right right
Starting point is 00:32:16 the first step is the step on identity stitching or id resolution Like, as I was mentioning earlier, you have all these different identities about the user, right? Somebody comes to your website, you assign it a cookie ID, and then they provide their email, so you have your email. Similarly, the same person comes to the mobile app, they have a device ID, and then they provide their email. And then now you can stitch all of these people into the same user, right? So when you're computing a feature like total number of times somebody has come to your particular product page,
Starting point is 00:32:48 you have to combine this mobile activity and the web activity based on these identities, right? So that's like step zero that needs to happen, like stitching all these identities. And it's actually a hard problem because it's not just like one level IDs, right? You can have like multiple levels, right? You have like an email which joins with a phone number, that phone number joins with an address. And some of this could be like non, like address is a good example,
Starting point is 00:33:12 but it's not like the domain state. So you have to create this ID graph and stitch all of them into like one ID. That's step one. The step two that needs to happen is like, you have to now define these features. Total number of times somebody has come to the login page. Total number of times somebody has viewed a product.
Starting point is 00:33:32 Total number of times total revenue. These are all interesting features. But then every business may be caring about features that are important to them. So you cannot have a static set of features. You want this flexibility where anyone can come in and define those features. And what I mean by anyone is like, not necessarily just a data engineer or data analyst, right?
Starting point is 00:33:53 You want a marketing person who is using that feature to be also like come in and define features wherever it makes sense, right? So you need this like additional layer where multiple people can define and contribute features. So that's the second step that needs to happen. The third step is actually what is called some version of time travel. It's not enough to just compute the features at today's time. There are use cases like, let's say,
Starting point is 00:34:27 training a machine learning model. You're trying to train a churn model and to do that you need to compute the features which go into the model at the point of churn, not today. So a user churned six days ago, so you want their features at that point. So you need this, you can call it some kind of a lightweight feature store, where you're not just computing and skipping track of the today's feature today, but you should be able to go back in time at any point and compute that feature at that time. And this is a hard problem. So you need these three things to happen to create a usable customer 360. Okay, that was super, super interesting. And you mentioned something that I find very fascinating as a problem.
Starting point is 00:35:14 And I'd love to hear how you rather stack is dealing with it. And by the way, it's one of the reasons that I'm really personally attracted to work together in harmony, right? So you put it very well. You mentioned we have the data engineer on one side, but we want also to allow the domain experts, and the domain expert obviously is like the marketeer here, to be able to define and express what they need, right? So, how do you do that? Like, how do you deliver
Starting point is 00:36:14 like a product experience that can resonate both with the engineering persona and the marketing persona? Yeah, that's a great question. And, I mean, by no means with the engineering persona and the marketing persona? Yeah, that's a great question. And I mean, by no means I can claim that we have solved this problem, right?
Starting point is 00:36:31 This is almost like, I don't know, anyone who solves that will get like, whatever is the equivalent of Nobel Prize for data engineers. But like, ProFence is an attempt to do that, right? And the way I think about this is, as you rightly so you rightly pointed out, there are all these different personas that need to come together to work on top of this customer 360, right? I mean, if we take that example, there are like data engineers
Starting point is 00:36:59 who are responsible for producing the data and cleanly modeling the data. And then there are marketing people who are using that data and they might want to define their own set of features, like their own funnels. And you want them to come together. Now, I think, yes, your product experience has to bring them together. But there are also like boundaries.
Starting point is 00:37:21 There are things the marketing person does not want to do, cannot do. A good example is identity sticking. I mean, often as a marketeer, you know the data sources. You know this is my website, this is my mobile app. You really know
Starting point is 00:37:40 the nitty-gritty details of how IDs are generated on these apps and how they are stitched together and all that stuff. So that is a problem that is best left to the engineering team. Similarly, there are things like what
Starting point is 00:37:54 to call an event. Should it be called product underscore purchase? It's a simpler problem, but still, somebody has to take care of it. So that, again, can be left to the engineering team. So there are things that engineering team has to contribute, like the IT
Starting point is 00:38:12 stitching rules and how does it happen to create this, like, some version of initial customer 360. And then you want your marketeer to come in and build on top of that, right? What does a marketing persona care about? They want to create funnels.
Starting point is 00:38:29 When you want to say, give me all the people who have done X but not Y. That funnel step should not require going back to an engineer every time. You want that user experience to define funnels. Now, the funnels are defined on events which are defined by engineering. They define the properties and they make sure that the events are clean. But it comes to, yeah, but the ability to create funnels
Starting point is 00:38:53 should be exposed to the marketing person. And there are other simpler events also, right? I mean, total number of times somebody has done a page view, right? I mean, in the last X days, that is a feature that, again, should be exposed to a marketing person. It shouldn't have to go back to engineering.
Starting point is 00:39:08 So, now these are the profiles product kind of enables this use case where a data engineer can come in and define these ID-stretching rules and these complex features into a config,
Starting point is 00:39:24 commit to a repo, push it to RutterStack, and then on the RutterStack UI, a non-tech person can come in and build on top. They can build funnels on top of these features defined by the data persona. And they can define other simple features. And everything goes back to the same config, the kind of link to the core config that was built by the data team. So, so that's what we have done.
Starting point is 00:39:48 There are problems we have not solved, but there are things which actually require, which don't have that clean boundary. A good example is like, what event should I be tracking? Right. A marketing person may be interested in a new feature. Maybe they are saying that, okay, I'm interested in how many times somebody has come to a specific page. But then
Starting point is 00:40:09 the event for that may not be even present today, right? So somebody has to go back and go and instrument the event, which again, the marketing person cannot do, right? Like somebody has to now go and instrument the event and then make sure that the right properties are captured. That's a complex workflow that, again, I don't think we have solved but hopefully at some
Starting point is 00:40:28 point we'll get to it. Yeah, 100%. That's a great way to describe in a pragmatic way the kind of problem that has to be solved here and how hard it is and i do find very refreshing to hear like from someone about like boundaries because many times and like what especially like we see from products that they start from a more engineering let's say mindset usually are very absolute in terms of this is like how things should work right like they try they try like to impose, let's say, a way of doing things, which of course, like, okay, it might work like for people that are like, like minded, like engineers, but you're your peers, but you can't really go out there and like ask someone who's a marketeer, like to change the way they think, right? Like there is a good
Starting point is 00:41:21 reason that they think the way they do. And that's because that's what helps them deliver the maximum in whatever they have to do right so having these boundaries and use these boundaries like to develop a well-defined like user experience on a product i think is key so i'm very curious to see like how this has been implemented as part of this new product in RutterStack. But we are close to the end here, and before we close, I'd like to ask you something about something that relates to
Starting point is 00:41:57 a term that you use sometimes. You mentioned the term feature store, which is obviously very related to machine learning. But we are also living in very interesting times. There's AI out there. There are some very new ways of interacting with a machine through interfaces like ChatGPT and all that stuff. And of course, all these things, all these new technologies,
Starting point is 00:42:30 they are data-related technologies. They are based on that. If we didn't have the data, we wouldn't have the models. Based on your experience, and I'm talking about here your like your whole experience right like starting from like everything that you have done like in in your career so far uh how do you see the future what do you see next and how do you see like these new technologies and paradigms affecting customer data and the space you are in? Yeah, that's a great question.
Starting point is 00:43:11 And that's something like we talk about quite often in our company. So if we take a step back, right? I mean, we always wanted to have this holy grail of like one-to-one personalization, right? Anything that you hear from a brand should be perfectly tuned for you, right? I mean, based on what your interests are and what your desires are and so on, right? Now, there were two problems to make that happen, right?
Starting point is 00:43:41 Number one was like to have all the data about you. I mean, unless I know you, how can I even personalize? So having all the data was the first step. The second step was, even if you had all the data, how do you personalize? If I have a million users and I know everything about them, their likes and their dislikes, while interacting with the brand, outside of the brand,
Starting point is 00:44:11 and what else? Even if I know something, if you ask a human to come in and draft the perfect message, they can do that. But how do you make a machine do that? So neither the data problem was solved, nor the ML problem was solved.
Starting point is 00:44:27 That's what we tried to do in my previous company. And then we kind of struggled on both the fronts. Now, what ChatGPT has done is hopefully solve the second problem. Somehow magically, if you can feed in all the data, like I tell it that okay, these are all the products like Kostas has looked at in the past. This is where he lives. This is what his interests are. Craft the perfect marketing message. I mean, you'll have
Starting point is 00:44:56 to do some prompt engineering, but I think chat GPT can give you a good enough answer, right? Like an answer that is personalized to you. You could not do that earlier. That's why you have to do this broad segment-based marketing and you have to create segments for all people in San Francisco. I'll do something. And all people in
Starting point is 00:45:11 New York, I'll do something else. Those days will be gone in like five years. Everything will be personalized and all the generative AI techniques will make that happen. Now, like we are I mean, we are not doing generative AI, but we will be using that. And a lot of other brands will be using that. But the data problem still has to be solved.
Starting point is 00:45:34 You still have to get everything that you know about a customer to call into these generative AI techniques. And hopefully that was not a big problem because you couldn't do much with the data anyway. But this problem will explode over the next 10 years and hopefully we will have a role to play in that data problem, if that makes sense. Yeah, absolutely. All right. That was an awesome conversation that we had. I hope we are going to repeat it much earlier than after another three years. So I'm looking forward to have you back on the show, Somia. But before we go, where can our listeners learn more about both BrotherStack, of course, and also the new product? The best way to do that is go to our website.
Starting point is 00:46:28 We will be launching it on our website. Request a demo. We will also do a Hacker News show. So yeah, that's kind of one channel. The other is like hit me up on LinkedIn. My first name is clearly unique, Somnadeb. So there aren't too many of Somnadevs in the world. So it should be easy to find me on LinkedIn. So hit me up
Starting point is 00:46:51 and I'd love to get feedback from anyone who is interested. All right. Thank you so much, Somnadev. Thanks, Kostas, for having me. I really enjoyed chatting with you. All right, Costas. What were your big takeaways from this? And the reason I'm so interested is, I mean, A, you worked with Simidev at Ruddersack. You've built tooling that had a pretty heavy emphasis on customer data and getting it into the warehouse.
Starting point is 00:47:27 So what were your takeaways? What do you think about his thoughts on CDP, the landscape, et cetera? Yeah. There are plenty of very interesting insights in the conversation that we had with Tomia. First of all, it was very interesting to go through like the history of this category right like how things started like more than 10 years ago and how they are still evolving and how although you know like every time like that you have like a cycle in the market like it feels like the problem has been solved but actually it's just like the beginning of another iteration of like getting closer
Starting point is 00:48:09 to the solution, right? Like, so it was very interesting, like to hear all these things about what started like the first iteration of these platforms, right? With segments and even before that and where we are today. Like, how do we work with this data today and like how much we still have like to build out there, right? That's one thing. The other thing that I found like extremely interesting, and I think
Starting point is 00:48:40 it's like one of the most interesting challenges in this type of products that are like very data oriented, is that you never have only one persona involved, right? And I think CDPs or like, let's say, customer data related infrastructure is like probably one of the most exaggerated of these. Because you have all the data infrastructure that you need. You even have the application developers, right? But at the end, you have the marketeer. And the marketeer is who is actually going to turn all this work that has happened before into actual value, right? And the marketeer is who is actually going to turn all this work that has happened before into actual value right and the marketeer is like a very different persona compared to the rest so it's very interesting like we had like a very interesting conversation about the difficulty of like building products that can you know satisfy like all these different personas and of course we had like as part of that like we also had the opportunity to see like
Starting point is 00:49:47 what Radarstack is doing today, new products, new solutions that Radarstack brings like to solve all these problems. So very interesting conversation. Soumya does not talk that often or like as often as he should, in my opinion, because he's really good at helping us understand these complex concepts. So I would suggest everyone to tune in and listen to the conversation. And there is also like a very interesting fact
Starting point is 00:50:26 shared about the origin of the show. So I'm not going to say more about that, but people should just listen. Sneaky. All right. Well, tune in for some insider information and a complete breakdown of Customer Data Platform, Customer Data, the whole nine yards. Subscribe if you haven't, tell a friend our information and a complete breakdown of customer data platform, customer data,
Starting point is 00:50:50 the whole nine yards. Subscribe if you haven't, tell a friend, and we will catch you on the next one. We hope you enjoyed this episode of the Data Stack Show. Be sure to subscribe on your favorite podcast app to get notified about new episodes every week. We'd also love your feedback. You can email me, ericdodds, at eric at datastackshow.com. That's E-R-I-C at datastackshow.com. The show is brought to you by Rudderstack, the CDP for developers. Learn how to build a CDP on your data warehouse at rudderstack.com. Thank you.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.