Drill to Detail - Drill to Detail Ep.94 'Rudderstack and the Warehouse-First Customer Data Platform' with Special Guest Eric Dodds

Starting point is 00:00:00 So hello and welcome to another episode of Drill to Detail and I'm your host Mark Whitman. So I'm very pleased to be joined today by Eric Dodds, Head of Demand Generation at Rudderstack. So Eric, thanks for coming on the show, and it's great to meet you. It's great to meet you too. And Mark, I actually have a, we didn't talk about this when we were prepping for the show, but I found you way back in the day from a blog post that you wrote

Starting point is 00:00:41 about more accurate cost per conversion and how you did that with a warehouse-based approach and different event stream pipelines, ETL pipelines. Anyway, so I just need to say thank you because I think that actually helped launch my interest in this whole world. So I have a lot to thank you for, actually. Wow, that's very auspicious. Yes. So thank you very much for that. Yeah. So maybe

Starting point is 00:01:07 just tell the audience who you are and what you do at Butterstack. Absolutely. So a brief background on me, have worked in and around marketing for most of my career with a heavy emphasis on data. I like to tell people that my two favorite subjects in university were statistics and consumer behavior. So it seems kind of inevitable that I would end up working in sort of data-driven marketing and then now eventually customer data tooling. I love SaaS. I've always loved the technical side of it and have worked at a lot of different companies really building out tech stacks to try to drive that sort of stuff. And that's actually, I was doing a lot of that work on a consulting basis

Starting point is 00:01:50 when I found Rudderstack. I was looking for a solution for one of my clients and found Rudderstack, used the tool, loved it. And then they asked me to come join the team and have done a number of things there, but I've always had my hand on the marketing side it and then they asked me to come join the team and have done a number of things there but have always had my hand on the marketing side and we've grown to be you know a larger organization now and I head our demand generation efforts at Rudderstack. Okay okay so maybe for anybody

Starting point is 00:02:16 who's not heard of Rudderstack just kind of set out the scene really what a very high level what is Rudderstack? Absolutely so Rudderstack is a set of customer data pipelines that make it really easy to move customer data anywhere in your stack. And our core foundation is behavioral event streaming. So there are a lot of tools that collect analytics and a lot of point solutions that do that sort of thing. And what we do is really provide a solution that allows you to have a defined schema for tracking behavioral data and your website or app, and then send that to your entire stack and especially your warehouse. So warehouse is a really big deal for us. We don't actually store any data. And our core use case that most

Starting point is 00:03:06 customers start with is what we call event streaming. So collecting all the behavioral data in the form of a defined spec and set of events that makes it really easy to have defined taxonomies and all that sort of stuff. And then solving integration problems in terms of getting that data across your stack to analytics tools, creating leads and marketing tools, all that sort of stuff. And then we have a couple of other additional pipelines for moving structured data into the warehouse, a la ETL. And actually, one of the most exciting things is our reverse ETL pipeline, which allows you to move data out of the warehouse. So our customers start with EventStream and then layer on the other pipelines as their stack, you know, sort of grows in complexity. Okay. Okay. So, so what's the,

Starting point is 00:03:53 I suppose, who are the founders of Rudderstack and what was the, I suppose, the origin story? What was the thing that you're trying to, to do or to solve that hadn't been done before? Great question. So Sumitab Mitra is the founder and CEO of Rudderstack. And he's brilliant. He has a doctorate in databases and worked at a number of companies and is a serial entrepreneur. So he started several companies. And before he started Rudderstack, his company was bought by 8x8, which is a big telecom company. And his startup then had focused

Starting point is 00:04:33 on data, but sort of in the performance advertising type context. And when the company was acquired, he went in and his task was to work with a group of data engineers and data scientists, uh, who were on his team to really, uh, solve some of the issues that had been plaguing them for a long time, which are actually, and I'm sure a lot of your listeners, these are very familiar terms, right? We need sort of a complete view of the customer, um, in a centralized place. We need, you know need to be able to enrich data and then syndicate that value out to other tools and really solve some of those problems. Because at a lot of companies, all those things sound great in theory, but when the rubber meets the road, it's actually

Starting point is 00:05:16 pretty hard to do this technically. And so when Sumi Adeb was reviewing vendors in terms of, you know, how do I actually, you know, how do I actually pull this off? You know, like it probably makes more sense to look at potentially buying something as opposed to building it. And he really looked at all the options out there. And, you know, the terms are kind of, you know, ambiguous, like CDP is a very gray term in the market. Right. You know, sort of started out with marketing and now can kind of mean infrastructure as well. But he looked at all sorts of CDP, data infrastructure vendors,

Starting point is 00:05:52 and then ended up kind of doing a combination build and buy. And ultimately it never gave him what he wanted. And I think one of the big things that was a struggle as he looked at all the options there was that in telecom, you deal with a lot of highly sensitive data and the regulations for them, especially being like a large publicly traded company. A lot of the solutions, like they stored the data, they created additional silos, which was challenging from like a technological and infrastructure standpoint, integrating with other teams. And so, and then speed was another issue, right?

Starting point is 00:06:29 There were a number of things that they really needed to do from a real time standpoint that a lot of the vendors didn't provide capability around. And so he stepped back and said, it's amazing to me that, you know, this is back in, you know, let's say 2018. How is there not a solution that solves some of these really basic problems, right? Because really, it's sort of getting the infrastructure right so that your team can start doing the work of, you know, building the customer 360 and all that. And so he started Rudderstack. And it really struck me, actually, when I found it. I was a user of the product before I joined the team.

Starting point is 00:07:10 But the product doesn't store any data. And that was a really big deal for SumiDev when he started the company because he didn't want to create another data silo. But that was almost a requirement from all the existing vendors. So he built RudderStack to do really robust collection and processing and delivery of data, but not actually store it. So there weren't any security concerns related to the product, you know, storing customer data, which is what we focus on is customer data. And, you know, he was like, I'm already paying my warehouse and data lake provider to store a copy of my data. Why would I pay someone else to store a copy and then also deal with the security concerns? It doesn't make any sense. And then a number of things around speed,

Starting point is 00:07:57 warehouse load times in sort of democratizing some sort of enterprise level speed and real-time features of the pipelines that were just like, you know, really only accessible to enterprise. And so he, he architected Ruddersack from the ground up to do those things. And that's, that's how we got here today. Okay. So, so, I mean, that's, that's obviously a great story and it's good to hear that. But I suppose the, the thing, the elephant in the room here, or the thing that most people, I suppose, get to hear,

Starting point is 00:08:30 the reason they get to hear about Rudderstack is because it's, you know, it certainly was positioned or is thought of as a kind of open source replacement or kind of clone or whatever of Segment. So, you know, there was prior art in this place. Segment were doing this sort of thing you're talking about and still do um and and i suppose the most characteristic thing of rudder stack at least initially when people they counter encounter it is its compatibility with segment so tell us about that really what led to that as a kind of approach and um i suppose really you know what's the value in that and why did you take that approach really that's a great question

Starting point is 00:09:03 and i'm glad you brought it up if you't, I would have because it's certainly the elephant in the room. And, you know, I try to have an objective perspective here, which is, you know, a little bit challenging because I work for Rudderstack, but I was a segment user for a long time. And I still tell people every day that it's a really great product. I mean, it's really enjoyable to use, and it does a great job, you know, for certain things and for certain use cases. When Sumida was reviewing vendors, and I've talked with him about this a lot,

Starting point is 00:09:38 and actually I talked with him about this a lot before I even joined the company, because when I found Rudderstack, we ran it, you know, both open source and, um, and in the cloud format. And we talked a lot about, you know, he wanted a lot of feedback on me being a heavy, a heavy segment user. And, um, one of the big, uh, one of the big things that, uh, I faced limitations with in being a Segment user and sort of even doing implementations with Segment for my clients at that point as a, you know, as a consultant was that Segment really broke a lot of ground in terms of building tooling that made it really easy to build a data layer

Starting point is 00:10:23 that was tool agnostic in your stack, right? That was the big paradigm shift. And, you know, I think for, and, you know, I know you, Mark, have been in the industry. And so we can kind of, you know, chat about this and reminisce about, you know, man, when the data layer came out that sort of disaggregated the direct integration from the tooling, you know, it was sort of a huge paradigm shift. And I think all of us who are working in the industry at that time, you know, realized that what Segment had built as a solution made so much sense and was a huge step forward. The challenge that I think Sumitab faced and that I even faced with a lot of the clients that I was working with on a consulting basis was that it was from a paradigm perspective, a monumental shift forward, step forward in the way that companies were sort of building their data stack. But it was built

Starting point is 00:11:20 actually a while ago, right? So closer to a decade ago now than not, right? And if you think about the rise of modern data tooling and the way that companies are trying to solve data problems, you know, the modern stack now is centered around the data warehouse and the data lake as the centerpieces. You know, privacy and data control have become, you know, even more acute issues than ever before, especially as we think about, you know, some of the recent news around Google Analytics, you know, there are a number of, you know, sort of buzzword news headlines that I think reveal that trend. But Segment's a great tool, but it really wasn't built for the modern data stack. And in fact, it was built before the modern data stack really became a staple architecture that was proven out by leading companies in the industry. And so when Sumida

Starting point is 00:12:28 built Ruddersack, he really focused it specifically on the new modern architecture that emphasizes data lake data warehouse at the center, extreme flexibility in terms of where the tool can sit in your stack and how deep it can go in the stack, right? And so that goes, you know, one thing we talk about a lot is, you know, sending data to, you know, marketing and sales tools is certainly a core use case for a lot of companies. But the modern data stack, you really want to send data to your own data infrastructure tools, even if that's internal, right? And that's a level lower in the stack. But as companies are bringing more and more of these components sort of under the control of data teams, they need more robust integrations that are very

Starting point is 00:13:20 technical in terms of the user that they're built for, as opposed to simply features that serve like a marketing or sales use case. And so I would say that's really the big differentiator at a high level. Happy to get into details, but Gutterstack is built for the modern data stack where companies are building a value system inside their company around the data team owning data flows, even if the end user or consumer of sort of those data products, if you will, are teams like marketing. And then also a stack that is centered around a data warehouse data lake architecture with a high emphasis on flexibility from a technical perspective.

Starting point is 00:14:02 And those are the areas where RudderStack really shines, right? And so you can do some things with the platform that you just, you can't do with segment. And so companies that are sort of adopting this modern architecture or going through, say digital transformation or any of the buzzwords, are using RudderStack to help make that process a lot easier. Okay, okay.

Starting point is 00:14:24 So another aspect that, of the products that was caught my attention at the time was, and still does now, is the open source nature of it. Now, I know obviously elements of segment, you know, for example, are open source and you've got Snowplow as well, which is kind of open source core, but where does open source come into your model of how you run the business and what value does it bring for Rudderstack and for, for your customers?ack and for your customers? This is such a good topic. And I think SumiDev's probably a lot better position to do this because he wrote a lot of the code that was initially open sourced on GitHub. That was a fundamental value for SumiDev in transparency. One thing that was a real challenge for him as he

Starting point is 00:15:06 navigated this entire space, and frankly for me too, but I think him to a much higher degree because he was trying to build this out at a large telecom company, is transparency. It's less of a problem in the world of, say, like MarTech or marketing technology, where you're sort of loading your lead records into like a number of different tools to do a know, or the way that data is being handled, especially when the team that's responsible or accountable for that inside the organization, you know, it really is accountable for the way that that stuff is happening. And, you know, the way that they need to report on how data is getting from point A to point B, especially if it's sensitive, sensitive customer data. Um, you know, those, uh, are, those you use is really important.

Starting point is 00:16:28 And so SumiDev said from the outset, you know, our core infrastructure and the way that we do things will be open source because it's not like we, you know, we're not building a tool that, you know, helps marketers, you know, as, helps marketers as sort of a SaaS product. We're not building a tool that helps marketers serve better ad campaigns. That is an end use case in terms of maybe a data product that the data team delivers to a marketing team where RedrSac is sort of a component in the pipeline. But really, we're talking about a data team building data products. And for that particular user, when they're making decisions about the tooling that they're going to use, transparency is a really big deal. And if they can see through the code on GitHub,

Starting point is 00:17:20 the decisions that we've made about how, you know, data is collected, processed, and then also distributed across the stack, there are, there's just a much higher comfort level, you know, for our user, because they're not looking into a black box and wondering how we're doing something. It's very transparent. And then of course, you know, with the open source community, which is sort of a whole other topic, you know, we've had many companies build their own integrations or even look at the way that we've built an integration and, you know, make recommendations on that. In fact,, when I found Rudderstack,

Starting point is 00:18:06 we opened a pull request and modified an integration that we were working on for a client because we had a lot of context for this use case. And I knew that there are a lot of companies using this particular SaaS tool that were trying to do something in a certain way. And so that just adds a very high level of credibility and also transparency. And so that's sort of a core value for us as a company. And I would say, this is sort of maybe beyond the technical choices around open source, that's really a big part of our company culture as well, which all stems from SumiDev. It's just a very transparent, open culture as well. So I would say it's kind of in the DNA of what we do. And it started with, you know,

Starting point is 00:18:52 sort of the first lines of code that he wrote and open sourced on GitHub. Okay, okay. But then I suppose like other open source core companies, you've now released RutterStack Cloud. So, you know, where you host the infrastructure yourself and you sell it as a service. So maybe just talk us through that, what it is and the value proposition and why Rudderstack Cloud came about, really.

Starting point is 00:19:14 Sure. I mean, this is pretty fun, and I don't want to get too technical, but Rudderstack's a Kubernetes native software. So the open source product, I mean, it really is very cool for the nerds, but you can run RutterStack on Kubernetes and sort of cloud agnostic, right? So it's a very,

Starting point is 00:19:40 and that way it's a very modern piece of software. It scales horizontally. And it's a very modern piece of software. You know, it scales horizontally. And it's very neat. It has a lot of neat technical features, you know, tool at scale, you don't only need a data team, but you actually need a, you know, software engineering and DevOps team, because you're running a high-scale Kubernetes software application, right? And so there are concerns around deployment, there are concerns around scale. There are a number of things there that if you have the resources, it makes a lot of, it can make a lot of sense to do that. And, you know, we, there are very large companies running, you know, sort of the, the

Starting point is 00:20:54 open core rudder stack product at high scale. Right. But they have pretty large Kubernetes teams that are already running like, you know, a lot of that software at scale. And so the cloud product really offers sort of a minimal infrastructure management solution that allows our customers to use the product and get all the benefits without having to worry about Black Friday's coming up and do we need to provision more nodes, right? Like we take care of all of that. You know, and for our enterprise customers, we have, you know, very deep relationships with them or we plan around peak times and, you know, all that sort of stuff. And so from that perspective, I think it's really kind of

Starting point is 00:21:46 a time to value type question, where the cloud product really allows a data team to scaffold out the architecture, not deal with the integrations work, and then scale to, you know, billions and billions and billions of events without having to involve their own DevOps team. And so that's really where the cloud product, and most of our customers run on the cloud product. It's a really robust, but yeah, so that's really the thing. It makes sense for some companies, but for most companies, or I'd say many companies,

Starting point is 00:22:27 they, and really, I guess if I had to summarize it, not to be too long winded, their time is better spent, um, building data products and not, you know, sort of managing like a DevOps workflow where, you know, they're running software on Kubernetes. Yeah. Okay. So, so one of the ways, one of the reasons I got to know about you again as well is our customers asking us about Rudderstack, telling us about your pricing model, really.

Starting point is 00:22:54 And I suppose the context of this really was that, like a lot of businesses in the last few years, the demand and I suppose visitors on and traffic on the websites has been going up, particularly B2C sites. And with the pricing model of other vendors, that can get quite expensive when you're pricing things on a per-user basis, monthly tracked users and so on.

Starting point is 00:23:16 But Vudderstack's cloud service, you price things based on events. And the way it works out, it looks to be quite cheap. Now, that's not to say cheap means low value, but certainly the way you do things is, I suppose, the way... Well, tell us about it. How does the price model work and why is it quite economical for high-volume websites? Yeah, that's a great question.

Starting point is 00:23:39 No SaaS company likes the word cheap, right? That's kind of like the death knell for the SaaS company. But you're right. It's much more economic. And I think the challenge, there are a lot of companies who charge based on MTUs. And I've certainly had my rants where I really complain about that pricing model. It's not a bad pricing model. I would say that it's really just, um, it's inflexible. And I think it limits the number of business models that it can serve really well at scale. Um, MTUs works, uh, fine when you're not talking about, you know, scaling to, um, you know, sort of the, you know, uh, tens of millions, hundreds of millions, and then billions or even trillions of events. And especially with, you know, there are a lot of new industries also that, um, you know, e-commerce obviously through COVID really exploded, right? You know, I think it grew in two years, you know, more than it grew in the

Starting point is 00:24:52 past, you know, however many years before that, right? It just hit, you know, sort of the hockey stick type growth. And, you know, Web3 and sort of NFTs and crypto and blockchain, again, just, you know, unbelievable sort of, you know, volume in terms of the number of people getting involved in that. And then, of course, the challenge there is monetization rates, right? Like, you know, e-commerce generally you're only monetizing a couple percentages of your total traffic. And so the MTU model really breaks down at scale when you are trying to build a complete customer profile or a complete customer journey, even with your anonymous users and anonymous traffic on an MTU basis because you're paying for just a huge amount of data that you're not monetizing. And in a sort of a perverse way, that actually can be some of the most valuable data, right? How do we understand the user journeys as they result in the users that do end up paying us money.

Starting point is 00:26:05 And really you need all of the data in order to answer those kinds of questions. And so the way that we think about it is not, charging on an MTU basis, but the real value with data, especially customer data is having all of it, right? If you have all of it, you have the ability to answer really, really what in other sort of environments are pretty difficult questions. And so when Sumida was building the product, he wanted to build for that use case because he had that challenge with the pricing models of the vendors that he was looking for. And I mentioned earlier that

Starting point is 00:26:48 we don't store any data. And so, you know, on the one hand, you could say, okay, well, like, it's a cheaper option. But that's actually, you know, that's an oversimplification. Really, our COGs are just lower, right? Because we're basically saying, we're not going to replicate the cost of what you are already paying to your data warehouse, data lake vendors. We're not going to replicate that cost and upcharge you for it as part of our business model. We don't store data. You're already paying them to store your data. And so our cogs are literally just lower. And so the way that we think about it is it's not a cheaper product. It's actually just the total cost of ownership has a smaller footprint on your stack. And the value that we try to push on is we have some specific features that really make collecting and using and activating all of your data very valuable. Right. So it's actually, you know, it's not a cheaper conversation.

Starting point is 00:27:55 It's more about we provide a lot of value in terms of helping you collect and then use all of your customer data. So that's the way that we think about it. So in my experience has been that it's often the segment compatibility and the price that has got people's interest. But for us particularly, and we're now a Rutter stack partner as well, the thing that got my interest really beyond that was what you refer to as the warehouse first approach to everything. And particularly the idea of using a warehouse as the basis of your customer data platform, as opposed to it being maybe an API or being something that is not so accessible or not so rich in terms of the models you can build and so on. So maybe just kind of paint the picture first about what Radistack and people mean by

Starting point is 00:28:42 warehouse-first and what does that mean in the context of cdps and so on absolutely i think we can start with the basics i think um you know in when we talk about cdps and we'll just say you know for the sake of argument cdps kind of mean you know marketing tools data infrastructure tools etc um you know cause there's a lot of, there's probably a whole podcast episode. You may already have one that tries to untangle that issue. You know, really for the last 10 years, most of the tooling has been built understandably just because, you know, sort of pre modern data stack or modern architectures

Starting point is 00:29:19 was really built to do some interesting things that ran on the software vendor's own infrastructure, right? So they have their own warehouse data lake and their own models, and they're doing interesting things, interesting things with the data. And so the first part is actually taking a lot of that, you know, black box may be, you know, a spicy term to use there, but sort of taking what those vendors were doing, let's say, from an identity resolution standpoint, and then actually just exposing that on the warehouse. And this actually goes back to what we discussed with the open source value system that is so core to, I think, Redderstack's DNA as a company. So if we think about something like identity resolution,

Starting point is 00:30:13 Rudderstack actually takes the deterministic identity graph that's built through all the various methods of data collection, SDKs, et cetera, and actually pushes the identity graph onto the customer's data warehouse. And so you have, you not even have actually the most powerful concept here is that you own the table that represents all the nodes and edges of, you know, a user's identity as it relates to all of the touch points that they have, you know, that you've collected via Rutterstack. And, you know, for some companies, for smaller companies, that may not be a big deal,

Starting point is 00:30:56 you know, and they just sort of use the out of box, you know, sort of ID identity resolution that we syndicate to, you know, say a marketing tool, you know, to sort of combine, you know, a new email with an existing user or whatever. But there are a lot of companies, especially when you're running at scale where cross device becomes a challenge. When you think about situations in retail, especially in modern retail, where there are digital devices and physical spaces that multiple people interact with, you begin to, you know, run into some pretty challenging questions around how to reconcile identities, which of course affects all of your downstream use cases from product analytics to, you know, so how are people using the product, right? Well, 30 people in this retail environment use this particular

Starting point is 00:31:43 interface in the last hour, right? Offers that are sent out that people access on different devices, et cetera. There are a variety of things that are going to scale. Financial services and retail actually are interestingly similar in that way, right? Like we interact with digital finance across a variety of different devices and use cases, right? And there may be multiple users in the account, et cetera. And so there are a lot of SaaS tools

Starting point is 00:32:10 that try to solve that with sort of algorithms that they run behind the scenes. And what we've seen is that in the modern architecture, companies increasingly want the base data set and then they want to be able to build their own sort of customizations on top of that, instead of being beholden to decisions that are being made for them, you know, sort of inside of a black box. And so Rotterdack and the warehouse first approach says, great, like,

Starting point is 00:32:37 A, we're not going to store your data, charge you for that data, because you're already doing that. And then B, we're actually going to push the most valuable table to your warehouse in terms of identity resolution so that you have it, you own it, and then you can work internally with your data team, data scientist to modify that as it fits your own business model, because that's really where identity resolution becomes a challenge. Every business is different. Every sort of user journey is different. So that's sort of one side. That's a little bit on the sort of deeper end of what it looks like to have a warehouse first approach. I think the other side is just technology that really services

Starting point is 00:33:19 companies that use their warehouse for a lot of things that SAS was used for before. So one example I love to give is, you know, we have a customer, a very large sort of e-commerce travel customer. And, you know, they used some SAS tools for analytics, but they were incomplete because it was just a sort of point integration. And then they built out analytics infrastructure. But the load times are really slow because they were doing daily jobs. And so in e-commerce, sort of speed is the name of the game, A-B testing, et cetera. And with RudderSec, they actually load all of the behavioral data into their warehouse every 15 minutes. And so for them, in an e-commerce context, that's faster than they can get statistically

Starting point is 00:34:12 significant results for A-B testing. And so it is essentially real-time e-commerce analytics on their warehouse. And that's a technology and architectural decision that RutterStack enables that a lot of other tools don't that, you know, for companies that are running at scale that have near real time use cases on the warehouse are really, you know, looks when a customer is using RutterStack to accomplish, you know, sort of like maybe an identity resolution use case where that data that was previously siloed lives in their warehouse, or literally just having pipelines that service the warehouse or help you make use of it in a way that a lot of other tools don't. You mentioned Google Analytics a little while ago. and one of the one of the one of the new things that's come through with ga4 for example is is i suppose their privacy first approach to doing to data collection and handling handling customer data and and for example um i

Starting point is 00:35:18 don't know identity resolution so i'm wondering but that seems to be the way approach they're taking which is almost to give you less of that data than you previously had. And then maybe to use machine learning or to use black box techniques to do that. I mean, is there an argument that what Rudderstack are doing is maybe going contrary to privacy kind of changes that are happening? Or is this mixing up sort of concepts? I mean, what's your thoughts on whether it's possible perhaps to build like a complete customer view and whether that's going in line with the way that sentiment is going at the moment? That's a really good question, especially around Google Analytics. And maybe what I'll do is start by talking about our values

Starting point is 00:36:03 and I think what we are enabling for our customers. So really what we're talking about is first party data, right? It's easy to, you know, there's security concerns, you know, there's, you know, we can talk about a lot of, you know, there are sort of a number of topics that are symptomatic of the core issue of first-party data. And really, first-party data as it's collected and processed and stored or managed or, you know, what have you, by a third-party vendor, right? Google Analytics. And Ruddersack's approach, as I mentioned before, is to really give you full ownership of that. We actually don't want to be part of the security conversation in your first party data. And we're a conduit to that. We are not the end destination for that. And so if you are collecting first

Starting point is 00:37:18 party data and using it on your own infrastructure, a lot of those topics that are sort of symptomatic or, you know, sort of, you know, spicy headlines about, you know, privacy and third party vendors, a lot of those really don't apply to us. Because what we're talking about is you, you know, making the most use of your own first party data on your own infrastructure, right? I mean, it's kind of funny, because in that flow, we are just the conduit, really, it's all about you. I think also, I mean, there's people, people, people, that whole topic is very interesting and emotive, isn't it? Because, you know, the logical, I suppose, the logical, the logical end of all of this is that only the big mega vendors will have access to any data

Starting point is 00:38:06 that they can then use to, you know, the likes of Google, the likes of Facebook, you know, they will have all this first party data. And retailers and, you know, anybody who's selling things on the internet, it's getting harder and harder for them to understand their own customers. So it is a bit, I think sometimes it is a kind of an argument that gets mixed up in two things here. You've got a third party, you know, I suppose third part, you've got the collection of data by third parties, you've got a collection of data by legitimate, you know, legitimate uses by retailers and so on. And, you know, I think in a way, Rudderstack is what they're doing is, you know, it's allowing those, it's allowing, I suppose, retailers to still function really in a world where actually the way the direction is going towards the big vendors, really. I mean, what do you think on that? Sure. I agree with you. I mean, of course I'm biased, but I also faced this before I joined Rudderstack, right? In that, you know, I mean, e-commerce is sort of, you know, I like it because

Starting point is 00:38:59 it's, you know, it's like the sharp end of a lot of this stuff, right? You're talking about collecting as much data as possible to try and increase conversion rates. And, you know, it's like the sharp end of a lot of this stuff, right? You're talking about collecting as much data as possible to try and increase conversion rates. And, you know, it's, um, the issues tend to be more acute, the volume higher, et cetera, than maybe say, you know, your traditional like B2B context. And so it just amplifies these issues a lot earlier. And, um, you're right though, it is,. And really, what you're talking about is almost, you know, it comes to data collection and sort of usage. You're talking about removing a middleman that is dealing with their own set of privacy concerns relative to regulation, right? And so if you remove that middleman, and I'm not,

Starting point is 00:39:47 I don't say that in a way of like, there are tons of really good SaaS applications out there, right? I mean, I look at Google Analytics every day. We actually just feed Google Analytics server side with Rutter stack data, right? It's a great interface, but they don't collect any of our data with their own SDK, right? We choose what we send to them and we have full control there. And that's really what we're seeing a lot of our customers adopt is before it was kind of a choice, and this is oversimplified, but for the sake of argument, I either need to implement this tool where I may have concerns about their storage and security or the decisions they're making around what a session means or any number of issues where it's like, well, I'd rather have the data or actually not even have the data. I'd rather be

Starting point is 00:40:42 able to look at the data than not look at the data, because that's better than not looking at the data, right? But I'm beholden to a lot of security concerns. I mean, you know, of course, with Google Analytics Classic, you know, the unspoken challenge or sort of rarely spoken about challenge was sampling, right? You're looking at sampled data, especially at scale, very big problem, right? A 10% variance from sampling could cause really big issues, you know, with decisions that you're making and how you're spending money. You know, some of those problems, I think, are being solved by more modern analytics tools. But really, what we're seeing now is that companies are saying, I actually the technology exists now where I don't have to choose either or I get both and right I I collect my first party data I own it on my own infrastructure and I can choose

Starting point is 00:41:34 where to send it which data points to send to which analytics tool and then I have the single source of truth original copy of all of that in my data warehouse or my data lake. And so we're seeing a lot of customers do interesting things. They'll run lots of analytics on their warehouse and they'll say, you know what? There's an analytics tool that would be really good for helping us create self-serve analytics or help this team answer these questions. Great. We will syndicate that set of data for those things to that team and the tool of their choice, right? But it's not a, we just send everything or we don't have access to it, right? You have full control now, which is exciting. I think Google actually, interestingly has made, you know, ironically, I'm not, I'm not

Starting point is 00:42:23 going to say that, you know, anyone copied anyone or anything like that, but GA4, the event-based paradigm. And I think importantly, which I'm not sure if a ton of people have, you know, have looked at this, but you can actually get events from GA4 directly into BigQuery. And so it's really interesting that Google is actually kind of conforming parts of their stack into an architecture that looks very similar to the modern data stack. I think what we see is customers look at that

Starting point is 00:42:59 and sort of do migrations to Google Analytics and look at that whole ecosystem is that you're, even though I think in many ways, it's just worlds beyond where GA Classic was. And like, there are so many things, you know, sort of as a longtime user of GA, it's like, man, you know, it took 10 years, but we're finally getting there. You're really making a decision about your stack and so we as i said before like our values around openness transparency ultimate flexibility on integration uh we view the google ecosystem and some of the changes they're making there as really really great um but you're also making a choice if you build your entire stack on that to sort of limit flexibility to what Google allows you to do. Yeah. Okay. So maybe to get onto the last topic I want to talk to you about,

Starting point is 00:43:50 it does link back to the warehouse first approach. So you mentioned earlier on about the way that Rudderstack is built is kind of, I suppose it's in line with how things work now and the modern data stack and the modularity and the developer focus and so on and again one of the other things that interested me about rudder stack is i suppose the developer focus there the api based approach the whatever and the fact that maybe the way you build a cdp is something that applies that appeals to maybe analytics engineers and so on i mean how much of a focus do you have on developers and those that of audience with the product? And why was that? And give us some examples around the way you're building the product. Yeah, absolutely. Absolutely. We have a huge focus on developers. In fact, we build the product

Starting point is 00:44:39 for developers. That's our core user. And developer is a broad term, right? If you are a small startup company, you're developers, you're head of engineering, head of product, and head of data. And then at a larger company, you have engineers that have the specification in their title of data engineer. And so it can be a broad term. But we build for the technical percentile, let's say, and we tend to say developer is the catch all for that. And there are a couple things. So one of the ways that this shows up most, and I'll talk about it in terms of a feature, I think, because that's, we hear a lot about this from our users, especially who migrate from tools like Segment or our other

Starting point is 00:45:26 tools. We have a feature called Transformations. And what Transformations allows you to do is take an incoming event payload. So let's say you run a track call that is added to cart in an e-commerce example that represents a user behavior. Or let's say, you know, in the B2B SaaS example, you, you know, have a user sign up and create a new account. So you run an identify call that sort of declares that user, you know, and is going to create a user, you know, row in the user table on your warehouse and then, you know, create a new lead in your marketing tool and sales tool, et cetera.

Starting point is 00:46:05 Right. In both of those use cases, you can, in Rudderstack, run what's called a user transformation on that event payload. And the way that we built that feature is actually as a code editor within the product. You can actually also run these on your own private GitHub repo with version control, which is really cool. But that's another conversation for another time. But there's actually a code editor in the product. And I'll give you an example of this. And I'll give you a very simple use case,

Starting point is 00:46:48 and then I'll give you a more advanced use case. The reason that we chose to do a code editor is because developers need and have asked us for a very high level of flexibility and control when it comes to the way that they manage their constantly evolving stack instead of integrations and tools that their data pipelines connect to, right? The stack is getting more complex. It's not getting simpler, right? There are more tools, like in some, it's like, okay, we collected the data in the warehouse, but in terms of the ecosystem of tools and pipelines, it's actually becoming more complex, which is a challenge for data engineers to manage. So on the simple side, let's talk about how you would transform a payload. So you have a marketing tool, let's say, and a sales tool and a customer success tool. And inevitably, there's going to be some point at which a field is created in those three tools. And it just so happens that the field name that tool operates, right? It's not necessarily like, let's say an ops team has a very clean process.

Starting point is 00:48:09 Great, okay. All of these are named the same way in terms of the UI. But let's say the API name might be forced to be different across those tools, okay? So now I have a product added to cart event, or like a user created event that's coming through in a pipeline. And how do I handle even just those three tools? Right. And I don't know, the modern company as well, like 100, 200 tools, depending on the size of the company. So as a data engineer, I'm responsible for getting the data to these tools and having it be accurate and timely.

Starting point is 00:48:41 And so now I have this big problem where the ops people maybe did a good job of setting this up, but the tools themselves have introduced challenges or limitations that create what we would call like a data engineering problem, right? And so what transformations allows you to do is write custom JavaScript code. And we're working on also enabling this in Python,

Starting point is 00:49:01 which is going to be really neat for you to take a single payload and transform it on a per destination basis using JavaScript. It's a code editor. You're not doing a UI because there are a number of challenges with that. You're talking about API names and all this sort of stuff. Really to do that quickly for a data engineer and the developer persona, great, let's go in. We can write some quick JavaScript, write some quick Python. The problem is solved in literally minutes, right? And the ops team doesn't have to do anything

Starting point is 00:49:32 downstream. And guess what? Like, oh, whoops, someone accidentally made a change or an update to one of those field names. Okay. Not a problem. Let's just go in and update the transformation. That's on the simple side. On the advanced side, let's say you want to enrich some sort of data in flight, right? So one example would be, I want to hit a service like Clearbit to grab additional information know, you're dealing with, okay, well, then you have all those data fields in Salesforce, but they're not in any of the other tools. So do you do point to point integrations? I mean, it just becomes a gigantic mess, right? With Redrack transformations, if that use case comes up, the data engineering team can actually say, well, we have like a code editor that can hit external APIs. So the user signed up event comes in, we can hit the API using JavaScript,

Starting point is 00:50:26 pull in the relevant fields, and then syndicate that not only to Salesforce, but to any other tool that we want, right? And so now you've actually solved a pretty pervasive data engineering and data, you know, sort of consistently consistency problem across the stack, not by daisy chaining a bunch of these direct integrations, but actually by allowing a developer or data engineer to write a little bit of code that sort of simplifies that integration challenge at the root level across the stack at a single point. And then they can iterate on that as the stack grows in complexity. So I know it's a long explanation. That's just one example of how we try to build features that allow the data engineers to actually make life easier for everyone in the data ecosystem,

Starting point is 00:51:16 even the end users. Okay. I suppose even at a kind of meta level, if you think about the fact that Rudderstack is an open source product and it's been developed kind of more recently, is it even possible to include maybe, if you think about infrastructure as code and as you're kind of doing testing, as you're doing deployment, you kind of install and lay out all the components of your infrastructure. It's possible presumably to have Rudderstack as part of that and deploy Rudderstack as part of the test pipeline, for example. Is that possible? Yes. So there are certain components of that that are possible. In fact, we're doing some

Starting point is 00:51:50 interesting things with Terraform. So you'll see a blog post about this coming out soon that'll actually allow you to define your whole configuration of Rudderst stack as code in Terraform, which is really, really cool. And then of course, with those sorts of features, you can really sort of manage your stack as code, which I think a lot of our customers are moving towards. And so we do have an API first approach when we're building out features. A lot of our customers still use the UI, but increasingly we're seeing customers adopt some of those API first features and actually sort of integrate the management of their stack and rudder stack into their existing sort of say CI, CD workflow. So I think that's where things are going. And our customers that are trying out those features

Starting point is 00:52:47 really, really love them. Okay, fantastic. Well, Eric, it's been fantastic having you on the show. How would people find out more about Rudderstack? How would they get a trial? How would they kick the tires and give the product a try? Sure, just go to ruddersack.com and you can click on the free trial there in the header.

Starting point is 00:53:05 There's a lot of buttons on the site. We, you can get, you can send 5 million events for free, uh, per month. Um, so, you know, you can sort of scale to a pretty large, um, scale up to a pretty large scale, uh, on the free plan. And then, um, my email is actually eric at ruddersack.com. I love talking about this stuff. If you have any questions, I would love to hop on a call and chat because obviously I can tend to be verbose about this because I have such a fun time talking about these subjects. That's fantastic. I can

Starting point is 00:53:34 actually vouch for that. I've spoken to quite a few of you on Slack and on other channels and so on. You've all been really helpful and all really enthusiastic for the product as well. So it's always good to see innovation in this space really. So Eric, thank you very much for coming on as well. So it's always good to see innovation in this space, really. So, Eric, thank you very much for coming on the show. It's been fantastic speaking to you.

Starting point is 00:53:51 Thank you very much, and hopefully we'll speak again sometime in the future. Of course. Thank you. you

Drill to Detail - Drill to Detail Ep.94 'Rudderstack and the Warehouse-First Customer Data Platform' with Special Guest Eric Dodds

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.