Drill to Detail - Drill to Detail Ep.94 'Rudderstack and the Warehouse-First Customer Data Platform' with Special Guest Eric Dodds
Episode Date: March 22, 2022Mark Rittman is joined by Eric Dodds to talk about Rudderstack's founding story and open-source roots, Segment compatibility and event-based pricing vs. MTUs and Rudderstack's "Warehouse-First" approa...ch to building customer data platforms.Rudderstack's Data Stack : Deep Dive Introducing RudderStack Cloud: The Warehouse-First CDP for DevelopersRudderstack, Snowplow and Open-Source CDP Alternatives to SegmentHow Twilio is Shaping SegmentEvent-Based Analytics (and BigQuery Export) comes to Google Analytics 4; How Does It Work… and What’s the Catch?Why (and How) Customer Data Warehouses are the New Customer Data PlatformWhy Take a Warehouse-First Approach to Analytics
Transcript
Discussion (0)
So hello and welcome to another episode of Drill to Detail and I'm your host Mark Whitman.
So I'm very pleased to be joined today by Eric Dodds,
Head of Demand Generation at Rudderstack.
So Eric, thanks for coming on the show, and it's great to meet you.
It's great to meet you too.
And Mark, I actually have a, we didn't talk about this
when we were prepping for the show,
but I found you way back in the day from a blog post that you wrote
about more accurate cost per conversion and how you did
that with a warehouse-based approach and different event stream pipelines, ETL pipelines.
Anyway, so I just need to say thank you because I think that actually helped launch my interest
in this whole world.
So I have a lot to thank you for, actually.
Wow, that's very auspicious.
Yes.
So thank you very much for that. Yeah. So maybe
just tell the audience who you are and what you do at Butterstack. Absolutely. So a brief background
on me, have worked in and around marketing for most of my career with a heavy emphasis on data.
I like to tell people that my two favorite subjects in university were statistics and consumer behavior.
So it seems kind of inevitable that I would end up working in sort of data-driven marketing and then now eventually customer data tooling.
I love SaaS.
I've always loved the technical side of it and have worked at a lot of different companies really building out tech stacks to try to drive that sort of stuff.
And that's actually,
I was doing a lot of that work on a consulting basis
when I found Rudderstack.
I was looking for a solution for one of my clients
and found Rudderstack, used the tool, loved it.
And then they asked me to come join the team
and have done a number of things there,
but I've always had my hand on the marketing side it and then they asked me to come join the team and have done a number of things there but have
always had my hand on the marketing side and we've grown to be you know a larger organization now
and I head our demand generation efforts at Rudderstack. Okay okay so maybe for anybody
who's not heard of Rudderstack just kind of set out the scene really what a very high level what
is Rudderstack? Absolutely so Rudderstack is a set of
customer data pipelines that make it really easy to move customer data anywhere in your stack.
And our core foundation is behavioral event streaming. So there are a lot of tools that
collect analytics and a lot of point solutions that do that sort of thing.
And what we do is really provide a solution that allows you to have a defined schema for tracking
behavioral data and your website or app, and then send that to your entire stack and especially
your warehouse. So warehouse is a really big deal for us. We don't actually store any data. And our core use case that most
customers start with is what we call event streaming. So collecting all the behavioral data
in the form of a defined spec and set of events that makes it really easy to have
defined taxonomies and all that sort of stuff. And then solving integration problems in terms
of getting that data across your stack to analytics tools, creating leads and marketing tools, all that sort of stuff.
And then we have a couple of other additional pipelines for moving structured data into the
warehouse, a la ETL. And actually, one of the most exciting things is our reverse ETL
pipeline, which allows you to move data out of the warehouse. So our customers start with EventStream and then layer on the other pipelines
as their stack, you know, sort of grows in complexity. Okay. Okay. So, so what's the,
I suppose, who are the founders of Rudderstack and what was the, I suppose, the origin story?
What was the thing that you're trying to, to do or to solve that hadn't been done before?
Great question. So Sumitab Mitra is the founder and CEO of Rudderstack.
And he's brilliant.
He has a doctorate in databases and worked at a number of companies
and is a serial entrepreneur.
So he started several companies.
And before he started Rudderstack, his company was bought by 8x8, which is a big telecom company. And his startup then had focused
on data, but sort of in the performance advertising type context. And when the company was acquired,
he went in and his task was to work with a group
of data engineers and data scientists, uh, who were on his team to really, uh, solve some of
the issues that had been plaguing them for a long time, which are actually, and I'm sure a lot of
your listeners, these are very familiar terms, right? We need sort of a complete view of the
customer, um, in a centralized place. We need, you know need to be able to enrich data and then syndicate
that value out to other tools and really solve some of those problems. Because at a lot of
companies, all those things sound great in theory, but when the rubber meets the road, it's actually
pretty hard to do this technically. And so when Sumi Adeb was reviewing vendors in terms of, you know, how do I actually, you know, how do I actually pull this off?
You know, like it probably makes more sense to look at potentially buying something as opposed to building it.
And he really looked at all the options out there.
And, you know, the terms are kind of, you know, ambiguous, like CDP is a very gray term in the market.
Right. You know, sort of started out with marketing
and now can kind of mean infrastructure as well.
But he looked at all sorts of CDP,
data infrastructure vendors,
and then ended up kind of doing a combination build and buy.
And ultimately it never gave him what he wanted.
And I think one of the big things that was a struggle
as he looked at all the options there was that in
telecom, you deal with a lot of highly sensitive data and the regulations for them, especially
being like a large publicly traded company. A lot of the solutions, like they stored the data,
they created additional silos, which was challenging from like a technological and
infrastructure standpoint, integrating with other teams. And so, and then speed was another issue, right?
There were a number of things that they really needed to do from a real time standpoint that
a lot of the vendors didn't provide capability around. And so he stepped back and said,
it's amazing to me that, you know, this is back in, you know, let's say 2018.
How is there not a solution that solves some of these really basic problems, right?
Because really, it's sort of getting the infrastructure right so that your team can start doing the work of, you know, building the customer 360 and all that.
And so he started Rudderstack.
And it really struck me, actually, when I found it.
I was a user of the product before I joined the team.
But the product doesn't store any data.
And that was a really big deal for SumiDev when he started the company because he didn't
want to create another data silo.
But that was almost a requirement from all the existing vendors. So he built RudderStack to do really robust collection and processing and delivery of data, but not actually store it. So
there weren't any security concerns related to the product, you know, storing customer data,
which is what we focus on is customer data. And, you know, he was like, I'm already paying my warehouse and data lake provider to
store a copy of my data. Why would I pay someone else to store a copy and then also deal with the
security concerns? It doesn't make any sense. And then a number of things around speed,
warehouse load times in sort of democratizing some sort of enterprise level speed and real-time
features of the pipelines that were just like, you know,
really only accessible to enterprise. And so he,
he architected Ruddersack from the ground up to do those things. And that's,
that's how we got here today.
Okay. So, so, I mean, that's, that's obviously a great story and it's good to
hear that. But I suppose the, the thing, the elephant in the room here,
or the thing that most people, I suppose, get to hear,
the reason they get to hear about Rudderstack is because it's, you know,
it certainly was positioned or is thought of as a kind of open source
replacement or kind of clone or whatever of Segment.
So, you know, there was prior art in this place.
Segment were doing this sort of thing you're talking about and still do um and and i suppose the most characteristic thing of
rudder stack at least initially when people they counter encounter it is its compatibility with
segment so tell us about that really what led to that as a kind of approach and um i suppose really
you know what's the value in that and why did you take that approach really that's a great question
and i'm glad you brought it up if you't, I would have because it's certainly the elephant in the room. And,
you know, I try to have an objective perspective here, which is, you know, a little bit challenging
because I work for Rudderstack, but I was a segment user for a long time. And I still tell
people every day that it's a really great product. I mean, it's really enjoyable to use,
and it does a great job, you know, for certain things
and for certain use cases.
When Sumida was reviewing vendors,
and I've talked with him about this a lot,
and actually I talked with him about this a lot
before I even joined the company,
because when I found Rudderstack,
we ran it, you know, both open source and, um, and in the cloud format. And we talked a lot about, you know,
he wanted a lot of feedback on me being a heavy, a heavy segment user. And, um, one of the big,
uh, one of the big things that, uh, I faced limitations with in being a Segment user and sort of even doing implementations
with Segment for my clients at that point as a, you know, as a consultant was that Segment really
broke a lot of ground in terms of building tooling that made it really easy to build a data layer
that was tool agnostic in your stack, right?
That was the big paradigm shift. And, you know, I think for, and, you know, I know you, Mark,
have been in the industry. And so we can kind of, you know, chat about this and reminisce about,
you know, man, when the data layer came out that sort of disaggregated the direct integration from
the tooling, you know, it was sort of a huge paradigm shift. And I think all of us who are working in the industry at that time, you know, realized that what Segment had
built as a solution made so much sense and was a huge step forward. The challenge that I think
Sumitab faced and that I even faced with a lot of the clients that I was working with on a consulting basis was that it was from a paradigm perspective, a monumental shift forward, step
forward in the way that companies were sort of building their data stack. But it was built
actually a while ago, right? So closer to a decade ago now than not, right? And if you
think about the rise of modern data tooling and the way that companies are trying to solve data
problems, you know, the modern stack now is centered around the data warehouse and the data lake as the centerpieces.
You know, privacy and data control have become, you know, even more acute issues than ever before,
especially as we think about, you know, some of the recent news around Google Analytics,
you know, there are a number of, you know, sort of buzzword news headlines that I think reveal that trend. But
Segment's a great tool, but it really wasn't built for the modern data stack. And in fact,
it was built before the modern data stack really became a staple architecture that was proven out by leading companies in the industry. And so when Sumida
built Ruddersack, he really focused it specifically on the new modern architecture that
emphasizes data lake data warehouse at the center, extreme flexibility in terms of where the tool can sit in your stack and how deep it can go in
the stack, right? And so that goes, you know, one thing we talk about a lot is, you know,
sending data to, you know, marketing and sales tools is certainly a core use case for a lot of
companies. But the modern data stack, you really want to send data to your own data
infrastructure tools, even if that's internal, right? And that's a level lower in the stack.
But as companies are bringing more and more of these components
sort of under the control of data teams, they need more robust integrations that are very
technical in terms of the user that they're built for, as opposed to simply
features that serve like a marketing or sales use case. And so I would say that's really the
big differentiator at a high level. Happy to get into details, but Gutterstack is built for the
modern data stack where companies are building a value system inside their company around the data
team owning data flows,
even if the end user or consumer of sort of those data products, if you will,
are teams like marketing. And then also a stack that is centered around a data warehouse data
lake architecture with a high emphasis on flexibility from a technical perspective.
And those are the areas where RudderStack really shines, right?
And so you can do some things with the platform
that you just, you can't do with segment.
And so companies that are sort of adopting
this modern architecture or going through,
say digital transformation or any of the buzzwords,
are using RudderStack to help make that process a lot easier.
Okay, okay.
So another aspect that,
of the products that was caught my attention at the time was, and still does now, is the
open source nature of it. Now, I know obviously elements of segment, you know, for example,
are open source and you've got Snowplow as well, which is kind of open source core, but
where does open source come into your model of how you run the business and what value does it
bring for Rudderstack and for, for your customers?ack and for your customers? This is such a good topic. And I think SumiDev's probably a lot better
position to do this because he wrote a lot of the code that was initially open sourced on GitHub.
That was a fundamental value for SumiDev in transparency. One thing that was a real challenge for him as he
navigated this entire space, and frankly for me too, but I think him to a much higher degree
because he was trying to build this out at a large telecom company, is transparency.
It's less of a problem in the world of, say, like MarTech or marketing technology, where you're sort of loading your lead records into like a number of different tools to do a know, or the way that data is being handled,
especially when the team that's responsible or accountable for that inside the organization,
you know, it really is accountable for the way that that stuff is happening.
And, you know, the way that they need to report on how
data is getting from point A to point B, especially if it's sensitive, sensitive customer data. Um,
you know, those, uh, are, those you use is really important.
And so SumiDev said from the outset, you know, our core infrastructure and the way that we do things will be open source because it's not like we, you know, we're not building a tool that, you know, helps marketers, you know, as, helps marketers as sort of a SaaS product.
We're not building a tool that helps marketers serve better ad campaigns.
That is an end use case in terms of maybe a data product that the data team delivers
to a marketing team where RedrSac is sort of a component in the pipeline.
But really, we're talking about a data team building data products.
And for that particular user, when they're making decisions about the tooling that they're going to use,
transparency is a really big deal.
And if they can see through the code on GitHub,
the decisions that we've made about how, you know, data is collected, processed, and then
also distributed across the stack, there are, there's just a much higher comfort level,
you know, for our user, because they're not looking into a black box and wondering how we're
doing something.
It's very transparent. And then of course, you know, with the open source community,
which is sort of a whole other topic, you know, we've had many companies build their own
integrations or even look at the way that we've built an integration and, you know,
make recommendations on that. In fact,, when I found Rudderstack,
we opened a pull request and modified an integration that we were working on for a client
because we had a lot of context for this use case. And I knew that there are a lot of companies using
this particular SaaS tool that were trying to do something in a certain way. And so that just adds a very high
level of credibility and also transparency. And so that's sort of a core value for us as a company.
And I would say, this is sort of maybe beyond the technical choices around open source,
that's really a big part of our company culture as well, which all stems from SumiDev. It's just a very transparent, open culture as well.
So I would say it's kind of in the DNA of what we do.
And it started with, you know,
sort of the first lines of code that he wrote
and open sourced on GitHub.
Okay, okay.
But then I suppose like other open source core companies,
you've now released RutterStack Cloud.
So, you know, where you host the infrastructure yourself and you sell it as a service.
So maybe just talk us through that, what it is and the value proposition
and why Rudderstack Cloud came about, really.
Sure.
I mean, this is pretty fun, and I don't want to get too technical,
but Rudderstack's a Kubernetes native software.
So the open source product,
I mean, it really is very cool for the nerds,
but you can run RutterStack on Kubernetes
and sort of cloud agnostic, right?
So it's a very,
and that way it's a very modern piece of software.
It scales horizontally.
And it's a very modern piece of software. You know, it scales horizontally. And it's very neat.
It has a lot of neat technical features, you know, tool at scale,
you don't only need a data team, but you actually need a, you know, software engineering and DevOps team, because you're running a high-scale Kubernetes software
application, right? And so there are concerns around deployment, there are concerns around
scale. There are a number of things there that if you have the resources, it makes a lot of, it can make a lot of sense to do
that. And, you know, we, there are very large companies running, you know, sort of the, the
open core rudder stack product at high scale. Right. But they have pretty large Kubernetes
teams that are already running like, you know, a lot of that software at scale.
And so the cloud product really offers sort of a minimal infrastructure management
solution that allows our customers to use the product and get all the benefits without having
to worry about Black Friday's coming up and do we need to provision more nodes, right? Like we take
care of all of that. You know, and for our enterprise customers, we have, you know,
very deep relationships with them or we plan around peak times and, you know, all that sort of stuff.
And so from that perspective, I think it's really kind of
a time to value type question, where the cloud product really allows a data team to
scaffold out the architecture, not deal with the integrations work, and then scale to, you know,
billions and billions and billions of events without having to involve their own DevOps team.
And so that's really where the cloud product,
and most of our customers run on the cloud product.
It's a really robust, but yeah, so that's really the thing.
It makes sense for some companies,
but for most companies, or I'd say many companies,
they, and really, I guess if I had to summarize it, not to be too long winded, their time is better spent,
um, building data products and not, you know, sort of managing like a DevOps workflow where,
you know, they're running software on Kubernetes.
Yeah.
Okay.
So, so one of the ways, one of the reasons I got to know about you again as well
is our customers asking us about Rudderstack,
telling us about your pricing model, really.
And I suppose the context of this really was that,
like a lot of businesses in the last few years,
the demand and I suppose visitors on and traffic on the websites
has been going up,
particularly B2C sites.
And with the pricing model of other vendors,
that can get quite expensive when you're pricing things
on a per-user basis, monthly tracked users and so on.
But Vudderstack's cloud service, you price things based on events.
And the way it works out, it looks to be quite cheap.
Now, that's not to say cheap means low value,
but certainly the way you do things is, I suppose, the way...
Well, tell us about it.
How does the price model work
and why is it quite economical for high-volume websites?
Yeah, that's a great question.
No SaaS company likes the word cheap, right?
That's kind of like the death knell for the SaaS company. But you're right. It's much more economic. And I think the challenge,
there are a lot of companies who charge based on MTUs. And I've certainly had my rants where I
really complain about that pricing model. It's not
a bad pricing model. I would say that it's really just, um, it's inflexible. And I think it limits
the number of business models that it can serve really well at scale. Um, MTUs works, uh, fine when you're not talking about, you know, scaling to, um, you know, sort of the, you know, uh, tens of millions, hundreds of millions, and then billions or even trillions of events.
And especially with, you know, there are a lot of new industries also that, um, you know, e-commerce obviously through COVID
really exploded, right? You know, I think it grew in two years, you know, more than it grew in the
past, you know, however many years before that, right? It just hit, you know, sort of the hockey
stick type growth. And, you know, Web3 and sort of NFTs and crypto and blockchain, again, just, you know, unbelievable sort of, you know, volume in terms of the number of people getting involved in that.
And then, of course, the challenge there is monetization rates, right?
Like, you know, e-commerce generally you're only monetizing a couple percentages of your total traffic.
And so the MTU model really breaks down at scale when you are trying to build a complete customer profile or a complete customer journey, even with your anonymous users and anonymous traffic on an MTU basis because you're paying for just a huge amount of data that
you're not monetizing. And in a sort of a perverse way, that actually can be some of the most
valuable data, right? How do we understand the user journeys as they result in the users that
do end up paying us money.
And really you need all of the data
in order to answer those kinds of questions.
And so the way that we think about it is not,
charging on an MTU basis, but the real value with data,
especially customer data is having all of it, right? If you have all of it, you have the
ability to answer really, really what in other sort of environments are pretty difficult questions.
And so when Sumida was building the product, he wanted to build for that use case because he had
that challenge with the pricing models of the vendors that he was looking for. And I mentioned earlier that
we don't store any data. And so, you know, on the one hand, you could say, okay, well, like,
it's a cheaper option. But that's actually, you know, that's an oversimplification. Really, our COGs are just lower, right? Because we're basically saying,
we're not going to replicate the cost of what you are already paying to your data warehouse,
data lake vendors. We're not going to replicate that cost and upcharge you for it as part of our
business model. We don't store data. You're already paying them to store your data. And so our cogs are literally just lower. And so the way that we think about it is
it's not a cheaper product. It's actually just the total cost of ownership has a smaller footprint
on your stack. And the value that we try to push on is we have some specific features that really make collecting and using and activating all of your data very valuable.
Right. So it's actually, you know, it's not a cheaper conversation.
It's more about we provide a lot of value in terms of helping you collect and then use all of your customer data.
So that's the way that we think about it. So in my experience has been that it's often the segment compatibility and the price that
has got people's interest. But for us particularly, and we're now a Rutter stack partner as well,
the thing that got my interest really beyond that was what you refer to as the warehouse
first approach to everything. And particularly the idea of using a warehouse
as the basis of your customer data platform, as opposed to it being maybe an API or being
something that is not so accessible or not so rich in terms of the models you can build and so on.
So maybe just kind of paint the picture first about what Radistack and people mean by
warehouse-first and what does that mean in the context of
cdps and so on absolutely i think we can start with the basics i think um you know in when we
talk about cdps and we'll just say you know for the sake of argument cdps kind of mean you know
marketing tools data infrastructure tools etc um you know cause there's a lot of, there's probably a whole podcast episode.
You may already have one that tries to untangle that issue.
You know, really for the last 10 years,
most of the tooling has been built understandably just because, you know,
sort of pre modern data stack or modern architectures
was really built to do some interesting things that ran on the software vendor's own infrastructure,
right? So they have their own warehouse data lake and their own models, and they're doing
interesting things, interesting things with the data. And so the first part is actually taking a lot of that, you know, black box may be, you know,
a spicy term to use there, but sort of taking what those vendors were doing, let's say,
from an identity resolution standpoint, and then actually just exposing that on the warehouse.
And this actually goes back to what we discussed with the open source value system that is
so core to, I think, Redderstack's DNA as a company.
So if we think about something like identity resolution,
Rudderstack actually takes the deterministic identity graph
that's built through all the various methods
of data collection, SDKs, et cetera,
and actually pushes the identity graph onto the customer's
data warehouse. And so you have, you not even have actually the most powerful concept here is that
you own the table that represents all the nodes and edges of, you know, a user's identity as it relates to
all of the touch points that they have, you know, that you've collected via Rutterstack.
And, you know, for some companies, for smaller companies, that may not be a big deal,
you know, and they just sort of use the out of box, you know, sort of ID identity resolution
that we syndicate to, you know, say a marketing tool, you know, to sort of combine, you know, a new email with an existing user or whatever. But there are a lot of companies,
especially when you're running at scale where cross device becomes a challenge. When you think
about situations in retail, especially in modern retail, where there are digital devices and
physical spaces that multiple people interact with, you begin to, you know,
run into some pretty challenging questions around how to reconcile identities, which of course
affects all of your downstream use cases from product analytics to, you know, so how are people
using the product, right? Well, 30 people in this retail environment use this particular
interface in the last hour, right?
Offers that are sent out that people access on different devices, et cetera.
There are a variety of things that are going to scale.
Financial services and retail actually are interestingly similar in that way, right?
Like we interact with digital finance across a variety of different devices and use cases,
right?
And there may be multiple users in the account, et cetera.
And so there are a lot of SaaS tools
that try to solve that with sort of algorithms
that they run behind the scenes.
And what we've seen is that in the modern architecture,
companies increasingly want the base data set
and then they want to be able to build
their own sort of customizations on top
of that, instead of being beholden to decisions that are being made for them, you know, sort of
inside of a black box. And so Rotterdack and the warehouse first approach says, great, like,
A, we're not going to store your data, charge you for that data, because you're already doing that.
And then B, we're actually going to push
the most valuable table to your warehouse in terms of identity resolution so that you have it,
you own it, and then you can work internally with your data team, data scientist to modify that as
it fits your own business model, because that's really where identity resolution becomes a
challenge. Every business is different. Every sort of user journey is different.
So that's sort of one side. That's a little bit on the sort of deeper end of what it looks like
to have a warehouse first approach. I think the other side is just technology that really services
companies that use their warehouse for a lot of things that SAS was used for before. So one
example I love to give is, you know, we have a customer, a very large sort of e-commerce travel
customer. And, you know, they used some SAS tools for analytics, but they were incomplete because
it was just a sort of point integration. And then they built out
analytics infrastructure. But the load times are really slow because they were doing daily jobs.
And so in e-commerce, sort of speed is the name of the game, A-B testing, et cetera.
And with RudderSec, they actually load all of the behavioral data into their warehouse every 15 minutes.
And so for them, in an e-commerce context, that's faster than they can get statistically
significant results for A-B testing. And so it is essentially real-time e-commerce analytics
on their warehouse. And that's a technology and architectural decision that RutterStack enables that a lot of other tools don't that, you know, for companies that are running at scale that have near real time use cases on the warehouse are really, you know, looks when a customer is using RutterStack to accomplish,
you know, sort of like maybe an identity resolution use case where that data that was
previously siloed lives in their warehouse, or literally just having pipelines that service
the warehouse or help you make use of it in a way that a lot of other tools don't.
You mentioned Google Analytics a little while ago. and one of the one of the one of the
new things that's come through with ga4 for example is is i suppose their privacy first
approach to doing to data collection and handling handling customer data and and for example um i
don't know identity resolution so i'm wondering but that seems to be the way approach they're
taking which is almost to give you less of that data than you previously had.
And then maybe to use machine learning or to use black box techniques to do that.
I mean, is there an argument that what Rudderstack are doing is maybe going contrary to privacy kind of changes that are happening?
Or is this mixing up sort of concepts? I mean, what's your thoughts on whether it's possible perhaps to build like a complete customer view and whether that's going in line
with the way that sentiment is going at the moment?
That's a really good question, especially around Google Analytics.
And maybe what I'll do is start by talking about our values
and I think what we are enabling for our customers. So
really what we're talking about is first party data, right? It's easy to, you know,
there's security concerns, you know, there's, you know, we can talk about a lot of, you know,
there are sort of a number of topics that are symptomatic of the core issue of first-party data. And really,
first-party data as it's collected and processed and stored or managed or, you know, what have you,
by a third-party vendor, right? Google Analytics. And Ruddersack's approach, as I mentioned before,
is to really give you full ownership of that. We actually don't want to be part of the security conversation in your first party data. And we're a
conduit to that. We are not the end destination for that. And so if you are collecting first
party data and using it on your own infrastructure, a lot of those topics that are sort of symptomatic or, you know,
sort of, you know, spicy headlines about, you know, privacy and third party vendors,
a lot of those really don't apply to us. Because what we're talking about is you,
you know, making the most use of your own first party data on your own infrastructure, right? I
mean, it's kind of funny, because in that flow, we are just the conduit, really, it's all about
you. I think also, I mean, there's people, people, people, that whole topic is very interesting and
emotive, isn't it? Because, you know, the logical, I suppose, the logical, the logical end of all of
this is that only the big mega vendors will have access to any data
that they can then use to, you know, the likes of Google, the likes of Facebook, you know,
they will have all this first party data. And retailers and, you know, anybody who's selling
things on the internet, it's getting harder and harder for them to understand their own customers.
So it is a bit, I think sometimes it is a kind of an argument that gets mixed up in two things here.
You've got a third party, you know, I suppose third part, you've got the collection of data by third parties, you've got a collection of data by legitimate, you know, legitimate uses by retailers and so on.
And, you know, I think in a way, Rudderstack is what they're doing is, you know, it's allowing those, it's allowing, I suppose, retailers to still function really in a world where actually the way the direction is going towards the big vendors, really. I mean, what do you think on that?
Sure. I agree with you. I mean, of course I'm biased, but I also faced this before I joined
Rudderstack, right? In that, you know, I mean, e-commerce is sort of, you know, I like it because
it's, you know, it's like the sharp end of a lot of this stuff, right? You're talking about
collecting as much data as possible to try and increase conversion rates. And, you know, it's like the sharp end of a lot of this stuff, right? You're talking about collecting as much data as possible to try and increase conversion rates. And, you know, it's, um,
the issues tend to be more acute, the volume higher, et cetera, than maybe say,
you know, your traditional like B2B context. And so it just amplifies these issues a lot earlier.
And, um, you're right though, it is,. And really, what you're talking about is almost,
you know, it comes to data collection and sort of usage. You're talking about removing a middleman
that is dealing with their own set of privacy concerns relative to regulation, right? And so
if you remove that middleman, and I'm not,
I don't say that in a way of like, there are tons of really good SaaS applications out there,
right? I mean, I look at Google Analytics every day. We actually just feed Google Analytics
server side with Rutter stack data, right? It's a great interface, but they don't collect any of
our data with their own SDK, right? We choose what we send to them and we have full control there.
And that's really what we're seeing a lot of our customers adopt is before it was kind of a choice,
and this is oversimplified, but for the sake of argument, I either need to implement this tool where I may have concerns about their storage and security or
the decisions they're making around what a session means or any number of issues where
it's like, well, I'd rather have the data or actually not even have the data. I'd rather be
able to look at the data than not look at the data, because that's better than not looking at the data, right? But I'm beholden to a lot of
security concerns. I mean, you know, of course, with Google Analytics Classic, you know, the
unspoken challenge or sort of rarely spoken about challenge was sampling, right? You're looking at
sampled data, especially at scale, very big problem, right? A 10% variance from sampling could cause
really big issues, you know, with decisions that you're making and how you're spending money.
You know, some of those problems, I think, are being solved by more modern analytics tools. But
really, what we're seeing now is that companies are saying, I actually the technology exists now where I don't have to choose either or I get both and
right I I collect my first party data I own it on my own infrastructure and I can choose
where to send it which data points to send to which analytics tool and then I have the single
source of truth original copy of all of that in my data warehouse or my data lake.
And so we're seeing a lot of customers do interesting things.
They'll run lots of analytics on their warehouse and they'll say, you know what?
There's an analytics tool that would be really good for helping us create self-serve analytics or help this team answer these questions. Great. We will syndicate
that set of data for those things to that team and the tool of their choice, right? But it's not a,
we just send everything or we don't have access to it, right? You have full control now, which is
exciting. I think Google actually, interestingly has made, you know, ironically, I'm not, I'm not
going to say that, you know, anyone copied anyone or
anything like that, but GA4, the event-based paradigm. And I think importantly, which I'm
not sure if a ton of people have, you know, have looked at this, but you can actually get events
from GA4 directly into BigQuery. And so it's really interesting that Google
is actually kind of conforming parts of their stack
into an architecture that looks very similar
to the modern data stack.
I think what we see is customers look at that
and sort of do migrations to Google Analytics
and look at that whole ecosystem is that you're, even though
I think in many ways, it's just worlds beyond where GA Classic was. And like, there are so many
things, you know, sort of as a longtime user of GA, it's like, man, you know, it took 10 years,
but we're finally getting there. You're really making a decision about your stack and so we as i said before like
our values around openness transparency ultimate flexibility on integration uh we view the google
ecosystem and some of the changes they're making there as really really great um but you're also
making a choice if you build your entire stack on that to sort of limit flexibility to what Google allows you to do. Yeah. Okay. So maybe to get onto the last topic I want to talk to you about,
it does link back to the warehouse first approach. So you mentioned earlier on about the way that
Rudderstack is built is kind of, I suppose it's in line with how things work now and the modern
data stack and the modularity and the developer focus and so on and again one of the other things that interested me about rudder stack is
i suppose the developer focus there the api based approach the whatever and the fact that maybe the
way you build a cdp is something that applies that appeals to maybe analytics engineers and so on i
mean how much of a focus do you have on developers and those that of audience with the product? And why was that?
And give us some examples around the way you're building the product.
Yeah, absolutely. Absolutely. We have a huge focus on developers. In fact, we build the product
for developers. That's our core user. And developer is a broad term, right? If you
are a small startup company, you're developers, you're head of engineering, head of product,
and head of data. And then at a larger company, you have engineers that have the specification
in their title of data engineer. And so it can be a broad term. But we build for
the technical percentile, let's say, and we tend to say developer is the catch all for that. And
there are a couple things. So one of the ways that this shows up most, and I'll talk about it in
terms of a feature, I think, because that's, we hear a lot about this from our users, especially
who migrate from tools like Segment or our other
tools. We have a feature called Transformations. And what Transformations allows you to do
is take an incoming event payload. So let's say you run a track call that is added to cart
in an e-commerce example that represents a user
behavior. Or let's say, you know, in the B2B SaaS example, you, you know, have a user sign up and
create a new account. So you run an identify call that sort of declares that user, you know,
and is going to create a user, you know, row in the user table on your warehouse and then,
you know, create a new lead in your marketing tool and
sales tool, et cetera.
Right.
In both of those use cases, you can, in Rudderstack, run what's called a user transformation on
that event payload.
And the way that we built that feature is actually as a code editor within the product.
You can actually also run these on your own private GitHub repo with version control, which is really cool.
But that's another conversation for another time.
But there's actually a code editor in the product.
And I'll give you an example of this. And I'll give you a very simple use case,
and then I'll give you a more advanced use case. The reason that we chose to do a code editor is
because developers need and have asked us for a very high level of flexibility and control when it comes to the way that they manage their constantly
evolving stack instead of integrations and tools that their data pipelines connect to, right?
The stack is getting more complex. It's not getting simpler, right? There are more tools,
like in some, it's like, okay, we collected the data in the warehouse, but in terms of the
ecosystem of tools and pipelines, it's actually becoming more complex, which is a challenge for data engineers to manage. So on the simple side, let's talk about
how you would transform a payload. So you have a marketing tool, let's say, and a sales tool
and a customer success tool. And inevitably, there's going to be some point at which a field is created in those three tools. And it just so happens that the field name that tool operates, right? It's not necessarily like, let's say an ops team has a very clean process.
Great, okay.
All of these are named the same way in terms of the UI.
But let's say the API name might be forced to be different across those tools, okay?
So now I have a product added to cart event,
or like a user created event that's coming through in a pipeline.
And how do I handle even just those three tools?
Right. And I don't know, the modern company as well, like 100, 200 tools, depending on the size of the company.
So as a data engineer, I'm responsible for getting the data to these tools and having it be accurate and timely.
And so now I have this big problem where the ops people maybe did a good job
of setting this up,
but the tools themselves have introduced challenges
or limitations that create what we would call
like a data engineering problem, right?
And so what transformations allows you to do
is write custom JavaScript code.
And we're working on also enabling this in Python,
which is going to be really neat
for you to take a single payload and transform it on a per destination basis using JavaScript.
It's a code editor.
You're not doing a UI because there are a number of challenges with that.
You're talking about API names and all this sort of stuff.
Really to do that quickly for a data engineer and the developer persona, great, let's go
in. We can write some quick JavaScript, write some quick Python.
The problem is solved in literally minutes, right? And the ops team doesn't have to do anything
downstream. And guess what? Like, oh, whoops, someone accidentally made a change or an update
to one of those field names. Okay. Not a problem. Let's just go in and update the transformation.
That's on the simple side. On the advanced side, let's say you want to enrich some sort of data in flight, right? So one example would be, I want to hit a service like Clearbit to grab additional information know, you're dealing with, okay, well,
then you have all those data fields in Salesforce, but they're not in any of the other tools. So do
you do point to point integrations? I mean, it just becomes a gigantic mess, right?
With Redrack transformations, if that use case comes up, the data engineering team can actually
say, well, we have like a code editor that can hit external APIs. So the user signed up event
comes in, we can hit the API using JavaScript,
pull in the relevant fields, and then syndicate that not only to Salesforce, but to any other
tool that we want, right? And so now you've actually solved a pretty pervasive data engineering
and data, you know, sort of consistently consistency problem across the stack, not by daisy chaining a bunch of these direct
integrations, but actually by allowing a developer or data engineer to write a little bit of code
that sort of simplifies that integration challenge at the root level across the stack
at a single point. And then they can iterate on that as the stack grows
in complexity. So I know it's a long explanation. That's just one example of how we try to build
features that allow the data engineers to actually make life easier for everyone in the data ecosystem,
even the end users. Okay. I suppose even at a kind of meta level, if you think about the fact
that Rudderstack is an open source product and it's been developed kind of more recently, is it even possible to include maybe, if you think about infrastructure
as code and as you're kind of doing testing, as you're doing deployment, you kind of install
and lay out all the components of your infrastructure.
It's possible presumably to have Rudderstack as part of that and deploy Rudderstack as
part of the test pipeline, for example.
Is that possible?
Yes. So there are certain components of that that are possible. In fact, we're doing some
interesting things with Terraform. So you'll see a blog post about this coming out soon
that'll actually allow you to define your whole configuration of Rudderst stack as code in Terraform, which is really, really cool.
And then of course, with those sorts of features, you can really sort of manage your stack as code,
which I think a lot of our customers are moving towards. And so we do have an API first
approach when we're building out features.
A lot of our customers still use the UI, but increasingly we're seeing customers adopt some of those API first features and actually sort of integrate the management of their stack and rudder stack into their existing sort of say CI, CD workflow.
So I think that's where things are going.
And our customers that are trying out those features
really, really love them.
Okay, fantastic.
Well, Eric, it's been fantastic having you on the show.
How would people find out more about Rudderstack?
How would they get a trial?
How would they kick the tires and give the product a try?
Sure, just go to ruddersack.com
and you can click on the free trial there in the header.
There's a lot of buttons on the site. We, you can get, you can send 5 million events for free,
uh, per month. Um, so, you know, you can sort of scale to a pretty large, um, scale up to a pretty
large scale, uh, on the free plan. And then, um, my email is actually eric at ruddersack.com.
I love talking about this stuff. If you have any questions, I would love to hop on a call
and chat because obviously I
can tend to be verbose about this because
I have such a fun time talking about these subjects.
That's fantastic. I can
actually vouch for that. I've spoken to quite a few of you
on Slack and on other
channels and so on. You've all been really helpful
and all really enthusiastic for the product as well.
So it's always good to see innovation
in this space really. So Eric, thank you very much for coming on as well. So it's always good to see innovation in this space, really.
So, Eric, thank you very much for coming on the show.
It's been fantastic speaking to you.
Thank you very much, and hopefully we'll speak again sometime in the future.
Of course.
Thank you. you