The Data Stack Show - 189: Customer Data Modeling, The Data Warehouse, Reverse ETL, and Data Activation with Ryan McCrary of RudderStack
Episode Date: May 16, 2024Highlights from this week’s conversation include:Ryan's Background and Roles in Data (0:05)Data Activation and Dashboard Staleness (1:27)Profiles and Data Activation (2:54)Customer-Facing Experience... and Product Management (3:40)Profiles Product Overview (5:10)Use Cases for Profiles (6:44)Challenges with Data Projects (9:19)Entity Management and Account Views (15:33)Handling Entities and Duplicates (17:55)Challenges in Entity Management (22:18)Product Management and Data Solutions (26:08)Reverse ETL and Data Movement (31:58)Accessibility of Data Warehouses (36:14)Profiles and Entity Features (37:47)Cohorts Creation and Use Cases (41:17)Customer Data and Targeting (43:09)Activations and Reverse ETL (45:57)ML and AI Use Cases (55:53)Data Activation and ML Predictions (57:02)Spicy Take and Future Product Features (59:47)ETL Evolution and Cloud Tools (1:00:50)Unbundling and Future Trends (1:02:10)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Transcript
Discussion (0)
Welcome to the Data Stack Show.
Each week, we explore the world of data by talking to the people shaping its future.
You'll learn about new data technology and trends and how data teams and processes are run at top companies.
The Data Stack Show is brought to you by Rudderstack, the CDP for developers.
You can learn more at rudderstack.com.
Welcome back to the Data Stack Show.
We are here with Ryan McCreary, who is a product manager at Rudder Stack, so close to home.
And Ryan, you've been building a bunch of stuff that is intended to get business users
closer to data.
And I'm really excited to dig into that whole problem because
I think it's been a topic of late in the last year or two in the data space, or at least has
reached fever pitch with venture-backed companies. So I want to talk about that, but briefly,
give us a background. Yeah. So my original background is as a software engineer. I've
been at Rudderstack for a few years now in a number of different roles. So kind of approaching this from different phases
of the customer journey. So started off in customer success engineering, working with our existing
customers. Moved from there into solutions engineering. So
building higher level solutions for prospects and customers. And now I'm on the
product team helping build out the actual solution that we needed this whole time.
Awesome. Awesome.
Yeah.
So Ryan, one of the topics I wanted to dig into was data activation.
I think I've had two or three conversations over just the last week around people complaining
a little bit maybe on dashboards and like, oh, you make a dashboard, it gets to the table
and then really desiring like, hey, what if I could have the data and the tools I already
use? And I think that's data and the tools I already use?
And I think that's one of the big things on data activation.
So happy to talk about that.
And then I'm excited to hear what you want to talk about.
Yeah.
So from a data activation perspective, I mean, that's kind of the impetus for what we're
trying to accomplish with profiles and specifically with like that last bit of the activation
piece.
And so we've talked about this, you know, a lot internally, but to your point, like, yes, everyone's used to like the world of
BI and, you know, here's a view into our data, but it's largely worthless if it doesn't,
A, align with what they're seeing in the downstream tools, but also just being in the
downstream tools is kind of the prerequisite to that. So, you know, a dashboard's great,
but unless you can act upon it in the tool where you live, whether you're marketing product, you know, advertising, anything like
that, it's really largely useless. So really understanding not only how we model the data and
visualize it, but then what do we actually do with it is kind of the key there. So that's kind of
what we're, I guess, discussing today. Yeah, I'm excited. All right, well, let's dig in.
Okay, Ryan, with so much to talk about here,
you mentioned Profiles, which is a Ruddersack product. We did an episode on this a while ago,
actually. So I want to get an overview of that. But first, I actually want to dig in a little bit
to your experience going through multiple customer facing roles before becoming a product manager
at a data company. And I think this is really interesting. So you mentioned that you started
at Rudderstack on the customer success side, then you moved into a solutions architect role,
and then eventually moved into the product. So my main question is, I mean, I'm assuming the
answer is yes,
that, you know, being customer facing has helped you as a product manager.
So maybe that's not true if it's not Telus. But if it is true, what are the specific ways
in which that's influenced your role as a product manager, the way you think about building product?
Yeah. So, I mean, obviously, in both of those previous roles, working very closely with
customers, which, you know, I think is the way to understand what we're actually trying to solve for and what we're trying to build. And I mean, what you see pretty quickly is that everyone believes that what they're doing is very unique. But at the end of the day, that really aligns around a handful of use cases. And, you know, solving it over the years, we've seen a lot of tools kind of come into
vogue and, you know, DBT probably being the primary one.
And I'll, you know, go ahead and caveat that most of our customers that use profiles also
use DBT.
So we're not thinking of the tool as a replacement for that really more as an enhancement.
And so, you know, the strength of DBT, which is also kind of the pitfall for this particular application is that you can do anything with it. It's there is no opinionation. I mean, it's largely just a better SQL interface, you know, but it's oversimplification. And so profiles kind of introduces a light layer of opinionation, specifically around customer data, right? So around the entities that we call them,
you're actually interacting with from a business perspective.
And so, you know, our approach
to that kind of data modeling
is really around,
as I refer to those entities,
largely for most of our users,
that is a customer or a user
or, you know, a person essentially.
And so basically what we do
is we do two primary things.
Identity resolution, identity resolution,
you know, so what are all of the various... This is the profiles product. Yes. So for those listeners who didn't listen to the previous episode, which I love that we're just digging
in here. Okay. So yeah, this is the profiles product. So give us a breakdown. So RutterSec
profiles is the product. What does it do? Yeah. Sorry. A lot of excitement there just to jump in,
but yeah, the two kind of primary building blocks are the identity graph.
So, you know, we have this customer journey across online, offline, different, you know,
data sets.
How are we reconciling that to a single user?
And then that's really the foundation for, okay, what do we want to know about those
users and making sure that we're doing that on the solid foundation of an identity graph
that we believe to be true and trust.
And so that's really it from a level of opinionation, right? Like we think about less of a just fully
unstructured or relational data set, but really more of how do we coalesce around that individual
entity, that individual user? And I've mentioned entities a couple of times now, we can also have
accounts or businesses or households or anything like that can be related to each other as entities,
but it don't have to be. And so that level of opinionation is kind of what we built this on is,
you know, the reason for that is to find out, you know, that single view of the customer,
whatever that entity is. Now we've got to get that, like I mentioned before, into an actionable
place, right? And so like, what does that look like in a marketing automation or a CRM or something
like that? That's really the reason for that opinionation.
And that's really kind of where the opinionation kind of ends.
Okay.
So, right.
And obviously you and I both work for RudderSec.
So, and John, you are an unbiased, you know, consultant.
So I want to ask you a question here.
I think there are actually maybe a number of questions, John, that you may have about
profiles, and then we'll get into like the activation piece and getting business users closer to the data.
But like, John, why would you use this, right?
I mean, you were a heavy DBT user in multiple different roles previously.
And I guess like another way to ask this question would be like, I've never been in a business where anyone asks for an identity graph. So to hear Ryan say like,
okay, you have tools that help make writing SQL better,
which is awesome, right?
I mean, those are, we all use them, right?
And so to add in an opinionated layer
around identity resolution is really interesting
because that's not really,
if you just go out and talk to a bunch of people,
no, there aren't a bunch of people saying like, we need an identity graph. And so John, I mean, I'm sure you have answers
for that, Ryan, but like, John, why is that? I mean, why is that a need? You know?
Yeah. It's funny. I remember first talking about profiles as a product and then some other,
you know, similar products. And I have the same same question of like why would you not just do this in dbt right yeah it's like that and i'm sure whatever
or sequel yeah or or whatever yeah and and i think the funny part is there are a select number of
people who you've even i think talked to these people that fully did this all themselves custom
role they maybe they just wrote
custom software to do it sequel whatever but i think that's a very small number of people
so then the next layer down is like okay i to your point like i'm probably not trying to solve like i
i didn't graph i'm but i want the result of like hey i want all my customer data in one place
and i want to start adding features like churn prediction, I want to add a lead score, you know, I want to add those things
that are have them all in the same place. So once you get to that, and once you get sold,
I think the other big concept, if you're kind of sold on the idea that we want to have first party
data and like have data in a warehouse versus like, keep all in marketing a marketing tool or all in an erp or
something i think that's the other key but if you're there then then like you back into like
oh we need to solve identity resolution we need to solve some of these other problems and and then
when you get there i think it is easy to get trapped and like oh like we'll just write some
sequel won't be that bad right and then i think when you get there and and then you start down the path you realize like oh like this is harder than i thought yeah this is
way messier than i thought and then like any data project you're like even if you did it all yourself
you have to maintain it yeah and that that is the kicker with any data project where even if you did an excellent job the first time,
you don't remember what your past self did
because it's so complicated.
And a team member that maybe has to come in
and maintain it when you've moved on,
maybe you're kind of doing special projects or something.
It's just so hard.
Yeah, yeah.
I mean, I think that maybe is the story.
So, you know, I think our listeners may know,
like I used to do a bunch of marketing stuff at rudder stack and we did a ton of work around understanding attribution
and we did all of that in the warehouse you know with our own first party data whatever
and we happen to have this guy named benji we haven't had him on the show actually probably
be good to get benji on the show and he's just a when he's just like a unbelievable with sequel and so I just had some attribution
needs and so I just went to him and you know I didn't ask for an identity graph I was like hey
actually I need to see like first touch and then I need to see like a couple other things or whatever
right and then so you know five or six thousand lines of SQL later, you know, I have a table where I just every week I'm asking him to add another.
Right. Right.
You know, and he's really the only one who could do it, you know?
Oh, absolutely. Yeah.
So that's kind of like John said, like, no one, and we mentioned it before, like, no one's ever saying, like, you know, what I really want to do today is like, get someone to build me a really solid ID graph.
That's just like, gets me going.
I think every, like, everything starts from a use case.
So even, you know, John was saying, like, he didn't think about the ID graph as more of the features, but even the features themselves are driven by a business use case.
Like attribution, like, okay, attribution is cool, but why, right?
So you're trying to like actually measure and quantify that.
And so, you know, if you think kind of top down, like you have that objective, you know,
we want to understand where we should spend more money, what's being effective, or even
like from a multi-structure perspective, like what's the next best action.
That then informs the features that we want to build.
And then you always end up
in that place of, to build that feature, we need to understand the full customer journey. And so
that's where it really does rely on the base of that being the ID graph. The second thing I'll
mention is, I've been the victim of some of Benji's work and other people where you have this thing.
And I'll be completely honest, when I first, you know, saw the MVP of Profiles,
I didn't get it either.
I thought to myself,
and this is a part of Profiles,
you know, it is,
at the end of the day,
it outputs SQL
that you can read and audit.
And I thought to myself,
like, this is something
I can just write and do.
Oh, interesting.
Okay, so let's stop there
for a second.
So Profiles,
so we're talking about
the identity graph.
Profiles does additional stuff.
But like the actual, like what happens is Profiles, so we're talking about the identity graph. Profiles does additional stuff. But like the actual, like what happens is profiles generates SQL
and then that runs in your warehouse.
Yeah, it's all in your warehouse.
It's the data we're consuming is in your warehouse.
The SQL is being shipped to your warehouse
and Python and some of the ML models.
And then the outputs are actually in your warehouse.
So the tables that we generate are in your warehouse as well.
So nothing belongs to your warehouse.
So we're generating SQL.
Yeah. And so, and, and we see
that a lot when we first, you know, talk to folks about profiles, they're like, I can see the SQL,
can I just use this? And the answer is yes, like you can just use it and that's fine. But to John's
point, that's not the issue, right? Like to come like, yeah, Benji, like we've got a working model,
we've got attribution solved, whatever. As soon as someone comes and says, hey, we have a new data set, we need this as an input, or like this is a new part of the customer
journey, or this is a new data set that's going to inform a feature. That's when the whole thing
falls apart to shoehorn that in. And this is where you see data teams struggle. And well,
really the business team struggle with getting what they need from the data teams is they go
to a data team, they say, I need this simple thing added to this dashboard or to this metric.
Can you just do it? And the answer is yes, It's simple. It's simple. And it is,
but the problem isn't that feature. It's by shoehorning that into a multi-thousand line model,
you risk affecting adjacent features. So customer success comes to you and says,
can you add this in? This is simple. You say yes. And as soon as you get it deployed,
now sales is yelling at you because their dashboards are broken.
And so the impetus for profiles there
is to kind of encapsulate some of that
so that you're not affecting other parts of the model
as you add to it.
Okay, so you mentioned customer success and sales.
And so I actually want to ask about this
because I'm genuinely curious.
I mean, I'm obviously fairly close to some of this stuff,
but I don't know exactly how the sausage is made.
So we get to hear all of that to bring our curiosities.
And John, please jump in here.
But like, okay, so one interesting thing that I know about Redrack because I work here is that when I was doing the attribution stuff with Benji, we were looking at it all on a user level, right?
So like, it's first touch, we're looking at leads, we're looking at, you know,
how does someone enter the site? And then we're like, what did they go on to do? Right? And we're,
you know, you break that down by channel. And there's all this sort of, you know, stuff,
right? Do they request a demo? Do they sign up for the app? Do they do all these other things?
But the customer success team is actually much more interested in sort of a collective account
view, right? They don't necessarily care about the lead number they want to know what like an account is doing in the
product how much data are they sending which features are they using etc that is actually
pretty interesting because like when because i also asked for that because of course we're doing
attribution yeah eventually i asked benji to add columns that were representative of, I guess you could say, an account roll-up, right?
So you have a user, but then I also want to know how many other users are associated with this account.
Yeah.
That actually is where things got really wild.
And if we thought it was complicated before, that's when Benji, like, that's when it got really crazy, right?
Because.
Right.
That's when he quit.
I think you're not.
Yeah.
That's when I inherited it.
That's actually, he did change roles.
Not because of that.
I didn't.
Yeah.
But can we talk about that?
So, like, we talked about an identity graph.
But, like, if we just have that on a user level, that's fine. Like Benji sort of rolled like a V1 of that in SQL. But how do you think
about these different sort of, I mean, you said entity, like what does entity mean? But like
account user is kind of a classic version of that. Yeah. That's a really common one. You know,
another common one in a different space would be like a household, right? So like a roll up of
multiple users, but an account, we think of the same way.
And so like to kind of where we started with this,
when you think about like,
let's take Rotterstack as a product, as an example,
from a sales and marketing perspective,
we care about getting an individual across the line,
whatever we define a conversion, right?
Like signing up for the app,
setting up a source, whatever the case may be.
From a customer success perspective,
not to say they're not worried about the individual user,
but when they're thinking about what is the overall product adoption
or health score of this account,
they're thinking of all of the many individuals within that
and how they're behaving.
Because different people are going to use different parts of the product.
Yeah, exactly.
And some might not use it at all, right?
So when you think about, again, using RudderSec as an example,
you know, the front-end engineer
who may be responsible
for most of the instrumentation,
they're never in the app.
So if you're looking at it from,
if you're looking on a digital basis,
you'll say like,
hey, this front-end engineer
is very uninvolved.
Sure, the personal upstream of the API.
Like, they're just sending it to an endpoint.
Yeah, they're just sending it.
But if you look at it on an account level,
you might say,
oh, wow, well, you know, the business user is in here daily, you know, looking at the health score
of, you know, their health dashboard, understanding the, you know, the overall volumes, their
downstream destinations, they're the ones that are getting the emails that says like this threshold
is dropped, go in and check. And so like, on one sense, there's the aggregation of those. On the
other side, there's also excluding those. So like I work very closely
with a lot of our customers.
So I'm in some of their workspaces.
So if you were looking
on an individual level
or if you weren't calculating
the account entity correctly,
you might say like,
wow, this is a really active account.
Like they're in there
setting stuff up.
This person's in there every day.
But then you realize that's me.
That's like an internal employee
acting on the customer's behalf,
but I'm part of their account.
And so it's not only important
to include the right metrics in that,
but also to actually do things that might, you know,
influence that incorrectly.
Yeah, like dev prod also, right?
Like you may see a bunch of activity in a dev environment.
Yep.
That's interesting.
John, how did you, like, entities?
Like, talk about entities a little bit.
And like, did you face any of that?
So, I mean, the funny thing about entities a little bit and like did you face any of that so the i mean the funny thing about entities is if i had to pick something in data that almost everybody handles poorly it
would be entities but i just like one thing where interacting with like companies i've worked for
companies i've worked with yeah like getting that it's so hard for them and i think some of this
is some of the a lot of products are tiered around entities, where like you can do the individual user, but you have to be enterprise to like do that's part of it. Yeah, and then people like end up just hacking things. So they don't want to, you know, upgrade. Yeah. But the other the second, well, this is probably even bigger than entities. And I have to bring this up is duplicates. Yeah. So when you're doing ID resolution, like there's no magic, right?
There's no AI magic that like can deduplicate
your customer records yet.
Yeah.
But, and then of course,
there's two different types of duplicates.
One, which ID res solves of like,
it's in two different systems and we have an ID
and we can like stitch them together.
The other one is the one that is the tough one
where they're truly duplicates.
They have different IDs.
There's no like clear way to do that. but i'm sure people would be interested in like how people you
know how people are addressing that that problem or how you've seen customers address that problem
yeah i mean that's a tale as old as time i mean when we think about stitching users together
in profiles it largely is the deterministic system but the way that we stitch is based on the ID types
themselves. And so that gives us the ability to map back and find some of those outliers. So when
we think about setting up the initial ID graph, we have some scripts that will run some QA on that.
And there are some that are very easy to spot. There are others that are more difficult. So we
worked with a customer recently who, you know, we built this out. They were very pleased with it.
But when we did the kind of QA of the ID graph, we found there was a single user that had, I think they were
stitched to like 10,000 different identifiers across the user stack. And you might be in
trouble if. Yeah. And there's two things to think about there. One is depending on the use case,
you may not care because you may know, hey, that's an internal user impersonating folks.
Oh, sure. Testing. If we're stitching it together, like something like for marketing use case, you may not care because you may know, hey, that's an internal user impersonating folks. Oh, sure.
Testing.
If we're stitching it together, like for marketing use cases, it doesn't matter.
That means that person might get a couple extra emails to their other emails.
And so it's not a huge deal.
When we're thinking about, you know, maybe custom offers or if we're doing more sophisticated things like, you know, there are folks that use their customer data for like password
unlocks or account unlocks.
That's much more important to be stitched to that user.
So we find, you know, part of it is that could seem like a bug of like,
oh, wow, profiles is stitching all these people together.
In a way, it's a feature because it helps point out then instrumentation flaws.
And so what we realized with this particular customer is they, it was their policy,
their standard operating procedure was that some of their employees would impersonate
other users to place orders on their behalf. And so that seems, you know, fine, but then you realize it
only takes one node to stitch all those together, right? Like when you now sign in as this person,
you know, anonymous IDs is a good example. Every time you clear your cookies or launch your browser,
you may get a new anonymous ID. So a user having a bunch of anonymous IDs, not a red flag,
but if two years users have a bunch and now you've impersonated that one,
you're now stitched to all of those other anonymous IDs
and everything that those are stitched to.
And so we do have mechanisms around excluding specific nodes,
whether that's things, you know, often we'll know that,
like, let's just ignore internal email addresses.
But we can also do it programmatically where we feed duplicates
above a certain outlier into a table that are excluded in the future.
And so a good example is for most operational systems,
you know that a user should, I don't want to say most,
but by and large, a user should have one email address.
And so for most systems, if they have two email addresses,
that is something we want to take a look at and understand
why were those stitched together?
Because that's going to be more wide reaching.
And so we can also put thresholds around individual ID types of like, again, for anonymous IDs, we're okay with the
threshold of, you know, anything below 250, whereas emails, we want it to be exactly one per user or
internal IDs as well. So it helps kind of find some of those anomalies. And again, a lot of times
that's, that's the challenge that we're helping solve is this goes back to instrumentation.
And sometimes, and with the the or like data inputs in
general exactly yeah and so like you know for the customer i was just referring to we actually
realized they had really good server-side identification on these users so we're really
just able to basically ignore anonymous ids oh interesting because we know there's you know web
browsing behavior yeah we know that there are systems in place that we're not going to get rid
of that are merging some of these together but but we're using a much more robust, you know,
internal identification system. And so it was really fine to ignore those. But that showed us
that and then also allowed us to speed up the project because that's a lot of stitching you
don't have to do. Sure. Oh, gosh. So I have a funny example of this. And I think this happens
a lot in businesses. So at a previous company company we had a order management system not connected to some of the online systems we used and people would
enter orders right and then we had some integrations that would flow between the systems
and it was funny you got me thinking about it with the id graph thing with a bunch of like one
node tied to a bunch of different ids so we had this customer in there you know and it started
popping up on analytics reports.
It was, let's say, Jane Smith. It was some person's name, and they would just have this massive number of orders. We're like, nobody's ever talked to her. We used to give her a call.
This is our best customer.
Yeah, this is our best customer. Who is this? So it was funny, and I think this is true of a lot
of OMS and even CRMs. What had first, it grabbed the name off of the first order that came in
and it stuck that.
And it was just an integration where it was all the Amazon orders.
So everything that came in from Amazon, it grabbed it,
grabbed the first one that came in was like Jane Smith
and then it just stacked them up.
So if you did reporting off of it, it was like,
wow, who is this Jane Smith?
So I think that happens in a lot of these systems.
And if you're
just browsing like
one record at a
time like
operationally like
it just doesn't show
up but once you get
into the data
problems like it
it shows up in a
big way sometimes.
Yeah.
Yeah that's super
interesting.
And I think one
thing I'll add to
that is you know
I mentioned that
everything we're
doing is SQL that
you it's running on
your warehouse that
you can see and I
think that's a big
that's something that's appealing to me because you know traditionally using these
black box systems you don't ever see that's happening and you know like you're yeah i dealt
with that for who knows how long until they realized like oh shoot like this has been
happening forever yeah and in a closed system that just happens and you're unaware whereas
if you can see the sequel that's running you can yeah debug you know what might be causing sure
yeah yeah well i mean going back to, and then I want to talk about,
I mean, we haven't even gotten to the pictures, but that's fine. Brooks is actually not here today
for all the listeners. And so we can go long, which is...
I get invited when the producer's not here.
That's exactly right. The producer's gone. Let's get Ryan on the phone.
One of the... So actually, so John, i want to return to something you said so you said one of the things that you know most companies do poorly from a data
perspective as entities right i think that is probably most of the explanation of why every
salesforce is the biggest nightmare yeah right it is. Right? Yeah, it is. Yeah. All Salesforce customization is like trying to wrangle entities
into a system that is like a lead and contact account opportunity,
like whatever, you know?
That's why Salesforce developers exist.
Sure.
Yeah.
And they make a very good living.
I know.
Yeah.
But it is essentially like fairly complex entity resolution inside of a system that doesn't support, that only supports, like that is only designed from a simple problem but but even just that simple like parent
company child company or a sim or yes like multiple people in one company like that's easy enough to
like mess up but once you have parent companies and they spin off and then they merge back together
and they change names a hundred times like that's the challenging data problem especially over time
like do you want to update that information forever? Or do you want to keep
a record of like, in 1997, they were this and then, you know, like, the slowly and data is
like that slowly changing dimension problem, which almost nobody does that. They just, you know,
retroactively. Sure. Yeah, we talk about it a lot. But we just retroactively, like, just, you know,
update it every day is when they change names, or, you know, get acquired.
Okay, so let's switch gears a little bit names or, you know, get acquired. Okay.
So let's switch gears a little bit here.
So, A, that's really interesting.
And I have a bunch more questions, actually.
Actually, okay.
One more question on this to close it out, just from a like product manager standpoint,
just because I think it's really interesting to think about how we build data products
generally, right?
We've had a ton of people on the show, but this is very interesting to me.
So as a product manager, one of the things that you spend a huge amount of your time on
is a product that generates SQL, the first output of which is an identity graph,
which is something that no one asks for inside of a company, but that is required in order to like resolve entities or
whatever. How do you think about, and that's a very, that seems like a very difficult problem,
right? Where it's like, you don't like, no one's asking for this, but it's actually what you need.
I mean, you're hurting my feelings right now. Like you work on a product whose primary output
no one wants. Thank you. I mean, I think it goes back to what we were saying before is like,
you have to solve that to solve the actual problem.
Right.
And so, and what is the actual problem, actually?
Like, I know you mentioned this, but just to say, like, because identity graph is a stepping stone.
Yeah.
I mean, the actual problem is to solve business use cases in the tools where these business stakeholders live.
You know, like, again, like we talked about Salesforce.
Like, I don't care who you are.
You're not going to get your sales team out of Salesforce. Yeah, of course not. You shouldn't.
Yeah, you're not going to get your marketing team out of customer IO, iterable, braze,
whatever you use, like, that's where they are going to live. That's where they are doing their jobs. And so, you know, all of this is for nothing if we can't make it useful.
Totally. So how, so I guess, like, maybe to put a little bit of sharper point in the question, the solution to that problem and what you're building lives really far upstream of Salesforce.
I mean, I guess you could argue about the distance, right? But the person in Salesforce probably should never know about the intricacies of like entity resolution or
all of that that's happening, you know, in the data warehouse, right?
Yeah.
How do you think about that just as a product manager?
And like, you have this outcome that needs to happen in a business.
And then you have this really technical process.
Well, even I guess what's interesting about profiles is like,
the identity graph is actually just a stepping stone to produce like computed user
traits, right? Yeah. And so it's even upstream of the stuff that the data team produces. Yeah.
Yeah. I mean, at a high level, you know, it's a single product profiles, but there are really
two interfaces for it. There is the actual data definitions, you know, this ID stitching,
the building of the features, you know, this ID stitching, the building of the features,
you know, which eventually result in these output tables
that we mentioned in the warehouse.
But it also, yeah, like I said,
it's all for nothing
if you can't access it.
And so, you know, we have a UI
that essentially, you know,
so backing up, I guess,
profiles is a set of configuration files
that connect to warehouse,
you know, build these queries,
run these queries, get out, build these queries, run these queries,
get out, build output tables.
And then that's all done
in like a version controlled environment.
So you can manage that
in whatever version control you use.
And then that actual, you know,
Git repo can be connected
within the Rudder stack UI
and allow for the business users
to interact with.
Oh, interesting.
Okay.
So the data team is doing all of this in their own dev workflow with config files. Yep. But then the actual user
interface, like the RudderSec web app, is reading the outputs? Yeah, it's connected to that Git repo,
which is what is being used to kind of source and build those tables. And then the UI sits on top of the warehouse tables as well. So, you know, I always have to preface
this when I'm doing demos or explaining the product to folks is that the UI is admittedly
slow because it's actually pulling from the warehouse. Everything that you see exists in
your warehouse. Wow. Okay. And so of course, the reason for that has to be to expose that data to someone who's not on the data team, because why wouldn't I just go like into a warehouse?
I'm literally already there and I have the config files, right?
Yeah.
Okay, so walk us through that.
Yeah, I mean, if you think of it, it's a spectrum, right?
So there are a lot of teams that operate in a lot of different ways.
In some teams, you know, you have to teach everyone how to use the BI tool or how to understand how to query this data to get what
they need. And so the way that we think of it is, you know, how can the data team live where they
want to live? Can they have a technical tool and use software development best practices,
but then give that to the non-technical stakeholder? How can they, you know,
have that in a way that they can see and
understand it and then understand like, at what grain am I sending this to the downstream tool?
Like, what do I actually want to send? And again, different teams operate different ways. Some teams
send everything. Some will, you know, slice that data according to their needs and send,
you know, subsets of that. Some will send full audiences as just lists of users,
some will send traits and then do the dynamic audiences in the downstream tool.
It really kind of caters to whatever they need.
We talk about it.
It's kind of funny.
I mean, this is a technical audience, so I can be honest here.
But we talk about it as being, you know, for the two different users, you know, there's
the technical solution for the technical user and then the UI version for the non-technical
user.
But they're really all for the technical user and then the UI version for the non-technical user. But they're really all for the technical user. The technical user wants the business teams to have that self-serve as much as
maybe more than the business team wants self-serve. Because they don't love
sending CSV. Exactly. And they don't want to handle all the
ticket. So they're both for the data engineer, but that's
a necessary solution for the technical.
So it's like a presentation layer.
So like, hey, look, I can show you what this thing does.
Yeah.
Yeah.
Okay.
Okay.
So can we take a slight but related detour and quickly talk about reverse ETL?
Yeah, the producer's not here.
We can do whatever we want.
The producer's not here, so we can do whatever we want.
So this concept of reverse ETL, you know,
has cropped up in the last couple of years, but I think it's actually an old idea, right? I mean,
this has been happening. It's yeah, it's ETL actually. I mean, you've actually mentioned
this. Like you've talked with a bunch of companies who just call it ETL. Yeah. I mean, it is. I mean,
I would say I would put reverse ETL in the same bucket as the ID graph. It's not, no one's like,
give me a reverse ETL.
I mean, they are now because we've told them they want it,
but like no one's out there.
Like, you know what I really want to do today
is like get into some really cool reverse ETL.
Like it's just a, it's just a means to an end.
Like it's in the same way that the ID graph
is what we need to build, you know,
reliable data solutions.
Reverse ETL is what we need to get those data solutions
into the tools
where we actually can use them.
Okay, so spicy data take here
from both of you.
Like, how did it become...
There's obviously a ton
of buzz around it, right?
I mean, Ruddersack has
a reverse ETL pipeline, right?
Yeah.
But then the other thing is
just it seemed like this...
It seemed like a quote-unquote
industry unto its own but now you
just every company is building this right even like the marketing tools right so john i mean
you see this every day right i mean it's like it's actually just atl data movement and any company
can build a pipeline to slurp it up but how did it become like a thing for a couple of years
yeah we talked about this a little bit before the show today. And my theory on it is you, I had this pulled up, but I think Snowflake IPO'd in 2020, around
2020, you know, biggest IPO in tech history, really splashy.
So that freed up a bunch of money for startups, right?
And then it seems, I think those were Eric's's like, people form startups around features of products.
And then you had all these tiny little slices
of like, we do ETL.
Well, we do reverse ETL.
We do observability.
We do transfer.
I mean, just every little slice imaginable, right?
When they all got funding.
And then in the last couple of years,
AI has kind of been the focus, right?
So the funding's a little bit drier in the data space nowadays.
And you're seeing some merging of companies and some acquisitions and some others that
are like, I don't know if they're going to make it.
But so I feel like it's just the macro environment that created it, honestly.
Like in another time, you know, would, let's say Fivetran, would Fivetran just be like
the data pipes company they do
reverse and maybe transformations too like i don't know maybe which is kind of what all tricks is
right yeah like because that was like a generation before right exactly yeah so i mean who knows
remains to be seen i agree i think i would add on one one layer and ryan feel free to disagree
with any of this because we love we love a spicy take on when the producer's out here yeah but I mean the intent is good right like I think yeah I mean to
your point Ryan you're like who cares about an identity graph if you're not getting it into some
tool that marketing can use to send a campaign to like increase conversions right or whatever
their use case is downstream right and of course if you're just writing a python script to do that
you know or you have some custom etl job like that's annoying to manage over time and it's
you know arguably not the best use of the data team's time and so having that as a managed
service like of course makes sense but i do agree that you know they're like of course like it is
probably a feature and we see that now right like now
marketing teams can literally self-serve from their own platform like data that's available
in the warehouse yeah i think some of that comes from too i mean to john's point about the snowflake
ipo i mean i think you've seen a huge acceleration too of just the accessibility of a data warehouse
and so you've got teams that normally wouldn't have had access to that. Now you can just go sign up
for Snowflake for free
or BigQuery or whatever.
And so these are smaller,
you know,
maybe even younger
software teams
that a lot of times
maybe aren't even data teams.
They're just the software.
They're just the engineering team.
And so ETL is not a concept
that they're well-versed in.
And so there is a place for,
I think, reverse ETL
from that perspective.
But I think as you enter into a mature data team, you see that it becomes much more of a,
you know, just kind of table stakes. Yeah. Yeah. I think you bring up a great point,
Ryan, because historically databases were very locked up. Yeah. Like if it's a production
database, it's lock and key. you lock developers out of it you've got
like a couple of ops people that have access to it you've got privacy concerns you've got
uptime concern we don't want to take down production databases so some of it too is that
like oh like i can go click a button sign up for this thing and have a database like this is cool
and then i can move just the data i want and it's not going to impact, you know, production and I can anonymize things.
Like some of that is like we kind of unlock what used to be like a lot more tightly held.
I think about the early days of Data Studio in just how, well, we probably don't need to go down that path.
But it was, there was a certain element of magic to it.
Oh, it was certainly magic at the time, yeah. I can can get it's so easy to get data into big query yes it's so easy to just lay data studio
right on top of this and you know do like really cool reporting things that were so so hard yeah
any other way now of course like course, like, you know,
Ryan's ugh was like, yes,
there are a number of things about that.
You prefer him to call it Looker Data Studio?
Oh, gosh.
Oh, yeah.
Might be worse.
Okay, yeah.
Well, okay, that's a totally other episode.
That's a totally separate episode.
Okay, so we talked about reverse ETL a little bit.
Thank you for the spicy takes.
But let's just say I have any reverse ETL pipeline.
It doesn't matter, right?
But profiles is outputting this identity graph.
You build all these traits and profiles or, you know, what do you call them?
Features.
Features, okay.
Okay, so features, user features, entity features, I guess, if I need to be very accurate.
Yes.
And so I just have this table or maybe a set of tables that are like okay this
is my entity and here is like everything i knew about this entity so do i just slap a reverse etl
job on there and like i'm off to the races and this is of course a leading question because
yeah you as a product manager just shipped two features one is called cohorts and one is called activations and cohorts is actually
sort of like a an opinion about creating subsets of this giant table that represents an entity
yeah and so why don't i just send just use a reverse etl job to like connect to this
entity table yeah and then send the data where i want yeah i mean you can uh ultimately like that
was kind of the original intent you know it's just like hey you have this you can send all of it or
some of it you know wherever you want and you know i think that becomes that becomes a challenge
at scale because you know who knows what or how you want to send that you know like that's a level
of opinionation for the business to understand and And so what we did, so, or just, I mean, honestly, just a ton of data. Yeah.
Hundreds of columns. Yeah. I mean, if you think about sending all of that to a downstream tool,
I mean, yes, most modern tools support custom traits and things like that, but you do just,
you ship that mess somewhere else now.
And so you have to deal with that.
And so I've mentioned entities a couple of times.
Early on in the product, we had the concept of entities since day one.
We noticed customers kind of almost hacking those.
So like a good example would be, you know, multiple customers we found were stitching
users as an entity together and then had a second entity
that was like known users or customers so essentially like re-computing that identity
graph for users where they had an oh interesting right so like you want the whole id graph when
you think about things like attribution when you want to you know you have a bunch of anonymous
users but you still understand how they got there or what their behavior is but then when it comes
to targeting them you know again to the same point of tons of columns,
that's tons of just empty records
that you don't need to send to your marketing automation
or ESP or anything like that.
And so we found them kind of hacking this together
as a user's cohort and a customer's cohort,
or a user's entity and then a customer's entity,
which was just, at the end of the day,
like a subset of that, but it's just driving compute.
A filter.
Yeah, and so cohorts was kind of born out of that is okay you have your entity
you know you can now define on those traits that exist in the entity a cohort which is a subset of
that entity graph and all it's really doing is filtering that stitched master id based on some
criteria that exist about those so now you have a entity, and then you can have a known user
or a customer cohort within that, or really any type of cohort that you'd want. Those cohorts can
also have different features than the main set. And so that's kind of why we saw customers
beginning to break that out into a different entity is because if you think about,
you know, something simple like just calculating an aggregate of LTV on customers,
even if everything's null, to calculate that on all of your anonymous users,
take your time and compute. And so you really want to actually compute those features on the cohort which they actually apply to.
Yep. That makes total sense. Yeah, that's super interesting.
Have you seen cohort creation
kind of follow team use cases?
And so I guess like the immediate thing that came to my mind was
if I have a known users cohort, like as someone who works in product
or I'm trying to understand feature adoption
or I'm trying to understand,
I'm trying to increase lifetime value of my
customers or e-com or whatever. Or do those like, do cohorts sort of fall along business lines or
what kind of patterns are you seeing there? Well, that's where it gets really interesting.
We've seen, you know, customers in different verticals and really even different structures
of internal teams that have taken those different ways. And so in some cases, yes, it's by kind of
function, you know, so the product team wants to look at this, the marketing team wants to look at this different cohort.
We've also seen cohorts acting as journey steps or funnel steps where you can have mutually
exclusive criteria for each of these folks can move, you know, between them and you can
target those accordingly.
That's something where, you know, we're still deciding where we probably won't have a heavy
opinion on that because I think it really depends on the team and how they operate.
And so some teams will have just that basic, you know, user cohort and the
known users, and then they will activate or send that known users cohort, either the whole thing,
or they'll segment on those features that exist and send subsets of those with them.
That's where, you know, maybe a more resource constrained team where there's a single data
engineer that's saying, this is the clear definition of a customer. You guys go run with it and send it to the tools how you want.
And then we see teams with more robust data teams where they say, here are the five primary cohorts
that we have defined and split the customers into and then put features on. And so these are your
entry points. And that could be something like US customers, or, you know, we've worked with an
e-com customer recently that their primary ones are business
and residential and those two
teams operate very differently.
Oh, interesting. Yeah.
The sky's kind of the limit as to how those are
segmented. Interesting. All right, John.
Cohorts. Did you try to do this?
We did.
You were rolling a bunch of stuff.
It's funny and this is kind of
a sad story, but we...
How sad? Our producer stuff. Yeah, it's funny. And this is kind of a sad story. But we are producer leaves and we get like these hot takes.
No, it's a sad story of one of those companies that was funded in that, you know, 2020 range
that built an awesome product that got acquired and then basically killed. But yeah, we actually
there's a small primarily email tool,
but they really built a pretty robust kind of customer data features into it. Like I said,
they are no longer exist. But one of the things we did was feed custom entities into that. And
they did some neat things like computing, like predictive stuff inside the tool as well. But
that was something that we found was really helpful for,
you know, for targeting and for email and customer messaging inside that tool. And then getting
insights, like one of the cooler things that we did is we had this cut up like product ranking
thing where it was like an X and Y axis. And it scored it on like views and conversions.
So like what are your like high view, low conversion
or low view, high conversion all on like a X and Y axis.
So that was something that we like piped it into.
And then from a customer data standpoint,
I think the biggest problem we faced
where we were selling B2B and B2C
was how do you pinpoint the customers
you should reach out to especially like businesses because you'd get purchases and they'd be from
some big names they're like wow and you know and that wasn't necessarily the only like indicator
that would be a good customer but that was an interesting one because you can't reach out to
everybody but you do want to reach out to especially if a business buys something just
like well what else do you buy like where else are you buying and that was probably
one of the more interesting customer problems yeah we're working on that like at the very end
of my time thinking through like all right how can we rank them let's find properties to rank
them on and then give like a call sheet or an email sheet or something to a sales team and then
automating that further yeah so that was probably the most interesting man how rare to have like some sort of like customer engagement tool that
actually handles entities well i don't that may be the first time it gets killed then it's gone
yeah it's sad that is really bad yeah uh okay well actually speaking about that okay so cohorts is
one of the things you recently launched, right?
But speaking about email tools,
there's this other piece of this called activations.
So what is activations?
And to put a spicy take on it,
like, is it that sounds just like reverse ETL?
It is. It is reverse ETL.
Reverse ETL.
Wait, that's reverse ETL. Reverse ETL. Wait.
That's just ETL.
Yeah.
So they're like, cancels out.
It cancels out.
Yeah.
I mean, so at a high level, essentially what we saw is that we're providing a way to define these entities in like a trustworthy manner for the data team to own that definition.
And then for the data team to segment that further into, you know, again, cohorts that different business units or teams or, you know, different phases of the customer journey
cared about.
And so that became the grain at which we saw people needing to actually get that into the
downstream tool.
You know, like you've built your ID graph like you've built your id graph you've built your features you've subset that into you know usable buckets
of users and then that's where it was like okay now we've got it to a place where we can actually
action on it and so like again like beating a dead horse here but like that you still can't do
anything until it's in the tool where you want it sure that's the inner literally just talking
about materialized views in the warehouse.
Yeah, yeah.
And so that's literally the grain at which it was like,
okay, now you need to get this into the downstream tool.
Because RudderStack is building these,
we know exactly how the views are materialized,
where they live in the warehouse.
And so it becomes very simple then for a non-technical user to say,
you know, I'm looking at this UI,
which again is built
on top of snowflake or warehouse data i want either this cohort or even a further segment
of this cohort or even some traits of features of this cohort in my marketing tool and so
activations is basically you know a ui you know low number of clicks way to get that there i like
i wish it was one click that's like what I was really going for.
But honestly, because you
have to kind of map it. It's a feature not about. Yeah, you have to map
it to the fields. It's hard rails. Yeah, you have to map it to the
fields in the tool that you're sending
them to. So, but the idea is
that gives you a centralized place to say,
you know, again, business
user is exploring that, saying
this is what I want to get or a subset of this and
then put it in the downstream tool. And then that's now connected in the UI, at least to that cohort.
So they can see all the places it's being sent or what sub slices of that are being sent. And so it
really, really kind of ties a bow on that notion that I mentioned of data team owning the definitions
and the config and the business stakeholders owning the interface to that.
Yep. So to go back to the spicy take you are actually
actively just turning reverse etl like melting it into like it's just under the hood and a business
user like goes to look at data and then they're just like i just want it in this tool which
actually is just yeah i'm like reverse etl is bogus use Use my reverse ETL. Yeah. Yeah. Exactly. Awesome.
Yeah.
That's hilarious.
Like what tools are supported out of the gate
or commonly used?
So downstream,
it's any of the integrations
that Rudder stack already has.
Oh, all right.
Okay.
Wow.
So there's a big library.
Yeah.
Nice.
So anything where you would send
click stream or reverse ETL data
is automatically supported
by activation.
Okay.
I will trade your one click
for like
the Ning data anywhere okay
i gotta ask a question though and john this is for both of you okay and i don't care who goes first
you guys can fight over it but we talked about and i mean i kind of know this for myself and
maybe this is just because i was like a very technical marketer and i like actually did go
into the warehouse so i'm probably not the right... But why is it important?
You mentioned the data team can own the definitions, right?
Couldn't I just go in and create a bunch of definitions?
Why is it important to have that dynamic?
Like you as a marketer?
Sure.
Well, because you would do it wrong.
Okay, Dan, hold on.
I do need to create a definition.
Me personally? Yes. okay hold on i didn't need to create a decision me personally for like yes
i mean that's a good question and i mean my answer to that is that
data is never as clean as you want it to be and so okay a good example we worked with a customer
recently where they in their downstream tool wanted like a list of, they wanted to see and do activities on recently active users.
And this was a product like Rudderstack, you know, a SaaS-based product.
And so someone on the marketing team, probably someone brilliant like you, like just literally like grabbed like, you know, account of like, did they have a session in the last week
and they were using that as active users and you know wasn't converting like they thought it would
or had in the past and the data team was able to come in and say you know this tool is primarily
used to a browser extension and so like if you're signing in you're probably not really using the
tool well if you're using the tool well you're using it from a browser extension and you're
never signing in and so they were able to come in and essentially correct that feature
by the definition of it but everything downstream remained the same right so everything that
marketing had already built around saying sure recently active was still fine but the definition
was just done more more reliably around the business concepts and that's not i mean i love
to knock against marketing people don't get me me wrong. It's my favorite activity.
Me too.
But that was just someone doing the best with what they had.
And they didn't realize because that's what existed in that marketing tool based on just
like click stream data.
It was what they had available.
Yeah.
But the business has access to those metrics in a different data set that's not available
in the marketing tool that can say like, oh, this was the average number of minutes they used the tool
last week and that's much better.
And so that's why I think it's important for the data team
to own those definitions.
But the marketing team, as much as I hate to
admit this, is always going to know how to use those better.
Like how do we actually target folks?
Yeah, but it's like filters on
it's like core
business definitions that shouldn't change
because it can create situations where someone is sending a campaign and actually reporting something that's inaccurate.
Exactly, yeah.
Interesting.
Yeah.
Yeah.
Well, and I think just as a general concept, some of the best tools are tools that bridge two teams or more than one team together.
I mean, that is a lot of the value of really any SaaS tool
is like, okay, this is how marketing looks at it.
This is how data looks at it.
And providing like clarity and an interface to work that out.
Because that flip side is often true too, right?
Where the data team like does things,
they model things technically correct
and they do everything right, like technically.
And then marketing is like, yeah, but this isn't useful yeah because of like x y and z like business rules
yeah or just like oddities of like how some systems set up that can't be changed or whatever
yeah so like you have to have that like the these tools like you know like profiles like forces you
to kind of get it on paper and to agree on it. And then like, I think it can flesh out a lot of the problems with, with data definitions. Cause,
because a lot of times the data teams and marketing teams aren't working together
because the marketing team can have their fully enclosed black box and just like do stuff. And
they would never even know if it was wrong because they don't have anything to like compare it
against. Yep. So is it, so it sounds like both of you are advocating for this world. I mean, there is this whole self-serve analytics, data democratization,
blah, blah, blah, right? That's another episode where we can talk about how that didn't materialize
or when it did. It's some very severe issues. But what's interesting is you could say,
okay, we're just going to send this data to your tool and then you can do whatever you want with it.
But to your point, Ryan, I think what's interesting is without context or sort of without an agreement
on what the meaning of some of those core business definitions are, which ultimately
like materialize as, you know, some sort of a column or a table or something like that's
actually how it exists physically, unquote in the business but it
seems like there's this desire to create how do i want to say this get the marketer me closer to
those like physical assets in the warehouse but in an environment that has like a bunch of safeguards yeah exactly okay interesting and i think to the vision of okay we need clear ownership for pieces of this
thing because like actually after you're if you're at a certain scale like nobody can do everything
right yeah so you have clear ownership but then you also have like we're talking about with a
earlier you have visibility between teams like i'm not owner here, but I can at least go see
like what is happening at a high level on this downstream thing that impacts me.
And then same on the other side of like, I can see like the results of what I did. I think that's a
really positive thing. So the teams feel like they're actually doing something useful versus
completely siloed of like, I ship it across the wall. I don't know what happens. I don't know what happens before me or after me. It's like an assembly line mentality
in a bad way versus like the bold visibility of like what's going on and then like clear ownership
lines in the process. Super interesting. All right. So, two more questions before we end here.
I actually have no idea how long we've been recording,
which is a great feeling.
We have, yeah.
I don't know.
That's such a great feeling.
All right.
We may change the format of the show.
Yeah.
Just, you know, the two episodes, add video, you know.
Yeah.
Okay.
So two questions, one for Ryan and then one for both of you.
Actually, we'll end on a spicy take.
Great. This isn't the spicy take. Okay, great.
This isn't the spicy take. This is for you. So what are you building next? Okay, so you have your identity graphs, profiles, you generate an identity graph. No one asks for it. Everyone
needs it. You build traits on top of that becomes this sort of table that represents everything you
know about an entity. You create cohorts that are business
definitions. Business users go in, they look at a cohort, they can filter it. They send the data to
their tools. I mean, this sounds like a great world. What are you building next?
Yeah. A couple of things we're focusing on right now. One is already live, but we're doing a lot
of work around it, but is around the ML piece of this you know a lot of teams that are approaching us and wanting
to use profiles are wanting to do so to build kind of that solid foundation to start thinking about
ml use cases you know everyone's trying to do right yeah like that's what you do do you mean ai
brian oh you've done this whole time when nobody said ai i think we're like an hour in crazy no you
did mention a oh you said there's no ai tool to... Yeah, you're right, you're right.
You did mention AI. I have not said that.
That should be a game.
Although, as a product manager, you did
say it's a feature, not a bug. That's true.
I have to call that out. Yeah, I think I'm contractually
obligated to say that. That is true, yeah.
So, when we think about, you know, a lot of folks
are doing this to build that foundation to
start to, you know, leverage some of these more
advanced techniques. By the nature of profiles, you know, I mentioned a couple of times, we're outputting this table that's
got all of these features that you're defining. We also, from the start, have stored historical
snapshots of that over every run. Oh, interesting.
A lot of folks run this. Like every job run and you
materialize a view that is this point in time? Yeah, so the view that you look at in the UI
is pointing to the most recent run,
but all the previous runs are in there
and you can kind of set the retention
that you want for those.
That sounds fruitful for ML.
Yeah, and so, you know, that was the intention
is we can do this for, you know,
teams to have a good foundation for their ML,
but then we realized, you know,
we know exactly how we're writing this,
we can do some of that ML for them.
And so our predictions product, you know,
kind of sits on top of that and allows you to say,
well, you've got all these users and all of their you know their feature evolution
day over day you know if you can and some of these features are as simple as defining like which
feature would you say is a conversion and then what are the ones you want to exclude like obviously
you don't want to predict on like my first name and my state but you know excluding those like
what are the things that are changing day over day?
And then we can give you, you know, with this, how far of an outlook you want, we can
run models on that.
We'll say, Hey, this is the, you know, either a lead score or turn score, but this is the
propensity to do this defined conversion action.
Okay.
Fascinating.
You go find the training data.
I mean, I guess it's all there, but it's trained on the customer's data.
So yeah.
So we train it on that.
It trains, you know, there's defaults,
usually trains weekly and then runs.
On the inputs to whatever you define
as like conversion or something?
Yep.
Interesting.
And so thinking about what else can we layer onto that?
So we've recently built that attribution, you know,
so you can do first touch, last touch, multi-touch.
And then, you know, eventually that gives you
on a per user basis, what's the next best action
for this person based on, you know, actual trained data.
And then, you know, we're building some other things
around like LTV prediction and category prediction and things like that. So
that's really exciting. And then the other is, you know, everything I've mentioned today is a
batch process. So this runs on a defined cadence, calculates these things, writes them to the
warehouse. Yeah. And so real-time features is kind of something that we're beginning to work on
currently. And, you know, that's the ability to have that daily aggregate run,
which then gives you quick access
to that historical aggregation.
And then you can compare that
to events coming through in real time
and access those either through our API,
which I haven't really mentioned yet,
or to like tack on and send to the downstream tool
so that you have that kind of real-time access.
And that's being used in beta
by some customers right now for like broad detection,
you know, different things like that.
But understanding your users at that point of contact versus, you know, having to wait
for the batch process.
Yeah.
Exciting.
Super interesting.
Yeah.
Wow.
Okay.
Well, we'll definitely have to have you back on to talk about that.
We'll make sure the producer isn't here.
Okay.
So last spicy take, I have to end on a spicy take.
Okay.
So reverse ETL is getting turned into a feature of a bunch of other products. Ryan is at the spearhead of that, obviously. He's changing the entire industry
right now. What is another one that you see getting like a sort of, let's say like cottage,
you know, you know, sort of data explosion VC backed product that is just going to get turned into a feature
my first thought would be probably observability like there's a lot in that space where we're like
how does that not get rolled into yeah there's so many places it could get rolled into like it
could get rolled into orchestration it could get rolled into like the warehouse itself or the data pipeline tools sure so that that's the one that feels the most it's useful but it's it's
you know like i don't say it doesn't do anything because it's useful and you can have alerts and
stuff but at the same time it's not like core like actually like hey this is a data pipeline
what that moves data from here to here that i need. That would be my guess. Yeah, and there are so many tools that have access to the same stuff that could just
build that. Yeah. And then in catalog, yeah, the catalog, it's maybe a similar space too.
Yeah. I mean, both of those things are kind of just like,
it's just a matter of time until Sniflake and Databricks.
Right, right.
They probably already have products or have acquired companies and Databricks. Right, right. They probably already have products
or have acquired companies who have done that.
Yeah, right.
I'm going to stick to my...
I'm going to stick in the same vein as Reverse ETL,
but I think ETL.
I think your traditional like...
Interesting.
Five Trans, Stitch, Hevo, all those players.
I mean, they built huge businesses around this,
but I think we've already started to see it some,
but I think the actual cloud tools
themselves should just start writing that data to the warehouse yep um and kind of cut out that
the need to like have a dedicated like zero etl just directly like exactly they may just have
like data shares basically with all of these like hub spots and like these big providers well i mean
that's kind of what you see with the i mean mean, going back to reverse ETL, like that you see
this with marketing platforms that are just like, we'll just plug into Snowflake.
Yeah.
Yeah.
Yeah.
Exactly.
Yeah.
I agree.
I agree.
I mean, the other thing is for, let's call it traditional ETL, like some point, the big
cloud providers, I mean, they already have tools that can do this.
Yeah.
Right.
And so at some point they just acquire or build or something,
the connectors or this, the zero ETL thing. Yeah. That's interesting. That, and actually
that sort of goes back to like a bundling, right. Where you see, you know, a lot of, you know,
you mentioned, you know, alter X or some of these other, right. Yeah. Yeah. It's back to bundling.
Yeah, exactly. Which is super interesting. Which is back to like the lock-in thing,
which a lot of people are trying to get away from.
And then it'll probably unbundle again, you know, next phase.
Well, I think it'll be...
Actually, yeah, that's a whole other subject.
But we recently had Andrew Lam on the show from Influx.
And this whole idea around object storage
and Apache Eros actually creating some crazy unbundling
sort of on the analytics stack side of things,
which is really interesting.
So I don't know.
We'll see.
Alrighty.
I can't wait to see how long we recorded.
Yeah.
Was it like 25 minutes or?
Yeah, it was more than 25.
It was more than 25.
I think so.
All right, Ryan, thanks for coming back.
Have you been on the show?
I was saying coming back.
I don't think so.
Wow.
Yeah.
Okay.
Thank you for coming on. Yeah. Thanks for having back i don't think so wow yeah okay thank you for
coming on yeah thanks for having me great all right well when you get the ml stuff sorted out
yeah let us all know walk over to my office and when you get ai figured out too yes
there we go okay here's my third mention we needed another mention yeah three ai mentions all right
thanks for joining us keywords yes subscribe if you haven, and we'll catch you on the next one. eric at datastackshow.com. That's E-R-I-C at datastackshow.com.
The show is brought to you by Rudderstack,
the CDP for developers.
Learn how to build a CDP on your data warehouse
at rudderstack.com. Thank you.