The Data Stack Show - 114: Solving Data Infrastructure Problems at Startups and Enterprises with Max Werner of Obsessive Analytics Consulting
Episode Date: November 23, 2022Highlights from this week’s conversation include:Max’s career journey (2:54)Going from a small startup to a big enterprise (11:15)Dynamics of a switchboard operator (17:09)Common threads through d...ifferent companies (20:53)When data is not the answer (26:57)The evolution of CDP (29:38)Data sources to include in a CDP (35:16)Working with event data (37:19)Max’s take on other tools (41:18)The cutting edge in data (43:09)Building your data company in an evolving environment (49:28)Find Max: https://www.obsessiveanalytics.com/The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com..
Transcript
Discussion (0)
Welcome to the Data Stack Show.
Each week we explore the world of data by talking to the people shaping its future.
You'll learn about new data technology and trends and how data teams and processes are run at top companies.
The Data Stack Show is brought to you by Rudderstack, the CDP for developers.
You can learn more at rudderstack.com.
Welcome back to the Data Stack Show. Kosta, today we're talking with
someone we've actually known for quite some time, Max Warner. We met him way back in Rudder Stack
days, and I think they were an early Rudder Stack customer, but it's been quite some time now.
Anyways, he's done all sorts of data engineering type work, both at small startups,
huge companies. And generally, I think just has a very smart approach to working with data and
systems in general. And he's not afraid to share his opinions. So currently he runs a consultancy
and you know, and our listeners have, they've heard past shows with consultants.
I love the breadth of view that consultants get because they get to talk with a lot of different companies
and they tend to have the much more sort of objective,
let's say like realistic on the ground view
of data tooling, right?
Because they're not building it,
they're actually implementing it.
And so I'm really interested to know
what he sees on the ground.
And in particular, what's most exciting to him in terms of new technologies that are coming out.
So that's what I'm going to ask.
How about you?
Yeah, absolutely.
I think what is especially interesting with Marks is that he's a person who has experienced how things get done in startups startup in a huge company like Warner Bros.
And also as a consultant, like to many different companies of many different sizes.
So he's like the ideal person to try and find out like common patterns and how these
patterns might and things that are like different because of like the size or like the company.
So I think we're going to have like a super interesting conversation with
him and we're going to learn a lot.
David Pérez- All right.
Well, let's dive in and talk with Max.
Max, welcome to the Data Stack Show.
This is a great treat for me because I guess we've actually
known each other for
multiple years now,
which is pretty cool.
So that's great.
I've talked data stuff
for a very long time.
So excited to have
a dear friend here on the show.
And yeah, why don't we start out
with where we always start out,
which is you giving us a background.
So you've been a practitioner and now you're doing your own thing.
So give us the story.
Yeah.
I mean, thank you very much.
It's a pleasure to be.
Yeah.
It's been a number of years, like 2017 or 18 or something like that.
Yeah.
Yeah.
So I worked together and well back at that time, you know, I was, you know, just working at a company, their customer data infrastructure, which, you know, at that point included segment that included rudder stack.
Funny how these things happen.
Yeah.
And yeah.
I do have a quick question because I know I'm going to interrupt you, but what was your title back then? Which, and I'm interested to know, because a theme we've had on the show has actually
been like how roles are changing.
And I know what you were doing back then, which we would now call basically like probably
a mix between like data engineering and analytics engineering and maybe even marketing ops.
But I don't remember what your actual title was.
Oh, that's a good question.
So when I started at that company, I started there in a segment rotation.
I think it was performance marketing manager or something to that effect.
I know, right?
Because I mean, these things tend to always start up under marketing.
Yeah.
Right.
Cause marketing are the closest people to data usually right there.
The ones in the dashboards and analytics to do their thing.
It then eventually as that position grew and it was your analysts and
you're selling the team data engineer specifically.
I became the data operations manager.
Okay.
Because it made more than that.
Okay.
Sorry for the interruption.
We can return to that.
That's a nice little walk back in history.
Okay.
Yeah.
Yeah.
So yeah, I did that for them. And as that kind of became a stable status quo, I, you know, I was happy to be there, but got approached by a recruiter to do another segment implementation.
And that was about, you know, two weeks after I had just closed on a house and I tried to tell him it was not a great time to switch jobs.
So when she replied, it's like, well, you got your five real mortgage locked in.
Now is the time.
So I talked with him a bit more and that ended up being about a year and a half long, kind of almost like a contractor position to do a segmentation.
Because that was kind of at the start of COVID.
So a lot of companies were hiring, but contractors were okay.
And yeah, so I kind of went out there on my own, but it was, you know, like 40 hours a week.
So it wasn't really kind of doing my thing.
And was that at a, that contract, is it a, what kind of company was it like?
That was Warner Brothers.
Oh, wow.
So it was, it was a, okay.
So we're talking like a serious, that's enterprise, enterprise level implementation.
Exactly.
Right. So whereas before it was a, you know, like a small medium business, a B2B SaaS
company, less than a hundred employees, that's going right up for two.
Yeah.
And going over to doing that for Warner Brothers for like specifically
their, like their video game division or, you know, your Batman's and
Mortal Kombat's and those kinds of things.
Cause obviously those games generate a ton of telemetry data, right?
Oh, gaming is crazy.
Yeah.
Potential that you can have there.
And also just going to do X and prove nothing just to X game floor
improvement, right, which are the levels that people keep struggling with.
Done. whatever.
Um, and as kind of the, that contract was starting to like near its end,
I kind of freed up a little bit there.
I was either going to look for full one in somewhere again, or, you know,
kind of just continue doing this kind of thing as, you know, as a freelancer, as
consultant, whatever you want to call it.
And well, that's what I ended up doing.
So that was about like the beginning of last year, but really kind of went down
fully on and yeah, it's, it's business is good.
It's been, it's growing and I have a lot to do.
I have two employees now, which is crazy concept, you know, pay people
with your own money is scary thought.
Yeah, it's not, it's, it's, it's all good.
And I kind of found somewhat accidentally a bit of a niche in
kind of the types of clients.
Oh, interesting.
Yeah.
Because the, like, I mean, having done startup and B2C and, you know, like enterprise where it's, you know, very much B2C, right, directly to the consumer.
Yeah.
I found really like that the underlying problems really
didn't change all that much, right?
The data collection, some of this data modeling in terms of, you know,
to give the end users, right?
The marketing theme, go the, especially at companies like WordPress where
there's like entire divisions that are only doing, you know, mark like, you
know, or like unit or
product analytics or whatever.
Yeah, exactly.
But it's, it's kind of all the underlying same, same problems, right? Like people go interact with a property of sorts, right?
That a website, an application, and you know, there's some sort of user funnel
there, a, you want to get them to convert.
You want to see what happens to them after they convert so that they keep coming back.
So I client base is all over the place.
I've done, you know, used car sales with a company called Canada Drives.
They are an interesting, great project there. And a couple of marketing agencies or agencies in general, I've worked with
that kind of websites or mobile applications for their clients that are,
you know, about sports and tourism, right?
It's a hotel book.
It's completely all over the place, but still the underlying problems are the same.
But what kind of stood out to all of them was, and that's kind of that niche
that I'm interested in, they kind of all, for the most part, didn't really have
the Canada Drive as the exception.
They had a really good data stack to start with.
For the most part, companies tend to not have customer data infrastructure in the
way that the four of us here would think about it, right?
There's some data infrastructures.
There is a SQL database somewhere that powers their application or, you know,
oh, and oh, there's a whole bunch of spreadsheets, we'll export things,
from one tool to the other and spreadsheets and more spreadsheets.
Yeah.
Yeah.
And kind of moving them from that over
to something a little more sophisticated
or as you would call it,
kind of the growth stacks or starter stacks.
Yep.
Getting that data layer in.
Super cool.
Okay.
So one thing that you said,
I think is really a really helpful insight,
which is that, you know, you sort of went from like a small, medium, like B2B SaaS startup, and then you've worked with a number of other companies, and there's this common thread of a similar problem, which I think
is a really helpful insight, right?
Like when it comes down to data and sort of what you're doing with that data, the initial
problem set is very similar, right?
The context may be different and the particular data points may be different, but like, you
know, you have users interacting and they need to do X, Y, and Z, and you need to track that to optimize it.
I'm interested to know though, outside of that sort of similar problem set,
what were some of the biggest things you noticed as a data practitioner going from like a small
to medium sized startup to a company like Warner Brothers?
And not even necessarily, I mean, in general, sure, but, you know, not even necessarily in terms of the data stack, but like, what were the big things that you took away from
that making that jump?
Because that's a pretty big jump.
Yeah, I'd say the biggest thing that kind of stood out for me is that, I mean, the people
in a startup, you know, they do 10 different things, right?
Like when I was performance marketing manager, you know, and I had to do that implementation of segment, but also manage the, you know, like the redshift, you know, instant.
I'm sure that that doesn't run out of space.
Performance marketing manager managing redshift is hilarious, by the way.
Yeah. And that's something you jump over to something like, you know, Warner
Brothers, and now all of a sudden you have very specialized people, right?
Where there is, you know, an entire team of people, like multiple people
that their only job is to maintain their Redshift clusters and also it's
clusters that, right, with different kinds different kinds of you know notes in them for
more compute or more storage and they customize the what's it called the work queue scheduler
or whatever to to work for their particular kind of workload and it's just like it's you know the
skill set is not as broad on the skill set but it it's extremely, so that's, that's, I was
at there's multiple people whose sole job is to email marketing for like one day.
And there's another team's sole job.
It is to do, you know, kind of in-app purchases and seeing what people are,
you know, buying, or what users are interacting with from, you know, in-game.
It's very, it's very specialized.
And of course, a lot more, you know, red tape, because once you get to a certain
size, it's just more process involved.
Totally.
Okay.
So for you as a data practitioner that are like the work that you were doing,
kind of, I would guess, at least by the sound of it, you are probably sending data from the game usage to like multiple destinations to like enable email marketing or like see the Redshift cluster or whatever.
So did you have to use, did you have to interact with all those teams?
What was that like from your vantage point?
Because you were sort of working on a layer of the data that touched
multiple things that you mentioned.
Yeah.
It, yes, it was certainly a lot more kind of communicative and kind of
scheduling and interacting with that, coordinating things with these other
things than it would be at the startup, right.
Even a lot less in terms of, I wasn't designing tracking plans there.
Because for the most part, those things were there and obviously like the game
developers are working on these things.
They know what, you know, is happening where.
Uh, yeah.
Yep.
Yeah.
So there, like the, there was some coordination with, you know, like these
kinds of folks and obviously the redshift people and the VPN people, because there's
security considerations for this kind of pipeline and it like the main focus of,
of kind of the pipelines and stuff that I built there was to,
you know, integrate kind of the existing different systems that they had with what they wanted to do.
Right.
And then second bite, a part of that, but also, you know, big tools, right.
There was Apache Airflow, you know, like multiple sequential tasks that were building data sets and exporting
from Redshift or importing into Redshift.
And there was really kind of this, almost like the switchboard operator type things.
And okay, we're grabbing this from here and this from there, and this needs to go
over here because when you have so many people that are so specialized, the biggest problem
that you often encounter is that people don't know what is available as far as
data or even like pipelines.
Yeah.
Right.
They, when you can even at a company that science, when you can like blow
people's minds for saying, this is like, oh yeah, no, like we could totally tell you if
this person has played this other game.
They were like, you can, that's amazing.
Right.
It was like, yeah, we can, right.
Because with you are so kind of focused on your role and if you know,
and person or niche there, yeah.
And so having, you know, people like me there, who's basically sole job was to see, okay,
what are your data problems and how do we solve it with what we have?
It was interesting.
David Pérez- Super interesting.
Okay.
I'm going to ask one more question and this is going to be kind of a lead in to
hand the mic off to Costas.
The concept of a switchboard operator is really interesting to me for two reasons. Number one, I don't think we've ever heard that on the show. But I think it's a very helpful analogy because I think that, you know, depending on the role, like if we think about the data team, like a switchboard operator is certainly like a very appropriate analogy to describe like the function of the data team in general.
But I do think it's actually
interesting
to describe like part of the value
they can provide
as you just expressed.
Also, I think it's interesting
because switchboard operator
is,
you know, from a technical perspective
is you get into the area
of like orchestration, right?
Like,
so can you speak a little bit more to that?
Like when you think of switchboard operator, like where does the human end and where does the like orchestration begin?
Well, I mean, I'd say in an ideal world, the human part ends at where it becomes wasteful
for a human to do the thing. So that would be actually doing like the data export and your own kind of like manual data processing and reporting.
So that's where you would want to it's doing the same kind of things based on the same rule sets very, very often over and
over again. That's where you should rely on orchestration. The human part should really
be used for, you know, what humans are good at, which is creative problem solving and kind of
making these, you know, connections. We say, okay, so we have payment data here and Striven, things like that,
or Intercom for messaging.
All people go into these games or this app or this website, you know, really
just kind of work with all those people that, you know, want to do things and
kind of really just go on and say, okay, what, what do you want to do that you
can't run, say like, oh, okay.
So I always, I don't know when my customers are renewing their plans.
And so I always have to like manually check either, either I have to ask our
developers and they pull it out of the application database or to go into, you
know, Stripe or whatever myself saying like, okay, and then what if you were,
oh, I live in Breeze Salesforce.
It's like, okay.
So in other words, if I can get the renewal date up to date automatically
in your Salesforce, like account object for, for that customer, you know,
would that help you like, yes, that would be great.
So really kind of like, that's the way that switchboard comes in.
Right.
And say like, okay, I have all these different kinds of data sources
or an end or destination. Tools tend to be both. Like that's where that switchboard comes in, right? And say like, okay, I have all these different kind of data sources or
end or destination tools tend to be both and saying, yeah, we can bring that over.
It's helping them do their jobs because then they start thinking about other things they can do with it.
I was like, oh, if I have this and I can build automated workflows off of that.
Okay.
So you get that renewal date in my marketing automation tool and say, hey, you know,
you're three months out from renewal.
You know, we put them into a certain cadence.
Our prices are going up, you know, renew now and you lock it in or whatever the case may be.
And that's, I think where the human part of the ad's value, because, you know,
you know, you in control C control V in spreadsheets, it might be necessary
at times, but it shouldn't be, you know, entire people's job all day, every day.
Yeah, absolutely.
By the way, Eric, I think there's at least one company related to data tire people's job all day, every day. Yeah, absolutely.
By the way, Eric, I think there's at least one company related
data with the name switchboards.
So is there really, I didn't know.
I mean, I don't know if it's still out there, but there was like a couple of years.
Oh, interesting.
So yeah.
Uh, we should start domain squatting either way.
Ah, yeah.
I think we're going to make money on them.
100%.
Probably one of our listeners is already doing that.
I mean, even like, you know, if you look at like the main overview of something like, you know,
Rutter stack or, you know, in particles, hopefully dashboard.
What's the first thing you see?
Sources, you have destinations.
Alex Rauchmanis- Yeah.
David Pérez- All kinds of sprinkling lines go.
Alex Rauchmanis- Yeah.
David Pérez- That's your switchboard, right?
Alex Rauchmanis- Yeah.
A hundred percent.
So Max, you have seen like many different environments, right?
Like from a startup to Warner Bros to run your own business and like helping other
businesses like to implement like their data
strategies what is what was surprisingly common for you in all these different sizes and companies
yeah surprisingly common
okay especially like the clients that I've been working with since full-time,
a consultant for kind of dealing with these, they tend to call
them the free to paid migration.
Oh, we have spreadsheets at Google and just, yeah.
The most common thing that, well, that was not surprising was people say like,
well, but you know, that costs a lot of money, right?
I need talent to, to run this.
It could, the tools cost money.
Like what I have right now, it's working and it's free.
And because for some reason, a lot of people tend to think that
don't draw salary.
So the three people that you have, or four or 10 who spent non insignificant
amount of time importing, exporting spreadsheets, you're paying those.
Those people either could be doing more kind of value add fit because moving
data from agent doesn't add value.
Making decisions based on it being that can add value.
So that was really a very common, surprised me a little bit, but not as much.
What really surprised me was that a lot of companies make incredibly important decisions for everything, for
product development, to marketing spend, to even like hiring decisions based on
like, oh, we need more support people or on incredibly, not even imperfect data,
but basically data that has the same accuracy level of throwing a dart at a board.
Right.
So the, the underlying problem there being things kind of, to some
degree, data literacy tech, tech web literacy, where they go like, oh,
what do you mean ad blockers make it so that, you know, mixed panel or
Google analytics or whatever, isn't working or, you know, privacy features.
That I don't
see how that this person did this thing.
And if they think they made, you know, good data driven decisions, like
you're supposed to go into things like a CDP implementation, take some
precautions to make sure that we actually are capturing data accurately and
completely, and all of a sudden, you know, their, their funnels look some precautions to make sure that we actually are capturing data accurately and completely.
And all of a sudden, you know, their, their funnels look drastically different.
And they're like, oh my God, we were going to spend a million dollars on the next six months, this area to develop this out and nobody's using it.
Yeah.
You know, whereas everybody here is like the product decisions are based on anecdotal evidence from the support reps.
After the vocal minority problem, right?
Playing along very loudly saying, oh, this is the worst thing.
This is the most important thing to fix right now.
And you look objectively into data and say, yeah, okay.
3% of my revenue accounts have expressed that as a problem.
Yeah. And how, I mean, how long does it take for a company to start like
figuring out that something is wrong there? Right. Because there is something wrong. I mean, okay,
we are using data to make decisions, but if the data is wrong, like, I guess, for how long you can be likely, right?
Something, like, at some point will go wrong.
So how long does it take?
And what use of, like, companies is, like, the first reaction?
That's a good question.
It takes longer than you'd want or that you'd like to admit because
even when they start feeling a pain point, right, and they're like, oh, my,
my, you know, my performance marketing interests are, you know, are, you
know, are so busy all the time, right?
Think, okay, yeah, that's just, that's just the way it is.
It usually takes then even longer to, to realize that, well, maybe if I need to
hire another analyst and another analyst that kind of really just kind of cloning
people to do the same thing, that there may be a better way to do that, you know,
is, is something that takes, you know, very, very long for, for companies to
realize it's often you'd think it'd be this like big thing.
Oh, we did something horribly wrong.
We're at this big catastrophic event.
Are we need to reevaluate?
No, it tends to be, at least for my experience on more on the SMB side of
things, which is a lot of the clients that I work with.
Just this long drawn out process where at some point they see, okay, we're doing things with
data, we're collecting data, but I, am I doing it wrong? Because like it feels a lot more painful
than it should, especially when I look at,
you know, marketing materials for analytics tools, or even if you look at like a demo data set that Mixpedal gives you, right, you know, this detailed, beautiful
information, like, oh, we don't have that.
Why not?
That tends to be those, those quick moments.
So when they start kind of to look at potential solutions and then compare
the demos or like kind of the good state of things in those tools with what
they're currently, if they have an issue.
Henry Suryawirawan- Have you ever had the experience of like going to
a customer who asked like for your help?
And you reached the point where you had
to tell me that data is not going to help you in what you are doing because
we tend, especially like for us who are, you know, like we are working with data,
right? Like that's, that's our job.
That's how we, like what we are, like, we take it for granted that like data is
like something super important, but data not always is able to give you like the answer that you
are looking for.
Right.
So have you ever like experienced that as part of like consulting, like having to make
like the customer realize that no, it's not an oracle that's doing right to give you, you know,
like the exact thing that you are looking for.
Yes and no.
I mean, on the one hand, no, because, you know, my consulting practice is built around
customer data infrastructure, right?
So telling people data is not the answer.
It's not exactly the best sales pitch for me.
But what I have seen that kind of goes into that is when people want to prove
negatives, right, which is the inherent cause of it, so that's the thing
that data can't help you do really.
When it's, oh, we want to see, especially the why behind it.
So it's like, oh, you know, have a great customer funnel and we want to know
not just where, which is the easy thing,
where do they drop off, but why are people dropping off there?
And it's like, how can our event tracking tell us why they're dropping off?
Those kinds of the whys, which is something that really data can't tell you,
could get experts in that, you know, from UX or whatever that, that kind of look at that step and say, that's probably where you lose.
Well, you can run that e-test to confirm whether or not that is the case, but that's kind of the closest as far as like data doesn't get searched.
Why is something happening?
Henry Suryawirawanaclapurnabhoyineyatheb.com.
Henry Suryawirawanaclapurnabhoyineyatheb.com.
Yeah, yeah.
Makes total sense.
All right.
So, you're going to be mending customer data platforms.
Like the concept of CDP has been around for a while, right?
And it has gone through a few iterations and from what it seems there are more coming.
Can you like take us through like the evolution of the CDP and let's start
with that and then I have like a few more questions about like specifically
about like the CDP infrastructure.
Yeah.
I mean, if we go really far back, let's say, oh, really sorry.
Let's say, you know, the early 2010s, you know, customer data platform, it really wasn't a big trip there.
But what you had a lot was kind of these third-party data platforms, right?
Where you have your tracking pixel or whatever on a website.
And it then, you know, with like third-party cookies and everything, it tries to be able to get other data about your visitors, right?
By the MasterCard data sent, right?
And then it will tell you that, like, oh, you have income range or whatever for your customers or for your visitors.
This range and trying to kind of do advertising based on.
I think that's kind of where that really started.
And it, so really third party based and it moved then slowly into more first
party platforms where usually just a, a tool, a SaaS tool of some sort, basically just all of a sudden as like
a concept of, you know, user data and user attributes, you know, you, instead
of just saying, Hey, this like in Google analytics, instead of just track events,
you know, my customer through my conversion, now I all of a sudden more
with users individually.
Right.
I can look at all the specific, this user has changed my
website or my app accordingly.
So from that, that's kind of where it more moved to.
And I think now we're going more into a space.
I mean, there's kind of two schools of thought.
One is kind of a little more open approach, where you say a customer
data platform is more of a really good way to kind of, you know, like
ingest and export data.
We talked before, like the orchestration side of things, right?
And the more closed approach would be, you know, give us the data.
We're a silo that contains all your data,
and we can kind of do some of these computations for you, and we can build
these user lines for you inside our platform, you can do this based on your
first party data inside our tool, be it in product analytics, be it a, you know,
advertising tool, or even, you know, a CDP itself, you just store data in it.
Henry Suryawirawanacik, And what would you describe as, let's say, like a
reference architecture for a CDP today?
Like an ideal CDP infrastructure for a company, like an SMP, let's say company.
Because I get like, these things probably like change also with the size and like the complexity of the organization
that you have like to implement them.
But let's consider, let's say an SMP that's pretty agile.
They just start, so they don't have something already in there.
So there's a lot of freedom choosing what to use.
So tell us a little bit about that.
Yeah.
I mean, it's the core requirement for that kind of company, for CDB
specifically, they would be, let's say, your kind of standard, you
know, implement one's approach, right?
Data collection is instrumented via the CP and it then handles your, you know, your destinations, right?
It makes sure that your, it's good to let your house, if you have, and also
handles kind of client side delivery of things if we're talking in the context
of a website, you don't have to make sure that your, you know, intercom knows that
this person has logged in and is, you know, intercom knows that this person has a lot in it is, you
know, Joe Smith, as opposed to random anonymous user X.
I think that's kind of like the core piece of it really.
It really depends then kind of the complexity required.
Do you need a dedicated data storage solution, either data lake, data warehouse, or a blend.
Do you need to have ComPlanX data transformation abilities inside the CDP?
Especially if you're just building like something from scratch, new, or you can,
data can influence the architecture of that product, of that, you know, whatever that asset is building or selling, which means that you can, if you influence that from the data side, build with data in mind, you know, and not from a perspective of like data decision making, but making sure that the way are structured, kind of make it easier for you to get access to work.
You don't need that much actually that in the CDP, right?
Because if you build it right from the source, your CDP needs to do less work.
Whereas in an existing situation, existing company, CDP might have to do lower
head lifting because you get what you get based on how the application is built.
Stas Miliuszak Yeah.
Yeah.
So I mean, outside of, okay, the most basic and important piece of information,
which is the user interactions themselves that you need to track in order to have a CDP.
What other data sources do you think that are really important to include in a CDP as
soon as possible?
As soon as possible, I'd say data storage.
Because you can do that as dirt cheap as you need it to, right?
Something like an Amazon S3 data lake, even for, you know, hundreds of thousands of events, like per week, it's gonna, it's gonna cost a few dollars.
So that's, that's not a big investment.
It's easy to set up.
And that way you can immediately kind of start collecting this nicely standardized schema, you know, data for earlier data collection.
I think that's kind of the first, because you can always grow from that.
Later, if you want to have to, like I have complex data modeling requirements.
Okay.
Put Redshift on top of your S3 data lake or, you know, whatever.
Same if you're on the Google cloud, right?
You know, BigQuery can support, you know, bigger things as it grows.
But I think that's kind of the first thing that you really want to get into once
kind of your core user interaction is covered the next after that is what,
what I would call catalog data, which is, you know,
setting up ETL pipelines for, well, your catalogs of data, these other tools
that you use, your CRN, your billing system, your ticketing system, usually,
you know, unless it's a very whole group or obscure tool. There's a variety of DL providers that can just take whatever is the state of
that tool and move it into that warehouse or data lake, right?
And then you can all of a sudden run analysis on ticket volume or
what that you couldn't do before.
Henry Suryawirawanacke...
Henry Suryawirawanacke... Makes total sense.
And you mentioned a couple of times the term complex modeling needs.
So what kind of processes we usually, what kind of modeling we usually have to do over these
event data? Because event data, it's not the most complex type of data out there. It's not like you have a schema that's super complex.
But I have a feeling that there are a couple of things that need to happen on top of this
data that require quite some processing.
So what are the first steps in turning these raw event data into something
that is more, let's say useful for someone who wants to do like analysis?
Alexi Vandenbroekke Yeah, for sure.
I mean, I kind of tend to classify these into like three buckets and I call it
like, you know, bronze, silver and gold level data, right?
Where bronze is your raw sewage, right.
That's, you know, that's the data that you're pulling out of some legacy legacy
system, or, you know, just, if you're just ingesting a webhook from some tool, right.
Where you don't do any process or silver would be low, more kind of standardized
things.
So that's something that, you know, rudder stacks, like those throw
into your data warehouse, right?
You're very standardized.
Okay.
Every property of my track call becomes a column in the table of the same.
And that is good.
And the ATL data falls into the same category, right?
You just get a certain fixed schema and you can run analysis, but generally it's
extremely long with lar lot of pasting.
The goal data is what I would call things that you can directly, you know, you can
hand that to the board, you can hand that to the executives, right?
Those are things that power the dashboards and scorecards, whatever kind of leadership that they don't care where it comes from or they want to be.
Okay.
Last week I got this many people into the funnel from these different work engines.
So the message, the modeling there is not necessarily like that complex, right? It's often just doing some basic, you know, trait computation.
Has this, has a user interacted with a certain feature that you're
tracking, you know, in the last seven days, have they, have they logged
in in the last seven days, you know, it's like making these booleans or what's
the, you know, if you take the video game, right?
So like, what's the, the average, like, you know, dollar amount that the
user has spent on their purchases.
Right.
Those are not complex questions, you know, to, to the answer, but they're not
necessarily readily available from just, you know, your, your silver data.
So that's, that's these kinds of queries that you build there.
The bulk of where your more complex data.
I mean, it goes way up from there if you have complex resolution problems.
But usually the win to start out with is just lines, you know?
Yes.
This user has interacted with this feature.
From rolling up to the account level.
It was how many, what's the feature adoption rate for, for
feature X on this account?
Yeah.
Yeah.
Easy math or SQL.
Yeah.
Do you see like the need of using something outside of SQL for that?
Like using Python, for example?
Like what's the tooling that, I don't know, you prefer to work with, right?
And we see also these days more and more tools that are a little bit more hybrid.
It's like you have notebooks, like Chex, that you can mix Python together with SQL, so you have like notebooks, like Hex that's, you know, you can mix like
Python together, like with SQL, so you have like a lot of flexibility there.
But what's your your experience with that?
And what's your preference?
Because at the end, it's also like a matter of preference.
Yeah, for sure.
I tend to try and stay away from notebooks.
Not that I dislike them.
They absolutely have the use case for kind of my, my customer base.
SQL is often enough because at the end of the day, we want to have these
finished data sets that we could plug into visualizations or that we can
plug to, you know, reverse ETL to send them to, to other, you know, CRM and
ticketing tools and kind of output a standard
tables and stuff like that as something that is super, super easy and SQL.
And as I just said, the complexity is generally not in measure we high.
So taking a Python notebook and going into, you know, pandas and data
frames is, is just overkill at that scale.
Yeah. Yeah.
Yeah.
Makes sense.
All right.
My last question.
So before I give it back to Eric, you use, like we use in our conversation, like quite a few terms that are usually associated with what we call like the
modern data stack.
And this data stack, or anyway, this idea has been around for a while now, right?
So I'm sure you've seen many different products, different techniques, methodologies, best practices.
I'd like to hear from you what you have, based on your experience, it's really working as part of like
MotorStack, let's say it's here to stay.
And in addition to that, where do you feel like there's, you are excited that more
stuff is coming, like you are expecting like more innovation to see like happening there.
Right.
I mean, on the modern data stack, yeah, the buzzword term, I think the worst aspect of
the modern data stack and associated marketing is, for me, modern data stack is a holistic
approach to handling your data, right?
It's not a tool.
It's not rudder. Love you, but it's not, you? It's not, it's not a tool. It's not rudderstack.
Love you, but it's not, you know, it doesn't make a modern data stack.
It is powder because you need to have good event data ingestion and
transmission and TL capabilities, but it's not just a tool itself.
So when tools brand themselves as like, oh, you know, like our tool is like
the modern data stack, it is, you know, all the data that you could possibly.
I get, you know, a little doubtful there because no, the more complex the tool
gets, the more use cases it tries to cover the, you know, like
Jack of all trades master of none problem tends to happen where you get
either, you have problems that your use case doesn't work because the graphical
user interface of this tool doesn't let you click this certain thing that way.
So where I want to see things going, where I see a lot of things going, is using existing standard open practices for data modeling, SQL, or if it's a little more heavy wide swath of people that have these skills.
They know Python, they know SQL, they know JavaScript or whatever tool of choice.
And not necessarily like kind of locking people into proprietary, you know, tool that, you know, proprietary, you know, truly that, you know, has either a worst
taste of their own, their own programming machine, you know, look at something
like Salesforce where you have Apex, right?
Yeah.
It's job, you know, Hey, you don't need to call that, but it's
Java with some extra bits, but it's still, it's all thing.
And it's very kind of closed off.
Whereas, you know, dbt on the other hand, yeah, you build your data
mongs with SQL because it's data and you can trap with it with SQL.
That's what, you know, makes sense.
Henry Suryawirawan- All right.
I, I agree with you.
Actually, I was thinking, you know, there's like a lot of, there is part of, let's say, the data industry and technologies that are like super open.
Like anything that has to do with the database, usually like, okay, outside of like Snowflake, but most of the database systems out there are like open source, right?
There's this tradition that's, okay, you can't really go to market within your database without going through open source at the beginning.
But I think there are also many other parts of the data stack
that they feel like extremely closed source, let's say.
Like you see so many products out there
that they are not open source
and like whole categories of products
that are actually like closed source.
And I'd love to see like more openness too.
I think it's also important as the developer experience becomes like more
and more of an important aspect of building a product because it's not just
users that you have that are interacting with the product, you have developers also.
So yeah, like I hear you and I totally agree with you.
Like the modern data stack needs to be also a little bit like more open, I think.
At least in terms of like the source code.
David Pérez- Very cool and open standards, right?
And open standards.
David Pérez- Not too bad for Salesforce too much.
It's a great tool, but you know, a very con standard concept. One of the practices, web hooks, you know, which super, super useful.
Salesforce can do web hooks.
It's very complex to set up and it's by default, unless you had extra third
party into your Salesforce is XML only, which is great, you know, 10, 10, 15
years ago, but not great with all your other tooling these days, you know, 10, 10, 15 years ago, but not great when all your other
tooling these days, you know, either requires or just works better with, you
know, Jason, openness, interoperability between tools, right?
Because I mean, that's, that's my thing, right?
Customer data infrastructure.
That's what my company does.
Making sure it goes from, from A to B.
So the more open these tools are, especially X4, like getting data out
and getting of these tools reliably, it's easy to use is my biggest
kind of wish for the industry.
Yeah.
Makes a lot of sense.
Eric, what do you think?
Oh, well, I'm going to shift the focus for the last question because what we are discussing is the tip of the iceberg in the conversation around how data companies make money off of artificial scarcity.
And that is a whole other episode. And we are really close to the buzzer, but I have a lot of very strong feelings about that.
That's why Salesforce's ABI hasn't changed, right?
There are very, very lucrative reasons for that.
But that subject aside, which we definitely need to tackle on a future episode because it's such a good one.
Max, in our last few minutes, this has been such a helpful conversation.
Let's look to the future.
So you've given us like a really good picture of the current state,
you know, sort of challenges,
like ways that you're solving it.
What are the, you know,
so you mentioned open standards,
but could you get a little bit more specific
and talk about maybe some new technologies
that you've seen or even like ideas that you've seen that you think are going to sort of create not the next SQL optimization, you know, but the next sort of stepwise change in like the way that you would advise companies to build their customer data infrastructure that's like pretty different from what you're doing today.
Especially for customers.
Okay.
I mean, there's two parts.
So the first one, as far as a comparison of kind of more close versus more open
standard would be, it should take something like Zapier and compare
it to a company called Pipe.
You know, both principally do similar things, right?
Triggers, data, some changes, sending off some, right?
These kinds of things.
That is interesting.
Conceptually, like, they're both like almost identical in functionality as like a pipeline.
Yeah, but they work very differently.
In Zapier, you have extremely little kind of control.
You can't really change your data.
So I thought, oh yeah, you can create a conversation in this tool when this
incoming thing from the other two packets and it's very, you know, one to one.
Right.
And pipe dream.
Yes.
Pipe drive.
That's a CRM.
Pipe dream, you know, works similarly, except that you can just define as many steps as
you want and you can mix, you know, kind of pre-built data transformation.
Oh, now I want to run a node JS function.
Now I want to run a Python function after this, based on certain outcomes, you can also
like branch off.
Right.
Hmm.
So similar concept, but extremely different approach to, to
then actually working with it.
And one being a lot more open because it could just do a lot more basic things
with your own script, Python, whatever your tool of choice is.
What I would advise clients to do is build your thing, application, your site, your web
app, whatever, with data in mind.
I mean, it's easier said than done.
But basically, don't kind of mix and match things into the same kind of database table that does.
Try and separate out your concern, right?
Because there's going to be someone that has to work with the data that
isn't application code itself, right?
Be that a ETL process that you need to get some things out of there because
you just can't collect it otherwise or, and it was really kind of
with some, you know, data processing, you know, or data consumption, you know,
it, it might make your, you know, application a little more complex because
you might have to, to work with it in terms of calling
an API, like my own API for my tool.
But then that's a standard thing that can be reused elsewhere.
Why is P for example?
Right.
So one example for that, you know, we're getting near to the buzzer and I'll make
it short is let's say a company has a very manual process of doing like sales, right?
They go into their payment tool and somebody emails them and says like,
Hey, I want to place an order for this.
They go into a sale, a payment processor and create an invoice, email that off
to the customer and at some point it's marked as paid Stripe or whatever.
Then they go into their, their admin backend and, you know, put in
the shipping order or whatever it is.
Well, you know, if you built it so that this kind of creating the
shipping order or making the invoice as smart as a kind of internal API,
well, you could use something like a CDP where Stripe can send a webhook, listen for like those types being marked as paid.
And you send that, you call your own companies.
And now this person that had to like manually go to like five different pools to make a purchaser happen doesn't. And they can focus on, I don't know, better customer service, you know,
end-of-the-month upsells, whatever, or they can simply, you know,
make, you know, 10 times the revenue because, you know,
they don't have to manually process data so much.
Yeah, no, that's so helpful, Max.
And I think, you know, really when we think about building some sort of digital experience, you know, if I had to summarize what you're saying, it's, you're building it for an end user, you know, for an additional user, which is the internal
consumer of the data that the application produces, right?
And really, there are probably multiple internal, not probably, there are definitely multiple
internal consumers of that data, right?
Be it, you know, marketing or product or customer success.
That's exactly, that's a great way of putting it, right?
Because you have, the business models have changed, right?
It's over the last year or so.
What caused this when we talked before, right?
Just saying that, you know, you were going third party data, then it slowly
transitioned into like first party data.
Now, a lot of companies just have a lot more first party data where consumers of
that data are not just
the customers themselves, but these internal folks that you can, you know, you only, you only care about kind of making it work for your customers.
It have a horrible or like time consuming experience for your needs.
You know, like that's either inefficient or you're going to like staff retention problem or onboarding takes six months because somebody has to explain.
Oh, three pages off, like for, you should think that they like work plus.
Also you log into this tool and then you export this and you take
from a, a, B and six or whatever, you know,
David Pérez- AABB, you know, 42.
But the other thing I would argue argue and this is probably a great point
to end on to use your words at the buzzer is that that ultimately results in a bad experience for
the end user right like there may be initial gains there but ultimately like the people who
are trying to make that experience better like are pretty limited because they have limited visibility.
Max, this has been such a great show as always tons of insights.
Congratulations on, on having employees and running your own thing,
which actually before we sign off,
if we have listeners who want to talk with you about anything, you know,
maybe some work, but also, you know, anything you talked about in terms of data flows or solving problems, where can they find you?
Well, yeah, you can find me at obsessiveanalytics.com.
That's, you know, where you can get in touch with me there.
You can, all those things exist there as well.
Obsessive analytics.
That's what I do.
Great. Obsessive analytics. That's what I do. Great.
Obsessiveanalytics.com.
Well, Max, thank you so much for your time.
And I will have you back on again soon.
Sounds great.
Can't wait.
Costas, I think one of my big takeaways is the switchboard analogy.
I know we talked about that on the show, and I know I said during the show that I liked it. But I just thought that was a really helpful analogy
in terms of describing a unique way that someone on a data team can provide value.
And it's that, sure, they're making connections between technologies, but the intuition of the
right connection to make, depending on
the circumstance, you know, can play a big role.
So like Galan, I was just imperfect, but I, you know, I'm sure that's been used
before, but I hadn't heard of it before.
And for me, it was just a really great description of sort of one of the like
value add parts of a data role.
Yeah. What I found, like, from my side, like super interesting is how common some
patterns and some issues are, regardless of the size of the company.
And that's, that's, I think is like very exciting, I think, for the industry.
Because it means that like there, there's probably a way like to address problems that can scale like from a startup
up to a large enterprise in a way.
So that's what I'm going to keep.
And that's something that like, obviously like personally excites me a lot as I'm
part of this industry and work in this.
So yeah, that's what I'll keep. And I'm
looking forward to have him again, to be honest. I'm pretty sure that's the good thing with
consultants. They always have new stories to share. So let's have him back to the show again soon.
Let's do it. And yeah, thanks for listening. listening as always subscribe if you haven't and we will catch
you on the next one we hope you enjoyed this episode of the data stack show be sure to subscribe
on your favorite podcast app to get notified about new episodes every week we'd also love
your feedback you can email me eric dodds at eric at data stack show.com that's e-r-i-c Thank you.