The Data Stack Show - 54: The Center of the Modern Data Stack with Neil Rahilly of Mixpanel
Episode Date: September 22, 2021Highlights from this week’s conversation include:Neil’s programming hobby turned into a career and how he cold-contacted Mixpanel for a job (2:28)Lessons learned from nine years at Mixpanel (5:05)...Defining product analytics (8:06)How Mixpanel has evolved into the product it is today (10:56)The importance of Mixpanel’s real-time analysis (19:52)Looking at Arb, Mixpanel’s own arbitrary segmentation database (23:44)The business impact that the rise of the cloud data warehouse had on Mixpanel (34:56)Sub-second latencies and real-time use cases (49:05)Career advice from Neil (1:02:02)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Transcript
Discussion (0)
Welcome to the Data Stack Show.
Each week we explore the world of data by talking to the people shaping its future.
You'll learn about new data technology and trends and how data teams and processes are
run at top companies.
The Data Stack Show is brought to you by Rutterstack, the CDP for developers.
You can learn more at rutterstack.com.
We're back on the Data Stack Show. And today we're going to talk with a company who has been in the data space for a decade.
And that's Mixpanel.
And we're going to talk with Neil, who has done a variety of things there, but now works
with the product and customer
experience teams. And he has been at the company for almost a decade, which is going to be a really
interesting conversation. And that's what my burning question is about. When you think about
Silicon Valley companies and startups, there tends to be a shorter tenure than a decade. And so I think his perspective on staying there for that long
and seeing the market, the product, and the company change
is going to be fascinating to hear about.
How about you, Kostas?
Yeah, I think I used Mixpanel for the first time in 2011 or 2012,
something like that.
So I'm super excited to hear from you like about the evolution
of the products this past decade like many things happened in terms of like products around data
and yeah i'd love to hear the stories from him about how the product changed how the company
changed and all that stuff and And that's one thing. The
other thing is he's a very, very experienced person when it comes to products. And I want
to see what kind of advice we can extract from him around product strategy and product execution.
Great. Well, let's dive in and talk with Neil. Let's do it. Welcome back to the show. Neil, we're super excited to talk with you. You
have a lot of experience at Mixpanel and a lot of experience with data, and we are just super
pumped to chat about all sorts of things with you. Yeah, it's great to be here. Love the show.
All right. Well, give us your background. I would love to know kind of where you came from and what do you do today at Mixpanel?
So I joined Mixpanel about nine years ago. Prior to that, I was actually in law school
and had just started programming as a hobby and because I was really interested in the sort of
creative change happening on the web with Web 2.0.
And it was very beginning of mobile.
And then I made this kind of sharp turn where I decided this is something I actually really love.
And so I reached out to a few companies at that time.
You know, I was reading Hacker News all the time.
And I'd actually started to use Mixpanel for an app that I had built and really loved it. And so, and then kind of actually, I know that Mixpanel initially just
sort of dismissed my email, but eventually was searching enough for engineers that they
like, all right, go back to the inbound pile. Such a good story. And so I came out and interviewed and thankfully got the job.
And then for the first few years I was at Mixpanel, I worked as an engineer.
And at that time, I was just really learning from the founders and learning from the CTO at the time,
getting sort of a real education in software development, software engineering,
and eventually started managing the infrastructure team and then became the sort of director of
infrastructure and then eventually the VP of engineering. And then two or three years ago,
switched over to lead product and design. And then more recently, I also have the support team and the customer
success teams report to me. So we sort of run product and design. Support and customer success
is one thing that's really focused on the user experience, the customer experience at Mixpanel.
And I do that in a close partnership with the VP of engineering. And yeah, that's,
that's, that's what I do today. Man, incredible journey. Okay. Lots of things to discuss, but
I'd love to know nine years at a, at a, at a formidable company like Mixpanel is incredible. Could you just give us a little sort of, like you look back
at sort of the arc of your experience there, is there anything that sticks out? Because a lot of
times people do a couple of years of tenure. And I think the perspective you have being at
the same company for nine years is just really interesting. Is there anything that sticks out?
Yeah. I mean, a lot, lots of things stick out, I think. And yeah,
that's been part of the sort of great thing about the experience is that I've been part of mixed panel going from six or seven people to
hundreds of people.
And I think that along the way companies change a lot. And so it's taken a sort of an,
I've had to be intentional about wanting to be adaptable enough to not get frustrated by change
and embrace what the problems are and learn from those at each sort of stage. Some things have been consistent though.
I think one is that it's just always been a company
that's attracted a really eclectic
and awesome group of people.
And so I stick around for that.
And then the other is that it attracts
a really amazing group of customers.
So we often are helping startups.
And so seeing so many interesting companies that many of which become huge companies when they were just a handful of people.
And so our customers are often these really innovative, creative founders or teams doing new and interesting stuff.
And I also think that Mixpan panel is lucky. It's at this
sort of intersection of a bunch of really big trends. And, and so I still think it has a lot
of unrealized potential left to, to, to make even more of an impact and be a bigger company
than it is today. So for those reasons, I've just kind of stuck around and some people definitely
have the reaction of like, well, you stayed at the same company for nine years, but it's been great.
And I've learned a lot, so no regrets. Yeah. I think that's, I think that's awesome. And I think
just over a nine year period, just responding to changes in the market and changes in technology,
it is, I think, going through those changes repeatedly in the context of a startup is
just hard.
You sort of have to kind of rebuild things that you spent a lot of time building because
a lot of times of outside forces, right?
And so that's pretty hard.
So good on you.
That is really awesome.
Let's step back a little bit.
So Mixpanel is a product analytics tool. And one thing that we've been doing on the show is sort of revisiting the 101 definitions. We did this with the data warehouse of all things recently
on an episode. And it's really helpful for me. And I think our listeners as well,
just to get a crisp definition of what a subject is that we all kind of think we know in the back
of our minds. And product analytics, I think can mean a lot of different things to a lot of
different people. So I'd just love to know from you, could you just give us the one-on-one? What is product analytics in five minutes?
Sure. Yeah. So product analytics, you know, purpose of the tool is to help product development teams.
So engineers, product managers, designers, the people who are making changes to the product
every day and, and trying to improve the product to help them inform their decisions,
which are primarily prioritization and design decisions with data.
And the primary input is user engagement data collected as an event stream.
So all the interactions that your users are having with the
product, and then it makes it really fast and easy to explore that data and to visualize the types of
metrics that are really useful for products. So growth metrics, retention metrics, conversion funnels, and we're really
trying to deliver a productivity gain to the product development team. So helping them figure
out what are the most important problems to work on, to be able to measure how effective the
solutions you're building are and actually catch when you make a mistake
and actually make things worse for users
and really understand with all the changes
that you're making to the product
and trying to improve the product,
are they having the intended effect?
Because you want to get out of this sort of mindset
of shipping and celebrating shipping features
and really start measuring what are the implications for users in the business,
right? So you make changes to the product, but is that actually making users happier and making
them come back more often and driving more revenue? And so that's what product analytics is for.
Neil, I have a question for you.
I was listening to the previous question that Eric asked you,
and you were saying about the change that you have experienced in Mixpanel.
You've been there for nine years.
Pretty sure that you have experienced a lot of change.
And usually,
when we talk about companies and change, we focus more on the people and the organization,
but products also change. And they change a lot. And I think outside of some very core people in
the product teams or the leadership, we keep to forget about the rest, are forgetting about that.
So would you like to take us through the journey of the Mixpanel product in these past nine years?
How it was at the beginning and how it changed until it became the product that it is today?
So Mixpanel started, Suhail, the founder, had been using something at Slide, which was a gaming company that was acquired by Google.
And he had been an intern there and they had really great analytics. And I only know this
secondhand from him, but that let them sort of understand users in the game and how they were
interacting with the games. And I think he had that sort of moment of hey this would be really broadly useful
and so started building Mixpanel and the early product really is actually had very early on I
think sort of in the first like year or two when it's really just the founders and and one or two
people had the Mixpanel was started in 2009.
But pretty early on, it had the key reports, which are, you need a report that's kind of event segmentation, lets you sort of do general analysis of that event stream that's coming in,
funnels to understand conversion and retention to understand retention. And then the early running, I think that the
issue was the thing about user engagement data is it tends to be one or two orders of magnitude
bigger than probably at least your application data, right? So if you have an application,
you're storing like your users table, let's say you've got a million
users, but then you want to start tracking every time a user does something, then you're actually
now going to be collecting millions of data points per day. And so that rapidly becomes a really
big data set. And then the thing about product analytics is that you want to collect all that
data, but then you want to look back over it and really slice and dice it kind of arbitrarily and decide, hey, I want to create a funnel that goes from sign up to payment.
Or actually now I want to make a thing that goes from sign up to did these three things and then payment.
And so the query workload is really complex and unpredictable. And so in that early
sort of time, I think the challenge was the company went through a bunch of different options.
I think initially it was on MySQL and then there was some Mongo and Redis and then tried building
it on Cassandra and kind of kept coming up against these walls from
like a performance and flexibility standpoint. And so then turned to building our own database,
which we call ARB, which is an arbitrary name for arbitrary segmentation. And, and that had just
shipped like right around when I joined the company. And I think that was a real inflection point for Mixpanel because it really allowed the
company to handle much, much higher scale data volumes and then also deliver this really,
really flexible interface.
And it really had that kind of magic feel at that time in sort of 2011, 2012.
And, and so Mixpanel just took off. And then I think another interesting moment is it's right
in there, mobile really started gaining steam. And we used to, I can think of a time where we
went to OpenTable, for example, and it's like a small company at the time.
And most companies were using Google Analytics, using Adobe Analytics, Omniture.
And we'd say, go to pitch them on Mixpanel.
And they were sort of entrenched in those products or would ask, how are you different from Google?
Of course, we were different from Google, but it was sort of hard to explain.
And but as mobile came along, we'd say, well,
what about mobile? And they'd be like, oh, well, yeah, there's, I think there's like a couple people down the hall working on an iOS app or something, go talk to them. And then you'd ask
them, hey, do you have analytics? And they'd know. And because Mixpanel, that event model I was
talking about is there's nothing platform specific about it.
You can track an event from a mobile app
just the same way you can track one from a website.
And that was really a way for us
to get a foothold a lot of places
and we sort of repositioned as mobile analytics,
even though really we're completely sort of cross-platform.
We've since positioned back to just product analytics,
but that gave us this kind of like explosive growth
with the iOS app store opening up
and just all those apps coming online.
And then I think our product evolved a bit based on
the fact that we had an SDK that was in basically all apps. And because of
R, because of the scale we could handle, because of the interface we could build at that point,
we were just by far the most popular product analytics tool. And we got into all the sort of
high growth apps. And so from there, I was like, okay, well, we've got this SDK,
we should just add more value. And so why don't we help our customers send push notifications?
And why don't we help them send surveys? And why don't we help them run A-B tests? And so we
offered more and more of these products that were, it was cool too, because
you could kind of configure them in our website and then boom, they'd be on because our SDK
was already installed in your app.
And I think what, what happened there was that we, we inadvertently spread out into
a lot of different, different use cases, different end users.
We've now sort of spread from just serving product teams and engineering teams trying
to make product decisions.
And now we're helping marketing teams send notifications and surveys.
And then we got into some more sort of like data infrastructure type stuff.
And we got spread a bit too thin.
And we're in too many different types of products for too many different types of people, which
made it hard for the company to focus and get to all the stuff that customers wanted
across that widest surface area. And so the next phase was to really
refocus and go back to our core, which is, as I described, product analytics and helping
engineering product design teams, like leading digital companies and startups.
And so that was a tough transition because, of course, we had both people who had been working on those products and customers who love those products.
But refocusing has been huge for us because, I mean, it's that old saying, like, just do one thing really, really well.
And then you see it, like, in our NPS has, like, tripled and our customer retention has gone up by 35%.
And so I think it was a good decision, but it was hard.
And that's probably the most dramatic part of the journey, at least in the last four
or five years.
And Arb, just on the infrastructure side, we're on like version three of that.
There's kind of some interesting stories on the infrastructure side as well.
That's super interesting.
And actually, I think it's an amazing and very, very useful advice for everyone who is like into product.
And what I take from what you say, like product management and product leadership in general is not just about building.
It's also about making the decision to kill products or features, right?
And this is a very important part of the work that product managers are doing or product
owners, like everyone who is involved in product.
And we shouldn't forget that.
And I really appreciate that you are sharing this, Neil, with us. I remember the first time that I used Mixpanel,
I think it was in 2013 or something like that.
I was impressed by the real-time nature of the product.
I remember, for example, I was very interested to figure out
what the users of my web application were doing. But when I figured
out that not I can only see that, but I can see it in real time, like someone was doing something
and I could see an event popping up there. And I was super impressed. And especially when you are
at the beginning of a new company, the new product where you don't have that much traffic,
it's an incredible feeling to see someone interacting with what you're building, right?
Yeah.
How important is the real-time nature of the product for Mixpanel?
I think on the sort of ingestion side,
that as events get sent, they immediately show up in the UX.
I think it's really important for sure in the case
that you're talking about when you're a smaller company and really you don't have the kind of data volume or the user traffic to you may as well look at what's happening to every user that signs up because there's not an emotional component there that part of what
what product analytics tools do is they give people who are making digital products they give
sort of visibility into what what's happening with what they're what they've built so i just
say like it's different than a restaurant like your restaurant you you make the food you sit
there you can you like watch people eat the food and you can see if they liked it or not. And whereas you put an app in the app store,
without analytics, like you got no idea what's happening. And so when you turn on a product
analytics tool, it's a kind of sight to the blind moment. And, and it's thrilling, it's thrilling to
watch people all over the world signing up and using it. And you think this thing that I just built in my desk is there's someone in Russia is
using it.
And so I think it's important for that.
And I also think it's important for actually the implementation side.
Like, so when you are the one who's connecting tools to Mixpanel or instrumenting your application,
your servers and sending events to Mixpanel, we can really tighten that kind of feedback
loop if you curl an event to us and then it immediately shows up and you can see if it's
the way you want it.
And then you can keep editing your code until you're tracking things correctly. And so for that workflow,
it's really the real-time ingestion is key. Once you're a bigger app or you're making
decisions about products, features, and that sort of thing, I think it starts to matter less and actually being too impatient is,
will start to work against you. You, you need to sort of let some, some data get collected and,
and see some trends over time and then look at the sort of aggregate stats. And so, and in that
case, you're probably not making moment to moment decisions.
You're really product feature prioritization, design decisions are not super urgent.
Prioritization is probably happening weekly, monthly, quarterly kind of thing.
And design decisions will be getting made every day, but you don't need like instantaneous
ingestion latency, like real-time.
On the query side, I think it's always super important that the queries are at an interactive speed because that's what encourages people to explore the data and makes getting answers really
quick and easy, which means people will do it because it's not painful. Absolutely. It makes
total sense. I think it's a painful. Absolutely. Makes total sense.
I think it's a very important part of the experience
that someone gets from Mixpanel,
at least based on my experience and working with it.
You mentioned the database,
the custom database that you built,
you called it ARP, right?
Can you tell us a little bit more about it?
I mean, you mentioned that the existing technologies,
the existing storage engines out there and query engines out there couldn't scale.
So you decided to build your own database system in a way.
Can you tell us a little bit more about it?
Like, what is it, first of all?
Is it a database like Postgres, like MySQL?
Is it something else?
Like, how it looks like?
Yeah. So I think ARP fundamentally is one
of these, it's got, it's more purpose built for our kind of workload. And that lets us make a
bunch of trade-offs. The key thing is that we know that the data is coming in as a user event stream. And so when you set up an instance for a customer in ARP,
the event table is the sort of core table.
And so we can make an assumption
that there's going to be a timestamp column
and that it's actually technically optional,
but like in a user ID column.
And I think the first thing to sort of understand about
it is that that lets us partition the data by user ID and by time. So we can distribute the
data across a lot of different shards and those are distributed by the user ID, which in a typical application means it's pretty even distribution.
And then it also lets us make this very critical assumption that all events for any given user
are going to be in a single shard. And then the second is with timestamp, we can then,
within the shards, we can partition the data by time.
And that really works for our query load. Because if you think about wanting to do sort of behavioral
analytics, right, where you want to say, look at a funnel, and you want to see what percentage of
users went through step one, step two, step three, what you really need to go do at like an analysis, like at a query level is
go look at each individual user's journey and see if which steps each individual user made it
through and how many users made it to each step. And that means that we can do that completely
distributed. Like you can do that independently in parallel on each shard
because you know that all the steps for a given user
are in that one shard.
And then what you pass back up to serve the aggregation step
is the aggregate from each shard.
Like this many users made it to step one,
this many users to step two,
which is a very small piece of data to send over the wire. And if it weren't that way, then you would have to like
scan every shard to put together each user's history. And the same is true of retention.
The same is true of like a Unix query. If you're just looking at totals and Unix kind of thing,
the key is that by sharding on user ID, you can just kind of sum up the values from each shard and you're not double counting anybody.
And then, of course, a lot of the time in our kind of analytics, you're like looking over the last 30 days, looking over the last week.
And so it's not as often people want to look back to three years or whatever. So
by being sharded by date, it means that we don't need to process all the data each time.
And then, and then from there, it's like the rest of the properties you send with the event
are just, you can, you can send whatever you like. And, and that's actually a really nice
thing about ARB, I think is that it's schema on read.
Like, so there's this tension, I think, in data where you want to have governance and
you want to have schemas.
And so the dream is to have like a tracking plan and a schema that's like tightly enforced
and tightly managed.
But it's kind of at odds with, on the other hand, the delight of this user experience of just being able to like write one line of code and start tracking your user events.
And that's really just being able to get tracking going quickly and send more data when you need it without a lot of like overhead of managing the schema is actually like a feature in some ways. And so that it's like schema-less or schema on read
allows us to be very kind of user-friendly
in how you set up tracking.
And then I said like ARV's gone through a few iterations.
The first one was to switch to the storage format
for each of those event files to be column-oriented.
And then the third was to separate compute and storage, which we did. We had this kind of
circuitous journey where we started in Rackspace in the cloud, had too many noisy neighbor type
problems back in the day, switched to SoftLayer, were on our own hardware for a while, and then four or
five years ago switched to Google Cloud for developer productivity reasons.
And when we made that switch to Google Cloud, we really re-architected ARB in a lot of ways.
So now the storage and computer are completely separate, and we've gotten rid of a lot of
the old scaling problems we used to have when we had sort of clusters running on our own hardware.
And I think for our users, what's happening as well that's really, really important is that we have thousands and thousands of cores deployed.
And then that compute is shared.
So you come to Mixpanel and you might do a query over a trillion events, right?
And we might use a thousand cores to do that query.
But you're using them for three seconds, right?
And then it's pooled across all of our users.
And so that allows us to also keep costs down and latencies low.
Nice, nice.
You've done some amazing engineering work behind the scenes for Mixpanel.
So far, you have described, I mean, we started the conversation talking about Mixpanel delivering
product analytics, and that's like the definition of the product, let's say.
We continue the conversation, and and so far we have talked about
innovation that has been done on a visualization level
or like at the point where the user interacts with the application.
There is huge innovation that you've done on the storage layer.
You pretty much had to build your own database system
with, of course, like your own assumptions
there and trade-offs to make it work. And you also have an ingestion layer, right? Like you
can collect data. Let's say from usually like in the data stack, we are talking about database
systems, ingestion layers, processing engines. At the end, did you have to build pretty much a whole dedicated data stack
to drive the experience of mixed panel right now is this like does this sound right what i'm saying
yeah yeah and i mean things things have changed over the over the years too like so for example
in 2011 when art was first conceived maybe maybe Redshift existed. I'm not
sure if it did. And certainly BigQuery and Snowflake didn't. And now we do run benchmarks
still like just recently in the last six months. And for our specific workload, ARB is still 10, 20, 100 times faster, cheaper than running it on like a generic data warehouse or like a cloud data warehouse.
But we do basically, we try to avoid the kind of not invented here syndrome, right?
Like we're not trying to build for our own sake. We really, and if things come
into view, that would be, we don't need to build it ourselves. Like we adopt them as, which is what
we did when we moved from, from software back to, to the cloud. But yes, the same on the ingestion
side, we have, we have a bunch of servers, kind of the edge around the world.
So there's like low latency from client devices to Mixpanel.
And then those all independently queue data.
And then that's ingested.
We use Kafka into our core data centers, one of which is in the US and one of which is
in the U.S. and one of which is in the EU.
And so, yeah, we've built the full kind of event client data collection ingestion system
and our own storage layer, our own query system, and the UI.
So we've more and more been trying to leverage services in the cloud where they make sense for us.
So we've been using Spanner for things and some of the other tools that Google has.
But for the most part, the core systems are all still custom built.
Do you think if you started building a mix panel today that this would have changed? Do you think that you would be using like some of the technologies
that exist right now out there for each one of these layers that we talked about?
Yeah, probably. I mean, I think that, as I said, I think that in terms of storage and query layer,
I think that the cloud data warehouses are really great. And if you were really just a small startup, you'd probably just start there.
At the same time, for the kind of scale a lot of our customers see, they still can't do interactive queries over the types of workloads that we see.
So for now, we continue to invest in ARB on that basis. Ingestion, again, it's like, I, what you're saying earlier,
it's like part of product is, is, is deprecating things. And it's also part of product is
recognizing when things that you chose to build, you don't, you can now use off the shelf stuff
to replace it with, and you can get leverage from that. So I'm always looking for that.
Again, on the issue on the ingestion side is like,
I don't see some off-the-shelf thing
that actually has all the features
in a sort of schemaless approach like we have.
And if we did, we'd use it.
But for now, I think the things we do have
are things that we need to build ourselves
to deliver the experience that we want to give our users.
Yeah.
You mentioned data warehouses like a couple of times.
I remember, I mean, Mixpanel has been around for quite a while.
And so I think at the beginning, there was just Redshift around and probably Redshift
wasn't that popular yet.
I mean, it wasn't yet.
We hadn't like entered the era of the cloud data warehouse yet.
When this happened, both with the explosion in usage of Redshift
and then Snowflake and BigQuery, how did this affect Mixpanel,
both from a business perspective, like what was the experience that you had,
and also from a product perspective,
if you saw things around your perception of the product,
like if something changed there
because you saw how things could work
with the data warehouse
or how customers were interacting with it.
Yeah, so I think cloud data warehouses
and then kind of like the associated rise
of the data engineer and analytical engineers
and now everything that's going on with DBT and reverse ETL is just basically this whole
kind of modern data stack movement.
I think it's just like a seismic shift in the world of data.
And it certainly affected us.
I think one thing that we saw was we would have larger customers.
It's funny, if I go back far enough before Redshift and those types of tools existed,
we had larger customers.
And if they said, hey, we have some engineers, they think we're going to take this stuff
in-house, they'd often come back a year or so later and be like, well, that was harder
than we expected.
But once cloud data warehouses were available, then, you know, it became much more viable to build like really big event ingestion systems and storage and query systems for this kind of data.
But then we saw people coming back again and for different reasons. user experience right was the the teams in that were product managers product designers engineers
that they it they were the the experience for them of setting up like doing exploratory analysis
setting up a funnel understanding time to convert understanding nuances of retention, being able to segment that by preceding user behavior.
That stuff was basically either extremely difficult and slow or impossible for them
to achieve in a BI tool or in writing SQL.
And so they become reliant on analysts to do it for them.
But even that is kind of slow.
Like the equivalent of a,
what's a few clicks, simple mix panel funnel might actually be like three or 400 lines of SQL.
And even if you can, you can do that, doing it in a sort of consistent way and doing it in a way that enables it to be self-serve for everyone and lets people really interact with it in this very sort of exploratory iterative way,
teams really miss that.
And so I think it's a case where you have this sort of workflow,
vertical kind of product development specific types of analysis and types of questions.
And even going back to the kind of experience you were talking about,
just being able to, from all of those aggregate reports, be able to drill all the way down to the granular user experience that's underlying those reports to really understand the causes of things.
That was missing.
On the other hand, the data in the cloud data warehouse was one, and this is a big motivator i think for companies moving there richer right it's
it's you don't just have the your user event stream from your product but you also have your
billing data and you have your support ticket data and you have your crm data about and and so you
have all this additional rich information that you can join to more deeply understand the user experience.
And it's also better governed because usually there's a data engineering team that's really focused on that and focused on producing these cleaned up, reliable, simplified tables for driving analysis or automation in the company.
And I sort of mentioned DBT earlier, but this more and more trying to bring these sort of
software development kind of practices like testing and so on to data.
And so I think that for us now, what we've been really working on for a year or so is what we internally call modern data stack compatibility.
And this is trying to bring product analytics to the modern data stack.
How do we marry those two things?
How do we give product teams the UI that they love, but point that at the data that they trust, and that's rich,
and that's in the cloud data warehouse. And for us, what that's been is sort of the,
I think, talk about the sort of hub and spoke model where you put the cloud data warehouse
in the hub, and then you have all your other tools in your company as spokes, and you pull the data in from them,
and then you can push out the sort of aggregated data to make all those tools more useful. And I
think product analytics is the approach we're taking is to really be a great spoke in that
modern data stack. So if you've collected events to Mixpanel, they can be ETL really easily into your cloud data warehouse.
And that's been true for forever.
And then now what we're really doing is the other direction
where you can use reverse ETL to push dimension data
like users and accounts tables into Mixpanel
to join with your event stream and also modeled events where there are events that are occurring in other systems that you model as a table in your data warehouse.
And then you can pull that into Mixpanel and have that be available in your analysis in Mixpanel. I think it's a case of where when I go back to those times where we would see customers churn to a cloud data warehouses to to really invest in
centralizing and validating and cleaning and joining companies data and as an analysis tool
data is our input it's our grist for the. So the better and richer and more trusted the data that gets loaded into Mixpanel, the
more value we can create for our customers.
And so I'm really excited about the whole modern data stack and cloud data warehouse
change, but it's certainly like it's a big shift in our, in our product strategy
and sort of how we think about where we sit in the stack. Yeah. Love this subject. And I'd like
to dig a little bit deeper into it. And one thing we've talked about before is that when you think
about the value that a product analytics tool, or even
you can really apply this to any team, right? So marketing analytics or analytics around customer
success, the point solution is really beneficial for that team because to your point about what
you're doing in Mixpanel, you can really focus on helping a very specific set of users of your product accomplish a very specific set of things with your tool.
And that creates a really good experience.
And I think one thing that's exciting about the Cloud Data Warehouse and what you just talked about is that value in a point solution, especially in the analytics space,
can often be trapped in that tool, right? Where it's like, I uncovered this great insight.
And then you run into this challenge of like, okay, like, how do I take that and then
operationalize that insight across these other pieces of the stack? And so two questions for
you, Neil. First of all, the cloud data warehouse, for sure, for all the reasons that you mentioned,
I think is really exciting.
I think one of the current limitations in some ways, which will lead in the second question,
is that it's still, from a cost standpoint, is hard to create that loop in real time from like a technical standpoint and
like a cost standpoint where you can create this amazing feedback loop, but it's pretty expensive
and you have to sort of piece together several pieces pipelines in order to create that feedback
loop. So first, like when you think about
the real-time use cases where maybe routing something through the data warehouse loop
doesn't make sense, how do you view the role of a product analytics tool like Mixpanel
sort of serving those? And so one just off the cuff example would be your product team uncovers insights around churn. And so you need to
sort of enroll those people in like a win back campaign of some sort that's happening in a
completely separate tool. And maybe that's run by marketing or some other team. How do you view
that piece of the architecture? Yeah, I think we're're we're in this kind of transitional or i don't know how
long it persists sort of mode though where there are really two ways you can come at that i mean
one is that you we do have integrations directly to to marketing tools and so cohorts that you
create in mixed panel you can hook up to those tools
and push them there and act on them. Just kind of a direct point-to-point integration between
Mixpanel and the marketing tool. And that's kind of the way it's been for a long time.
The other way is that you pull those things. So you pull the tables that are in like the events
or the cohorts that you've built in Mixpanel
and you pull those tables into your data warehouse, right?
And you use a reverse ETL tool
to push them onto that marketing tool from there.
And in the first case, it can be lower latency.
I think we can, for many of the tools,
we can push those cohorts like every 15 minutes kind of thing. And on the other hand, I think
that if they're going more places, at some point, it makes more sense. I think if you really are
going to do this kind of hub and spoke modern data stack architecture, what I'm seeing is more and more where it makes sense for things that you want
to federate everywhere for those to belong in the hub. And,
and you,
you're probably going to find that more manageable and less error prone and,
and versus, versus having like a kind of,
it's great to have you sort one point-to-point connection
between two tools, it's going to work great.
But like once you've got 25 tools
and they all have different point connections,
it can start to become really messy
and hard to kind of reason through.
So if I were building my own stack kind of thing
from scratch, like, and I would probably just say, okay,
I'm going to have my data warehouse be at the center of things and I'll pull,
pull data from all my different tools and tables from all my different tools
and then have it managed there with like dbt or something and then push it
back out. Yeah. It was reverse ETL. Yeah. It makes total sense.
And it's interesting. We,
one thing we've talked about before is kind of this daisy chain problem and the it's interesting because the daisy chain
problem in the boom of sort of marketing technology tools or whatever is like oh direct integration
with like hubspot and salesforce and like marketo and Salesforce. And then you kind of like you would
daisy chain, right? Where it's like, okay, Salesforce connected to Marketo, Marketo is
connected to whatever other thing like ad platform or whatever it's doing analytics tool. Then you
sort of create this daisy chain. And then it was like, okay, well now like we're collecting sort of
the raw data. But then when you think about analytics or tools or sort of behavior-based tools that
send the behavioral data, it's like, yeah, well, you're sort of getting the raw like behavior
and you're sending that, but then you run into another daisy chain problem, which for sort of
the data engineering side of things is pretty challenging. Okay. Second question is a follow
on to that. And I'm going to flavor this with my own
take on it a little bit. So you see this trend, especially as it relates to the cloud data
warehouse, where things are getting cheaper and faster, right? So you can sort of imagine this future world where I like to say you sort of remove, like imagine a world where you remove all the artificial scarcity from the equation around moving data.
And you sort of have just the lowest cost ability to move data wherever you want.
Who knows when we, like if we actually get to that point, but in terms of,
because you do this every day and you're thinking about these architectures,
when do you think we'll hit a point where the latencies around the sort of like a product
analytics tool, like mixed panel, where you can complete the loop of saying, let's dump a cohort into the warehouse, run processes on it, and then get it back out of the warehouse.
When do you think that loop will become, it will make the real-time point-to-point connection
sort of unnecessary?
And I'm interested to know from your perspective, like obviously with what I work with every day, like it's a very
relevant topic for me, but you in many ways, if you're building for an architecture like that,
have to think about when those latencies are going to drop. So I just love to know,
when do you think we'll see a world where the point-to-point solution isn't necessary because
you can complete the loop as fast as you want at a reasonable cost on the warehouse that's a good question i think the reality of these connections right now is
that they're not they're not like real time in the sense that they're they're like event streams
right we're not collecting the event streams and then streaming them to engagement marketing tools
and in real time it's still it's still a it's still a batch system where it's just
like on some schedule it's like pushing here are the latest users in this targeted cohort to
whatever marketing tool it is and there's no real reason that can't be just as fast to like
push that same thing to the warehouse i think it's scale where it can get the, probably the thing
is right now, most of those connections are pretty naive. Like they just recompute the whole cohort
and push the whole cohort. And of course, past certain scale, you have some big gains by pushing
like a Delta of some kind. And, but there's today it it'd be fine. Like if you had from, from our perspective,
if there was whatever ETL service was like hitting mixed panel,
it could hit us at that same interval would be no different than the,
the way that we get,
we kind of hit ourselves to push cohorts to, to downstream tools.
Yeah. It's a, it's super interesting.
And I think one thing hearing you say that,
that I've thought about a lot recently is when you say real time and you actually drill down
into that with a company who's trying to do real time, there are absolutely mission critical things that you do want to do
in real time or near real time, right? Where a user does X and you want Y to happen immediately.
So for example, a new user completes the signup process for an app and you want to send them a congratulations
like message or push notification or something, right? Like, sure. I mean, of course you want
them to have that gratifying experience. That's actually part of the onboarding flow.
But when you think about cohorts, I mean, you, there aren't a lot of situations where, well, I say that, I'm sure I could conceive of many, but where use a cohort to simulate a transactional process,
but you're usually better off just using a transactional process.
Like having your signup code, just like trigger an email to SendGrid or whatever,
to send to that user.
Like a lot simpler than this kind of roundabout thing of like going into a product analytics
tool and setting up a
cohort of users who've signed up in the last five minutes and then pushing that to like a
engagement marketing tool to send an email it's just like a lot of like you call it sort of daisy
chaining and additional complexity to just like i think you can kind of forget sometimes like oh
hey we also do have our whole transactional system for doing transactions.
And once you remove all those use cases, then yes, by definition, you're only looking at
stuff that's like usually things like, well, you can't do in a transaction or not as easily
where it's like, well, users who did this and then this and were in that state in the
last five days and then did this other thing.
And yes, in those cases, they tend not to be so urgent.
And we've seen that, especially when we've had that messaging tool where that was a lot of like customer.
We get those kinds of, hey, could this happen lower and lower and lower and lower latencies?
And then when you would dig into the problem they were trying to solve,
it was stuff like you just mentioned, like, well, we want to send an email when someone signs up.
And then it's just easier to just put that in your code in the application.
Yeah, that's interesting. I think once you get outside of the
sort of mission critical sub-second latency, whatever you want to call it,
actual real-time, real, real-time use cases. When you start to get outside of that in the
customer journey, even companies at really large scale, you're in the realm of testing, right?
And so like building cohorts and sort of conceiving of like different ways that the
customer journey can happen that are dependent on different user behaviors. And often it gets, it gets a lot more complex
because you're looking at combinations of user behaviors that don't happen back to back,
like in a linear fashion always. And in that context, you're really looking at,
okay, how do I build like a, the hard part is actually building the list of users that you
want to test something against, right? And it's more about the cohort and combining those behaviors
or traits that sort of build that cohort and then saying, okay, how do I like operationalize that
and sort of enter these people into an alternate customer journey that I can test?
Yeah. This conversation is making me think of too,
is I think that there was a,
there's really,
there's kind of two competing conceptions of the data stack.
I'd say one that was kind of like
called sort of the engagement stack or the growth stack
that was sort of like a cdp press of product
analytics tool plus a bunch of engagement type tools and and that was this sort of all real-time
event collection and and sort of engagement tool type stuff with the like feedback loop between
you send this message and then you see like what what happened as a consequence and and then it's actually kind of a competing stack the modern data
stack built around the cloud data warehouse and and and you see that within a lot of companies
that they literally have both stacks and they're kind of siloed from one another for the most part
and and i i think for a long time at Mixpanel, we, that conception of how
things would go is great because it's like great product analytics is going to be like at the
center of a company's data stack and be the, like the brain for every user interaction that happens
in a company. And, and I think that that's, it's, that's just proving not to be the right way to do things, that product analytics tools are really, really awesome at answering questions for the product team about their users and their experience in the product. a cloud data warehouse and the tools that are being built around them for being how you
centralize and govern and route data at your company. And so I think that that's where like,
even as we're talking about these use cases, they're more marketing, right? Like they're for
how do we build cohorts
to send campaigns to?
And actually, once you kind of revert back to,
hey, the cloud data warehouse is going to kind of be
the center in the modern data stack.
And if you go into a lot of these marketing automation tools,
they have incredible cohort builders
and they can take event stream data
and do targeted notifications based on that.
And you don't really need to use your product analytics tool to do that.
And there's actually some simplicity in that.
You can use your marketing tools, marketing, you can use product tools to do the product
work.
And so this conversation is actually very representative of how, when I was earlier
talking about how we got spread out into a lot of different use cases,
we were spending a lot of time trying to figure out how do we move cohorts here and there so that
people can run a campaign on them. And now it's like, it's been this very liberating thing to be
like, that's not really our, that's not our, that's not what we do. We do answering product questions for the product team.
Yeah.
And there are awesome marketing automation tools out there that we partner with.
And I think one of the cool things is like the modern data stack makes this kind of,
there's this potential for this standardized way for all these tools to communicate,
which is like ETL into the warehouse and reverse
ETL back out.
And as long as you support that, then you can kind of be integrated to anything at that
clearing house.
Yeah, I think it's two comments on that.
One, totally agree.
I think the sort of liberating individual tools to be the best at what they do, because
you have the cloud data
warehouse at the center makes total sense. The follow-on question, which is definitely for
another episode because we're coming up on time is then you get to the question of data governance,
right? Because you sort of like building cohorts and separate tools and there's crossover, like,
is it the same data or whatever? That's a whole nother discussion. Super interesting where the space is going on that front. But the second
comment is the idea that you can have sort of known models around data from different tools
in your cloud data warehouse is really interesting, right? Where you say, okay,
we have the product analytics from Mixpanel. We have marketing website analytics from whatever
it is, Google analytics. We have engagement analytics from the marketing automation tool
and maybe like the sales outreach tool. And as sort of data models, if data models find some level of conformity or people just crank
out a bunch of dbt models to sort of figure all that out, you can almost conceive of this world
where like, okay, I'm starting a company and I just kind of already know what things need to
look like in Snowflake or BigQuery when I spin it up to sort of serve all
of these different use cases, which is really interesting. I think would just be just thinking
about the experiences that I've had in the past. It's like, man, that would be so nice.
It just saves so much work and so much time across teams to have that approach from the start? Yeah, yeah, it would be.
It would be.
I think then all different tools
could make more assumptions about the data
and do more for the user automatically,
which would be great.
This is like a perennial conversation for us.
It is sort of like,
can we kind of standardize the sort of tracking plans?
And sometimes it's sort of like,
can we do that by vertical?
Like if you're an e-commerce company,
you just need to track these things
and then we can do all your reporting out of the box.
One thing you do come up against,
I've found is just like,
it's amazing how much like companies are kind of
snowflakes. They, every company is different in important ways, even within the same industry.
And, and so that standardization has been, has been hard to, to define. And in, in some ways,
like the having to think through what data is important to your business and what events are important and what properties are important and what are the right KPIs is also like a side effect of having that, that doing that planning is really more deeply understanding your business and how to measure it and what the goals are and aligning around that.
And so it's also sort of like a useful process to go through anyway. But I genuinely don't know
which way it will go, whether they'll just be kind of like a standard data operating system
for running companies or continue to have just kind of free form in each company to
define its own?
Well, it'll be fun to have a front row seat as all this unfolds.
We are, we're right at time meal, but just one more question.
Thinking about our audience, you have played so many different roles at Mixpanel.
And I'm wondering if you have any career advice.
A lot of our listeners,
they're across the spectrum. Some people starting out in their careers, sort of working with data
or engineering roles that are close to data. And some people who have been doing this for a really
long time like you, but any career advice that you could give our audience before we hop off? That's a good question. I think that specifically for data work,
I think, you know, you can only,
you really, and we see this with our customers as well,
is that there's a sort of instinct
to sort of begin with the data collection
and the sort of technology side of things.
But I think that it's like that.
I think this is an Apple thing that you have to start with the user problem
and then work your way to a product that would solve that problem.
And then the technology that makes that product possible,
you can't just go build a technology and then search for what product you
might make with it and then hope that there's someone who needs that product.
And I think it's the same with data.
You need to really understand the business that you're in and the product or
service that you're providing and the user experience and, and, and understand what are the
most meaningful questions and what are the most important things to have visibility on and,
and then work your way backward from there to the systems to provide that. And, and in that, I'd say there's, I think it's Gall's law. It's
like all complex systems that work came from simple systems that work, keep it simple and
just get something working end to end for the most important metrics and for, to answer the
most important question. And I think that that that's going to be what in the end is going to allow you to have the
most positive effect on the company that you're working at.
And that's what really drives careers forward in the end, in the long term, I think, is
knowing what the most important things are to be done and actually getting them done. And that is really good advice for everyone, no matter where you're at in your career.
Well, well said, Neil. Well said and appreciate that insight. Well, we are,
we are at time and this has been an incredible conversation, Neil. Loved learning all about
Mixpanel from a technical side and a
product side and really appreciate you taking the time to join us on the show.
Yeah, this was great. Thank you so much for having me. Really appreciate it.
My big takeaway is kind of thinking through a summary of all the things that Neil talked about. And I think he just has such a mature perspective
across a number of subjects. So number one, he reiterated a lot of things we've heard from
other really good experience engineers around keeping things simple. How do you make the
decision to build something yourself or buy a service? Just
a very mature perspective on that. A very mature perspective on focusing the product that you build
for a specific purpose, for a specific set of users. And also a mature perspective on adapting
to what are sort of clear sea changes and the way that people are using your tool relative to other
tools. And so I just appreciated it. It just, I think all of his experience and the fact that
he's just a really smart guy, it was just really fun to hear him talk with a lot of authority
across a variety of disciplines. Yeah. I think the most interesting part of the conversation,
and there are two things that I'm going to
keep from this conversation about product.
One is the focus, what you also mentioned, Eric.
Focus is super, super important.
And I mean, I think Neil is one of the best authorities to talk about this.
I think the examples from Mixpanel were amazing about what focus can do or what lack of focus
can do to a company.
Yeah, like the NPS score and retention numbers were crazy.
And just as a consequence, I mean, not simple to change your product focus,
but those are numbers that tons of companies are striving for.
Yeah, yeah.
And the other thing is that product as a function is not there just to build. It's also there to destroy in a way, right? Like, everything that we build at in product is all about building and executing and
introducing new features but sometimes it's more important to decommission a feature than building
new one on top of that and that's another hard learned lesson that i think everyone who has
tried to build especially like people who have been around a product from the beginning
until it matures like i think they will say that this is super important.
The other thing that I found super fascinating
was all the discussion around building a custom processing
and storage solution for their specific use case,
which I think is super interesting.
And it's something that if not like now,
in a couple of years, we will be hearing more and more about especially as all these platforms like snowflake big query they are trying to
move away from being a data warehouse and actually become a platform where data applications can be
built and this is going to have some very interesting challenges on the technical level. And I think what Neil was describing is like a small glimpse of the future that we are
going to see now.
It remains to be seen how it's going to materialize, but we'll see.
Yeah, it will be really fun.
Well, that was a great conversation.
Many more great episodes lined up for you this end of the summer and going into the fall.
So thank you, Brooks, for keeping an amazing list of guests on the roster.
And we will catch you on the next show.
We hope you enjoyed this episode of the Data Stack Show.
Be sure to subscribe on your favorite podcast app to get notified about new episodes every week.
We'd also love your feedback.
You can email me, ericdodds, at eric at datastackshow.com.
That's E-R-I-C at datastackshow.com.
The show is brought to you by Rudderstack,
the CDP for developers.
Learn how to build a CDP on your data warehouse
at rudderstack.com.