The Data Stack Show - 30: The DataStack Journey with Rachel Bradley-Haas and Alex Dovenmuehle of Big Time Data
Episode Date: March 24, 2021On this week’s episode of The Data Stack Show, Eric and Kostas are joined by the co-founders of Big Time Data, Rachel Bradley-Haas, and Alex Dovenmuehle, formerly of Mattermost and prior to that, He...roku. At Big Time Data, they work together to provide companies with the ability to derive value and insights from decentralized datasets, improve business processes through data enrichment and automation, and build a scalable foundation to enable a data-driven culture.Highlights from this week’s episode include:Rachel and Alex's background and their goal to make data approachable for companies everywhere (3:09)The data stack journey: making decisions when you're small that allow you to grow with your data and your organization (12:28)The problems faced when a data stack isn't nurtured early on (15:59)Changes in data stack technology (21:32)How Alex and Rachel's roles at Big Time Data differ and interact with each other (39:00)Client use cases (43:34)Comparing the stacks of seed-stage startups, mid-sized companies, and giant enterprises (48:54)The Data Stack Show is a weekly podcast powered by RudderStack. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Transcript
Discussion (0)
The Data Sack Show is brought to you by Rudderstack, the complete customer data pipeline solution.
Thanks for joining the show today.
Welcome back to the Data Sack Show.
Eric Dodds and Costas Pardalis here.
We have an exciting guest and a surprise guest on today's show. The exciting
guest is Alex, who is at Mattermost, who is actually our very first guest on the show,
exactly 30 episodes ago. Alex has since started a consultancy called Big Time Data,
along with Rachel. And both of them are going to join us on the show today to talk about the data stack journey, how the data stack changes over time. I'm very interested. My burning question
for them is, we see this all the time in our work and from people on the show, companies of
different sizes have different requirements around the stack that they build for customer data
infrastructure. And I want to know which tools stay the same throughout the entire journey from, you know, sort of person in a garage startup,
all the way up through enterprise level. So that's what I'm going to ask. Costas,
what's on your mind? Actually, we're very aligned on that. That's something that I think both Alex
and Rachel are like the perfect people to chat about how from this very chaotic market of data
related technologies right now, patterns emerge and
what these patterns are and how people can use them to navigate the whole process of building
their own data stack. And there are, of course, differences between the different types of
companies, the different problems that they're trying to solve and scalability, but there are
also many commonalities. So I think we will be able to tackle this today. And of course, I'm
even more excited because Alex was the first episode of this show,
but also my very first episode for a podcast ever.
So I'm really, really happy to chat with him again.
Great.
Well, let's talk with Alex and Rachel.
All right.
We have our very first podcast guest ever from the first episode back on the show and have added another
special guest. So Alex, who was at Mattermost when we talked with him last, and Rachel,
who was also at Mattermost at the same time, have joined us to talk about the DataStack journey.
Thank you so much for joining us. Oh, you're welcome. Glad to be here again.
Yes. Glad to be here again. Yes. I didn't be here for the first time.
Well, we had a great episode with you, Alex, talking about all sorts of interesting things.
I guess it was, wow, six or eight months ago now. So time flies when you're stuck in a house,
right? Yeah. Yes, indeed. Yes, indeed. Well, why don't we start out? We'd love to just get a little bit of background. So
you had different roles at Mattermost, but worked very closely together. But we'd just love a little
personal background on your history, how you ended up at Mattermost, and then what you're doing today,
which is big time data, which we want to hear about as well. So Rachel, why don't you start
since, and Alex, of course we want our
new listeners to hear your story, but we'll let Rachel start. Yeah. Yeah. So background is in
industrial engineering at university of Michigan rep that to the day I die. So go blue. And really
what ended up happening, graduated, went to Cisco, really honestly, very lazy in the way that I never liked to do the same thing twice.
So got really into automation, data, how do you scale all that, and then wanted to go deeper on
a technical level. So ended up moving over to Heroku, which at the time was a subsidiary of
Salesforce, and did a lot of data analytics, data engineering, ended up spending a little bit more
time on the operation. So my role grew into
really understanding how all the data in the data stack can be used to drive go-to-market motions,
automation, and scalability. And then after that, I kind of felt like I had outgrown my role there
and decided to take a risk and go to a smaller company with Alex. So we ended up going over to
Mattermost and really starting from there,
just understanding how do we start a data infrastructure from scratch, basically,
using some open source technology, some new tools we had never used before. And then also,
how do you help an organization adopt a data-driven culture and really embed that in
their day-to-day? So that's where we're at at this point. And then,
you know, once Alex is done, we'll talk a little bit about how big data came about.
Just as far as my origin story in this whole thing, I come from, you know, computer science,
full stack developer kind of background. It was really at Heroku that I got into all the data engineering things and basically modernized their data stack,
which at the time when we were there early,
and this was like six years ago, five years ago,
they were using like bash scripts to run stuff.
And like, it was just a total nightmare.
They were using Postgres as their data warehouse.
So we migrated all that, DBT, Airflow, Redshift.
And that's where Rachel and I first met at Kuroku.
And, you know, she was doing the analytics and operations stuff.
And it ended up being like, at one point, just basically like the two of us were tackling
all this stuff by ourselves.
And we ended
up you know building teams there and everything like that so then when we moved to matter most
like she said it was like they really had no infrastructure at all so we built that all up
from scratch and then as far as the evolution of matter most it's like we built all this stuff at
matter most we could see and show the value of all the,
you know, like the architecture that we were using and the technologies we were using and
how we were using it. And then what we started to notice was, you know, there's all these other
companies that have the same problems, right? They all have a bunch of data. They don't know
what to do with it or how to get the value out of it.
And so that was really what started to really spark the idea of big time data and us kind of going out on our own and, you know, actually spinning up a consulting company. Because like
what we really want to get to is building this for a bunch of different companies. Like
I want every company to succeed, right? I just want all of them to be able to like harness the
power of their data and make their company the best it can be. Yeah. Just to company to succeed, right? I just want all of them to be able to like harness the power of their data and make their
company the best it can be.
Yeah.
Just to add to that, I feel like one of the things that's really hard is people talk a
lot about data and how to build these state-of-the-art data stacks.
And, you know, for us, it feels very approachable, right?
Because we live in that every single day and just thank goodness our parents were smart and therefore we became smart as well. So it seems very intuitive,
but when we went to Mattermost, I remember thinking, oh gosh, they're going to know we're
frauds. We're really not as great as they keep saying we are sort of thing. And we got in there
and just the smallest things that we would do where we would say, oh yeah, you're just going
to do this and throw this on top of it. And it's straightforward. And they were just thinking,
oh my gosh, you're a godsend. Like, this is amazing.
I never would have done this. And you're kind of thinking, huh, that's weird. That's just something I thought everyone knew. And so as we've continued along and talked a lot about how do you scale your
operations using scripting and, you know, how do you really support self-service analytics and data governance,
we started realizing these are things that are not talked enough about, or there's almost a sense of I'm too embarrassed to ask because it seems like everyone knows what they're doing.
And so from our perspective, it's like, we want everyone to be able to do that. We want to put
documentation out there. We want to have best practices. We want to make sure that people can
do these things because data is so important. And so that's really where my passion has come from,
from this. It's so many easy, small conversations that help people build confidence to make those
risks. And so, you know, that's one of the reasons why we're on this podcast right now is just making
sure people know all you have to do is take one step at a time towards your future goal. And it
really is approachable if you have the right people. Sure. You know, it's really interesting, Rachel,
that you mentioned people being afraid to ask questions because they think that everyone has
it figured out. And I was in consulting before joining RedrSac doing similar things, but more on sort of the MarTech side.
And it was so interesting. There's almost an imposter syndrome type dynamic in many companies
where you just have this sense that we're the only company whose Salesforce is really messed up
and who's having trouble cleaning our data and getting
insights. And the more companies that you talk to, the more you realize literally every company
has these same problems, right? It's pervasive and it's not because people aren't working hard
or they aren't smart, but technology is changing quickly. And when you have a quick growing company,
it's just, it's really hard to align both the organization and the tools and the data and
everything to make it work out correctly, especially if you don't have a playbook. So
that really resonates with me because I saw that all the time. Like we I'm just, you know,
it's almost like I'm embarrassed about the state of our situation. Yeah. And I have two things to
add to that. I think it's one of those things where
you end up finding, this is more of a psychology thing. I feel that people end up talking more
about the parts that they're comfortable with. Right. And so you all of a sudden have companies
that are doing one thing, right. But everything else is kind of crap and they're talking about
that one really great part, but you're comparing it across your entire system. And so all of a
sudden you have this perception that everyone else has everything great when in reality, it's just that one part
that they're talking about. And man, Alex knows he can get on a call and go so in depth with all
these different tools that I'm sitting there just nodding, pretending, you know, letting my imposter
syndrome get to me. But then I realized one of the reasons why Alex and I are such a great partnership
is because we don't need to know everything ourselves.
You know, we obviously have great friends over at Rudderstack.
We have great friends at DBT across the board everywhere we've been.
But that's why it's so great to have a community.
I know a lot about go-to-market motions and using data to drive that.
That's something that Alex isn't as strong about.
So it's just one of those things, you know, don't be too hard on yourself if you're not there yet and be realistic about what's really going on. And man, about the
Salesforce thing, a hundred percent, we worked at Heroku, which was part of Salesforce and we
were struggling to do it. Right. So I definitely, definitely get that one.
Yeah. I mean, this is, this is a little bit tongue in cheek, but I mean, it really is the reality.
But we used to joke, we used to ask people, have you ever seen a Salesforce that wasn't a mess?
Yeah, when you spin them up.
Really?
Oh, that's so good.
That is so good.
Okay, well, we have so much to talk about.
And I know Costas has many questions.
So I'll kick off with the first question on our topic of the data stack journey. So we wanted to have both of you on the show because one,
you bring an interesting perspective of working together at multiple organizations on the data
stack, sort of from two different directions, you know, sort of the data engineering perspective and heavy technical
side from Alex's side, and then the ops sort of go-to-market alignment on your side, Rachel. And
going from Heroku, which is a huge, I mean, part of Salesforce, right? Massive company
to Mattermost, and now having consulted with a variety of organizations, you present a really interesting perspective on the best practices for building a data stack that will scale and how that needs to change over the life of an organization, right?
Because when you're just starting out and you maybe have a two-person company, your needs around the data stack are very, very different than when you get to the
size of a Heroku that's running inside of a massive enterprise like Salesforce. So I'd love to get the
perspective from both of you on what is the data stack journey? Just give us an overview of,
you know, from the perspective of a company just starting out to becoming a large enterprise,
what does the data stack journey look like? How would you define it? Yeah. So the data stack journey to me is like,
how do you build your data infrastructure in a way that can grow with your company as it's growing
and still give you all the value that you need while being efficient with costs and
like operational burden and that kind of thing. Because like you said, if you're a two-person
company, you know, having a bunch of different tools and, you know, a bunch of different
infrastructure that you're having to maintain is just going to waste your time when you should be,
you know, talking to customers or whatever. But on the other hand, it's like once you get to that Roku size,
you can really dig into optimizations that would otherwise,
they're only valuable because you're doing them so over and over and over again.
And so this idea of the data stack journey is like,
how do you make those decisions upfront when you're small that allow you to grow with your
data and your organization and not shoot yourself in the foot where you're having to, you know,
spend a bunch of time doing rework or, you know, your analysts are just fighting data fires and
they can't figure out
why the data is wrong and all that kind of stuff. Yeah. And just to add to that, I think there's a
couple of different variables that come in when talking about that. You know, you're talking about,
are you willing to pay more to have more scalability because of limited bandwidth,
right? And so you're saying, oh, if I have two people and I have one tool that does it all, and maybe it's
a thousand dollars more a month versus, you know, five different tools, if you start to think, okay,
how much time is it going to take to move between us? If there's an error or something needs to be
debugged, how much longer is it going to take? Because you have to look at five different tools.
The other thing that you brought up, Alex, which I think is so important is, you know,
if you think about where you are now and where you're going to be in a year, five years, so on, you have to think about the cost it would take to move from one to the other, right? So right now you might say, oh, it's $1,000 more a month. It's not worth it. But one year from now, if to re-engineer it, it's going to take an entire engineer's month. Is that more than $12,000? So you start to have,
like, in my mind, from the operations perspective, I start to think about the dollar amount and the
cost of an engineer's time and honestly, the morale, right? You want to keep these people
around. We all know the worst thing in the industry is losing someone when a company is so
small and they have all the knowledge that's, you know, that's a huge deal breaker.
You want to be using the tools that engineers want to be using and analysts. So you keep them
around and retain them because the loss of an engineer or an analyst is unimaginable. And I
would say, you know, close to $200,000, $500,000, depending on where you're at.
Absolutely. What do you think? And I'll ask one more question here and then the costs jump in.
And this may sound like a kind of an obvious question because we just see this so often,
right? With a growing company and you have good intentions and then, you know, you just don't
seem to have the time or the resources to do things the right way from the beginning.
Why do you think that happens? What are the main,
you know, maybe top two or three things that produce the downstream problems that companies
face if they aren't really careful about nurturing their data stack early on?
I mean, I think the first thing is going to be that as an organization, you're not going to have that muscle of, hey, when we implement this new feature in the product, we need to, you know, like track its usage in a decent way.
Right.
So that we can like have the insights, like, are they doing this thing right?
And, you know, so then what's going to end up happening is you're going to kind of end up beingmost, where it's like they had a data warehouse and they were running some queries on it.
But the data quality was kind of low.
Things were all one-offs.
And it just wasn't scalable at all.
And so then you have to really go through that whole migration process.
It's not only a technology change
it becomes like a people change and organizational change and as a growing company you're already
dealing with so many challenges from like that growth just in general that having to deal with
like data growth and you know all that stuff just adds to it right so it's better to like if you can
and it doesn't even have to be like crazy you know amounts of time that you're spending on
all this stuff right like you can just do a few things and i think you know the more i've been
thinking about it it's like can we as big time data like like provide some tools, and this is kind of going back to the imposter
syndrome thing is like, can we provide some guidance and tools and guides or something that
can give people the confidence that, hey, I'm not like totally screwing this up, even though I don't
know everything about it. Like, I'm not an expert, but I know I need to do something. Right.
Yeah. And the other thing that I don't talk about,
obviously, Alex, you and I always think about these questions in different perspectives,
but I think that's great is from my perspective, I think the biggest impact is you have all these
brilliant people that need to be focusing on strategy and making sure that that business
is successful. You're going to hit that pivotal. And are you going to be ready to blast off? Or are you just going to be a dud? And if you have the leaders of your
organization spending their time questioning numbers, instead of focusing on strategy,
that's a big deal. You start to have a VP of marketing presenting numbers about, you know,
your pipeline, and there's a disagreement, all of a sudden you're spending the full day
trying to get ready for a board meeting, questioning how many MQLs you have instead of saying,
how are we going to present this? What are our next steps? What are we doing for the next year?
How is our product going to change as our customer evolves? Those are more important
questions than what's the definition of an MQL? Is our data from Salesforce coming in accurately? Do we have the right, you know,
triggers in our product to promote, you know, growth, all these different things, right? So
I think it's so important that you have data in the right place and the definitions and it's
trusted or else you end up spending these unaccounted for hours trying to figure those
things out. And no one tracks that anywhere. It's just something that comes as part of the job. And I think as soon as you realize that you wouldn't have to have as many
of those conversations, if you had invested a little bit upstream, you're going to regret not
having done it already. Absolutely. The board meeting scramble, that is probably a good topic.
That'd be great to collect war stories because you said that. And I think myself and
probably a lot of our listeners know exactly what you're talking about. Okay. I have one thing to
add to that. I'll just say, you know, this goes into the whole, I'll give a little shout out to
Michael Schiff. He's been a mentor and my boss and Alex's boss for a while, you know, you pay
now or you pay later is one thing he
always said. The other thing is you're training them or they're training you. And I will tell
you that at Mattermost, we've worked very closely with Emil, who's our VP of finance there. And we
have trained him how to go and get his own numbers, how to trust the data for all of the things that
he needs to present to the board. And I'll say the last board meeting, there was only one question that he had that he
reached out to me for getting for the board meeting and the ability for him to self-serve
when initially it was, you know, four to six hour calls trying to get him his numbers.
It's just been amazing.
It just kind of shows as you train them and as your data stack evolves, people are able
to trust the data and feel more confident going and getting it
themselves. Love it. That is really cool. All right, Costas, I've been monopolizing
and I can keep going, but I'm not going to because I know you have a ton of questions.
Yeah, Eric, I think this is a common pattern lately on our shows, but it's fine. I mean,
you're asking very good questions anyway, so it's good.
For me, it's a very special episode today
because Alex was my first ever guest
in a podcast episode,
so I'm super happy and excited to have him back
and also having Rachel together
because they are both working on the data stack,
but they see it from a different perspective.
So I think it's a great opportunity
to have both perspectives at the same time. So let me start with a question about the data stack.
I mean, you've been working with data for quite a while. So and you have seen like the changes
that have happened in the technology. So how the data stack has matured since your time at
Heroku or even earlier, if you have like experience from before that.
And what are the tools that really excite you that exist today and didn't exist in the past?
Yeah. Yeah. It's crazy how much things have changed and it feels like it hasn't been that long.
And yeah, I mean, you know, going back to Heroku, like the early days, I mean, you're
talking, I already mentioned the bash scripts and stuff like that, but you know, the SQL
that we were writing, I mean, literally, and I'm not kidding, like thousand line SQL
files were not uncommon.
I don't know why you're complaining.
I really enjoyed debugging those scripts.
Yeah.
We had just amazing data quality too.
And so one of those tools
that I just preach the gospel of
everywhere I go is dbt,
which we started using at Heroku
three years ago.
Was it three or four?
Anyway, something like that.
And that was really like,
it was funny because it was actually
a data engineer on my team just sent me this link he was like hey i saw this thing on hacker news
and then we started looking at it was like oh my gosh we have to use this what are we doing
and that really like so then you go from here's this thousand line sql file that i can't make
heads or tails of i mean eventually i could but could, but you know, you have to,
it's like every time you have to debug the thing,
it takes you four hours just to remember all the nooks and crannies of the
stupid thing. You go from that to like, Oh, I just have, you know,
a 50 line dbt model and then like a couple of other ones and like
everything just works. It's amazing. So that's one.
And I think the other thing
that has been really interesting is just the availability of tools that make dealing with
large amounts of data easy for like not like you don't have to be a PhD person to be able to deal with big data anymore and I think
there's just been so much done there that it really helps I mean anybody right like anybody
can deal with terabytes of data now whereas before is like oh my gosh I have terabytes of data how
like it's going to take me hours and hours to query into this stuff. And I don't know what to do. So I'll let Rachel add to that in her way.
Yeah.
I mean, one thing that you didn't call out is the biggest thing that we ended up changing
right away when you took over our data stack at Heroku was adding airflow.
We used to have everything basically in one massive daily or hourly job.
And it would be like 50 in the hourly job and 120 in the daily job.
They would laugh themselves.
And it was utter chaos.
One thing fails, you have to kick it off by itself and have to track it to make sure it
finished.
It was terrible.
And so just getting airflow and all of that going was a huge game changer for us.
And then, like Alex mentioned,
we started talking about DVT. And from my perspective, I don't think I conceptually
understood what it was doing at first. I viewed it as cool. It's a different way to organize your
code, yada, yada, yada, huge investment. This just feels like a tool that an engineer wants to use
because they're bored of their day-to-day job and they want to have a new tool to mess around with. I was so wrong. I think I was
very busy at the time and I didn't really take the time to really understand what it was. And when we
ended up moving over to Mattermost, because at that time I had a team four was very heads down
more on the analytics side. When we moved over to Mattermost and we started it from scratch and we had basically no code and we were starting it from scratch,
seeing how great it was to build these dependencies on top of each other and have it be so clean,
where if you just need to change one small piece of logic at the granular level,
it scales and moves throughout your entire basically data model. And so being able to see
that, I'm so glad that they still invested in it at Heroku, even though I wasn't a huge proponent
of it, was a good idea. And I think it's one of those things where once again, you pay now or you
pay later, you're going to have technical debt. And I think DVT really helps you manage that
in a way. It really limits how complex your technical debt will get. So big fan of DVT as
well. And then, you know, I'm just a huge Looker fan girl. I can't help it. That's always been
something I've been very lucky since we went to Heroku. Heroku had it from day one. When I did
my interview, I did stuff in Looker. I don't think I ever want
to live in a world that doesn't have Looker available for it. I just think in terms of
how they've turned a visualization tool more into a data governance tool as well that allows
self-service scalability has been a game changer in terms of making sure analysts can focus on the
important things and not become report monkeys, right? That's everyone's biggest fear being an analyst
is do they think I'm a report monkey or do I really get to drive change in the business?
And so I think Looker has enabled analysts to focus more on driving change, diving into the
data because you now do have people like VP of finance feeling comfortable going in Looker
and pulling data themselves with confidence. Yeah, that's a great point, Rachel.
I think, and I've said that in the past, that the most successful tools at the end, they
don't just add value or simplify processes.
They actually promote organizational change.
And that's a very good point about Looker.
And I'm happy to hear that from you.
So from what I understand, I mean, some major changes that happened in this space is
things like orchestration, talked about that, modeling, composability in SQL, something that
was missing from the language for a long time. So my feeling is that there are many of them
standard, let's say DevOps or software engineering techniques that software engineers are using like
for quite a long
time that are entering this space and that's like a sign of maturing and it's the way of okay let's
adopt methodologies techniques and technologies that they have proved to be to they have proved
to add a lot in the productivity in the way we work what else do you think that is going to be
introduced i mean there are things like cic, there are things like CICD,
there are things like testing, especially I think testing. It's still, I mean, DBD is doing a lot of
things around that, but I think it's still an immature side of the data stack. So what do you
think is going to be the next big thing, let's say, that is going to be introduced in the data
stack and is going to have a lot of impact in the everyday work of someone
who is managing and building these data stacks?
From my side, I think there's two things.
And one you touched on, which is testing,
which really is more about data quality, right?
And, you know, you see things like great expectations,
you know, DBT has some testing stuff built in
and they even just came out
with a
great expectations package that you can use. And I really think that's going to be, you know,
like you said, like bringing actual like software engineering techniques, the ICB, you want to have
like unit tests, that kind of thing. It's been really hard to do that in your data warehouse. And so then you end up in that situation
where your VP of finance comes to you and is like,
hey, this number doesn't make sense that I'm seeing.
What's the deal here?
And then you're having to go trawl through all your data
trying to figure out what the issue is, right?
So I think that's going to be one thing
that's going to really take off.
And it should take off like
that should be part of your you know that should just be the way that data warehouse and data
engineering is done it's like okay i've developed my model but now i need to test it and make sure
that it's the way it is and then the second thing that i think is interesting and I want to like learn more about it and get more into it
is getting to more like real-time analytics not only analytics but also like doing stuff with
all of the data that you have in your data warehouse that triggers in real time something
to happen whether it be like marketing or something in the product things like that i
think could really be interesting like you know you look at materialize where they can basically
like ingest all this data from a kafka stream and you can write a sql statement on top of it that
you know updates basically in real time it's like what if instead of your dbt models you know you're
having to run like them incrementally every hour or whatever it happens to be like, what if they just always were up to date?
What if that just like automatically happened?
I think that would be really cool.
And that's something that like, I'm keeping an eye on and trying to learn more about and see, see what value that we can get out of technologies like that.
Because I, I think that's where people
are going to start really looking for stuff. Yeah, a few things that come to mind for me,
I think one of the things when I think about data, and it's been great that there's a huge
growth of companies that are really focused on the data engineer, right? I feel like for a while,
it was kind of, well, they just do what we tell them to do, make the data engineer, right? I feel like for a while it was kind of,
well, they just do what we tell them to do, make the data happen. We're not really going to invest
in tools for them and whatnot. And now I think there's huge importance on it, which has been
great. So that's why you see some of these companies coming out of nowhere with a bunch
of stuff to support them and really making sure they have what they need. But with more tools
comes more issues around integrations and timing. And you start to think
about, okay, well, I'm piping in my data with a stitch or a five tram, then I have to run my dbt
jobs. And then I have to send that data somewhere else. And if you don't have really great scheduling
or orchestration, you're all of a sudden sending stale data out because your dbt job took too long
to run, and it's not timed up perfectly and so you
start dealing with like how do you make sure that everything's kind of talking to each other so that
it is going based on dependencies and all that and then the other thing is tool consolidation
because i do start to worry about how much of the data stack is going to be very piecemeal. And if something goes sideways, you know,
debugging that many tools can be very difficult. And so are you going to start seeing companies have more integrations with each other and, you know, talk to each other? You think about the
Salesforce idea where you have these different packages and installations and whatnot. Are you
going to see, you know, connections between a stitch and a dbt or a rudder stack and a dbt? And then
are you going to see dbt have connections to another tool that then is going to write to
Salesforce, all these different things, it feels like there aren't as many really strong integrations
there yet. So while it might not be a tool itself or a product, it's how do you make sure that all
these dispersed tools are talking to each other and have really great alignment?
Because if there's any gaps in that system, the data engineer and analytics and honestly, business as a whole will suffer.
These are some great points.
And actually, it's something that I was thinking about lately.
I mean, I totally agree with you, Rachel. I think that the way that it works right now with all the different tools and just adding more and more
tools, like for example, I think it's a very common pattern to see like companies using both
Stitch data, for example, and also Fivetran just because there are like different needs
for integration or they are trying like to control their costs and all these things. But it's good to have
many options out there. But the downside of this is that you end up having a stack that is
more fragile, right? Much more difficult to figure out where the problem is. And especially when we
are talking about tools that are cloud-based, right? It makes the whole process and trying to debug much more time consuming
and much harder, in my opinion.
But what I find more interesting,
and I would like to hear both of your opinion,
is here we are talking about data stacks
where the core of the data stack
is the data warehouse, right?
It acts as like the central repository
of all the data that we have.
And this is, of course, like a great architecture and it works really well.
That's why companies are adopting this.
But as we add more and more tools that they have to interact with it, and especially when
we are talking about real time, right?
The utilization of the data warehouse is probably going up, right? And one of the selling
points of data warehouses like Snowflake or BigQuery is that you can control your costs
because you pay as you go, right? Like you have to execute a query and then you're going to pay.
But we are reaching a point where I don't see or I don't feel that like the data warehouse is going
to be slipping a lot, to be honest.
So at the end, we might end up in a situation
where the data warehouse is just working 24-7.
And optimizing the costs around that,
from my experience, at least, is not the easiest thing to do.
So two questions here.
I mean, first of all, I'd like to hear your opinion on that
if you agree with this.
But how do you think the data stack is going to evolve to address these things,
especially with the position of the data warehouse?
And how do you think that the data warehouses like Snowflake or BigQuery
or even Redshift can address and adapt to these new challenges?
Because, okay, traditionally, data warehouse,
it's not something that should provide responses in real time, right?
It's not something that it should be like working 24-7 naturally.
And that's how these systems were designed.
But the industry has different requirements right now.
So what do you think about this?
Yeah, so I think with Mattermost, basically, we do have an extra small virtual warehouse running basically 24-7, like you were saying.
And that's just kind of been like, well, we just kind of have to have that going, you know, all the time.
And that's just the way it is.
I think the, you know, we have spent a lot of time actually optimizing our snowflake costs at Mattermost. And, you know, it's anything from just like warehouse optimization,
as far as like what jobs are you running against which warehouse and,
you know, how often you run them and all that kind of stuff to even
optimizing queries, right?
Because if you can take a query runtime from, you know,
10 minutes on an extra large warehouse to five minutes on an extra small,
you know, at least in snowflake land, that's going to be quite a cost savings.
So, you know, I think the, going back to kind of the materialized idea,
I think is why I get a little bit excited about that too, because it's like,
can you use materialize more as like your real-time data store that, you know, gives you that real-time
access. And then, you know, behind the scenes, you just are doing your regular things with
Snowflake and all that. Real quick on that, Alex, like just to go back to what you're saying,
because, you know, I think Materialize is a great option, you know, as that space continues to
evolve. But for right now, right, you start to think about and tell me if I'm completely wrong, because once again, this is why
I feel very lucky to have you. But we basically have that extra small warehouse running all the
time, which is then dumping data, obviously, like modeling this data, bringing it in. And then we
have Looker going against different, more powerful warehouses that in the moment, if someone is querying something using Looker or whatnot, you know, that we're paying more.
But that one's not up all the time because you're taking care of all of your piping data in and modeling in a different warehouse, which is up a lot of the time, but they're smaller.
And then we have a bigger warehouse that maybe is running more complex stuff, but that's only running as a user needs to access it.
Right. Yeah, yeah, exactly.
And I mean, that's where you sort of have to I think.
People who are new to Snowflake really need to understand how that pricing model works, because you can kind of rack up a lot of cost if you aren't a little bit careful with it.
The other thing I think is like, you can look into, you know, like Snowflake has their Snowpipe,
which is a lot less money as far as getting the data into the data warehouse. And then I know
BigQuery has ways for you to stream data into the warehouse as well, you know, for less cost. So,
you know, I think at the present moment like rachel's saying it's like
that's sort of where we're at with all this stuff and you just sort of have to play the game
in that way and i you know as far as moving forward i think we'll see more stuff like
like like your snow pipes and things like that where it's like okay you can optimize your cost
for like sort of a subset of the use cases that you needed for that's great i want to i have two
more questions and then i'll give the microphone back to eric i know he also has like a lot more
questions to ask so you are working on the data stack, but on a different part of it, right? And obviously,
you have been very successfully working on that, like all these years. Can you describe a little
bit more how your roles differ and how you interact with each other? Yeah, yeah, for sure.
So yeah, I mean, basically, the way I've kind of been seeing it is like, I do whatever Rachel needs me to do to like, do the go to market things and operations and analytics things to make it work. I'm kind of like the plumber who I also considered myself, I was building the, I was building the Legos, and then she would put them together is kind of the way I would
think about it.
And that's why, like, I mean, honestly, that's kind of why we, you know, started this whole
thing is because we, like, it just, like, we just work well together and it really,
there's no gaps, right?
Like, if we were both just hardcore data engineer people, it's like, okay, yeah, there's like
some cool stuff we could do. But together, we're able to like really have a huge
impact on organizations. And I mean, you can see just based on this conversation, like,
we definitely think about things in a different way, but it ends up like, fully forming the idea.
And, you know, and the solution. I think the thing that's been really great.
And one of the reasons why I even feel comfortable going into business with Alex is we just have such
a great level of trust with each other. I think what ends up happening is I take a lot of time
to understand the needs of the business and really think through where do we need to go?
What are the things that the business doesn't even know they need from the data yet? And how do we make that happen?
And so what ends up happening is I go to Alex and I say, Oh, what about this? What about that?
Brainstorming these moonshot ideas. And Alex is absolutely brilliant in my mind. I mean,
don't, don't tell him cause you know, I don't want his ego to get too big, but I think what's
really cool is we take these different tools and anything that doesn't exist, he's able to bring a custom aspect to it.
So from my perspective, I do everything from basically what I would consider analytics
engineering all the way through process flows and sales force and helping marketing define
how they want to do their pipeline and sales forecasting and all these different things, right? So we really do meet at that overlap of right where data engineering
hands it off to analytics engineering. And I've been literally in the past two weeks, I think I
have really honed in on being obsessed with the concept of analytics engineer. Because I think in
the past, either I was oblivious to it or it really is that new.
I don't think that that concept is really there. I used to call it a hybrid analyst engineer.
And I think that's those people that have the ability to map business logic to raw data and
model it and things like DDT is where there's going to be a lot of investment right we have these very strong
individuals and those are the core people that enable self-service analytics and so from that
point on is where i focus and alex really does everything before but the thing that's super
important that alex does is he knows how the data needs to be ingested and kind of initially modeled for that analytics experience.
And so you end up having, if you don't have a tight interaction and relationship between data
engineering and analytics, you have people just dropping data and not giving a crap about what
it looks like, honestly, the quality of it or how it's going to be used into the data warehouse.
And then it's so inefficient for analysts to try to query it and model it. And, you know,
there goes your snowflake cost. If all of a sudden, you know, instead of writing a few different
scripts before you dump it in the warehouse, you're just dumping it in there. And then next
thing you know, you're spending a thousand dollars more on snowflake for the analyst to try to model
it and create something of it. So I think in general, that overlap there and like empathy and understanding about what we
want to do with the data has really allowed us to grow in a scalable way. Yeah, that's great.
Some great points here. So based on your experience with big time data, right? What
are some common issues that you see with from your customers and also like your your experience with big time data, what are some common issues that you see from
your customers and also your prior experience with that in the communication between data
engineering and data analysts?
And do you have some advice to give around that?
And if you would also like, can you tell us also as big time data, how you help with that?
Because solving the technology problem is one thing,
but the technology can do nothing
if the organization is not the right
around the technology, right?
So what are your thoughts around this?
And how do you approach it as big time data?
Yeah, so the clients that we've had,
it's been interesting because most everybody is like,
hey, I know we need to have a data warehouse.
So let's just use BigQuery or Redshift or whatever.
And they'll have some data in there.
But then they're like, OK, we have a data warehouse.
Great.
But it's like, well, hold on a second there.
You're not really getting any value from this data, really.
Or you're just running one-off queries on top of it.
So it ends up becoming this thing where we come in and we're like, okay, cool.
You have some data in there.
It's like, what are you doing with it?
And they're like, well, we're trying to figure that out.
And that's what we've really been helping with is like,
hey, let's get in there.
Let's understand your data.
Let's model it.
And then let's build like a scalable analytics infrastructure
on top of it.
And then, you know, and then you can get into the more,
even more fancy stuff as far as like, you know, marketing automation and all that kind of thing. So that's
what, I mean, that's pretty much what we've seen a lot. And, you know, like you said, it's like
an organizational change as well, because one thing that we're really sensitive to is just the
trust in the data that people need to have to to use the data that you're
producing because if you have you know it's like okay great you have all these fancy graphs and
stuff but do people actually use that data are people actually trusting that data and so that's
something we're really sensitive about is making sure that you know if we come into an organization
like we're not trying to just like build something,
leave, and then like nobody really uses it. It's like, we really want it to be used long-term
and build sort of that muscle within the organization on being that data-driven.
And, you know, like Rachel was talking about earlier with Anil at Matterhouse, it's like,
you know, they're trusting all this data, they're using it for board meetings, and all that kind of stuff.
Yeah, I think the one last thing I'd add to that is, I think in a lot of these companies,
you know, obviously, it makes sense when you have a limited number of individuals, you have a lot of people focused on the product and engineering, and then you have kind of this
slim, quickly moving go to market area, right? You've got a salesperson, a marketing person.
They might have other roles in the company as well, right?
Especially when you're really small.
And so they don't necessarily have the ability
to hire a data engineer.
And what ends up happening is no one's taking a step back
and saying, like, what does this data mean?
How should it be used?
You have products that's saying,
I know I should be creating a lot of data. I know that someone's going to want to use it.
I don't know what they care about. I'm just going to create a ton of data and send it into a
warehouse. And then you have people on the other side that maybe don't have the technical skills
saying, I don't know what to do with this data. I don't know what it means. And so there's this
awkward gap. And so I think what ends up happening is that gap will continue to grow. And it makes it very hard,
once again, as we talked about, to make this data accessible to the modern person or like the common
person at a company. And so if you don't add that layer that we've talked about, that the analytics
engineer has where it's saying, I can take a step back and say, this is the raw data. This is how it maps to what a customer is
doing in our product. And this is what you should care about from a business perspective. Then that
just gap continues to grow. And so I think that's really what we've seen is people trying to make
sense of the data, but really not knowing where to start. And so while it's not always the
first thing that you hire at a company, I do think it's something that you should start moving when
you hire an analytics engineer up a little bit further, you know, like get that person in there,
build that business logic sooner rather than later, or else you might suffer the consequences.
That's great. I have many more questions, especially around your experience with big time data, but I think we will need at least another
one. So, which is good. I love to have you back, but now I have to give the stage to Eric because
he also has questions and I think Eric, I really monopolized the conversation here.
No, it's great. And unfortunately we're coming up on time here. So I will have, I'll just throw one more question out there to wrap it up. And of course, we would
love to have you back on the show, but so many great insights and we've talked some about tools,
but just give us a breakdown and maybe we can divide into sort of three stages of companies, but just thinking about our
listeners who are probably at all different stages of companies, but give us a quick breakdown
of what is the big time data stack of recommendation for maybe sort of seed stage or like, you
know, seed stage series, a startup to sort of a mature, like maybe midsize company, you know, seed stage series A startup to sort of a mature, like maybe mid-sized company,
you know, maybe a hundred plus employees dealing with some serious data, multiple thousands of
customers to, I'm a gigantic enterprise, you know, like Heroku that's running, you know,
maybe inside of Salesforce. What's the ideal stack and maybe we can approach
it from the standpoint of what tools are the same across all three and then what tools are different,
you know, for each stage. Yeah. So I think across all stages, of course, it's going to be dbt
exact, you know, that's why it's such a good tool to invest in early because you can,
like, it's going to pay dividends for years, you know, working with it. And it's also,
if you want to use dbt cloud, it's super cheap. So it's not like you're, you know,
paying through the nose for it. I think that's one thing. I think, you know, I'm not as dogmatic
about which data warehouse you pick.
I know Rachel would have a different answer.
And in some cases, like, depending on your size, if you're really small, like, I don't even know if you need a smoke, like, for a BigQuery.
It's like, if your data is small enough, and by small enough, I mean, maybe a terabyte total, which is actually, you know, a pretty decent amount of data.
You know, you could just run a Postgres database, like, who cares? And then obviously, once you do get to that sort of
growth stage, and you're a little bigger, and you can pay the money, you know, go with, I would say
Snowflake would be my 1A, and then BigQuery could be my 1B, if you're, you know, if you're all up in the GCP world.
And, you know, I think then what you're going to need is,
you know, the ETL tool, just, you know, a Stitch or a Fivetran or whatever.
Stitch is pretty cheap too, so you could get away with that that way.
And then you would need a, like,
you need to get product data into your data warehouse. So, you segment a rudder stack a tool like that and then and then you're going to need at some point your reverse
etl which i would say is more like a growth stage tool and there's so many tools out there for
reverse etl that richard and i are still trying to like figure out which one that we like the best, but there's
so many players in that space at the moment.
Yeah, I was just going to say, I feel like one of the things is like across the board,
kind of like you said, from my perspective, it would be, you know, series A seed round,
depending on your type of business, you probably don't need a snowflake
or BigQuery, probably just Postgres, Redshift, something like that. But then you start talking
about the tools that are across all of them, definitely dbt. The other one, I mean, from my
opinion, I know this is the podcast, but rudder stack, the reason I would pick rudder stack for
event streaming is really because that's going to scale with you. So we talk about how much energy or effort is going to take for you to move from one product
to the next as you grow. With Redderstack, I definitely just feel like it will grow with you
as you scale price-wise. You're not going to be put in a corner as you start sending more events.
So I do feel strongly about that one as well. And then the other thing is,
I don't know how much you really need a reverse ETL when you're that small, because you basically
have one salesperson that's manually entering leads and doing that stuff, right? So at that
point, I think very early on, it's maybe not necessary. But then as soon as you start having
two to three different people, you have a third-party tool like HubSpot or Salesforce, and you're really wanting to make sure that there's enriched data based off of product usage
in there. That's when you should start really investing in it. You got people like Census,
Polyconic, I know Redderstack's coming out with their new stuff. I think overall, that's a new
space that we're going to see a lot of growth in, in terms of how do you make this data accessible
in the places where people need it most, which is sales and marketing and all of those things.
I'm trying to think about if there's other stuff that's really missing there. I guess the last
thing is data visualization. Looker's not the cheapest product. I think it really helps you
later on dealing with data governance, but when you're really small, you could, you could probably let dbt handle a lot of that. So I'd say when you hit series B, I would start thinking looker before
that you could probably deal with, you know, Metabase, I think is open source. They also
have a cloud version. What are the other one, Alex, that you think from a visualization standpoint?
Yeah, I mean, I guess there's mode.
Yeah, there's mode.
The thing I don't like about mode is just like
you're having to put so much SQL
into it, which again, to your point
about dbt, it's like
if you can basically make your
mode queries really, really simple
and then have all the complexity in dbt,
then I think that allows you to scale.
Plus then, if you do switch to Looker, you're already kind of like, you already have all these dbt models that
are, you know, being used, and you can basically just, it makes your migration process a lot easier.
Yeah, and when we say data governance, I think the biggest thing that we're talking about from
a visualization standpoint is there are tools that you write one-off SQL for every
single visualization you want to create. And what ends up happening is like we mentioned technical
debt, because if a single piece of business data changes and you have to go and update your
visualizations, are you going to want to go and update a thousand visualizations to add that one
piece of logic because you decided to write custom SQL for every single thing versus with Looker, you have your own
code behind the scenes, which is called LookML, where you define all your business logic, and then
it just flows into the visualization. So it ends up being much more scalable. And that's what,
you know, we're talking about in terms of data governance, where it's so much easier to scale and you can really trust because all of the logic is owned behind the scenes in a GitHub repository.
And you make one minor change, it has to be PR approved, analytics sounds it,
you can really trust that data as well. Yeah, that's great. That's some great points. And acquired, right? And my feeling
is that as we are entering like a new, let's say, innovation cycling in the data space and the way
we interact with the data or the requirements that we have around the data is going to change.
I think we will start seeing new BI or visualization tools that are going to address that.
And that's something that I'm really looking forward
to see what's going to happen in this space
in the next couple of years.
And the other thing that I would like to add
based on what you said,
just to summarize and also add my feeling about that.
I mean, right now we are in a period in time
where there's like crazy hype
around anything that has to do with data.
There are literally like products coming out every day in every possible function around data from
governance, pipelines. There's also like a big part that has to do with ML and AI, which we
haven't touched and it's still quite immature, but there are even new categories like that are
formed right in there with things like feature stores, for example. So there are way too many things happening. And I
think for someone who's trying to build a new stack, it's really easy to get lost in all these
details, make the wrong choices, have overconfident in what you can do with your data. I mean,
I was in this position, right? Like I had five customers
and I was trying to do data-driven product development,
which, okay, doesn't make sense.
So that's why I think that it's a great opportunity.
And I would advise a lot,
like all these companies,
especially on the earliest stage
to get in contact with you at the big time data,
because there are many pitfalls
and a lot of advice that you can give
to navigate this space
and help them get value out of the data faster
and reduce, of course, their costs
because when it comes to data products,
mistakes cost a lot.
So Rachel and Alex,
thank you so much for being with us today.
Pretty sure we will have another show for sure.
I mean, there are many, many more things
that we have to chat about,
more business oriented things,
but also like more technical things.
I think that one hour is just not enough.
I mean, you both have like so much experience
in this space that there's so much value
that we can give to our audience.
So I'm looking forward to have another show with you
in a couple of months.
Yeah, absolutely.
We appreciate you having us on.
And yeah, I mean, like you said,
like there's so much going on in the space.
It's like exciting.
And I think like Rachel and I,
I don't think we realized it
when we started Big Time Data,
but like how much fun we're having
just like being a part of sort of this community
as it grows and just learning all that stuff.
I mean, that's why we started Big Time Data because we were having these conversations
and it just, I remember thinking, oh my gosh, the conversations I have, you know,
two or three times a week are the highlight of my week. I love talking about this stuff.
And so it was just kind of surprising how much we knew and how much fun we were having. It
was like, what are we doing? Like, let's just make this our life. So it's been very exciting. And
honestly, it's a joy to come and be able to have these conversations. You know, we have these
conversations, Kostas, off of the podcast all the time with you. So it's been very fun to just be
able to dive into this. The last thing I would add is, because as you talked about, there's so
many tools that are coming out, right? And I think one of the things that Alex and I are really going
to try to hone in on are, what are those core components that you absolutely need? And then
what are those fun little add-ons to your data stack that depending on what you're trying to do
would help you, right? And so, you know, I think that's something that we could talk about in a
future podcast. It's like, what are the core pieces and what are some different add-ons that you should
start thinking through depending on what you want to do?
And if you have someone that wants to do AI and all these different things, it's just
really fun to think about it, but there's so many tools out there.
It's really hard to know where to get started.
Yeah, yeah, absolutely.
I think that's an excellent idea actually for content in general, but also for another episode to have together how we can compile this landscape in a way that can
be easily digested by our audience and also give them some kind of, let's say, map to navigate
this and make the right choices of the right tools depending on their needs and the market
they are in and their use cases. So I think we should absolutely do that.
And I have to say that I'm really happy to hear
that you're having fun doing all this
because I think that's the best that can happen in life, right?
Having fun while delivering a lot of value
to many people and companies.
So that's great, guys.
Well, thank you again for having us on here.
Hopefully we'll be back soon.
Absolutely.
Thank you so much.
Thank you. As always, a fascinating conversation with Alex and now Rachel. I think the big takeaway
that I had was really just reinforcement of an idea that we've heard before on the show.
And that is that the tooling is one thing, And it sounds like it's just gotten way easier to
build a scalable stack, but the people running the stack really make the difference. And it's
their commitment to shepherding the data and shepherding the tools in a way that doesn't
create future problems for the organization, which just aligns with sort of what we want to
learn about on the show, right? The people who are behind the tools. I think this was like a very
unique show exactly because we had the opportunity to I think this was like a very unique show
exactly because we had the opportunity
to have two people
that have a very symbiotic relationship.
We have the data engineering
and the operations from the other side.
And I think it became extremely clear
that the success of any kind of data initiative
inside the company relies greatly
on how these people and these functions
can work together.
And of course, with Rachel and Alex, they work really, really well together. But I think it's something that whoever starts to trying to build a data stack needs to have in their minds
together with the technology. Absolutely. Plus, they're pretty funny. And it's great to have
funny people on the show. All right. Well, thanks again for joining us on the Data Stack Show.
Subscribe on your favorite podcast network to get notified of new shows and we'll catch
you next time.
The Data Stack Show is brought to you by Rudderstack, the complete customer data pipeline
solution.
Learn more at Rudderstack.com. you