The Data Stack Show - 44: Leveraging Data in a Post-Covid World with Ruben Ugarte of Practico Analytics
Episode Date: July 14, 2021Highlights from this week's episode: Ruben's background (2:36)Massive shifts in data caused by COVID (4:47)Big Tech is no longer untouchable (9:54)Accelerations in the BI space (15:17)A focus on peop...le and on trust (23:43)Numbers are filtered by the biases of the people viewing them (28:46)AI trends and adoption (38:06)Using qualitative data for insights, particularly at early stages (40:56)Recommendations for taking stock of who is using the data and assessing what their skills are (50:06)The Data Stack Show is a weekly podcast powered by RudderStack. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Transcript
Discussion (0)
Welcome to the Data Stack Show.
Each week we explore the world of data by talking to the people shaping its future.
You'll learn about new data technology and trends and how data teams and processes are
run at top companies.
The Data Stack Show is brought to you by Rudderstack, the CDP for developers.
You can learn more at rudderstack.com.
Welcome back to the show. Today, we're going to talk with Ruben Ugarte, and he is a data
professional. He's worked in the data space for many years and does all sorts of projects. One
thing that I think is so interesting is that along with sort of doing the technical side of things and helping companies understand how to build their stack out, he also helps
companies and data teams learn how to make decisions with data and sort of operationalize
data across teams within an organization.
So tons to talk about there.
One thing that I think we should ask Ruben about that I'm really interested in is what he's seen as a result of the pandemic.
That's not a subject we've covered extensively on the show, but Ruben's dealt with some companies and industries that both were experiencing unbelievable growth and industries like travel and tourism that face some really, really hard times due to the pandemic.
And there are massive data implications in both areas.
So I'm really excited to chat with Ruben about that.
How about you, Costas?
Yeah, that's going to be super interesting, I think.
One of the things that are very unique with people like him is that as he's consulting
many different companies, he has a much broader, let's say, view of what's going on in the
industry.
So that's something that I would like to discuss with him, like what kind of patterns he sees
out there in terms of what the companies are doing, what are the problems overall in adopting
data-related technologies, and possibly also what are the solutions based on his experience.
So I think this is going to be like a couple of different areas that my questions are
going to be focused on. Great. Well, let's dive in and chat with Ruben. Let's do it.
Ruben, welcome to the show. We are so excited to chat with you about all sorts of different
things data related, especially your new book, The Data Mirage. So thanks for taking the time
to join us on the show. It's a pleasure to be here, Eric.
All right. Well, really enjoyed getting to know you a bit, just chatting before we hit record,
but I'd love for you to just give a background to the audience on who you are and what you do.
Yeah. My background is, of course, in data and decision-making in particular.
And I work with companies of all sizes from startups to public companies and we're typically
trying to figure out how to use data to make better decisions at its core and that may mean
that we'll need to select technology we'll need to implement it we might need to design strategy
around how to use data and of course working with the data itself to come up with insights and answers and
next steps. And this is something I've been doing now for just a little over six years,
and just seeing all those different use cases and things that come up as a company tries to
be more data-driven. Very cool. I love this topic because there is just a lot of, it's really hard to be data driven.
It's kind of like the idea of just getting, if you think about customer 360, you're sort of
moving towards warehouse-based analytics and some of these trends in digital transformation, just doing those things is
really hard. And then you oftentimes see companies look at your appointments like,
okay, we have all the data in the warehouse. And then you say, well, now what do we do?
It's sort of that you accomplish this really, really difficult task technologically. And then
you realize, okay, well, the real work actually is
now just beginning. So really excited to get your perspective on that. But one thing we were chatting
about before he record was the things are now starting to regulations for COVID or lifting
in various regions. And we saw just an unbelievable sort of change in the digital landscape in so many
areas. But you working with so many different companies, tell us what are you seeing on the
ground across companies and across industries that has resulted from COVID and now that we're
coming out of COVID? What types of things are you seeing related to data and what companies are doing? Yeah. So I think there's an even faster push
to get data adopted, to make sure it works, especially for remote teams. That is,
it's accessible by anyone, to wherever they are. It doesn't require you to perhaps go to someone
physical, like a data
analyst. So lots of companies trying to say, hey, how do we build dashboards? How do we build
reports? How do we make sure data doesn't have a lag, right? That we can look at numbers somewhat
real time, perhaps last 24 hours, last day, and not we have to wait now two weeks so we can get
the performance in the last month.
So that's one thing.
The second is, of course, this massive digital transformation of businesses.
Some of them, by force, their industry might be hard hit, like tourism.
And others, they realize that there's this huge potential to go e-commerce, to do more digital products, to really take advantage of digital channels.
And companies, I think, use the pandemic, or at least the best companies use the pandemic as a great opportunity to undertake this.
And the third thing that I think will be perhaps to be seen how it play out is how just communication takes place in a more remote environment. This is now perhaps a big debate at a lot of companies.
Do you go 100% remote? Do you do two days, three days? And then from there, whatever you decide,
then it's a matter of, okay, how do we make sure we share data and information with everyone
to keep everyone in sync and perhaps not overload people with meetings just because they are remote?
So all those different things, I think there's an undertone of data and how it might play out or not play out for companies as this next period of what seems to be really high consumer demand across probably every industry, almost every industry takes place.
So I have a question about that, Ruben.
You mentioned that it's across like every industry.
Do you think that there are like specific industries that are going to be, I mean,
are already like more adopting faster these new trends around like data?
And if yes, why do you think that this is happening?
Is it because like COVID affected them differently or there's some other reasons?
That's a good question.
I mean, the ones that come to mind, of course, is the one we might think about technology
companies, e-commerce companies, for example, that weren't really affected by COVID.
They just sort of ran through it.
I think even what may were low-tech industries are starting to change.
We look at car dealerships for now, right?
And there's no supply of cars in countries like the US and Canada.
So how do they adjust to that approach?
Do they go to a more digital way of selling cars, of taking bookings, wait lists?
Restaurants, for example, I think are really interesting. They all had to implement
booking systems for restaurants. Here in Vancouver, there were a lot of restaurants
that didn't have reservations. You showed up and if there wasn't a line, you waited.
That's what you did. But a lot of these businesses had to implement some kind of reservation system
to make the social distance regulations work.
And those things will probably stay.
And then once they have that, I think they can then lay over things like takeout
and other digital ways of interacting with the restaurant
without having to be physically present.
So those are the ones that come to mind.
I would say those industries that were either really hard hit by regulations
or they were hard hit by success
there was just so much demand for their products as people were at home and they were shopping
online that they had to completely change and the change that we're seeing today is not uh this
happened last month is it's been going on for a year you know a year and a half so it's it's there's
no there's no going back for a lot of these businesses.
And I'm sure you might be seeing similar things where you're coming from.
Yeah, absolutely. And I can't stop thinking while you're saying all that stuff about
some industries that traditionally, let's say, were lagging behind in terms of adopting technology.
Two of them that are coming in my mind is shipping. And the other one
is anything that has to do with supply chains. And I think that they really had during this
crisis with COVID to catch up and do it really fast. And we are talking about some very critical
industries, right? I mean, I think everyone hears about all the issues
that we have with supply chains right now.
And of course, like shipping is part of that.
So yeah, it's very interesting to see how the next couple of years
we'll be more, let's say, in a position to evaluate
the impact that COVID had.
And I think in most cases, it's going to be a positive,
like it's going to be an accelerator, actually.
Do you think that there's also a negative impact in most cases, it's going to be a positive, like it's going to be an accelerator, actually. Do you think that there's also a negative impact in some cases when it comes to the
adoption of technology and anything that has to do with data in particular?
Well, we were coming off 2020, right, where data was playing this big role.
And, you know, I think that the social dilemma came out, right, in Netflix that talks about
the usage of data for marketing, especially around the US presidential campaign.
And now we're seeing this backlash against, for example, big tech.
Big tech used to be untouchable.
And now they're almost on the other side where everyone wants to regulate them no matter what. So I think individuals in particular
became more aware of the data
that's going around them
and especially more of the sensitive data,
such as health data, right?
We now have debates around vaccine passports.
Do you do them? Do you not?
Is it a sort of break in privacy?
And the tricky thing for businesses is that every business knows they
want to, they need to track data. They need to store it. They need to use it, but they're not
the sophistication level needed to protect that data correctly from any threat. It's really hard.
It's not a, it's not really something that I think businesses are completely familiar yet.
I think I was completely familiar yet.
I think I was reading the Wall Street Journal today or yesterday, right, about that major hack that's going on with hundreds of companies.
It's just, it's really hard to protect data.
So businesses of all kinds are being put into that position.
It's no longer just the government or Fortune 500 companies.
And it's going to take some time for companies, I think, really get a good
grasp as to what that looks like and how to protect it properly among their employees, which then
naturally transitions to consumers. And do you trust this company that you're giving your data
to, whether it's just an e-commerce store, right, that you're providing your credit card and your
address and your name, or more involved companies,
perhaps the health companies that you're providing blood tests
and other health markers for personalized diets.
So it's this entire world where we know now what our data is worth
or what it could look like in the wrong hands.
And it's not quite clear how ready we all are to protect that data.
Yeah, absolutely.
I think if something 2021 and 2022 taught me is that actually data technology and infosec
technology, they're probably going like hand by hand.
And if we see progress in one, we will definitely also see like progress and changes happening
to the other.
And I keep saying usually that like this,
like the decade of the 20s is going to be all around data.
But actually, I think I should change that a little bit
and be more like it's going to be about data
and it's also going to be about security.
I think these two are going to be both super important
and there are going to be also, I think, very interesting,
like,
ethical
conversations
and conversations
on a, like,
collective level
that we are going to do,
which,
anyway,
it's going to be very,
very interesting.
But let's try not to make it,
like,
the whole conversation
too philosophical.
Cool.
Let's go back to technology,
Ruben.
And let's talk a little bit
more about
parts of the data stack, let's say, or like
the technology you think that got boosted by this whole adoption of data and digitization
that COVID brought.
Which technologies do you think benefited out of this?
The ones that come to mind is customer systems.
So anything like a CRM crm of course email marketing tools
i think digital advertising got a boost out of this and facebook and google of course
take the big chunk but any kind of digital advertising and bi tools in general might be
the category i think i was listening to a podcast the other day about data box as a business
intelligence tool and how well they were doing and how well they were growing. And I think it's
just a reflection of companies who are looking for easier ways to visualize their ever-growing
volume of data that doesn't stop. It just keeps growing and going, in some cases exponentially,
and trying to visualize it into some kind of usable format is a really big challenge.
So I think all those things got a boost from the pandemic as you couldn't reach customers in person.
And you had to reach them where they were, which were their phones, their emails, their social media,
and how companies can get their message to
the customers in those channels.
Makes sense.
In terms of BI, because you mentioned BI tools, my feeling with BI is that actually before
the pandemic or like the beginning of the pandemic, we were pretty much like at the
close, at the end of like an innovation cycle in this space. We had the acquisition of Care,
we saw the acquisition of Tableau from Salesforce, we had the merge of Sisense with Periscope Data.
Do you see this market as a space that has space for more innovation? Do you feel like there's new stuff that we need there?
And of course, do you think that this whole situation
with COVID and all this obsession around data
is going to accelerate the innovation,
particularly in the BI space?
That's a good question.
And you're right that a lot of the major players
were acquired.
There's always space for smaller companies
and sort of the SME market that they might be, one, switching quickly between tools, but also coming across new tools.
To me, I think that the most interesting element of BI isn't so much the building the dashboards
and reports. I think that is a problem that has been effectively solved, right? There's great ways of building charts.
There's great ways of doing it that is user-friendly.
It doesn't require SQL.
And if you do know SQL or something more advanced,
you can use that.
So that problem seems to be solved
by pretty much every player.
And I think the remaining problem is still
how easy can you integrate data sources into your BI tool, right? You might
have five or 10 advertising channels, your CRM, your email provider, maybe some custom sources,
and how many clicks does it take to sort of bring all that either into a central data warehouse or
to just bring into like a virtual space and then visualize it
and that that seems to be the trend for if you look at like domo right a lot you know a lot of
integrations smaller players like grow.com and of course I think tableau has some connectors
and that will like probably be the future of the of the bi world just more and more integrations
so it's point and click and then anyone can just sort of plug in
their data sources
and you sort of get up and running
with reports and dashboards.
And in some cases,
maybe even just templates, right?
Because if you know what the data schema looks like
when you bring data from Shopify
and you bring data from Facebook,
it's easy to then create
this sort of pre-built templates
that you can just create in a few clicks.
So that to me is the future of the BI world.
Whether that's new players or some of the older players take this on, that will be interesting. I'm not sure about that.
Yeah, that's an interesting point. Do you think that all this data accessibility problem and creating these connectors
and getting access to all the different sources
that the companies have,
do you think that this is like a BI problem
or did you see a different category
that's going to exist out there
that's going to focus mainly on that?
The reason that I ask, of course,
I'm a little bit biased
and I have my personal reasons
because I started the company around this, Blendo.
But I see that like this particular category right now, we see more and more companies
appearing, right?
Like we still have like Fivetran, which has become like a pretty big company.
And we have many open source solutions that they are appearing.
And we keep seeing more and more companies who are like trying to solve the connectivity, the data connectivity and accessibility problem. So what do you think
about this? Do you see a consolidation there? Do you think that this category at the end is
going to merge with the BI category? Or do you see two completely different categories to
keep growing? A couple of points here. I think from a technology perspective, I could see
the separate categories going as we might be talking about different buyers here in a company.
The more data engineer side or just an engineering department in general, when it comes to moving data from point A to point B.
And I work with a lot of clients where I'm usually brought in by the marketing team or a sales team and so on.
And they just tell the engineering team, this is the data we want, and this is where we
want it.
Just move it.
Don't worry about the data schema.
Don't worry about what we're going to do with it.
Just move it from place A to point B.
And that's already the data engineer and movement of data, right?
The five-trend world.
And in the BI tool, typically I find on the non-technical side, we want to build dashboards.
This is what they look like.
We want an executive dashboard to summarize our KPIs.
So I see those worlds being somewhat separate.
There might be some overlap, right?
More companies take that sort of domo approach where they make the connectors into one tool.
But my second point in general that I was going to say here is I think a lot of these
issues are becoming less technical as time goes on and more people related.
That is when I look at technology, just even the past five years and the data technology,
it is significantly much easier today to take data from common data sources, the Salesforce, the Marketos,
Facebook ads, Google ads. If you use a common data stack, it can be really quite straightforward to
plug things in and get data into a warehouse or a BI tool and so on. And I think that trend will
continue. More vendors, more SaaS companies are making data accessibility really easy but what's not
getting easier is what companies do with it so they collect it all what do you do with it right
how do you analyze it how do you turn into insights how do you deal with political issues
I work with a few finance companies and crypto companies And I was working with a client once where we had this great plan.
We were going to have a CDP and this entire data stack,
very modern, very advanced.
And the entire plan was vetoed by the legal department.
And they just said,
you cannot have data in a cloud environment that we don't control.
And the whole plan just fell apart.
So to me,
this is our political issues that can affect how
data flows from one place to the other.
And those, I think will continue to be trickier, especially, especially as
as companies and let's say legal departments or compliance departments
realize what if some of this stuff leaks out, it's going to be a big issue.
Like we're going to have fines, reputation damage.
So the safest thing possible is to not, the data flow freely, really have tight restrictions. And then that
limits how companies use it. And that's a tricky problem. I think that's not something you can solve
with technology as easily. But nonetheless, companies will have to find a way to sort of
get their head around it. Yeah. Yeah. I think that's, I think that's super interesting. And
jumping back just a couple of points, it's interesting. You see players in the space
sort of approaching the same outcome from different angles, right? So you have,
like you said, Domo, which is sort of building the connectors in.
Metabase recently spun out of GitLab.
And they are building some sort of connectors for the ingestion piece so that they're starting to dabble in sort of the adding pipelines and not just sort of being a BI tool that sits on top of the warehouse.
And then you have companies from the other side approaching it, right? So cross-suspension 5TRAIN, and of course, obviously, Redderstack sends data.
And from that regard, you sort of have a lot of times just the raw pipeline piece, but then
close partnerships with companies like DBT, where you're sort of crossing the bridge between just being a delivery pipeline and sort of enabling analytics, but not delivering the last mile of actual dashboards.
So it'll be interesting to see how the dynamics within an organization change depending on what tool they're using and where it comes from, right?
Because if you start from the BI and then sort of move towards the pipeline, as opposed to starting with the pipeline
and moving towards BI, there's very different team dynamics involved in that. You sort of,
the marketing or analytics org is sort of the key leader of the project versus maybe
someone more on the engineering side. So yeah, it'll be really interesting. And I think the
owner of the project internally has a really big impact on what the sort of the
political implications are of what the final outcome is within the organization.
You mentioned something very interesting. I guess from your perspective, what do you think are
perhaps some of the toughest challenges around data pipelines in the future, based on the context
that we're discussing? What are some of those things that are just going to get harder and harder,
despite some of this improvements in technology that we're debating here?
Costas, I'm going to let you handle that because you work on pipeline
projects every day from a product perspective.
Yeah.
Yeah.
That's an excellent question.
I think that as we solve, let's say,
the problem of accessibility when it comes to data,
I think the next big question around data
is going to be, can we trust the data
and how we answer this question?
And that doesn't have a simple answer, actually.
There are answers, actually, on very different levels, in my opinion.
So I see that what is going to be like a great effort from now on
is how we can separate the noise from the signal
in all these huge amounts of data that we can collect today
and very cheaply put them on a data warehouse and ask questions.
So I see a lot of space for innovation when it comes to anything that has to do with data quality, data exploration,
but not in terms of just doing the BI reports that we typically have seen that are reports for business decisions
and the lots around data governance, who access the data, why they access the data, how the
data, what's the lifecycle of the data, how they move from one place to the other and
how they have been transformed and what's the lineage of the data.
I think that we are going to see, and by the way,
these are not like new problems, right?
It's a problem that probably
large enterprises have been dealing
for quite a while.
But I think that the problems
that the large enterprises
were dealing with,
and especially like from space,
heavily regulated spaces,
like banking,
we are going to see it happening
to pretty much every company
and everyone who wants to do business
out there and wants like to be datadriven, as we said at the beginning.
So that's how I see at least the next two or three years of what I expect to see happening
out there.
What do you think, Ruben?
Yeah, I mean, I agree with one of the very first points you made about trust.
I remember the first time I came across a trust issue and I was working with a team
and we went through,
we checked the data, we made some fixes to it, and then we presented new numbers. And the executive
team was like, those numbers make no sense. This data is completely incorrect. And we had kind of
triple, we had triple checked the numbers. So we knew it made sense. Like we had gone sort of
column by column and made sure that the things were adding up. And I realized very soon that it was actually not,
they were not having a technical issue. They were having a trust issue. The data had been incorrect
for so long, in this case, a year plus, that they had very little trust in anything that came from
it. And we had to build the trust back up. And I learned that you sort of lose trust one report at a time, and you have to build trust one report at a time again. And trust, to me, is a fascinating problem
that I talked in the book, because it's, to me, fundamentally psychological. It's something that
you have to work with people, understand where they're coming from, how much data expertise
they have versus they don't have. are the expectations correct with what this number is supposed to be?
One of the most common questions I get is when companies are looking at comparing their paid spending, the conversions that something like Facebook has attributed them versus some external provider, like a mobile attribution provider or even just a web attribution provider.
And they're not the same, right?
They almost never
match. And that can cause a lot of stress and which number is correct, which number should we trust?
And that's, it's a matter of training, of reshifting expectations and getting people
comfortable with the data they have and making sure they're using it in the right context.
Yeah, a hundred percent. I totally agree with you. And I think it's something that you also mentioned a little bit earlier. And I think it's a very good opportunity
to start talking also a little bit more about your book. Because I know that one of the things
that you deal a lot in your book is about people and how an important dimension it is when it comes
to data and how we use data. And of course, that's where also trust comes into play.
And I think that's something that we forget, especially people that are coming from, let's
say, more of an engineering background.
But I think that this is also kind of perceptional, like, let's say, the whole humanity is building
out there, is that numbers are something objective, right? You come up with
the numbers and that's it. Your work ends there. I mean, everything that has to be told is told
through the numbers. But actually, I don't think that this is true. Because you have the numbers,
you might have your visualizations, you might have built whatever you want to build. But at the
end, you need the people there to tell the story of the data right and this story is super important and it's also what is going to build or rebuild
the trust and that's my take as a person that i work in this space for like a bit more than 10
years now what do you think about this and can you tell us a little bit more about the importance
of people? Yeah. We, of course, started talking about the pandemic. And I think it's a fantastic
case study for how numbers get interpreted or misinterpreted. Pretty much all over the world,
we were all seeing COVID case numbers and things like that. But it became very clear that
everyone was interpreting numbers in the same way.
Here in Vancouver, we had protests, anti-lockdown protests, as many countries did.
And it was a clear distinction between people who would see the daily or weekly COVID numbers
and they thought, okay, this is what we should do.
And an entire different group of people saw the same numbers and took a different decision.
And that's the same thing that happens in companies, right?
Any number, any report gets interpreted by the biases and preferences of the people who
are running them.
And this element of people that becomes the most unknown or perhaps the most volatile variable
in data. We can get the right technology. We can build the right pipelines. We can get the
sort of the best ways to build dashboards and things like that. But then how those numbers
can interpret it, that's the thing where the people element comes from. And when I wrote the
book, I realized that most of the books on the market
on data, which there's not that many, maybe five or 10, but most of them were really focused on
the technical side of things. How do you build reports? How do you run queries? How do you
analyze numbers and statistical models for analyzing numbers? And I thought, you know what,
those are useful, but I think they're missing the huge element that if you teach someone basic probabilities and statistics, but then they have a bias in some way or another, the results they'll get are completely different from what you may expect. And because we were talking about this before we started recording here, you're an engineer
across us, and I have a slight engineering background as a front engineer.
I work with a lot of engineers, and I realize engineers can sometimes see the world as very
mathematical.
It's like step one, step two, step three, and you take the numbers through a clear logical
calculation, and there's only one answer here.
This is like math grade three.
It's only one answer you can get to, only one way you can get to it yeah exactly that's not really the
case for a lot of data especially the the toughest decisions around strategy and what a company
should do what products they should develop what markets that you go into how to build futures into
a product and that needs to be recognized.
And then you deal with it, right?
It's not something that can be a complete disruptor to how companies approach data.
You just have to understand that and deal with it.
So in the book, I talk about trust.
I talk about expectations.
I talk about training and how to make sure people have the basic skills needed to work
with numbers, to understand
them. And that tends to provide a really good foundation for all the other technology stuff
that companies are going to do really well. Ruben, question for you. We talked about
patterns. The idea that the sort of very common data integration problems are going to be solved in sort of an elegant way
and be very accessible, I think is accurate, right? We sort of see commoditization there.
So if you sort of remove that element, have you seen similar patterns on the people side as it
relates to data? Almost like if you think about architectures from a data stack standpoint,
you have a constellation of people in the company
working with data.
Are you seeing patterns that are proving
to be really successful sort of across teams
between engineers and those sort of consuming the data?
I'd love to know.
I'd just love to know what you're seeing there.
Yeah, one of the most interesting patterns
or perhaps trends actually relates to data and specifically to machine learning and ai
but not in the way that perhaps companies think about it where you're building it out yourself
i think we're starting to see that ai and machine learning is being built into the specific SaaS tools, right?
So you have an email marketing tool like Salesforce or product specifically,
or marketing cloud, any of those.
And it has a built-in way of running AB tests, right?
So you can take two subject lines, you test it, and they'll
tell you which was the best one.
But I think what's interesting to see now
is a lot of that is being taken to the next level
and all this machine learning
is doing all this analysis sort of behind the scenes
and then giving some kind of insight to the user.
So instead of asking them,
look through the past hundred emails
and then see what kind of patterns exist
among those hundred emails that you can see,
subject lines and content and open rate and so on,
the tool is just doing that all automatically and then just spitting out some kind of insight,
right? Saying, you know what, typically when you send an email around 8 a.m. and you include this
in the subject line and you have two images, those tend to do better than your other ones.
We see it in Google Analytics, right? We'll surface insights.
And not all the time they're useful.
Sometimes they're just really random.
But that's, I think, a really interesting pattern
for the people component.
Because instead of expecting them to be able to run
very sophisticated pattern analysis
and take data to Excel and those kind of things,
the software will do it for them. And all they have to do is just try a bunch of stuff, right?
Just try a bunch of subject lines, try a bunch of types of emails. Maybe they have to do some
kind of setup to get the A-B test going or make the test go, it's going to work properly,
but there's going to be a lot of heavy lifting done for it. And I think the same, the same
things applying, even when you look at a field like product analytics.
So the world of mixed panel and amplitude and snowplow and all that.
And you see a lot of these companies are investing really heavily in their machine learning reports.
So instead of saying product companies, for example, really want to know what futures tend to correlate with conversions, like signups or people becoming paying subscribers.
And that's an analysis you can run and you can sort of run in different ways to get the entire picture.
Or the software vendor can just build it in, build the algorithm in, you feed the data and it does it for you for the most part.
And it's not perfect yet, but those are also things that I think
we'll continue to see going forward.
And if you go back to the BI tools,
I think perhaps there'll be an element of BI tools
where it's not just about displaying the data,
it's about doing something with it
and trying to highlight insights around segments
or specific attributes or something that you miss,
but the software is able to surface automatically.
Yeah, super interesting.
It's almost like if you think about,
okay, we have all this data across the company
and we want to do AI, right?
That's a very sort of ambiguous, like challenging,
okay, what are the inputs?
What are you defining?
All that sort of stuff.
But if you think about AI almost as a localized service
within a particular
tool that a specific team is using to sort of accomplish a specific or drive or understand
a specific part of the customer journey, it makes total sense. And I agree, it's definitely
getting better. It's not perfect, but it's definitely getting better. Kostas, I'd love
to know what you think about this. Yeah, my approach with that stuff is a little bit more influenced
by engineering in general, to be honest, in the sense that we should always start from the problem,
try to solve the problem, and find the right tools. And AI might be, or machine learning might
be the right tool, or might not be the right tool, right? This is something that I think
it's a journey that as engineers, we always have to take
when we try to solve like a new problem.
And I think this is like the approach
that we should approach like everything
when we are building like something like a company,
for example, trying to build a product.
I understand that as humans,
we always want to not miss an opportunity, right?
Or we don't want to,
or we want to work with the latest shiny toy out there.
But at the end, we might just not need it
or it might not be sweet up, right?
And it goes back to trust.
We tend to trust new things a little bit easier at the end
compared to how we should.
And we have much more elementary problems
that we have to solve when it comes to data
and the culture around using the data, I think.
And I think that Ruben has touched it in a very good way
by talking about people.
When we are talking about how we can educate
all the stakeholders inside the company
to become more data literate
with stuff like elementary statistics
or understand like what
bias means and all that stuff i think there's a huge gap there between doing this and which is a
necessity and actually put like an ai black box there that magically it's going to solve everything
so yeah that's that's how i see things i don't know what do you think ruben it's it's going to solve everything. So yeah, that's how I see things. I don't know.
What do you think, Ruben?
It's funny you mentioned that, right?
Because I think we had a period,
which perhaps is ending now,
but maybe from 2010 to 2020,
where it seemed like any company
just added AI onto their name
or their product or description.
And that all of a sudden
made it really interesting.
But it turns out that,
one, it wasn't really AI.
It was perhaps at most machine learning and most of the algorithms were just being reused,
right?
There were things that were built for other purpose and they were just finding a different
use case for it.
So I think it became a bit of a crutch for a lot of companies.
And I saw all kinds of companies, companies that were going to write copy for you, or we're going to write like Facebook ads for you and all this kind of stuff
using AI. And I'm not sure how many of those are actually useful. So we'll see, we'll see what,
what happens. I think SaaS companies will continue to add this machine learning AI,
and they might call it AI. Maybe it's not truly AI as easy ways to make the product more useful.
But as you mentioned, when we look at the best companies out there, when it comes to being data driven and you likely have examples for me, which is hard, is they've done very consistent
training or education or coaching at a company level, at a culture level. And people in general
have a high level of comfort with data and whatever that looks like. And it may not even
be fancy. It may just be very simple excels and things like that, but they have the ability to work with it and get insights and then try things and then innovate on it.
And that's hard, right?
The AI might help a little bit with the insights, maybe with experimentation and making sure you can experiment faster, but there's still things that you can't really get around.
You have to really train the people or hire the right people or build the right company culture. Sure. Company size, I think, is a really interesting component
of this conversation because speaking in general terms, you sort of exclude large swaths of the
market depending on company size, right? So for example, a really early stage
startup company may not even have enough data for AI or ML to be applicable, right? There's just not
enough there. They're too early. The product's changing really quickly. So I think that's a
really interesting insight around the needs of the particular, and really it's what you said,
Costas, right? Like what's the problem and what are you trying to solve? And that can vary
significantly depending on the stage of company, the complexity of the stack, the size of the
data set, and there are just so many variables in there. Yeah. You mentioned a great point. I
have a lot of startups who reach out to me very early talking about pre-product market fit, three people, pre-beta
even.
There might even be no product out there.
And they reach out because they want to set up this entire data stack, right?
And say, hey, we know we want a CDP and we want product analytics and we want a BI tool
and we want something for surveys and five or six tools.
And they have almost no users.
And it's just too early. Even if they were to have those things,
there wouldn't be enough data to make it useful. They might be looking at,
I mean, literally a hundred users and trying to understand how those hundred users are using the
product. But instead of trying to look at those users through charts, funnels and line charts and
things like that, it's probably more valuable to just talk to them, you know, just get on a phone
call, do an interview. And this is where I think companies, especially executives need to understand
the context of data. If you're in that stage, it's almost, it's not really a waste of time,
but it's not a very good use of your time to try and set up this very advanced ways of visualizing data when you don't have any, instead of just talking to people and going low tech, low data, right?
More qualitative as quantitative.
Now, if you're at the extreme, you're a public company, you have millions of users, then there might still be a role for interviews and qualitative data,
but you want to make sure you have the ML and the quantitative component as well.
Right.
So it depends on the situation. It depends where you're coming from, but trying to find the right
or the best use of your data for your purpose. I think what Costas was saying,
what is the real problem here? And what's perhaps the most effective way of getting there,
not the most sophisticated or the most exciting.
Yeah, I was talking with someone who is a data engineer at some really incredible companies
like Heroku and some other companies, like from very large companies to startups.
And he was talking about the stages you go through in terms of data engineering.
And he talked about your three people in a garage startup.
And he said, you want to go out and buy fancy analytics tools?
He's like, just query your production Postgres database.
He's like, you just don't have enough data for it to be meaningful.
And then it reminds me, actually, Kostas, do you remember we were talking with Alex
from the Pool app? He's a founder who went through YC. And we asked him about analytics at a very early stage because he had 100 users. I don't know how many, but in He said, I used my product analytics tool.
I don't remember what when he was using.
He said, I use it literally just to figure out who I should talk to.
What are anomalies or people who adopt really quickly?
I thought that was really, really interesting and aligns exactly with what you said.
Yeah, yeah, 100%.
Like at an early stage where people, I mean, it's not just like the number.
It's like it's also you are at a stage where also as a founder, you educate yourself.
You need to build your intuition, right?
And you are not going to build this intuition fast enough if you just look on a screen and try to figure out what's going on with the numbers. Probably you're going
to just waste your time, to be honest, because there's too much noise in this data instead of
signal. It's much, much better to go out there and pick up your phone and talk with someone.
And I think this is something that is relevant for a much longer time when it comes to B2B
companies, because numbers grow much slower. But of course, like to B2B companies because numbers that grow like much slower.
But of course, like with B2C, you reach a point
where doing analytics and trying to aggregate the data
in an automatic way is necessary
because you just have too much data, right?
Imagine a company like DoorDash.
Yeah, like of course you need analytics there.
Of course you need advanced analytics
because otherwise you have like so much data that
a human being cannot interpret them, right?
So yeah, I think that anyone who has started the company, and they went through that, they
have come to this realization of, I can build whatever model I want with the early stage
data that I have, but 90% of the times it just fails.
Yeah. I'll give you an example of where this plays out. Last year I worked with tourism agencies here in British Columbia, Canada, and of course, hardest hit industry by the pandemic,
unprecedented drops in volume and visitors to the country and so on. And I worked with one
in particular that was really
quite frozen by all this. It was a tough situation to be in. And they really wanted to look towards
the numbers to kind of figure out what to do next. And at some point, I remember telling them,
I mean, the numbers are not going to change. We know they're not going to change. Regulations
are not going to change that quickly. We know you're sort of 90% down from regular averages historically.
We know all this.
So the question is just, what are we going to do about it, right?
Is there local travel we can encourage?
Is there financial decisions that have to be taken at a company level?
These were hard decisions to be taken.
But I noticed that they felt quite stuck and sort of paralyzed. And they wanted
numbers and reports and dashboards, specifically quantitative numbers, to tell them where to go
and where to move. And I think that was one of the weaknesses they had to deal with. It was hard to
overcome. And the same thing could happen in early stage companies, one in numbers, one in machine learning to tell you something about those 100 users, instead of just talking to them. And in tough decisions that can be a bit of a crux. And I think that that's one of the challenges I was seeing with companies that really want to be data driven, or even just individuals, whether in a crisis or not, they'll run into situations where
they don't have enough data.
It's just not enough data to be any kind of sort of statistical validity.
But you still have to make decisions nonetheless.
You can't just wait until all the numbers are in.
And I think having a comfort and being able to go both ways effectively, right?
Be able to use data to analyze patterns and make decisions and being able to make decisions
despite a lack of data.
I think that that's what kind of builds resilience among companies and individuals.
Yeah, I agree.
One thing that I was talking to someone about recently is that the one thing that I've noticed, I think,
as I've had the opportunity to work at companies at various stages of maturity, but really
with teams that have sort of deep experiences, entrepreneurs and sort of taking companies
to market and, of course, leveraging data to do that is when it comes to
decision-making, I find that people who are really good at it, they've built an incredible amount of
muscle at breaking down numbers into sort of the simplest form and sort of only looking at the
necessary components, right? As opposed to saying, let's try and build some very complex model, which is necessary sometimes, right? But in many cases, you're trying to make a high level
sort of almost directional decision. And it's really interesting to see really, really smart
people actually break numeric things down into pretty simple stuff that makes decisions a lot more clear, right? Because, you know, I think
clarity is a huge deal when it comes to making decisions based on numbers. I cannot believe that
we have run through the entire time. We did not get to the third topic we wanted to discuss,
which is CDPs. So Ruben, we'll have to do that on another show because, boy, that's a loaded term and you see so much of it. But before we're going to have to end the show
soon, but before we go, I'm thinking about our listeners who are considering the people side of
the equation, which you mentioned. And as I've reflected in the conversation throughout the
recording, I really think that there's this element of building technical discipline around getting the data correct in the systems. building muscle around the people side of interacting with and actually sort of using
data and making decision rounds data in the organization. So for our listeners who are
actively working on the technical piece, but maybe want to get started with just some really
practical things, maybe they could start working on this week to build muscle on the people side,
what are the top two or three things you would recommend they do in terms of getting started? Yeah. First, I think they need to take a stock of
who's going to be using the data and what their skills are. That is, do they know SQL? Are they
comfortable with technical topics? Or are they going to have lots of questions around what may
seem like basic things? How does this number work?
How is this data collected?
What's the formula here?
Things like that.
Based on that, then you have sort of different paths, right?
If you have a highly technical team, then your rollout, how you get sort of this in
the hands of people will be slightly different.
You're likely going to want to make sure that you allow people to run their own SQL,
that you have a lot of diagnostic information
so they can explore the data on their own,
and that everyone just has to the right permissions
and access for it.
Highly technical teams, if you have a low technical team,
then you need to make sure that there is nice interfaces for interacting with the data that doesn't require a knowledge of SQL or similar things.
And you want to kind of get a gauge as to what skills might be best suited for kind of training, whether it's, again, basic statistics, basic probabilities, how to make decisions, how to read charts, what sort of the
difference between the line chart, bar chart, how the chart design might completely change the
meaning of a KPI or a number, things like that. And then the third step, kind of figure out the
ways to close those gaps. And the topics we're talking about here they're not university
phd level topics so they can be covered in you know informal workshops one-on-ones and things
like that but you have to know what you what you want to teach and if you're dealing with some of
the skills there may be some discovery or research that's needed and i mean that as i'm talking to
people because if you ask someone, are you comfortable
with numbers, they might just say yes. But if you dig a little deeper, you might find out that,
you know what, like, probabilities, it's not something that comes natural to you. Statistics
was not your favorite class in college. So you start to kind of figure out what are some of
those skills that you start to work on as a team or as a company. I think that's incredible advice.
And I'll actually say,
I really did enjoy statistics in college.
Math isn't my strongest subject,
but it's been really helpful.
And Kostas, I will say some nice things about you
because I've had the opportunity
to actually work with you on some internal reporting.
And with your engineering background, thinking about things like cohort analysis and some
other components there where you have the ability to explain some of those concepts
to me in a very practical way as we're looking at a data set together has been really, really
helpful, especially for me not necessarily having a non-technical background. So I can say
that if you're listening to the show and there's someone who's non-technical, but who interacts with
the data you're producing, please take the time to help them because it's been hugely helpful for me.
Thank you so much, Eric. I really enjoy doing it. So I'm happy to do it anytime.
Great. Well, Ruben, thank you so much for joining us. And if people are interested
in your book,
where should they go
to get more information on that?
Yeah, the Data Mirage
is available
everywhere you can buy books.
So Amazon,
Barnes & Noble,
Chapters,
if you're in Canada,
Google Play.
And they can also go
to my website
at rubenyugarte.com
and they'll find links
to the books
and blog posts
and videos
and other free resources. Great. Well, thank you so much for joining us and they'll find links to the books and blog posts and videos and other free
resources. Great. Well, thank you so much for joining us and we'll talk again soon.
Thank you for having me. I really loved the part of the discussion where we were talking about
the ways that the pandemic accelerated so many things. Obviously it was a hugely tragic and
challenging event in so many ways. In terms of
data and digital transformation, though, it really forced a lot of companies to do just a lot of
different things to update the way that they are dealing with data and creating customer experiences.
And I think, as I was reflecting on the conversation you were having with Ruben about
that, Costas, which was really, really enjoyable to listen to. There's this phrase that every company is becoming a software company.
And I'm not going to trademark this, but in many ways, the pandemic forced every company
to act like an e-commerce company in the way that they deal with data. E-commerce many times
is sort of on the sharp end of trying to figure out how to leverage data to grow
and really drive the customer experience with data.
So I think that was my big takeaway.
And I think that's something I'll be thinking about
in the upcoming week.
What stuck out to you?
Yeah, that's a great point, Eric.
For me, I think it's the validation
of like a kind of current theme
that we see in our conversations,
which is the relationship between data and people.
I mean, you heard like Ruben, he said that a big part of his book is actually dedicated
to how important people are when we are trying to be like a data-driven company and how many
things are missing there.
And I think this is like, as I said,
another validation of the concept
that data is not here to substitute people, right?
It's here to be another tool for people.
I know that we've said that many times before,
especially with people that are coming
from the ML space or the AI space.
And let's say the most advanced, let's say, use cases
where everyone is afraid like the AI overlords
will come and take our jobs and all that stuff.
But at the end, from what it seems
and what becomes like more and more obvious
is that data is just another tool, right?
And it's another tool that augments
like the capabilities that humans have.
And it happens like at every stage
and with almost like every problem out there.
And so we don't only need to build new technologies,
we also need to educate people
on how to use these technologies
if we want the technologies to succeed.
So that's like what I keep from our discussion
and I'm looking forward to chat with him again in the future.
That was a very succinct and elegant summary
of a philosophical perspective on data.
And I appreciate your ability to do that on hand.
So if you want more concise philosophical predictions
about the future of data from Costas and maybe me. Join us on the
next show. Tons of exciting episodes coming up through the rest of the summer. And thanks for
joining us. We hope you enjoyed this episode of the Data Stack Show. Be sure to subscribe on your
favorite podcast app to get notified about new episodes every week. We'd also love your feedback. You can email me,
ericdodds, at eric at datastackshow.com. That's E-R-I-C at datastackshow.com.
The show is brought to you by Rudderstack, the CDP for developers.
Learn how to build a CDP on your data warehouse at rudderstack.com.