The Data Stack Show - 44: Leveraging Data in a Post-Covid World with Ruben Ugarte of Practico Analytics

Episode Date: July 14, 2021

Highlights from this week's episode: Ruben's background (2:36)Massive shifts in data caused by COVID (4:47)Big Tech is no longer untouchable (9:54)Accelerations in the BI space (15:17)A focus on peop...le and on trust (23:43)Numbers are filtered by the biases of the people viewing them (28:46)AI trends and adoption (38:06)Using qualitative data for insights, particularly at early stages (40:56)Recommendations for taking stock of who is using the data and assessing what their skills are (50:06)The Data Stack Show is a weekly podcast powered by RudderStack. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You'll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by Rudderstack, the CDP for developers. You can learn more at rudderstack.com. Welcome back to the show. Today, we're going to talk with Ruben Ugarte, and he is a data professional. He's worked in the data space for many years and does all sorts of projects. One
Starting point is 00:00:40 thing that I think is so interesting is that along with sort of doing the technical side of things and helping companies understand how to build their stack out, he also helps companies and data teams learn how to make decisions with data and sort of operationalize data across teams within an organization. So tons to talk about there. One thing that I think we should ask Ruben about that I'm really interested in is what he's seen as a result of the pandemic. That's not a subject we've covered extensively on the show, but Ruben's dealt with some companies and industries that both were experiencing unbelievable growth and industries like travel and tourism that face some really, really hard times due to the pandemic. And there are massive data implications in both areas. So I'm really excited to chat with Ruben about that.
Starting point is 00:01:30 How about you, Costas? Yeah, that's going to be super interesting, I think. One of the things that are very unique with people like him is that as he's consulting many different companies, he has a much broader, let's say, view of what's going on in the industry. So that's something that I would like to discuss with him, like what kind of patterns he sees out there in terms of what the companies are doing, what are the problems overall in adopting data-related technologies, and possibly also what are the solutions based on his experience.
Starting point is 00:02:00 So I think this is going to be like a couple of different areas that my questions are going to be focused on. Great. Well, let's dive in and chat with Ruben. Let's do it. Ruben, welcome to the show. We are so excited to chat with you about all sorts of different things data related, especially your new book, The Data Mirage. So thanks for taking the time to join us on the show. It's a pleasure to be here, Eric. All right. Well, really enjoyed getting to know you a bit, just chatting before we hit record, but I'd love for you to just give a background to the audience on who you are and what you do. Yeah. My background is, of course, in data and decision-making in particular.
Starting point is 00:02:40 And I work with companies of all sizes from startups to public companies and we're typically trying to figure out how to use data to make better decisions at its core and that may mean that we'll need to select technology we'll need to implement it we might need to design strategy around how to use data and of course working with the data itself to come up with insights and answers and next steps. And this is something I've been doing now for just a little over six years, and just seeing all those different use cases and things that come up as a company tries to be more data-driven. Very cool. I love this topic because there is just a lot of, it's really hard to be data driven. It's kind of like the idea of just getting, if you think about customer 360, you're sort of
Starting point is 00:03:37 moving towards warehouse-based analytics and some of these trends in digital transformation, just doing those things is really hard. And then you oftentimes see companies look at your appointments like, okay, we have all the data in the warehouse. And then you say, well, now what do we do? It's sort of that you accomplish this really, really difficult task technologically. And then you realize, okay, well, the real work actually is now just beginning. So really excited to get your perspective on that. But one thing we were chatting about before he record was the things are now starting to regulations for COVID or lifting in various regions. And we saw just an unbelievable sort of change in the digital landscape in so many
Starting point is 00:04:28 areas. But you working with so many different companies, tell us what are you seeing on the ground across companies and across industries that has resulted from COVID and now that we're coming out of COVID? What types of things are you seeing related to data and what companies are doing? Yeah. So I think there's an even faster push to get data adopted, to make sure it works, especially for remote teams. That is, it's accessible by anyone, to wherever they are. It doesn't require you to perhaps go to someone physical, like a data analyst. So lots of companies trying to say, hey, how do we build dashboards? How do we build reports? How do we make sure data doesn't have a lag, right? That we can look at numbers somewhat
Starting point is 00:05:16 real time, perhaps last 24 hours, last day, and not we have to wait now two weeks so we can get the performance in the last month. So that's one thing. The second is, of course, this massive digital transformation of businesses. Some of them, by force, their industry might be hard hit, like tourism. And others, they realize that there's this huge potential to go e-commerce, to do more digital products, to really take advantage of digital channels. And companies, I think, use the pandemic, or at least the best companies use the pandemic as a great opportunity to undertake this. And the third thing that I think will be perhaps to be seen how it play out is how just communication takes place in a more remote environment. This is now perhaps a big debate at a lot of companies.
Starting point is 00:06:12 Do you go 100% remote? Do you do two days, three days? And then from there, whatever you decide, then it's a matter of, okay, how do we make sure we share data and information with everyone to keep everyone in sync and perhaps not overload people with meetings just because they are remote? So all those different things, I think there's an undertone of data and how it might play out or not play out for companies as this next period of what seems to be really high consumer demand across probably every industry, almost every industry takes place. So I have a question about that, Ruben. You mentioned that it's across like every industry. Do you think that there are like specific industries that are going to be, I mean, are already like more adopting faster these new trends around like data?
Starting point is 00:07:06 And if yes, why do you think that this is happening? Is it because like COVID affected them differently or there's some other reasons? That's a good question. I mean, the ones that come to mind, of course, is the one we might think about technology companies, e-commerce companies, for example, that weren't really affected by COVID. They just sort of ran through it. I think even what may were low-tech industries are starting to change. We look at car dealerships for now, right?
Starting point is 00:07:33 And there's no supply of cars in countries like the US and Canada. So how do they adjust to that approach? Do they go to a more digital way of selling cars, of taking bookings, wait lists? Restaurants, for example, I think are really interesting. They all had to implement booking systems for restaurants. Here in Vancouver, there were a lot of restaurants that didn't have reservations. You showed up and if there wasn't a line, you waited. That's what you did. But a lot of these businesses had to implement some kind of reservation system to make the social distance regulations work.
Starting point is 00:08:07 And those things will probably stay. And then once they have that, I think they can then lay over things like takeout and other digital ways of interacting with the restaurant without having to be physically present. So those are the ones that come to mind. I would say those industries that were either really hard hit by regulations or they were hard hit by success there was just so much demand for their products as people were at home and they were shopping
Starting point is 00:08:31 online that they had to completely change and the change that we're seeing today is not uh this happened last month is it's been going on for a year you know a year and a half so it's it's there's no there's no going back for a lot of these businesses. And I'm sure you might be seeing similar things where you're coming from. Yeah, absolutely. And I can't stop thinking while you're saying all that stuff about some industries that traditionally, let's say, were lagging behind in terms of adopting technology. Two of them that are coming in my mind is shipping. And the other one is anything that has to do with supply chains. And I think that they really had during this
Starting point is 00:09:15 crisis with COVID to catch up and do it really fast. And we are talking about some very critical industries, right? I mean, I think everyone hears about all the issues that we have with supply chains right now. And of course, like shipping is part of that. So yeah, it's very interesting to see how the next couple of years we'll be more, let's say, in a position to evaluate the impact that COVID had. And I think in most cases, it's going to be a positive,
Starting point is 00:09:42 like it's going to be an accelerator, actually. Do you think that there's also a negative impact in most cases, it's going to be a positive, like it's going to be an accelerator, actually. Do you think that there's also a negative impact in some cases when it comes to the adoption of technology and anything that has to do with data in particular? Well, we were coming off 2020, right, where data was playing this big role. And, you know, I think that the social dilemma came out, right, in Netflix that talks about the usage of data for marketing, especially around the US presidential campaign. And now we're seeing this backlash against, for example, big tech. Big tech used to be untouchable.
Starting point is 00:10:16 And now they're almost on the other side where everyone wants to regulate them no matter what. So I think individuals in particular became more aware of the data that's going around them and especially more of the sensitive data, such as health data, right? We now have debates around vaccine passports. Do you do them? Do you not? Is it a sort of break in privacy?
Starting point is 00:10:43 And the tricky thing for businesses is that every business knows they want to, they need to track data. They need to store it. They need to use it, but they're not the sophistication level needed to protect that data correctly from any threat. It's really hard. It's not a, it's not really something that I think businesses are completely familiar yet. I think I was completely familiar yet. I think I was reading the Wall Street Journal today or yesterday, right, about that major hack that's going on with hundreds of companies. It's just, it's really hard to protect data. So businesses of all kinds are being put into that position.
Starting point is 00:11:17 It's no longer just the government or Fortune 500 companies. And it's going to take some time for companies, I think, really get a good grasp as to what that looks like and how to protect it properly among their employees, which then naturally transitions to consumers. And do you trust this company that you're giving your data to, whether it's just an e-commerce store, right, that you're providing your credit card and your address and your name, or more involved companies, perhaps the health companies that you're providing blood tests and other health markers for personalized diets.
Starting point is 00:11:51 So it's this entire world where we know now what our data is worth or what it could look like in the wrong hands. And it's not quite clear how ready we all are to protect that data. Yeah, absolutely. I think if something 2021 and 2022 taught me is that actually data technology and infosec technology, they're probably going like hand by hand. And if we see progress in one, we will definitely also see like progress and changes happening to the other.
Starting point is 00:12:24 And I keep saying usually that like this, like the decade of the 20s is going to be all around data. But actually, I think I should change that a little bit and be more like it's going to be about data and it's also going to be about security. I think these two are going to be both super important and there are going to be also, I think, very interesting, like,
Starting point is 00:12:46 ethical conversations and conversations on a, like, collective level that we are going to do, which, anyway,
Starting point is 00:12:52 it's going to be very, very interesting. But let's try not to make it, like, the whole conversation too philosophical. Cool. Let's go back to technology,
Starting point is 00:13:00 Ruben. And let's talk a little bit more about parts of the data stack, let's say, or like the technology you think that got boosted by this whole adoption of data and digitization that COVID brought. Which technologies do you think benefited out of this? The ones that come to mind is customer systems.
Starting point is 00:13:23 So anything like a CRM crm of course email marketing tools i think digital advertising got a boost out of this and facebook and google of course take the big chunk but any kind of digital advertising and bi tools in general might be the category i think i was listening to a podcast the other day about data box as a business intelligence tool and how well they were doing and how well they were growing. And I think it's just a reflection of companies who are looking for easier ways to visualize their ever-growing volume of data that doesn't stop. It just keeps growing and going, in some cases exponentially, and trying to visualize it into some kind of usable format is a really big challenge.
Starting point is 00:14:09 So I think all those things got a boost from the pandemic as you couldn't reach customers in person. And you had to reach them where they were, which were their phones, their emails, their social media, and how companies can get their message to the customers in those channels. Makes sense. In terms of BI, because you mentioned BI tools, my feeling with BI is that actually before the pandemic or like the beginning of the pandemic, we were pretty much like at the close, at the end of like an innovation cycle in this space. We had the acquisition of Care,
Starting point is 00:14:47 we saw the acquisition of Tableau from Salesforce, we had the merge of Sisense with Periscope Data. Do you see this market as a space that has space for more innovation? Do you feel like there's new stuff that we need there? And of course, do you think that this whole situation with COVID and all this obsession around data is going to accelerate the innovation, particularly in the BI space? That's a good question. And you're right that a lot of the major players
Starting point is 00:15:20 were acquired. There's always space for smaller companies and sort of the SME market that they might be, one, switching quickly between tools, but also coming across new tools. To me, I think that the most interesting element of BI isn't so much the building the dashboards and reports. I think that is a problem that has been effectively solved, right? There's great ways of building charts. There's great ways of doing it that is user-friendly. It doesn't require SQL. And if you do know SQL or something more advanced,
Starting point is 00:15:53 you can use that. So that problem seems to be solved by pretty much every player. And I think the remaining problem is still how easy can you integrate data sources into your BI tool, right? You might have five or 10 advertising channels, your CRM, your email provider, maybe some custom sources, and how many clicks does it take to sort of bring all that either into a central data warehouse or to just bring into like a virtual space and then visualize it
Starting point is 00:16:26 and that that seems to be the trend for if you look at like domo right a lot you know a lot of integrations smaller players like grow.com and of course I think tableau has some connectors and that will like probably be the future of the of the bi world just more and more integrations so it's point and click and then anyone can just sort of plug in their data sources and you sort of get up and running with reports and dashboards. And in some cases,
Starting point is 00:16:52 maybe even just templates, right? Because if you know what the data schema looks like when you bring data from Shopify and you bring data from Facebook, it's easy to then create this sort of pre-built templates that you can just create in a few clicks. So that to me is the future of the BI world.
Starting point is 00:17:11 Whether that's new players or some of the older players take this on, that will be interesting. I'm not sure about that. Yeah, that's an interesting point. Do you think that all this data accessibility problem and creating these connectors and getting access to all the different sources that the companies have, do you think that this is like a BI problem or did you see a different category that's going to exist out there that's going to focus mainly on that?
Starting point is 00:17:38 The reason that I ask, of course, I'm a little bit biased and I have my personal reasons because I started the company around this, Blendo. But I see that like this particular category right now, we see more and more companies appearing, right? Like we still have like Fivetran, which has become like a pretty big company. And we have many open source solutions that they are appearing.
Starting point is 00:18:01 And we keep seeing more and more companies who are like trying to solve the connectivity, the data connectivity and accessibility problem. So what do you think about this? Do you see a consolidation there? Do you think that this category at the end is going to merge with the BI category? Or do you see two completely different categories to keep growing? A couple of points here. I think from a technology perspective, I could see the separate categories going as we might be talking about different buyers here in a company. The more data engineer side or just an engineering department in general, when it comes to moving data from point A to point B. And I work with a lot of clients where I'm usually brought in by the marketing team or a sales team and so on. And they just tell the engineering team, this is the data we want, and this is where we
Starting point is 00:18:46 want it. Just move it. Don't worry about the data schema. Don't worry about what we're going to do with it. Just move it from place A to point B. And that's already the data engineer and movement of data, right? The five-trend world. And in the BI tool, typically I find on the non-technical side, we want to build dashboards.
Starting point is 00:19:05 This is what they look like. We want an executive dashboard to summarize our KPIs. So I see those worlds being somewhat separate. There might be some overlap, right? More companies take that sort of domo approach where they make the connectors into one tool. But my second point in general that I was going to say here is I think a lot of these issues are becoming less technical as time goes on and more people related. That is when I look at technology, just even the past five years and the data technology,
Starting point is 00:19:38 it is significantly much easier today to take data from common data sources, the Salesforce, the Marketos, Facebook ads, Google ads. If you use a common data stack, it can be really quite straightforward to plug things in and get data into a warehouse or a BI tool and so on. And I think that trend will continue. More vendors, more SaaS companies are making data accessibility really easy but what's not getting easier is what companies do with it so they collect it all what do you do with it right how do you analyze it how do you turn into insights how do you deal with political issues I work with a few finance companies and crypto companies And I was working with a client once where we had this great plan. We were going to have a CDP and this entire data stack,
Starting point is 00:20:29 very modern, very advanced. And the entire plan was vetoed by the legal department. And they just said, you cannot have data in a cloud environment that we don't control. And the whole plan just fell apart. So to me, this is our political issues that can affect how data flows from one place to the other.
Starting point is 00:20:49 And those, I think will continue to be trickier, especially, especially as as companies and let's say legal departments or compliance departments realize what if some of this stuff leaks out, it's going to be a big issue. Like we're going to have fines, reputation damage. So the safest thing possible is to not, the data flow freely, really have tight restrictions. And then that limits how companies use it. And that's a tricky problem. I think that's not something you can solve with technology as easily. But nonetheless, companies will have to find a way to sort of get their head around it. Yeah. Yeah. I think that's, I think that's super interesting. And
Starting point is 00:21:27 jumping back just a couple of points, it's interesting. You see players in the space sort of approaching the same outcome from different angles, right? So you have, like you said, Domo, which is sort of building the connectors in. Metabase recently spun out of GitLab. And they are building some sort of connectors for the ingestion piece so that they're starting to dabble in sort of the adding pipelines and not just sort of being a BI tool that sits on top of the warehouse. And then you have companies from the other side approaching it, right? So cross-suspension 5TRAIN, and of course, obviously, Redderstack sends data. And from that regard, you sort of have a lot of times just the raw pipeline piece, but then close partnerships with companies like DBT, where you're sort of crossing the bridge between just being a delivery pipeline and sort of enabling analytics, but not delivering the last mile of actual dashboards.
Starting point is 00:22:31 So it'll be interesting to see how the dynamics within an organization change depending on what tool they're using and where it comes from, right? Because if you start from the BI and then sort of move towards the pipeline, as opposed to starting with the pipeline and moving towards BI, there's very different team dynamics involved in that. You sort of, the marketing or analytics org is sort of the key leader of the project versus maybe someone more on the engineering side. So yeah, it'll be really interesting. And I think the owner of the project internally has a really big impact on what the sort of the political implications are of what the final outcome is within the organization. You mentioned something very interesting. I guess from your perspective, what do you think are
Starting point is 00:23:17 perhaps some of the toughest challenges around data pipelines in the future, based on the context that we're discussing? What are some of those things that are just going to get harder and harder, despite some of this improvements in technology that we're debating here? Costas, I'm going to let you handle that because you work on pipeline projects every day from a product perspective. Yeah. Yeah. That's an excellent question.
Starting point is 00:23:45 I think that as we solve, let's say, the problem of accessibility when it comes to data, I think the next big question around data is going to be, can we trust the data and how we answer this question? And that doesn't have a simple answer, actually. There are answers, actually, on very different levels, in my opinion. So I see that what is going to be like a great effort from now on
Starting point is 00:24:13 is how we can separate the noise from the signal in all these huge amounts of data that we can collect today and very cheaply put them on a data warehouse and ask questions. So I see a lot of space for innovation when it comes to anything that has to do with data quality, data exploration, but not in terms of just doing the BI reports that we typically have seen that are reports for business decisions and the lots around data governance, who access the data, why they access the data, how the data, what's the lifecycle of the data, how they move from one place to the other and how they have been transformed and what's the lineage of the data.
Starting point is 00:25:01 I think that we are going to see, and by the way, these are not like new problems, right? It's a problem that probably large enterprises have been dealing for quite a while. But I think that the problems that the large enterprises were dealing with,
Starting point is 00:25:14 and especially like from space, heavily regulated spaces, like banking, we are going to see it happening to pretty much every company and everyone who wants to do business out there and wants like to be datadriven, as we said at the beginning. So that's how I see at least the next two or three years of what I expect to see happening
Starting point is 00:25:33 out there. What do you think, Ruben? Yeah, I mean, I agree with one of the very first points you made about trust. I remember the first time I came across a trust issue and I was working with a team and we went through, we checked the data, we made some fixes to it, and then we presented new numbers. And the executive team was like, those numbers make no sense. This data is completely incorrect. And we had kind of triple, we had triple checked the numbers. So we knew it made sense. Like we had gone sort of
Starting point is 00:25:59 column by column and made sure that the things were adding up. And I realized very soon that it was actually not, they were not having a technical issue. They were having a trust issue. The data had been incorrect for so long, in this case, a year plus, that they had very little trust in anything that came from it. And we had to build the trust back up. And I learned that you sort of lose trust one report at a time, and you have to build trust one report at a time again. And trust, to me, is a fascinating problem that I talked in the book, because it's, to me, fundamentally psychological. It's something that you have to work with people, understand where they're coming from, how much data expertise they have versus they don't have. are the expectations correct with what this number is supposed to be? One of the most common questions I get is when companies are looking at comparing their paid spending, the conversions that something like Facebook has attributed them versus some external provider, like a mobile attribution provider or even just a web attribution provider.
Starting point is 00:27:02 And they're not the same, right? They almost never match. And that can cause a lot of stress and which number is correct, which number should we trust? And that's, it's a matter of training, of reshifting expectations and getting people comfortable with the data they have and making sure they're using it in the right context. Yeah, a hundred percent. I totally agree with you. And I think it's something that you also mentioned a little bit earlier. And I think it's a very good opportunity to start talking also a little bit more about your book. Because I know that one of the things that you deal a lot in your book is about people and how an important dimension it is when it comes
Starting point is 00:27:42 to data and how we use data. And of course, that's where also trust comes into play. And I think that's something that we forget, especially people that are coming from, let's say, more of an engineering background. But I think that this is also kind of perceptional, like, let's say, the whole humanity is building out there, is that numbers are something objective, right? You come up with the numbers and that's it. Your work ends there. I mean, everything that has to be told is told through the numbers. But actually, I don't think that this is true. Because you have the numbers, you might have your visualizations, you might have built whatever you want to build. But at the
Starting point is 00:28:22 end, you need the people there to tell the story of the data right and this story is super important and it's also what is going to build or rebuild the trust and that's my take as a person that i work in this space for like a bit more than 10 years now what do you think about this and can you tell us a little bit more about the importance of people? Yeah. We, of course, started talking about the pandemic. And I think it's a fantastic case study for how numbers get interpreted or misinterpreted. Pretty much all over the world, we were all seeing COVID case numbers and things like that. But it became very clear that everyone was interpreting numbers in the same way. Here in Vancouver, we had protests, anti-lockdown protests, as many countries did.
Starting point is 00:29:12 And it was a clear distinction between people who would see the daily or weekly COVID numbers and they thought, okay, this is what we should do. And an entire different group of people saw the same numbers and took a different decision. And that's the same thing that happens in companies, right? Any number, any report gets interpreted by the biases and preferences of the people who are running them. And this element of people that becomes the most unknown or perhaps the most volatile variable in data. We can get the right technology. We can build the right pipelines. We can get the
Starting point is 00:29:53 sort of the best ways to build dashboards and things like that. But then how those numbers can interpret it, that's the thing where the people element comes from. And when I wrote the book, I realized that most of the books on the market on data, which there's not that many, maybe five or 10, but most of them were really focused on the technical side of things. How do you build reports? How do you run queries? How do you analyze numbers and statistical models for analyzing numbers? And I thought, you know what, those are useful, but I think they're missing the huge element that if you teach someone basic probabilities and statistics, but then they have a bias in some way or another, the results they'll get are completely different from what you may expect. And because we were talking about this before we started recording here, you're an engineer across us, and I have a slight engineering background as a front engineer.
Starting point is 00:30:50 I work with a lot of engineers, and I realize engineers can sometimes see the world as very mathematical. It's like step one, step two, step three, and you take the numbers through a clear logical calculation, and there's only one answer here. This is like math grade three. It's only one answer you can get to, only one way you can get to it yeah exactly that's not really the case for a lot of data especially the the toughest decisions around strategy and what a company should do what products they should develop what markets that you go into how to build futures into
Starting point is 00:31:23 a product and that needs to be recognized. And then you deal with it, right? It's not something that can be a complete disruptor to how companies approach data. You just have to understand that and deal with it. So in the book, I talk about trust. I talk about expectations. I talk about training and how to make sure people have the basic skills needed to work with numbers, to understand
Starting point is 00:31:45 them. And that tends to provide a really good foundation for all the other technology stuff that companies are going to do really well. Ruben, question for you. We talked about patterns. The idea that the sort of very common data integration problems are going to be solved in sort of an elegant way and be very accessible, I think is accurate, right? We sort of see commoditization there. So if you sort of remove that element, have you seen similar patterns on the people side as it relates to data? Almost like if you think about architectures from a data stack standpoint, you have a constellation of people in the company working with data.
Starting point is 00:32:31 Are you seeing patterns that are proving to be really successful sort of across teams between engineers and those sort of consuming the data? I'd love to know. I'd just love to know what you're seeing there. Yeah, one of the most interesting patterns or perhaps trends actually relates to data and specifically to machine learning and ai but not in the way that perhaps companies think about it where you're building it out yourself
Starting point is 00:32:58 i think we're starting to see that ai and machine learning is being built into the specific SaaS tools, right? So you have an email marketing tool like Salesforce or product specifically, or marketing cloud, any of those. And it has a built-in way of running AB tests, right? So you can take two subject lines, you test it, and they'll tell you which was the best one. But I think what's interesting to see now is a lot of that is being taken to the next level
Starting point is 00:33:27 and all this machine learning is doing all this analysis sort of behind the scenes and then giving some kind of insight to the user. So instead of asking them, look through the past hundred emails and then see what kind of patterns exist among those hundred emails that you can see, subject lines and content and open rate and so on,
Starting point is 00:33:46 the tool is just doing that all automatically and then just spitting out some kind of insight, right? Saying, you know what, typically when you send an email around 8 a.m. and you include this in the subject line and you have two images, those tend to do better than your other ones. We see it in Google Analytics, right? We'll surface insights. And not all the time they're useful. Sometimes they're just really random. But that's, I think, a really interesting pattern for the people component.
Starting point is 00:34:14 Because instead of expecting them to be able to run very sophisticated pattern analysis and take data to Excel and those kind of things, the software will do it for them. And all they have to do is just try a bunch of stuff, right? Just try a bunch of subject lines, try a bunch of types of emails. Maybe they have to do some kind of setup to get the A-B test going or make the test go, it's going to work properly, but there's going to be a lot of heavy lifting done for it. And I think the same, the same things applying, even when you look at a field like product analytics.
Starting point is 00:34:48 So the world of mixed panel and amplitude and snowplow and all that. And you see a lot of these companies are investing really heavily in their machine learning reports. So instead of saying product companies, for example, really want to know what futures tend to correlate with conversions, like signups or people becoming paying subscribers. And that's an analysis you can run and you can sort of run in different ways to get the entire picture. Or the software vendor can just build it in, build the algorithm in, you feed the data and it does it for you for the most part. And it's not perfect yet, but those are also things that I think we'll continue to see going forward. And if you go back to the BI tools,
Starting point is 00:35:29 I think perhaps there'll be an element of BI tools where it's not just about displaying the data, it's about doing something with it and trying to highlight insights around segments or specific attributes or something that you miss, but the software is able to surface automatically. Yeah, super interesting. It's almost like if you think about,
Starting point is 00:35:49 okay, we have all this data across the company and we want to do AI, right? That's a very sort of ambiguous, like challenging, okay, what are the inputs? What are you defining? All that sort of stuff. But if you think about AI almost as a localized service within a particular
Starting point is 00:36:06 tool that a specific team is using to sort of accomplish a specific or drive or understand a specific part of the customer journey, it makes total sense. And I agree, it's definitely getting better. It's not perfect, but it's definitely getting better. Kostas, I'd love to know what you think about this. Yeah, my approach with that stuff is a little bit more influenced by engineering in general, to be honest, in the sense that we should always start from the problem, try to solve the problem, and find the right tools. And AI might be, or machine learning might be the right tool, or might not be the right tool, right? This is something that I think it's a journey that as engineers, we always have to take
Starting point is 00:36:46 when we try to solve like a new problem. And I think this is like the approach that we should approach like everything when we are building like something like a company, for example, trying to build a product. I understand that as humans, we always want to not miss an opportunity, right? Or we don't want to,
Starting point is 00:37:05 or we want to work with the latest shiny toy out there. But at the end, we might just not need it or it might not be sweet up, right? And it goes back to trust. We tend to trust new things a little bit easier at the end compared to how we should. And we have much more elementary problems that we have to solve when it comes to data
Starting point is 00:37:26 and the culture around using the data, I think. And I think that Ruben has touched it in a very good way by talking about people. When we are talking about how we can educate all the stakeholders inside the company to become more data literate with stuff like elementary statistics or understand like what
Starting point is 00:37:46 bias means and all that stuff i think there's a huge gap there between doing this and which is a necessity and actually put like an ai black box there that magically it's going to solve everything so yeah that's that's how i see things i don't know what do you think ruben it's it's going to solve everything. So yeah, that's how I see things. I don't know. What do you think, Ruben? It's funny you mentioned that, right? Because I think we had a period, which perhaps is ending now, but maybe from 2010 to 2020,
Starting point is 00:38:14 where it seemed like any company just added AI onto their name or their product or description. And that all of a sudden made it really interesting. But it turns out that, one, it wasn't really AI. It was perhaps at most machine learning and most of the algorithms were just being reused,
Starting point is 00:38:32 right? There were things that were built for other purpose and they were just finding a different use case for it. So I think it became a bit of a crutch for a lot of companies. And I saw all kinds of companies, companies that were going to write copy for you, or we're going to write like Facebook ads for you and all this kind of stuff using AI. And I'm not sure how many of those are actually useful. So we'll see, we'll see what, what happens. I think SaaS companies will continue to add this machine learning AI, and they might call it AI. Maybe it's not truly AI as easy ways to make the product more useful.
Starting point is 00:39:06 But as you mentioned, when we look at the best companies out there, when it comes to being data driven and you likely have examples for me, which is hard, is they've done very consistent training or education or coaching at a company level, at a culture level. And people in general have a high level of comfort with data and whatever that looks like. And it may not even be fancy. It may just be very simple excels and things like that, but they have the ability to work with it and get insights and then try things and then innovate on it. And that's hard, right? The AI might help a little bit with the insights, maybe with experimentation and making sure you can experiment faster, but there's still things that you can't really get around. You have to really train the people or hire the right people or build the right company culture. Sure. Company size, I think, is a really interesting component of this conversation because speaking in general terms, you sort of exclude large swaths of the
Starting point is 00:40:23 market depending on company size, right? So for example, a really early stage startup company may not even have enough data for AI or ML to be applicable, right? There's just not enough there. They're too early. The product's changing really quickly. So I think that's a really interesting insight around the needs of the particular, and really it's what you said, Costas, right? Like what's the problem and what are you trying to solve? And that can vary significantly depending on the stage of company, the complexity of the stack, the size of the data set, and there are just so many variables in there. Yeah. You mentioned a great point. I have a lot of startups who reach out to me very early talking about pre-product market fit, three people, pre-beta
Starting point is 00:41:07 even. There might even be no product out there. And they reach out because they want to set up this entire data stack, right? And say, hey, we know we want a CDP and we want product analytics and we want a BI tool and we want something for surveys and five or six tools. And they have almost no users. And it's just too early. Even if they were to have those things, there wouldn't be enough data to make it useful. They might be looking at,
Starting point is 00:41:34 I mean, literally a hundred users and trying to understand how those hundred users are using the product. But instead of trying to look at those users through charts, funnels and line charts and things like that, it's probably more valuable to just talk to them, you know, just get on a phone call, do an interview. And this is where I think companies, especially executives need to understand the context of data. If you're in that stage, it's almost, it's not really a waste of time, but it's not a very good use of your time to try and set up this very advanced ways of visualizing data when you don't have any, instead of just talking to people and going low tech, low data, right? More qualitative as quantitative. Now, if you're at the extreme, you're a public company, you have millions of users, then there might still be a role for interviews and qualitative data,
Starting point is 00:42:25 but you want to make sure you have the ML and the quantitative component as well. Right. So it depends on the situation. It depends where you're coming from, but trying to find the right or the best use of your data for your purpose. I think what Costas was saying, what is the real problem here? And what's perhaps the most effective way of getting there, not the most sophisticated or the most exciting. Yeah, I was talking with someone who is a data engineer at some really incredible companies like Heroku and some other companies, like from very large companies to startups.
Starting point is 00:42:58 And he was talking about the stages you go through in terms of data engineering. And he talked about your three people in a garage startup. And he said, you want to go out and buy fancy analytics tools? He's like, just query your production Postgres database. He's like, you just don't have enough data for it to be meaningful. And then it reminds me, actually, Kostas, do you remember we were talking with Alex from the Pool app? He's a founder who went through YC. And we asked him about analytics at a very early stage because he had 100 users. I don't know how many, but in He said, I used my product analytics tool. I don't remember what when he was using.
Starting point is 00:43:47 He said, I use it literally just to figure out who I should talk to. What are anomalies or people who adopt really quickly? I thought that was really, really interesting and aligns exactly with what you said. Yeah, yeah, 100%. Like at an early stage where people, I mean, it's not just like the number. It's like it's also you are at a stage where also as a founder, you educate yourself. You need to build your intuition, right? And you are not going to build this intuition fast enough if you just look on a screen and try to figure out what's going on with the numbers. Probably you're going
Starting point is 00:44:26 to just waste your time, to be honest, because there's too much noise in this data instead of signal. It's much, much better to go out there and pick up your phone and talk with someone. And I think this is something that is relevant for a much longer time when it comes to B2B companies, because numbers grow much slower. But of course, like to B2B companies because numbers that grow like much slower. But of course, like with B2C, you reach a point where doing analytics and trying to aggregate the data in an automatic way is necessary because you just have too much data, right?
Starting point is 00:44:59 Imagine a company like DoorDash. Yeah, like of course you need analytics there. Of course you need advanced analytics because otherwise you have like so much data that a human being cannot interpret them, right? So yeah, I think that anyone who has started the company, and they went through that, they have come to this realization of, I can build whatever model I want with the early stage data that I have, but 90% of the times it just fails.
Starting point is 00:45:32 Yeah. I'll give you an example of where this plays out. Last year I worked with tourism agencies here in British Columbia, Canada, and of course, hardest hit industry by the pandemic, unprecedented drops in volume and visitors to the country and so on. And I worked with one in particular that was really quite frozen by all this. It was a tough situation to be in. And they really wanted to look towards the numbers to kind of figure out what to do next. And at some point, I remember telling them, I mean, the numbers are not going to change. We know they're not going to change. Regulations are not going to change that quickly. We know you're sort of 90% down from regular averages historically. We know all this.
Starting point is 00:46:09 So the question is just, what are we going to do about it, right? Is there local travel we can encourage? Is there financial decisions that have to be taken at a company level? These were hard decisions to be taken. But I noticed that they felt quite stuck and sort of paralyzed. And they wanted numbers and reports and dashboards, specifically quantitative numbers, to tell them where to go and where to move. And I think that was one of the weaknesses they had to deal with. It was hard to overcome. And the same thing could happen in early stage companies, one in numbers, one in machine learning to tell you something about those 100 users, instead of just talking to them. And in tough decisions that can be a bit of a crux. And I think that that's one of the challenges I was seeing with companies that really want to be data driven, or even just individuals, whether in a crisis or not, they'll run into situations where
Starting point is 00:47:05 they don't have enough data. It's just not enough data to be any kind of sort of statistical validity. But you still have to make decisions nonetheless. You can't just wait until all the numbers are in. And I think having a comfort and being able to go both ways effectively, right? Be able to use data to analyze patterns and make decisions and being able to make decisions despite a lack of data. I think that that's what kind of builds resilience among companies and individuals.
Starting point is 00:47:36 Yeah, I agree. One thing that I was talking to someone about recently is that the one thing that I've noticed, I think, as I've had the opportunity to work at companies at various stages of maturity, but really with teams that have sort of deep experiences, entrepreneurs and sort of taking companies to market and, of course, leveraging data to do that is when it comes to decision-making, I find that people who are really good at it, they've built an incredible amount of muscle at breaking down numbers into sort of the simplest form and sort of only looking at the necessary components, right? As opposed to saying, let's try and build some very complex model, which is necessary sometimes, right? But in many cases, you're trying to make a high level
Starting point is 00:48:30 sort of almost directional decision. And it's really interesting to see really, really smart people actually break numeric things down into pretty simple stuff that makes decisions a lot more clear, right? Because, you know, I think clarity is a huge deal when it comes to making decisions based on numbers. I cannot believe that we have run through the entire time. We did not get to the third topic we wanted to discuss, which is CDPs. So Ruben, we'll have to do that on another show because, boy, that's a loaded term and you see so much of it. But before we're going to have to end the show soon, but before we go, I'm thinking about our listeners who are considering the people side of the equation, which you mentioned. And as I've reflected in the conversation throughout the recording, I really think that there's this element of building technical discipline around getting the data correct in the systems. building muscle around the people side of interacting with and actually sort of using
Starting point is 00:49:47 data and making decision rounds data in the organization. So for our listeners who are actively working on the technical piece, but maybe want to get started with just some really practical things, maybe they could start working on this week to build muscle on the people side, what are the top two or three things you would recommend they do in terms of getting started? Yeah. First, I think they need to take a stock of who's going to be using the data and what their skills are. That is, do they know SQL? Are they comfortable with technical topics? Or are they going to have lots of questions around what may seem like basic things? How does this number work? How is this data collected?
Starting point is 00:50:27 What's the formula here? Things like that. Based on that, then you have sort of different paths, right? If you have a highly technical team, then your rollout, how you get sort of this in the hands of people will be slightly different. You're likely going to want to make sure that you allow people to run their own SQL, that you have a lot of diagnostic information so they can explore the data on their own,
Starting point is 00:50:52 and that everyone just has to the right permissions and access for it. Highly technical teams, if you have a low technical team, then you need to make sure that there is nice interfaces for interacting with the data that doesn't require a knowledge of SQL or similar things. And you want to kind of get a gauge as to what skills might be best suited for kind of training, whether it's, again, basic statistics, basic probabilities, how to make decisions, how to read charts, what sort of the difference between the line chart, bar chart, how the chart design might completely change the meaning of a KPI or a number, things like that. And then the third step, kind of figure out the ways to close those gaps. And the topics we're talking about here they're not university
Starting point is 00:51:45 phd level topics so they can be covered in you know informal workshops one-on-ones and things like that but you have to know what you what you want to teach and if you're dealing with some of the skills there may be some discovery or research that's needed and i mean that as i'm talking to people because if you ask someone, are you comfortable with numbers, they might just say yes. But if you dig a little deeper, you might find out that, you know what, like, probabilities, it's not something that comes natural to you. Statistics was not your favorite class in college. So you start to kind of figure out what are some of those skills that you start to work on as a team or as a company. I think that's incredible advice.
Starting point is 00:52:25 And I'll actually say, I really did enjoy statistics in college. Math isn't my strongest subject, but it's been really helpful. And Kostas, I will say some nice things about you because I've had the opportunity to actually work with you on some internal reporting. And with your engineering background, thinking about things like cohort analysis and some
Starting point is 00:52:50 other components there where you have the ability to explain some of those concepts to me in a very practical way as we're looking at a data set together has been really, really helpful, especially for me not necessarily having a non-technical background. So I can say that if you're listening to the show and there's someone who's non-technical, but who interacts with the data you're producing, please take the time to help them because it's been hugely helpful for me. Thank you so much, Eric. I really enjoy doing it. So I'm happy to do it anytime. Great. Well, Ruben, thank you so much for joining us. And if people are interested in your book,
Starting point is 00:53:26 where should they go to get more information on that? Yeah, the Data Mirage is available everywhere you can buy books. So Amazon, Barnes & Noble, Chapters,
Starting point is 00:53:34 if you're in Canada, Google Play. And they can also go to my website at rubenyugarte.com and they'll find links to the books and blog posts
Starting point is 00:53:43 and videos and other free resources. Great. Well, thank you so much for joining us and they'll find links to the books and blog posts and videos and other free resources. Great. Well, thank you so much for joining us and we'll talk again soon. Thank you for having me. I really loved the part of the discussion where we were talking about the ways that the pandemic accelerated so many things. Obviously it was a hugely tragic and challenging event in so many ways. In terms of data and digital transformation, though, it really forced a lot of companies to do just a lot of different things to update the way that they are dealing with data and creating customer experiences.
Starting point is 00:54:17 And I think, as I was reflecting on the conversation you were having with Ruben about that, Costas, which was really, really enjoyable to listen to. There's this phrase that every company is becoming a software company. And I'm not going to trademark this, but in many ways, the pandemic forced every company to act like an e-commerce company in the way that they deal with data. E-commerce many times is sort of on the sharp end of trying to figure out how to leverage data to grow and really drive the customer experience with data. So I think that was my big takeaway. And I think that's something I'll be thinking about
Starting point is 00:54:55 in the upcoming week. What stuck out to you? Yeah, that's a great point, Eric. For me, I think it's the validation of like a kind of current theme that we see in our conversations, which is the relationship between data and people. I mean, you heard like Ruben, he said that a big part of his book is actually dedicated
Starting point is 00:55:15 to how important people are when we are trying to be like a data-driven company and how many things are missing there. And I think this is like, as I said, another validation of the concept that data is not here to substitute people, right? It's here to be another tool for people. I know that we've said that many times before, especially with people that are coming
Starting point is 00:55:37 from the ML space or the AI space. And let's say the most advanced, let's say, use cases where everyone is afraid like the AI overlords will come and take our jobs and all that stuff. But at the end, from what it seems and what becomes like more and more obvious is that data is just another tool, right? And it's another tool that augments
Starting point is 00:55:58 like the capabilities that humans have. And it happens like at every stage and with almost like every problem out there. And so we don't only need to build new technologies, we also need to educate people on how to use these technologies if we want the technologies to succeed. So that's like what I keep from our discussion
Starting point is 00:56:20 and I'm looking forward to chat with him again in the future. That was a very succinct and elegant summary of a philosophical perspective on data. And I appreciate your ability to do that on hand. So if you want more concise philosophical predictions about the future of data from Costas and maybe me. Join us on the next show. Tons of exciting episodes coming up through the rest of the summer. And thanks for joining us. We hope you enjoyed this episode of the Data Stack Show. Be sure to subscribe on your
Starting point is 00:56:58 favorite podcast app to get notified about new episodes every week. We'd also love your feedback. You can email me, ericdodds, at eric at datastackshow.com. That's E-R-I-C at datastackshow.com. The show is brought to you by Rudderstack, the CDP for developers. Learn how to build a CDP on your data warehouse at rudderstack.com.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.