Drill to Detail - Drill to Detail Ep.67 'Thinking Probabilistically and Scaling-Out the Modern BI Stack' with Special Guest Dylan Baker

Starting point is 00:00:00 So welcome to Drugs Detail, the podcast series about modern cloud analytics and the people and products driving innovation in this world. So my name is Mark Rittman, and I'm joined in this episode by Dylan Baker, a freelance analytics consultant who many of you may know from the Looker and DBT discourse forums, his previous role as BI lead at Growth Street and from the DBT London meetup which met for the first time a month or so ago. So welcome to the show Dylan and it's great to have you here. It's great to be here, thank you very much. So Dylan, just for anyone who hasn't heard of you, just do a brief intro as to who you are and how you came to be in London. Sure. So as you said, I'm a freelance analytics consultant.

Starting point is 00:00:52 I help predominantly startups in London get their head around analytics and particularly kind of the infrastructure and modeling sides of that. I've been, I've got a bit of a funny accent. I've been in the UK for about 13 years. I grew up, I was born in Canada, grew up in Canada and then moved to the UK as a teenager. And I've been here ever since really. And how did I get to London? I guess I came originally actually as a journalist. I came and did my master's in London as a broadcast journalist, bizarrely enough, having hated my maths degree, and then very quickly moved away from that into BI proper and have been in startups doing BI ever since. Okay. And you're moving back to Toronto soon, aren't you? Or moving over there very shortly?

Starting point is 00:01:35 I am. Later this year, we're planning a move back to Toronto. I grew up in Montreal. Well, okay. Brilliant. So you, I mean, how I know you is from some, some brilliant posts and blog posts and things you've written about Looker and analytics development in general and what we call the modern analytics stack. But you work primarily with what you refer to as VC funded startups. I mean, tell us what, if anybody doesn't know, what is, what are VC funded startups and why did that area interest you really? So I initially ended up in startups. My first job in London was as a tech reporter at a small B2B tech publication that focused on tech startups in London around Silicon Roundabout, which I guess is a term that we use less and less these days. And I kind of fell in love with startups. I think there's a speed and agility to many or most startups that I found very contagious as somebody writing about them whilst also kind of in a very small publishing business themselves. And I've never left really. I very briefly ran a digital publishing business after that and have been in startups ever since.

Starting point is 00:02:45 And so to answer your question as to, I guess, what are VC-backed startups for anyone who doesn't know, when somebody thinks up of an idea, they'll often go and raise a small amount of seed funding early on, maybe a couple hundred thousand, couple low millions. And then often if they find product market fit, they'll raise a larger amount of money. And that's typically when I will come in and work with a business. It's typically around the stage that I've had my permanent roles with businesses. And it's really when they often start thinking about analytics in a serious way and start wanting to put meaningful infrastructure and reporting in place for the first time, putting

Starting point is 00:03:22 in place a modeling layer, putting in place a data warehouse. And that's right around the point that I typically work with businesses, helping them set that up. Okay. Okay. So actually I was listening to a podcast that was recorded with Tristan Handy from Fishtown the other day, actually. And he'd be talking about the growth stages that startups go through and at what point they need analytics results and what point they start to kind of hire in and so on. And interestingly, the, the, the example he kind of gave at the start was, was where people hire in ahead of BI really, you know, after, after they get some funding, they get ahead of BI. And that was the role you did at Growth Street, wasn't it? I mean, maybe tell us a bit about what you, how you came in to join Growth Street and what your original remit was there and how

Starting point is 00:04:05 things looked at the time. So I joined Growth Street as an analyst, as a business intelligence analyst, really, but it's the first hire really in the business to look at data and with a remit to grow that out and start thinking about what, particularly on the infrastructure side, needed to be put in place to facilitate reporting around the business. And they were really at that kind of growth stage. They were starting to think very seriously about these things. They had some reporting historically, a lot of which was in this ecosystem of Google Sheets, which we can talk a bit more about. And so I came in as the first hire to build out a small team, to put in place the infrastructure, which was a great time. I think it's a very, it's part of the reason I continue to work with businesses like that. It's very fun

Starting point is 00:04:49 and can be very satisfying to build these things from the ground up and really go from zero to one, helping people answer things that they'd never had the answer to before, or answer things much more quickly than they really thought, given where they'd been, you know, a couple months earlier. And so I came in there, I was at Growth Street for two years. We grew a small analytics team, brought in a BI tool, Looker, built out a kind of modeling infrastructure around a warehouse. And that was largely kind of what my responsibility was.

Starting point is 00:05:18 Okay, okay. So was it not the case that they say the engineers in the company would build out the analytics stack, or maybe not a DBA, but certainly the infrastructure team? Is that not what happens normally? I think there were two bits and some of it's specific to the business and some of it's just where we are in terms of analytics these days. I think generally you want to have your engineers, particularly at that stage in the business, really focus on the product. Rapidly changing business, GrowthSuite's financial technology company, a huge amount to be done on the product side. And really, we wanted to not distract them from that work.

Starting point is 00:05:59 I think we're also at a point in where analytics technology is where you can set up a lot of it without leaning too heavily on an engineering team. I think the tooling available to people is now increasingly accessible to be able to set it up from scratch without having to lean too much on those types of people in the business. That said, we definitely did work with them a fair bit to set up Redshift and just make sure from a security point of view, all of that was set up. So it wasn't that we were completely detached from the engineering team and we worked very closely with them to get data

Starting point is 00:06:33 from the applications into the warehouse. But I think increasingly, you're seeing very self-sufficient analytics teams within organizations who don't fully self-serve from a tooling point of view, but definitely have the ability to do a lot of the heavy lifting themselves. Okay. Okay. So I guess, again, in terms of the tools you kind of got hold of and how you work, there

Starting point is 00:06:54 were various options you had there. Back in the old days of the first dot-com boom, it would be the first thing any startup would do is go and spend millions with big vendors and buy everything from one vendor, the analytics, the database, all that kind of stuff. Or another extreme might be to go and do it with open source. But you went down the route of getting, I want to say best of breed, but certainly a modular approach to how you put your stack together. I mean, just tell us about that and some of the thinking behind that really and how that worked out for you. So we were very cognizant of the fact that

Starting point is 00:07:25 what we were going to put in place in day one was going to have to change a little bit as we grew, that the requirements of a business when they're 40 people is different to what they need when it's 100 people. And it's different again, when they're at 200 people. And I think we wanted to make sure that we weren't locked into anything and that we had for where we were at that time, a best in class solution for each component of our analytics stack. And so it ended up bringing us to a place where we did have a very, as you said, very modular stack. We had a third party, for the most part, third party ETL tools to get the data into the warehouse itself.

Starting point is 00:08:03 We use Redshift. We then used DBT, an open source tool, to do a lot of the stack and the amount of work that we did in different parts to get that right and did continue to evolve as the business grew. Okay. Okay. So we talked about Looker and Mode there as two sort of front-end tools, but what about the, I suppose, the data engineering side, the backend, the data pipeline side? How did you go about solving that problem and getting services set up for that? So we were lucky that we, as a business, used predominantly third-party tools that were well supported by the ETL vendors, Stitch, Fivetran, Aluma, and actually Stitch covered for us initially all of it, except for our Postgres database, I think, which we did through

Starting point is 00:09:01 AWS's database migration service. So we were very lucky that from kind of off-the-shelf tooling from AWS, as well as Stitch, which can be very cost-effective at the early stages of a startup, we were able to do most of it, or all of our getting the data into the warehouse through those tools. In time, we added tooling, added kind of tools across the business that we used that were more niche, that weren't as well supported, and contracted people to build Singer Taps, which again, worked really well with Stitch. And that's kind of why we stayed with Stitch, was that we were supported for the bulk of what we needed. And then where we weren't, and frankly, no one supported the tools, some of them were very niche things. We were able to integrate that with Stitch by building with Singer the open source platform that they've released.

Starting point is 00:09:51 Okay. Okay. So you're at Grow Street for a while. But then something must have happened or certainly that led on to where you are now, which is that you've moved now into freelancing and I suppose offering your services to multiple companies. What was the kind of motivation and what led to you posting that thing famously on Twitter saying, you know, you've finally done it now, you're actually going to go ahead and move into the kind of freelance world? What was the thinking and the drivers behind that? Part of it was personal. Part of it was driven by a move to Canada later this year and kind of maximizing for

Starting point is 00:10:28 optionality and consulting definitely ticked that box. But a lot of it was, I think, a desire to help businesses do analytics better. I think lots of people, I think analytics has changed a lot over the last five years. And I feel very fortunate that I've been able to see a reasonably new way of doing analytics at a few different businesses over the past five years or so. I've been very lucky with the experiences that I've got at various different startups. And I really enjoy helping businesses start that and be able to make those correct decisions from day one. I think I've seen a lot of businesses make very poor decisions early on with how they set up analytics and it sets them

Starting point is 00:11:12 back years sometimes. They end up far down the road with things that just don't scale or aren't modular or reproducible. And I really feel very strongly about helping businesses in that space. And I think there's a lot of demand there. I think people, businesses generally are really more and more seeing the value of data, really understanding that it's key to building a great business. And so I wanted to help facilitate that. I really enjoy working with those businesses. And so they seemed like the right move to do that.

Starting point is 00:11:45 Okay. Okay. And I think that's often right move to do that. Okay, okay. And I think that's often, I mean, it's certainly a driver for me that you've got a bit of a mission really and a philosophy about how you do things. And as you say, I suppose analytics has changed a lot in the last few years. And it's a lot more, I think about more, it's more aligned with software development.

Starting point is 00:12:02 It's more technical. But the scale at which you can do things and the impact you can have is is is amazing and this is the kind of thing i want to talk about really in this in this interview with you to talk about i suppose a couple of things that you've come become particularly i suppose associated with some of the thinking around the api look api or just devops and so on in general with analytics and later on talk about another talk you did which is about actually how we come to make decisions and and so on let's start off really by a few weeks ago i came along to the uh the looker the sorry the dbt meet up in in london and actually no it was the looker one wasn't it it was the looker join event actually i organized funny enough and i invited

Starting point is 00:12:40 you along as a speaker and um and you were talking there about the Looker API and scaling security. Just taking a step back, what was the genesis of the idea around that presentation? And at a high level, what were you talking about? And we can drill into some of the things, topics after that. Yeah. So when I was at Growth Street, we were heavy users of Looker. Most teams around the business used it. And a problem that we had was that we wanted to be able to have an audit trail of who had access to what data and when. I think Looker is great as a tool. I think the kind of advent of being able to have so much data available to end users across the business is fantastic. But the other side of

Starting point is 00:13:27 that is you need to think a lot about data security and what you make available to people. And at the very least, in my opinion, having an audit trail of who had access to what data, who had access to what model. Increasingly, as businesses become more sophisticated, you need to report on those things to any number of parties. And that was something that we found the Looker platform didn't do for us as well as we really wanted to. But luckily, they have an API that they refer to as having 110% coverage of what you can do in the tool.

Starting point is 00:14:01 And so we used it to build or start building, which I've since finished working on, a tool that allows you to change and control your groups, roles, model sets, and permission sets via the command line. So you define it all in a YAML file and you can run it either locally or as part of a continuous integration setup. And what that means is you can make those YAML files version controlled and ensure that you've got an audit trail of all the changes to part of your security settings for Looker, ensuring you know who was able to do what and when in the platform. Okay, okay.

Starting point is 00:14:36 So what's the advantage? I mean, because obviously within Looker, you can define roles and groups and permissions and model sets and so on. What was the driver for having it, say, doable from the command line or from a YAML file? And what does that add to things, really? And what does that align with? So I think there's two components there. I think the first one is having it in a YAML file, and that fundamentally makes it version controllable. So every time you make a change to it, you can use Git and you can ensure that that change is being tracked every time that file changes.

Starting point is 00:15:08 And that's really useful if you want to be able to look back and go, who had access to this model or explorer at this point in time? And that will be able to make it clear to you by going back in your Git history. The other side of it was just being able to have these things, the kind of configuration as code, being able to edit these things en masse doing bulk changes, which you can do. You don't have to change things at a roll by roll. You can make all the changes to the YAML file at once and then push them, which is also nice from a workflow point of view. It means that you can kind of see in one place all of that in text in your text editor and push it in one go and be confident that those changes have all have all been made okay okay so so i mean i suppose automation and devops in general would look at projects

Starting point is 00:15:59 and analysis projects i mean it's it's becoming an increasingly more important thing isn't it and there and there's you know there's lots of kind of, I suppose, frameworks and platforms you need to integrate with now in companies. What does good look like really on a project that you're involved in that touches on these sort of areas? I think good is about making the developer experience as seamless and as comfortable as possible. I think for years, we've had really high quality tooling for software engineers, for developers. They can push a pull request, or they can create a pull request, and they'll receive 12 different flashing lights on their GitHub pull requests that tell them, yep, this has passed, this has passed, this has passed, this hasn't, you should go look at that. And I think for a long time, analytics has lagged

Starting point is 00:16:49 behind that. We haven't provided the same level of sophistication or tooling to people who do analytics work. And it just makes their day-to-day work less smooth and a bit more frustrating. You have to do more stuff manually. And I think we're seeing now a move in that direction. I think we're seeing a huge improvement in what type of tooling can be made available. And so for me, good is about, well, how do we make the analytics developer, analytics engineer, BI analyst role as smooth as possible?

Starting point is 00:17:19 How do we let them make changes and get feedback on those changes as quickly as possible so that they're not wasting time on things that could be automated and that, frankly, they don't enjoy doing? And it's been great to see Looker over the past few years start building themselves some of those open source things. So they released a tool called Look at Me Sideways recently, which provides some linting and other feedback on looker changes. I know Warby Parker recently released Look ML Linter. And so more and more, there is stuff being built in that space. And so for me, it's how do I make the life as easy as possible for the other people who are working in these code bases, working in these tools? How do I let them work as quickly as possible

Starting point is 00:18:07 and have confidence that they're pushing kind of high-quality, reliable code? Okay. So you, Matt, I think in Growth Street, you end up managing a team of several kind of, I suppose, analysts and so on. I'm interested, what kind of skills would you look for in someone who is an analyst that you'd be recruiting, really?

Starting point is 00:18:24 So if you have that role within a company like, say, Growth Street what kind of skills would you look for in someone who is an analyst that you'd be recruiting really? So if you have that role within a company like, say, Growth Street or the companies you're working with, what is the typical kind of skill set and I suppose outlook on how you develop things you'd be looking for when you recruit? So I think a lot of the skills haven't changed from what people would have been looking for before. It's being curious, it's having good softer skills in particular. I think kind of engaging with end users having good software skills in particular. I think engaging with end users is a key part of it. I think where it's particularly changed is looking for people who have a desire to do things in the scalable way and not write the same SQL query for the seventh time or a version of the same SQL query for the seventh time, but really think about,

Starting point is 00:19:02 particularly if you're an organization that used Looker, what's the underlying data model and what's the underlying kind of explore in Looker that allows me to do this once and take a bit longer, but to answer those seven questions and a number of other questions really well. I think it's people who think in that way and aren't just trying to answer the question once, but are really forward thinking about, well, these are the other questions that this type of person could ask. These are the other questions that may be of interest. And often it's not that that person's going to go off and ask them, answer them, but it's, they kind of can foresee what an analyst in a different team that uses Looker may want to do. And it's obviously hard to test. I think often it's people who

Starting point is 00:19:43 think in a kind of software engineering type way, though that's not really where we had the most success hiring. I think actually it's we hired, I think, analysts who I left at Growth Street, who was there as part of the team, was a grad and we'd hired as a grad and really just had demonstrated an ability to think in that way and do things in a scalable way that would serve the business going forward really well. Okay. So you mentioned the Looker API there. I mean, you've built things off of that and I'd be interested to understand what else you've been doing with that as well. But for anyone who doesn't know what the Looker API is, just maybe just kind of what is it and how do people make use of it and what is it there primarily to do really? So the Looker API is a RESTful API that you can make kind of HTTP requests to. And it allows you to basically control your Looker instance. So almost anything that you can do in your Looker instance, you could control via your API,

Starting point is 00:20:40 whether that's creating a user, creating a group, seeing what LookML model is, you can really kind of get all the information that you'd want, or almost all the information you want from your Looker instance via the API. And so you can do things like automating the creation of users or automating the creation of groups. Or one thing that we're starting to look at a lot, we use dbt, which is where a lot of our underlying data lives. And people have to straddle both those systems when they make changes. If they change a field in dbt, they need to make sure that the corresponding fields in Looker are all kind of referencing the same thing. And Looker doesn't give that to you.

Starting point is 00:21:22 It tells you if there's kind of references within Looker, content that is built on fields that no longer exist, but it won't actually tell you if the underlying column in the underlying database does exist. And that's something that we're thinking about using the Looker API and DBT to test. So ensuring, pulling down every dimension from every Explorer, and then running those against the warehouse in an automated fashion and making sure that nobody's going to run a query in the Looker front end that gives them an error because of something that they push in dbt. And part of the problem is that you don't realize that there's a problem until somebody actually tries to run that query in Looker. And that often won't end up being you as the analyst, it's your marketing, someone in

Starting point is 00:22:03 marketing or someone in sales. And when that error comes up, you immediately erode the confidence you have with that user. And so I think with self-service, which I firmly believe in as a model for kind of sharing analytics around a business, you need to be really confident that what you're shipping works because it's really easy to erode confidence with the end user if they get the wrong answer a few times or can't get the answer they want at all a few times. And so that's really where I think about using the Looker API. It's also, frankly, just useful for scripting things. So things that you might want to automate, whether that's just deleting old Git branches. You can control and view all the

Starting point is 00:22:43 Git branches through the Looker api and that's something that i'll regularly i have a script that deletes all unused git branches that are more than three months old and that's again something that you can have someone go and do through the ui but if you're capable of using an api and capable of writing python in my case you can do really quickly okay okay so so you mentioned about testing there as well and i remember there was interesting i think they now they announced that looker joined last year, there was going to be a regression test framework in Looker itself. And I don't know whether that's happened to that, whether that still happened or not. But I mean, that just raised a bigger question. And I found this on projects as well, I've been working on recently, that how you get people to build the tests at the time that you do, they build out the look of content. And I know there are tests within dbt you can build. And I know there's ways of doing this, but it's, I find it quite hard sometimes to actually know what it is to measure the results against from the source system. It's, it's not quite as, not quite as easy in practice as it is

Starting point is 00:23:38 in theory. I mean, what's your thoughts on, do you have any thoughts really on building robust tests and regression things and so on into things as you do it? Or is that more of a kind of an aspiration really? No, we do less on the Looker side, but more on the DBT side. We do think a lot about testing. I'd love to advance the conversation with testing around Looker. I agree. I saw the release that came out last year.

Starting point is 00:24:05 We were both at JOIN in San Francisco, and I'm reasonably excited to see what they do around regression testing in Looker. It's something that we don't do a huge amount of yet. We trust that the data that we put in via DBT is well tested. And we do do it at the time. We kind of, different organizations, having tests on every model that gets created is and the rest of their

Starting point is 00:24:46 tooling but around data and i'd highly recommend people go and seek that article out because it really um kind of advanced my thinking in that area and i think it's a really uh for lots of people thought-provoking article around testing and analytics because i think it's not a thing that people have done for the most part i think we we write SQL queries and ship them and often don't really think about... Tests fundamentally are things that validate our assumptions about a data. And so you have an assumption about a table and you should go and write a test that allows you to automate confirming whether that assumption is true or not.'s we do it a lot as part of our work but i think it's historically not a thing that people have thought enough about in analytics

Starting point is 00:25:30 okay what about performance i mean again the thing that i've experienced a lot of is something's built out but it performs so slow as to be unusable and and and you know and the thinking is well we'll sort that later on yeah do you when you build out stuff in in analytics and look at do you do you address performance as a as a sort of a number one thing or is it something that is it addressed as you go along i mean what what just curious what's your what's your thoughts on that it's more something that we address as we go along we definitely as we test out a new thing, ensure that it doesn't run incredibly slowly, but we don't in any way other than user testing, test it out initially, but we definitely do testing kind of after the fact and on an ongoing basis. Lookers,

Starting point is 00:26:16 actually your talk at the Looker Meetup was all about the system activity explorer that they released. It allows you to understand how long queries take and what's being used. And we use that a lot with a number of my clients. We use it a lot at Growth Street to understand if there were areas of the data that were regressing in terms of the speed that they could be queried or whether performance on the whole. And so it's less a thing that we think about as we push work that we do validate that it's able to be queried in a reasonably timely matter but really on an ongoing basis we think a lot about uh how the warehouse is performing and what might have happened to do that or whether it's just a degradation generally and how we can tackle that

Starting point is 00:26:56 okay okay so the other talk i heard from you was was actually the the meetup that you organized which was the dbt one and i came in i have to admit came in halfway through because my train was delayed from back from brighton and um but you were talking about about how you make decisions um and how you think about probability and so on there and i'm curious to kind of hear a bit more about that talk really so just just kind of recap just in summary what was the talk about and what again what led that? What was your motivations before we get into the detail of it? So the talk itself was about how you can help people to think probabilistically, or how you yourself can think probabilistically, and how you as an analyst can help your organization think more probabilistically, particularly around kind

Starting point is 00:27:40 of understanding uncertainty or being comfortable with uncertainty and also how we make better predictions and how we kind of assess our confidence in predictions that we make day to day around things like how much work will get done or what the timeline of a project will be or what the success of a feature will be. And so it was really all around how do we help people think and calibrate the predictions that they make day to day. It was inspired by a handful of things. I'd recently been on holiday and I read Nate Silver's The Signal and the Noise. And he talks a lot about it, writes a lot in the book about how poor we are at making predictions. He shows a graph, which is a survey from, or it's the results of an edition of the survey of professional forecasters in the US, which happens either

Starting point is 00:28:33 quarterly or monthly. And a bunch of professional forecasters who are economists and other things were asked, what do you think the GDP of the US will be the next year. And this was, I guess, November 2008. And they all said, we think it'll grow 3% or 2.7% and put the likelihood of a shrinking being infinitesimal and the idea of it shrinking by more than 2% to be almost impossible. And for anyone who remembers 2008, 2009, which will be most of your listeners, the economy definitely did not, the US economy definitely did not grow 3% that year. It shrunk by almost 3%. And these are people whose jobs professionally were to make forecasts. And so, you know, let alone you or I who don't put the word professional forecasters in our title. And it just kind of reiterates how poor we often are. Confirmation bias is a thing. We don't seek

Starting point is 00:29:32 out all the information available to us. And just generally speaking, we're not great. Humans generally aren't great at making predictions. And so the talk was about, half of it's about how flawed we are as human beings and the various things we do is inspired by Annie Duke's work around resulting. And so how we look at the outcome of a decision to indicate the quality of the decision itself. And actually, sometimes those things are inherently linked, but often there is a level of kind of uncertainty around, you know, there's a probability of certain outcomes, and you can make the right decision and have the wrong outcome. But often, if we see the wrong outcome, we assume that the wrong decision was made. And it was just starting to get people thinking about that. And so I outlined through kind of suggestions by Annie Duke and Nate Silver

Starting point is 00:30:20 in his book, and particularly the guys at Twitch, the video game streaming company, who have done a lot of work in this area for things that you can do to kind of improve in this area. And those were running prediction training in your organization. So helping people actually think about these things. And I guess giving talks a bit like the one I did, making them aware of how bad we typically are at these things, but then asking them questions. And so training often looks like you ask them 10 questions that have a quantitative answer. And you say, I don't want you to guess the answer, but I want you to tell me your 80% confidence interval. So what is the lower end and upper end of the answer that you'd give that

Starting point is 00:31:01 makes you 80% confident that the answer will be in that range. And often you ask them 10 questions and four of them are in the range or 10 of them are in the range. And so they've either gone way too broad or they're overconfident on what their answer should be. And obviously if they were kind of well calibrated, their 80% confidence range, eight out of 10 would fall in the range. And so you then give them another 10, having kind of learned and adjusted their priors and their confidence of that first one. And they almost always immediately improve. And so you can run training sessions to get people doing, making better predictions day to day, even about small things, because it can have a meaningful effect on the work that we do and how we run our organizations. And then the three other things

Starting point is 00:31:48 were making those predictions day-to-day. So outside of training, actually making flash forecasts on a day-to-day basis, writing them down and returning to them has been shown to can really improve our ability to make better predictions. Running premortems is a thing that can help us understand the risks a bit better and kind of really understand kind of going into a project, giving people the ability to raise the kind of uncertainty and the risks that they see, because often people are aware of them. We just don't give them the forum to voice them. And that can help people kind of think more probabilistically about what the outcomes are. And then finally, Annie Duke has this idea of just saying, do you want to bet?

Starting point is 00:32:34 And the second you say that to someone, they immediately start thinking about how they could be overconfident or how they could be underconfident or what the actual probabilities of something are or what information isn't available to them. And it's, you know, you can't do so antagonistically. You need to have a culture that, that allows, you know, that type of thing, but just saying, do you want to bet to someone when they're thinking about something or they've said something can really help them kind of think more in depth about, about that decision. So that, you know, kind of long-winded way is what that yeah that's good i mean so in practical terms i mean how you get that presentation in a dbt event

Starting point is 00:33:14 and obviously involved in the looker world i mean how how can you start to express this confidence factor really in the stuff you do in those tools how because we tend to sort of say you know there's a number on the dashboard it is this number here it is saying that your your utilization is this you know whatever whatever there's not a huge it's not as common to show maybe a range of values or things like that how do you express that that that kind of concept really so i don't think in much of the work we do, it gets expressed necessarily in a dashboard itself. But I think what the manifestation of it is, but fundamentally you're shifting the point of analysis or some analysis from people that 10 years ago would have been an analyst in your BI team. And now you're allowing people around the business to do that. And that's fantastic, but

Starting point is 00:34:15 you need to kind of get them thinking the right way that allows them to understand, particularly where there's correlation versus causation. I think that's one of the things that we jump to a lot and you see lots of people, you know, see a number and go, well, this correlates with that. And this type of talk and doing that type of work, I believe gets people thinking more about, you know, what they know, what they don't know, what the other reasons for things could be. There's that great website, Spurious Correlations, which shows 0.95 correlations between something like the amount of cheese people eat in a year and the number of suicides. Or maybe it's Nick Cage films and suicides in the US in a given year. And they correlate incredibly closely. And so we can all, you know, we're humans want to draw narratives and tell, you know,

Starting point is 00:35:07 stories from data. And it's just about getting people to think about whether the story that they're telling is the right one is the only one you can tell and kind of inherently digging a bit deeper, because we want these people to do that analysis. And we just need to promote a way of thinking that that gets the most out of it. Okay, fantastic. So just to kind of round things up, really, how do people find out about you? And how do people get hold of maybe these talks or maybe hear you speak or whatever, really?

Starting point is 00:35:34 So my website is dbanalytics.co, where they can find, get in touch with me or find my Twitter feed, which is where I can be found about 19 hours a day. And so that's the best place to get in touch with me. If they want to hear me talk more than they just have, I host the DBT London meetups. The next one's going to be June 20th at Simply Business. If you go to meetup.com, you should be able to find it. We're very excited. We've got some great speakers lined up. I won't be speaking, but I'll be hosting and you'll be able to hear some great talks by other people. And then generally I'm often around the DBT and locally optimistic Slack

Starting point is 00:36:12 channels in particular, I think is a place where I engage a lot in these types of conversations in threads with people who think about the same types of things. And they're two great communities. And so I'd highly recommend them. And it's where the five hours a day, I'm not on Twitter. That's where I am. Excellent. And will you be at the Looker hackathon that's running in May in London? Yes, I will. Absolutely. So May 17th in London, I'm really looking forward to it. I think it'll be exciting to see. I've got some clients coming along as well, which is great. And so it'll be great to see what people build. I think all the way through from kind of developments on the dashboarding side, all the way through to, I guess, where I'll be focusing more of my time,

Starting point is 00:36:52 which is around the API and thinking about how you can improve the developer experience in Looker and other tools with what they make available. Excellent. Well, Dylan, it's been great having you on the show. And thank you very much for taking the time to speak to us about these things here. And hopefully I'll see you at the EBT meetup and you'll see us at the Looker one in London, hopefully sometime around that time as well. Other than that, thank you very much. And it's been great to have you.

Starting point is 00:37:17 Awesome. Thank you very much for having me. It's been a lot of fun. to find. Thank you.

Drill to Detail - Drill to Detail Ep.67 'Thinking Probabilistically and Scaling-Out the Modern BI Stack' with Special Guest Dylan Baker

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.