Drill to Detail - Drill to Detail Ep.59 'Looker, dbt and Digital Analytics Today' With Special Guests Tristan Handy and Stewart Bryson

Starting point is 00:00:00 welcome back to the drill to detail podcast after our long summer break and i'm your host mark rittman next week is looker's annual conference in san francisco and i'm joined in this looker joined 2018 special episode by long-term friends of the show Tristan Handy and Stuart Bryson. So Tristan, welcome back and why don't you introduce yourself and tell us where you come from. Yeah for sure, thanks so much and glad to be back. My name is Tristan Handy and I'm the founder and CEO of Fishtown Analytics. We do two things, we help venture-backed companies implement analytics stacks, and we also build an open-source data modeling tool called DBT. Fantastic.

Starting point is 00:00:54 And Stuart, nice to have you back as well. Do some introductions and tell us where you come from, what you do. Thanks so much, Mark. It is a pleasure to be back. It's also a pleasure to be with Tristan. This is going to be really interesting. So I'm CEO and founder of Red Pill Analytics. We're an analytics company about four years old. And, you know, we work primarily or at least our history is primarily from the Oracle stack, but we're only about 60% Oracle now, and the rest is pretty much public cloud. And we use a lot of the tools you discuss on this podcast,

Starting point is 00:01:30 so it's going to be a lot of fun today. Great. Okay. So, Tristan, when we did the first interview in the first episode with you a while ago, you talked about this tool called dbt that you'd built. And I must admit, at the time, I hadn't really had a chance to look at it, and I hadn't really got what the context of it would be but since then i've

Starting point is 00:01:47 been i've noticed it being kind of mentioned a few times on projects i know stewart's had some uh had some exposure to it as well so why don't you just tell us what this dbt tool is and what is the problem you're trying to solve with it first of all yeah uh for sure the and you had no good reason to know about it a year ago because almost no one was using it. So DBT stands for Data Build Tool. Currently, it is a purely command line tool that helps analysts model data in their data warehouses. And it really grew out of a need that I had working at my last company. I actually worked at RG Metrics, which became Stitch. And so we were big users of the Stitch product. And we had all of our

Starting point is 00:02:36 data loaded into Redshift. And I was actually trying to analyze marketing automation data. It was from Pardot, actually. And I was just having a real hard time. I was a longtime SQL user, but this was the first real experience I was trying to have analyzing data in this Redshift environment. to build these pre-aggregations because otherwise my eventual analytic queries were so freaking complicated that I couldn't keep track of what was going on. And so my co-founder Drew and I decided that we just needed this thing. And as we built more and more of it, we found that it was more and more useful. And we couldn't believe that there wasn't something out there to do this so a big part of the impetus of uh starting fishtown was just to try to bring this thing to life and and see what it became okay okay so i think i can understand what you're saying about um i suppose uh you know needing a tool like this i mean i was doing some work on redshift recently

Starting point is 00:03:41 and the amount of times i was using the same kind of design patterns or the same kind of patterns to do things like trying to find out what the first transaction was in a set of transactions or things like using a case statement to break out and decode a sort of like a field and the amount of repeated code i was creating in there was terrible and looking again i was looking at dbt at the time and it was two things particularly it potentially did, this thing about not repeating yourself, but also the testing framework you had there as well. I mean, was that the kind of thing you were thinking of, really, when you were sort of putting it together? Yeah, I mean, to be honest, the very initial need was something so trivial and stupid, but when you have a data warehouse that's loaded by modern data pipelines like Statute and Fivetran, the data shows up in your warehouse in a very raw format and each different integration, the table and

Starting point is 00:04:39 field names are aligned with the API endpoints that those tools are getting the data from. And so in some integrations, you'll have underscores in between the different parts of the name and some of them, they'll be case different. And it was just very challenging to have all these different data sources that had different styles of naming. And so literally the very first thing that we did was just create a set of what we call now base models that were one-to-one with the raw data, but provided renaming for all the fields and for the table itself, and then filtered out like test records, like records that shouldn't

Starting point is 00:05:20 exist. And then we did all of our analytics on top of these base models and then everything was felt felt very clean and consistent um but you're totally right doing that type of transformation downstream whether you're creating uh you know very simple like uh case statements or like all the way to like kimball style star schema modeling um that that's stuff that people use to uh use dbt for as well okay so, I mean, you've been looking at DBT, and you've got quite a background history in, I suppose, templated code and ETL tools and so on. What was it that interested you about DBT, and what was it that, I suppose,

Starting point is 00:05:56 made you think this is worth looking at, really? Yeah, I mean, one of the founding principles of Red Pill when we founded the company really was we wanted to reintroduce analytics, data and analytics, I'd say, to the software development lifecycle. I mean, we started trying to make enterprise-y tools work in that mode, which is not necessarily easy. And we've written several things and frameworks and that sort of thing to try to make that happen. And then when I discovered dbt, I just felt kindred because the whole idea of trying to make a tool, whether that be some sort of ETL tool or data integration tool, work as your sort of total source to target in today's environment, I think, is sort of a fool's errand.

Starting point is 00:06:52 As Tristan mentioned, you know, Five Trans Stitch, they deliver your data very efficiently to your cloud native data warehouses. And so we don't really need to focus on that side of it. And what I loved about DBT is it really focused on the back-end side. And now hearing Tristan sort of describe his desire to build this makes perfect sense to me. When you look at DBT, it's very much about building one interface at a time, one step in your process at a time, but not just that, but doing it in a way that works in modern development environments. So I'm thinking continuous integration, continuous deployment and testing. These enterprise ETL tools and data integration tools that we've both worked with in the past just never really thought about

Starting point is 00:07:43 what does it mean to test ETL code? What does it mean to do the sort of things that developers are used to doing, which is stubbing and mocking and thinking about what it means to write a test where you have the anticipated results of what you expect that piece of your pipeline to do and be able to test it. What DBT does is sort of abstract out just the real basics of what you need to write in a templating engine. And it really just takes care of the rest. So I'm a big fan, Tristan.

Starting point is 00:08:19 Thanks. I really appreciate that. Yeah, and you did a way better job there than I've ever done before. So I need to bring you along when I'm trying to convince people to use it. Tristan, I mean, so for anybody, what we've described here is interesting. And it's almost like an abstract sense. You've described, you know, reusable code and templating and so on. But Tristan, just maybe a very kind of basic level, explain, you know, I mean, what is dbt in terms of how people might kind of encounter it and install it and use it and so on?

Starting point is 00:08:49 Because it's like writing SQL, from what I understand, but it's in a reusable kind of templated way and so on. I mean, how do people actually use the product? Yeah. So the goal is to make data engineering accessible for analysts. And so there are certain things in the workflow that are a little bit unusual for analysts to have to do. But I think that we've tried to make the whole process as accessible as possible. So you install dbt by firing up your command line and you do a brew install dbt. And the install process is fairly straightforward at this point. You make a blank project, you supply some database connection

Starting point is 00:09:32 credentials, and then you get to work writing transformations. And every new SQL file that you create in your project becomes a new table or view. And the analyst doesn't actually have to write create table, create view, cert keys, disk keys, partitions on BigQuery. The analyst doesn't have to think about any of that stuff. They express their business logic in a select statement. And then dbt figures out how it needs to materialize that into the database, including if that needs to be materialized in some incremental view that builds over time. Okay. So is this an ETL tool or is it kind of an analysis tool or both or what, really? Yeah, I think that the – so people are so used to talking about, quote unquote, ETL. And now the order of the letters has been mixed up so that it's really ELT in the modern stack.

Starting point is 00:10:35 But I think that really tools like Stitch and Fivetran are really EL tools. And DBT is the key part of that. So if Stitch and Fivetran are responsible for extracting and loading your data into the warehouse, then dbt is the transformation layer that then sits on top of that. Because if you, you know, originally in the modern stack, people were just loading in raw data and then trying to write reports on top of that. And that's a very challenging thing to do as anyone who's worked in data warehousing in prior generations of you. So DBT is trying to fill that hole in the modern stack. What I love about DBT is that it's the concept that when we use Fivetran or Stitch,

Starting point is 00:11:19 we can get our data loaded to a cloud data warehouse. And we can knock out a lot of requirements sort of initially. And what I really like about DBT is that for those requirements that can be solved by the delivery of sort of the raw schemas from the source systems, we can have analytics developers go ahead and start trying to knock out some of those requirements while the back-end transformation developers, whether those be analysts or real developers, start to address the not-so-low-hanging fruit.

Starting point is 00:11:55 And I think that in traditional projects, we spent so much upfront time ETLing before we could even get to a state where we could build analytics. And so by the time we built analytics on top of these transform models, that's the first time we find out from the user or the business that we had really, we didn't really capture their requirements correctly. What I love is that we can go ahead and start building analytics, building it on the Fivetran or Stitch delivered data models, get some stuff in front of the user or the business, start iterating with them. And then DBT reacts to sort of those requirements in such a more reactive way that we're actually doing things with data instead of architecting projects. One of the main goals was to have really fast iteration speed because, like you said, the process of answering a question is you're going to do some analysis in a front- model in dbt you should be able to uh deploy that in like five to ten seconds uh and and so that that speed allows analysts to actually like

Starting point is 00:13:34 get their work done like so much faster okay okay and then that's quite an interesting sort of lead into something else i wanted to talk to you about, Tristan, and also to Stuart as well. I mean, about, I think the last podcast we talked, we talked about the kind of the analytics market and the sort of technical stacks and so on at that point. And since then, you kind of did a podcast or you did a webinar, I think, with Mode Analytics. And you generally talk about things being, well, you tell us, what do you think the software stack, the analytics stack is like at the moment? And where do you think it's kind of going? And I mean, just start off by saying that, and then we'll kind of maybe have a discussion around that with Stuart as well. Yeah, I think that there are several pieces of the analytics stack that has kind of evolved since Redshift was released in,

Starting point is 00:14:27 I guess that was 2012 or 2013. But the advent of Redshift, I think, is kind of the forcing function that really caused the modern analytics stack to come into existence. It was first Redshift, and then both Fivetran and Looker came out at a really similar time. Then you start seeing tools like Snowplow pop up to dump data directly into Redshift. And then dbt came along in 2016, which was, I guess, Matillion's another good tool that fits in that layer of the stack. But so you now have this whole modern tech stack that is made up of event pipelines, EL tools, extraction, loading, like Fivetran and Stitch, data warehouses, data transformation,

Starting point is 00:15:21 and then BI, these five layers. And what I think is so interesting um is that the the layers of the stack have been very stable for the past several years and and even at this point the players in each of those parts of the stack have have remained fairly consistent there there haven't hasn't been some brand new bi tool since the era of like looker mode periscope um to come out and like grab a ton of the the market um so i think that we're starting to see things kind of stabilize a little bit for this kind of era of the tech stack okay i mean so so is i mean for stewart and i we came from the oracle world

Starting point is 00:16:05 and for the from the more i suppose kind of you know enterprise enterprise databases and so on and i think certainly for me when redshift came out it didn't quite i mean i was obviously aware of it and i was aware that it was disruptive and and so on but i don't don't think redshift had quite the same resonance with us at the time but obviously now we see the kind of role that it has i mean stewart what was your you came into this around the same sort of time as me what was your take on when redshift came out and and the way the market has gone really and and and so on really so the beautiful thing about redshift at that time was the idea of provisioning something quickly and i think that when you come from the world that you and i come from mark uh the sort of legacy

Starting point is 00:16:43 world just the idea that you could provision a data warehouse with clicks, and then you discover underneath that it's really an API call, so you could automate that. I mean, that was just revolutionary. But now what I look at, like BigQuery and Snowflake, really has taken that model further, which is separating compute and storage. And I think that is just gangbusters for what we can do. And I'm sure Redshift's going to address some of these issues for it. It doesn't quite have the elasticity that the other two I mentioned. But I think the idea of, I mean, Mark, on these legacy projects that we worked on so many years ago, I mean, how long did it take for the data warehouse machine to actually be stood up? I mean, you know, we always built in a couple of weeks in the project to kind of stand around and wait.

Starting point is 00:17:35 And I think if that's the thing Redshift really changed, it is that you can get to work on day one. Yeah, I think the worst one was that the project went for a year and a half, and the machine actually got delivered about a week after the project finished so uh that was that's the exadata one that uh that you and i probably know about in uh in in london as well i mean so so i mean so one of the things that you've talked about tristan is you talk about the missing parts and you talk about i suppose where there's opportunities for the stack to be maybe sort of more more layers you put in or things that are missing at the moment i mean at this point now what do you think is still missing or where do

Starting point is 00:18:08 you think there's opportunities to kind of like do things better or add features into the stack we work with now yeah that i definitely am waiting for a lot of products to come out and i wrote that blog post almost a year ago now. And there actually hasn't been... So the first one that I mentioned there is that I feel like with a combination of data network effects plus AI or ML, there should be a way to cleanse data more effectively than there is today. You know, there are thousands of people now who all have Salesforce data loaded into a data warehouse. And yet everybody has to do the same manual cleansing over and over again.

Starting point is 00:18:59 It's still a very human thing. And maybe Salesforce isn't the perfect example. Maybe that is something a little more automated like Stripe or something like that. But it seems to me like that problem will be solved at some point. Probably the one that is the biggest pain point for people that we work with today is what I called data reintegration. There's probably a real term for this, but that was what I had called it, where you've built this tremendous amount of value in your data warehouse.

Starting point is 00:19:32 You've integrated data from all these different sources, then you've applied business logic to it. But then if you want to get that data back into the systems that actually run your business, like you want to get it into Salesforce so that sales reps can do things with it, like you want to get it into salesforce so that sales reps can do things with it or you want to get it into a marketing automation platform so that it can trigger campaigns there's really not a great answer for that today there are a couple of answers that are you know you're starting to see take shape and looker does a little bit of

Starting point is 00:20:00 that but i don't honestly think that like that's a mature space that has has a good answer today okay sure i mean you and i know that kind of area quite well i mean what's your thoughts on that and also your thoughts on the stack at the moment yeah i mean i i agree with with tristan i think machine learning has promise and whether or not it will deliver on that promise is is i think um debatable but you, the idea that you open these tools, whether they be analytics tools or they be integration tools, they should have an opinion. I mean, machine learning should allow these tools to have an opinion. Here are some expected joins I think you might be thinking about from the data integration side.

Starting point is 00:20:41 Here's from the analytics side. Here are some opinions that the tool can make because they have machine learning underneath them that can give you some visualizations when you open the tool. I think what we'll see, and I'm hoping that we'll see from both the analytics and the integration stacks, is just these tools having an opinion when you open them and go ahead and do some of the work for you without you having to tell them to. One of the tools that does do this to a certain degree is Glue, so Amazon Glue. At least there's a crawler in the background that's going ahead and finding your schemas and taking a first pass at defining those schemas. I think we'll build on those sorts of things moving forward where these tools we use because of advancements and what machines can do on their own. We'll start to see

Starting point is 00:21:32 these tools, you know, not give us blank slates or blank palettes when we open them and start to help us sort of guess business logic and improvise in the delivery of analytics and data in such a way that now all we're really doing is tweaking it or really defining what our, you know, specific needs or use cases or requirements are okay okay i mean so i mean do you but do both of you think that the inevitable thing with the market is you'll start to get consolidation and the moment we've got these very kind of um uh you know you've got these very sort of separate companies doing sort of you've got stitch you've got five tran you've got looker doing their separate things do you think there's an opportunity or a need for uh for equivalent of say oracle to

Starting point is 00:22:24 come along to kind of buy up these companies and create an integrated stack or i mean tristan or do you think there's kind of do you think there's benefit in having these things separate and modular now i think that modular is more customer centric um for the slice of the market that we work with. The VC-backed startups are investing in smart data talent today. And I think that people who want control, they want the ability to put together best of breed solutions. Is that the right solution for the enterprise? I honestly don't really know the answer to that. It may not

Starting point is 00:23:07 be. And historically, it clearly hasn't been. One of the things that I do think is interesting from a consolidation perspective is that the warehouses, and especially BigQuery, are starting to do more than traditional analytics under the hood of a SQL prompt, which I think is an interesting type of consolidation. They seem to believe that SQL is the lingua franca and that you're going to be able to do more and more things from SQL. They recently released this regression. You can now create a, quote-unquote, machine learning model inside of BigQuery,

Starting point is 00:23:55 and then it automatically keeps the weights up to date, and you can query it directly in BigQuery. As a function call. I mean, as a function call. I mean, it's brilliant. Yeah. So, I mean, as a function call, I mean, it's brilliant. Yeah. So, I mean, Stuart, I mean, we've seen this before. We've seen from the Oracle world that kind of,

Starting point is 00:24:11 first of all, sort of consolidation, but also SQL being the lingua franca of everything is a story we kind of know well, really. I mean, what do you think on this? Do you think there's going to be consolidation? I mean, also, I'm interested to understand from you, Stuart, we've looked at things like, I suppose oracle's uh you know data warehouse cloud service that's trying to copy or or at least be similar to the uh the ease of

Starting point is 00:24:33 provisioning that you get with bigquery do you think that do you think that players like oracle come along here and just repeat history or what were you here you put me on? You put me on the carpet, aren't you? So I think what Oracle has today with their autonomous data warehouses is more similar to, say, Redshift than is to, say, BigQuery or Snowflake, which is still very much local storage. It's still very much non-elastic, I'll say. I mean, there's definitely some elastic capabilities within Autonomous, and we've got a few customers using it. But I think the complete

Starting point is 00:25:14 separation of storage, you know, serverless. I mean, what Autonomous is not is serverless at this point. I'm sure they'll get there. As far as the modular discussion, I mean, I completely agree with Tristan. I don't see a need for a complete stack. I think that's regressive in the way we've moved forward with these modern stacks and thinking about the cloud giving us APIs, whether those APIs are SQL-based functions. I still consider that an API in ML. Or just real honest-to-goodness RESTful APIs that we can just call. I mean, building the plumbing is what we used to have to spend so much time doing, Mark, is building this integration and plumbing. And actually, the cloud vendors are exposing that. I mean, you look at Looker, you look at Stitch, you look at

Starting point is 00:26:03 Fivetran, you look at Mode. These things are all RESTful based. And the idea that we want in the way that that that makes sense for our requirements and i i hope it doesn't change i hope we move more toward modular pieces that can be plugged together to get just exactly what we need okay okay and it's interesting i mean again interesting to you really into the uh to the looker event that's happening sort of shortly so you know this should go out around the same sort of time that the Looker Join is the Looker Join conference is happening and I was there last year and I was I was impressed with what they were bringing out but I think back to the point that Tristan made a lot of the features they were bringing out were things that either I'd seen before or they were enterprise features. I mean Tristan what's your take on

Starting point is 00:27:01 Looker the product at the moment and perhaps kind of maybe what you're looking forward to seeing at the conference being announced, really? What's your sort of what's your state of the union really for Looker at the moment? Yeah, Looker still occupies an end of the market that like nobody else does. amazing at allowing analysts to write code that describes the relationships in their database and then painting a user interface in front of business users that they can use to drag and drop and create reports and dashboards. And I think that Looker's ability to translate the efforts of these analysts and business users into functional reports and SQL that actually performs, in most cases, performs pretty well, that doesn't exist anywhere else on the market. And I'm a little surprised that it doesn't exist anywhere else

Starting point is 00:28:00 in the market because it seems like such a big problem. But they still continue to be the very best at that from my perspective. And that's what we use Looker for. We don't really use Looker for data actions or we don't use some of its more sophisticated permissioning sometimes. But that core query model the look ml to explore to dashboard process um that that actually has been very stable since about 2015

Starting point is 00:28:35 um and and i think that most of what has been released since then even looker 5 which i did like um it was was mostly kind of extensions on top of this core experience and not real innovations on top of that core experience okay okay so i mean we'll come back to those points in a second but stewart what's your what's your take on the state of the world with looker at the moment i mean like me you're you're kind of you know it's a relatively new thing the last couple years we've been using it and uh there's been a lot of growth there and so on but you know what what do you think are the key the key kind of selling points of looker and the things that perhaps you want to see announced at the event

Starting point is 00:29:12 this year so i mean first off we're we're exhibiting at join this year so we're really so red pill is we're really excited about that so we're going to have quite an involvement while we're there um so there's there's a you know we have open eyes and in in that. So we're going to have quite an involvement while we're there. So there's, you know, we have open eyes in that experience and we're really excited to be, you know, truly invested in the Looker community. As far as, you know, one of the things I wanted to sort of say while Tristan was speaking is it's interesting that, Mark, the tools we worked on for years were trying to abstract away SQL. They were trying to make it so that you didn't have to write SQL, or at least in DBT and Looker's way of thinking, SQL-like template. And while all the big data technologies were moving back to SQL at the same time,

Starting point is 00:30:07 I think it's interesting that at some point in the enterprise tool set, it was decided that SQL was not, or SQL-like things were not the way to express data. And I think, I haven't seen anything that expresses data as effectively as SQL. I mean, from the days of being a database administrator, which is where I started, I mean, SQL was always the way I thought about data. And it still is the most powerful try to simplify and abstract away SQL as a sort of building block or at least a interface between tools and moving more toward that direction. We've got a project going on right now where we're doing data integration in Kafka using ksql. And's just so um enjoyable to think that sql is not a dirty word and i think that if there's one thing i could say you know sort of the time a theme that kind of ties together

Starting point is 00:31:14 all the different things we're talking about today is that sql is alive and well um it's it's being promoted as a first class citizen and a lot of the open-source technologies that don't have it. Apache Beam, Spark already has it, KSQL, and Apache Kafka. And these tools, the thing that Looker got right is that we don't need to abstract that away into a GUI. There is an expression language that should be defining our data sets and something very, very SQL-like. And I think that the lookML and sort of DBT way of expressing SQL in an easy-to-use morsel, bite-sized morsel kind of from a model to a model at a time is really exciting. So I'm hoping just to see a whole lot more investment from Looker in that LookML model. I mean, that's

Starting point is 00:32:15 the thing that for us really drives enterprise customers to consider non-traditional, non-Big Stack software is that LookML model model and that's the thing that we can take to to bigger customers so on that point and i'll hand this off to tristan in a second so on that point there's a big there's a lot of uh you know attention and uh and talk last year about data blocks and analytic blocks and source blocks and this general kind of templating or creating these pre-built or pre-packaged analytic kind of solutions really in templates how well do you think that's worked Tristan has that been sort of useful has there been some takeover that really in the market yeah I agree with the priority that's placed around this there more and more businesses are using SaaS products as their systems of record. All

Starting point is 00:33:09 those SaaS products have identical schemas, or all the users of those SaaS products have identical schemas to one another. And then when you load the data in via common tools, Stitcher, Fivetran, or other smaller ones, the data looks the same when it gets to the warehouse. So it seems like an obvious thing that you want to do. And we've thought a lot about that problem with DVT as well. Where the rubber meets the road is that no data are ever the same. And so you not only need to figure out how to distribute this code to large numbers of people, but you need to figure out what are the ways in which it makes sense for

Starting point is 00:33:53 those businesses to customize that to fit their own unique environment. But while they customize it, you have to make sure not to break the upstream link to the core package. Because one of the main reasons you want to use a package that's been developed by somebody else is that you will inherit the upstream improvements that get made. And if you break that link completely, for example, by copy pasting a bunch of the logic so that you can then make changes to it and you break that link. And I think that maybe the ways of distributing the code, I think are great with blocks and the intention is there, but I don't know that we've like, it evolves into this like sophisticated package management system where, you know, a thousand plus companies can all collaborate on common data sets together. I just don't think we're quite there yet.

Starting point is 00:34:52 Okay. Okay. So Stuart and I are quite used to this idea of package code and package solutions and so on. And Stuart, as the person that introduced you to agile development and Git, you know, I feel like I ask you, what's your opinion on this really? You know, you must have encountered this and thought about this as well. What do you think about blocks and what might you do differently or hope they might announce differently around that sort of area? I'll let you get away with that, Mark. I remember the first time you checked out a Git

Starting point is 00:35:17 repository, you checked it out into your Dropbox folder. We'll just let that one go. All jokes aside... Tell them where my code restored still yeah yeah exactly so I think I mean the get side of this is obviously you know so important I mean I'm just we're just now seeing, and in some cases not seeing, traditional tools move in that direction,

Starting point is 00:35:51 and it just seems mindless. I mean, when you talk to the people that are building these big box analytics tools, they're obviously using Git and committing and continuous integration, but they don't think analytics, this is broad stroke here, but they don't think analytics is the kind of thing or data integration is the kind of thing that is not a subset of software development, to use Tristan's terminology. And I very much think it is.

Starting point is 00:36:16 And so one of the first things about Looker that blew me away was its built-in integration we get, and it seems so easy and obvious. And I think that boilerplate is kind of the term we use for pasting things around and unnecessarily pasting things around. And I absolutely hope to see more enhancements from Looker in that direction to make inheritance more sort of a first-class development style. They do have LookML extends, so you can sort of define core models and extend them. And I think that's a great move in the right direction. It only allows you to extend on top of something that's been built, but true inheritance allows you to override. And I think what would be great is to see the extends functionality allow us to start doing easy overrides of things that exist in sort of core packages. And I think if we see that,

Starting point is 00:37:28 and hopefully we will, when we start to see that, I think it'll be easy to express a difference between a core model as just sort of an incremental change. And I think that's, hopefully I'm not putting words in your mouth, Tristan, but I think that's kind of what you're hoping for. Yeah, I think that that's right. And sometimes it's really very basic stuff. I think that the ability to just specify configuration parameters for blocks in just variables would be nice. And the way that Looker frequently deals with state in the liquid layer is a little bit wonky and you can't always do the things that you want with it. Okay. Okay. So, I mean, as well as obviously the Looker event is in a week's time, but in terms of the kind of the open source market and just generally other innovation you're seeing in the market and analytics, I mean, Tristan, certainly DBT is open source, isn't it? But what else is there you're seeing from the other vendors and open source projects and that sort of thing as well?

Starting point is 00:38:29 Yeah, I have been tracking a couple of open source BI tools, Metabase and Redash, for several years now. Both of those are making incremental improvements, but neither of them makes me want to encourage clients to stop paying for the proprietary solutions, at least where they are today. There is GitLab, which is an open source GitHub competitor. Uh, they are doing some interesting things, uh, with a pro a project that they're calling Meltano, um, where they're essentially trying to, to either build or integrate a solution within every layer of the stack. And one of the areas where they're spending most of their time is the BI layer. And they, um, I don't know exactly when they're planning on like revealing that to the world, but I've had the opportunity to see a couple of behind the scenes looks at that. And it is pretty exciting.

Starting point is 00:39:31 So I do think that we are going to continue to see the open source infrastructure at every layer of the stack continue to move forwards. Okay. And Stuart, you've been doing quite a bit of work. I know it's not open source, but with Google Data Studio. So what's the current state of the union with Google Data Studio at the moment? So when you look at something like Data Studio, and obviously Amazon has QuickSight, and you look at those sorts of things, which I put in a different category than I put Looker,

Starting point is 00:40:01 they're very much about visualizing data, and they expect you to bring conformed models or at least curated models to the table. I think what's interesting about, Tristan mentioned Metabase, and obviously Looker fits into this model of going back to this sort of metadata way of thinking. And I think right now, as open source technologies start to think about that middle ground, that really, as Tristan's right to say, that at this point, only Looker has really been trying to fill that gap. As open source technologies start to think that there needs to be a handoff between data integration, we don't necessarily want data integration as great as DBT and other things are. We don't necessarily want to have to do all of it there.

Starting point is 00:40:55 We certainly can, and there are lots of projects that do that. I think if you're going to use Data Studio, you're going to do a lot of your work in BigQuery or Dataflow. And that's what we've seen, or even Dataprep. And that's what we've seen. So I think when we start thinking about just-in-time analytics, where we can do a certain amount of the final delivery or final preparation of data in an analytics tool, or at least something like Metabase, which sits close to the analytics tool, that's great. And then we'll have the, and obviously Looker does a great job at this. What you really want is the ability for each user or each business or each project to decide how much they want to push to the data and how much they want to push to the data, and how much they want to push to the analytics tool. And I think as we start to see overlap, overlap's not a bad thing. I mean, we often think about, well, should we do that in DBT or should we do that in Looker?

Starting point is 00:41:53 And really, when there's overlap, that's just options you have. And it's great that we start to see some of these tools overlapping between them because that gives us the ability a to to refactor in advance but also just choose what's right for us on this particular project okay okay we're almost out of time now so tristan do you want to just tell us tell people how they can get hold of dbt and find out more about it and just tell us about any kind of i suppose activities you've got going on or or whether you'll be at Join in a week's time? Yeah, we will be at Join as attendees, but we're also having a pregame event on the evening of

Starting point is 00:42:35 registration where everyone's going to hang out, talk about their usage of dbt, and then head over to registration altogether. You can find out about dbt online at getdbt.com. And probably the first place that you go from there should be the big red button that says sign up for Slack, because we've got about 800 people in Slack today. And I really love hanging out with that community. They have such smart things to say, and they're really helpful and nice. Okay. And Stuart, just remind us about your exhibiting now, I think, at Join, and any things that you're doing as well there. Yeah, we're exhibiting. I think we're going to have about four or five of us from Red Pill there, myself included.

Starting point is 00:43:16 So we're really excited about that. And the DBT channel on Slack is great, Tristan. You know, it's almost table stakes today to have a Slack channel, and I spend so much time in Slack trying to get help from these open source slash enterprise customers or enterprise vendors that are both a company and an open source, and I love that model. Quick, really fast answers to questions. Also, I'll be speaking at Oracle Code this year. So let me throw a little plug in there. I'll be building a machine learning model on taxi fare prediction. I'm doing that with a gentleman called Bjorn

Starting point is 00:43:57 Roost from Pythian. So anybody that's out there that's going to be going to Oracle Code, check me out there. Excellent. That's good. Brilliant. Well, it's been great speaking to you both. And thank you very much for coming on the show. And hopefully I'll see both of you at Join in a couple of weeks' time. Thanks so much. Talk to you soon. Thank you.

Drill to Detail - Drill to Detail Ep.59 'Looker, dbt and Digital Analytics Today' With Special Guests Tristan Handy and Stewart Bryson

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.