Drill to Detail - Drill to Detail Ep.119 ‘Data Control Planes, SDF Labs and dbt Cloud Visual Editor’ with Special Guest Greg McKeon

Starting point is 00:00:00 I remember talking to Tristan years ago and it would have been, in a way, the last thing he'd wanted was to have a graphical interface on top of dbt. What does this mean about, I suppose, the soul of dbt? So I think the soul of dbt is let more people work like engineers, right? Let more people who are doing analytics work like engineers. And so it really is everything that people love about DBT, just made a bit more accessible. So hello and welcome to the Drill to Detail podcast sponsored by Ritman Analytics. And I'm your host, Mark Ritman. So I'm very pleased to be joined today in this episode by someone I last saw wearing a chef's hat in Las Vegas, none other than Greg McKeon from DBT Labs.

Starting point is 00:00:52 So welcome to the show, Greg, and tell us what you do when you're not wearing a chef's hat in Las Vegas. Hey, Mark, it's great to be here. I'm a staff product manager over at DBT Labs, and I focus on all of our developer experience in cloud. So what that means for us is the cloud IDE, the cloud CLI, and then now the new visual editor that you saw me on stage

Starting point is 00:01:12 in the Let Them Cook show at Coalesce in Vegas. Fantastic. So, Greg, let's get a bit of a picture about you first of all. So tell us a bit about, I suppose, your journey into what you're doing now. So I think you've worked in the past at MongoDB, for example. You know, how did you end up at dbt labs and what were you doing before that really? Yeah, for sure. So I sort of found my way to product in an interesting roundabout way, right? I studied engineering in undergrad and then I actually went and worked at Red Hat

Starting point is 00:01:38 for a little bit as a consultant working on OpenShift. And I'd say that was the first time I kind of interacted with enterprise software. And it sort of blew my mind, right? When you're actually working on a small team of two or three people and deploying software out in school, you don't ever think about like, how do you handle things at scale, kind of DevOps and all these kinds of best practices that are just,

Starting point is 00:01:57 you sort of have to have to be a functioning enterprise. And so from Red Hat, I actually made the move over to MongoDB, which was a piece of software that I had used to help me do my homework in my databases course in about two days instead of in three months. That was really where I first saw the power of these alternate methods and ways of thinking about developer experience

Starting point is 00:02:16 that can really benefit developers. That continued at Cloudflare, where I worked on storage products. We launched a competitor, S3, called R2, which cut egress fees to zero. Really fun product to work on. at Cloudflare where I worked on storage products, we launched a competitor S3 called R2, which cut egress fees to zero, really fun product to work on. And then found my way over to DBT and now work on developer experience here,

Starting point is 00:02:34 specifically for analytics engineers, but sort of the same kind of mindset, how do you really help enterprises and help the developers and the workers in enterprises work better, right? And kind of have a better experience with the tooling that they have to use to get their jobs done every day. Okay. So before we get into the detail of what you do,

Starting point is 00:02:54 particularly at dbtLabs, for anybody that lives in a cave and hasn't been part of the modern data stack experience the last 10 years or so, just give us a little bit of a kind of briefing on what dbt is and what DBT Cloud is in particular. Yeah. So DBT is basically a way to structure your data warehouse. And so DBT creates a number of models, which are SQL statements that can be version controlled using Git, and then combined together to go ahead and build out that full warehouse structure. So it provides a composable way and kind of repeatable way to do engineering work inside

Starting point is 00:03:29 of your data warehouse. And that comes along with everything you'd expect from kind of a composable, repeatable toolset. So macros and functions that you can call to promote code reusability and the ability to set specific control functions basically over the code you're writing in your warehouse. Okay. So dbt cloud, how does that fit into it?

Starting point is 00:03:49 Yeah, definitely. So dbt core is our open source used by thousands of companies to go ahead and structure out their warehouse and then cloud layers on sort of this control plane for dbt. So what that means for my products, we have a cloud IDE that really helps you write dbt code in a way that's sort of a better developer experience than doing so locally. Same thing with the cloud CLI, we have an orchestrator. So you want to run these dbt models on a cadence, on a schedule, make sure the data in your warehouse is always fresh and up to date.

Starting point is 00:04:17 You have an orchestrator that's sort of purpose built for dbt that does that. And then we have tools like explorer, which is sort of a lightweight data catalog to help you visualize what's actually happening in the underlying warehouse. And I would say a lot of features that we're building out now, including with the visual editor and some other things that are coming soon, are really about how DBT is at the center

Starting point is 00:04:36 of the modern data stack and is driving sort of the transformation piece and the expansion out into other parts of the modern data stack over time, if that makes sense. Yeah, sure. So we had Tristan on the show many, many years ago, and he was, I'm proud to say that his company was as small as mine then, and mine has stayed as small as that, but yours has grown quite a bit in that time. And so paint a picture now really, I suppose, what does dbt, the labs, the company look like now in terms of size and how many product managers are there and how has it sort of, I suppose, grown up a bit since maybe sort of a few years ago, really?

Starting point is 00:05:12 Yeah, definitely. I mean, this was one of the reasons why I joined dbt, right? When I was looking around for what I wanted to do next, I've asked a common question of all of my friends, which is like, what's cool? What are you using now? What do you love? And at the time, those back in only a few years ago in 2023, and they were like, dbt is like, what's cool? What are you using now? What do you love? And at the time, that was back in only a few years ago in 2023, and they were like, DBT is amazing, right?

Starting point is 00:05:29 It's this tool that gives, I feel like I have superpowers when I'm using it. And I think that that kind of core piece of the company, Empowering Analytics Engineers, Empowering Analysts is so there today. We're a few hundred people larger than when I started. So I think we're around 600 or so people now. My team's probably 10 to 15 product managers.

Starting point is 00:05:49 And there's one thing I want to call it. We have this really special role at dbt, which is the developer experience advocate. It's someone who actually has used dbt and uses it every day and kind of was an analytics engineer who sits alongside the product manager and kind of helps guide the development of the product. And I think you see that in all the products that we build. If you notice, I mentioned like we have an IDE, but that IDE is tailored specifically to DBT.

Starting point is 00:06:10 All of those pieces of cloud really bringing that kind of core love for the analytics engineer through. So I think that that's sort of the lifeblood of the company and what makes it really exciting to be there day to day. And just hear how you're impacting kind of core users, everyday life, right? People who are analytics engineers or analysts, how they've interacted with the product. Okay.

Starting point is 00:06:27 Okay. So let's get into this, get into really the reason you're here. So, so, um, I was in a presentation that you gave at, uh, KLS last year and you, um, you kind of unveiled or you sort of showed this new thing called the visual editing experience. I probably maybe got the name wrong there, but, but, you know, a, a drag and drop editor I probably maybe got the name wrong there, but a drag and drop editor for DBT users, which is in many, many ways is quite a significant thing really. Just maybe just start off by explaining what this product is or the features that you're looking after.

Starting point is 00:06:57 I suppose what is the high level problem is trying to solve for who? The visual auditor is basically a way to build DBT models via drag and drop interface, like you said, right? And it's funny, we started thinking about this product a little over a year ago now, Drew Bannon, who I think is also a bit odd, and I kind of sat down and talked to some customers and they really were pulling this out of us, right? They came to us and said, I have these downstream teams, I know they should be working in DBT, but they're not, right? And we had this question of like, well, why aren't they working in DBT?

Starting point is 00:07:29 What's the challenge? And the initial hypothesis was, well, they don't know SQL, right? Maybe they're a less technical team. And over our conversations, we found out, that's not quite right. These teams are extremely technical and extremely capable, right?

Starting point is 00:07:42 They're writing some of the most complex expressions in Excel or in Google Sheets I've ever seen in my life. Or they're using Alteryx and they're chaining together these very complex transformations. What we really found was that building analytics for them was a small percentage of their total workflow. In a lot of cases, you have some workflow that someone's built out 10 years ago, they leave the company, and then there's no way to modernize that workflow that's just sort of been sitting around. And so what we realized is really we needed to lower the barrier to entry to DBT and to building DBT models.

Starting point is 00:08:13 And what we saw was that the state of the technology had kind of gotten to a place where that was possible, right? We might not get directly into this, but if you look at kind of the SDF acquisition and how we think about that, there's really this idea that you need a compiler for SQL. You need to be able to have semantic understanding of SQL. And the visual editor is built directly around that, right? We were able to say, we're going to build dbt models for you with a drag and drop interface, but we're still writing SQL under the hood. And this means that every model you already wrote in your dbt project is actually a visual model as well. You can go and visualize that right away right and

Starting point is 00:08:47 we also were able to maintain kind of the access to version control and the Get first approach that dbt has so there's no sort of change to the workflow As we kind of went and built this out And so I think the idea here is that if you're if you're someone who's seen someone else built dbt models And you've always wanted to try that but the barrier to entry has been a little bit high, this is a great product for you, right? And I think that that use case is gonna expand out more and more over time as we start,

Starting point is 00:09:12 we'll talk about roadmap a little bit and where things can go. But that's really where we got to as we talked to customers and thought about what they really needed. Okay, okay, so we'll come back to the visual editor in a moment, but you mentioned this, the SDF acquisition. So just tell us about that really. What is that about?

Starting point is 00:09:27 How does that relate to what you're talking about now? And what does that bring to DBT in general? So we were lucky enough to have some of the members of the SDF team come and talk to us at an offsite we were recently at. And what they've built is really amazing. They've built essentially a compiler for SQL that understands the actual semantics of the SQL code you're writing. And so why is that impressive and why is that cool? Well, back in the day, DBT kind of provided structure to your analytics

Starting point is 00:09:53 project and let you rewrite and reuse SQL in a very interesting way. Part of the way it did that was with Jinja. Jinja is basically a string templating language, right? So you could go ahead and write a function that will go ahead and output some string, but you don't actually know what that is producing. So it's very difficult to do things like type checking. A lot of the features that you would expect in a modern IDE experience, or that will help you sort of write code yourself,

Starting point is 00:10:20 are very difficult to build atop the original dbt experience. And this was still state of the art at the time, right? But what the sdf team has done is kind of built that semantic understanding and built a real compiler for sql that actually Understands what you're trying to do, right? And so that's really gives you features in two areas, right? One is it gives you developer experience features, right? So you can do things like go to the definition of a function. SDF, for example, understands all of the warehouse functions that are actually being called. So if you call a split part, it knows

Starting point is 00:10:49 how to actually structure the arguments to that function. It can help you write that SQL. The other thing is it can help you optimize that SQL, or even transpile it. So because SDF is actually building up an understanding of the code you're trying to write, it has the opportunity to say, hey, maybe you actually want to reorder these joins.

Starting point is 00:11:04 Or hey, maybe you want to do something different here. Or, hey, maybe you can't use a hash join algorithm if you actually write the SQL this way, write it this way instead. Right? And so I think as kind of the usage of SQL has exploded, and it does pain me as a former MongoDB employee to say that, but with SQL as the absolute standard, you really do need a compiler that understands the language.

Starting point is 00:11:23 And that's just something that hasn't existed before. And I think that's really the excitement here around the acquisition. Okay, so I realize it might not be your area of specialty, but how does that compare to SQL? SQL Glots or SQL Glots? Is that a similar sort of thing, really? Yeah, I think that's right. SQL Glot has a little bit less of the the semantic understanding in it. And I think that that's sort of the main difference there. There's a lot of technical architecture that I will absolutely butcher for you semantic understanding in it. And I think that that's sort of the main difference there. There's a lot of technical architecture

Starting point is 00:11:47 that I will absolutely butcher for you right now. But the way that this actually goes through and builds up the understanding of the SQL that you're writing, I know that SDF is world-class at that and sort of built on some great Apache projects that have come out recently to help you build essentially a local version of a parser and a compiler. Okay, okay.

Starting point is 00:12:10 So now I appreciate this is a speech only podcast really, but can you talk maybe through mime or through kind of expression, can you try and paint a picture really of what this visual agents or experience looks like? So how would people encounter it really when they're using dbt cloud? And what would the steps be or the things they see on the screen be like on this? Just maybe try and paint a picture of that.

Starting point is 00:12:34 It's a drag and drop interface. And maybe I'll talk a little bit about how we want you to feel when you're using this tool. So what we have is a way to reference an upstream model. So DBT use these statements called ref statements to refer to a prior built model. You can go ahead and ref to an existing model and then you can add what we call transformation operators

Starting point is 00:12:56 onto the canvas. So maybe that could be a join, maybe that could be a filter, maybe it's adding a new column, sorting something, right? Kind of all the operations you could imagine doing to transform and change this data. And then lastly, you add some sort of output to the canvas. And so today that is a dbt model that gets created. And then there's a pretty strict governance flow around this.

Starting point is 00:13:16 So one thing we heard from a lot of customers who use tooling like this, they really want to see how the data is changing at each step in the transformation. And there's a whole bunch of really interesting work that came out of Google and a few other places, languages like prequel that are basically pipeline SQL. What this means is rather than having this complex SQL statement that you have to parse out yourself in your brain, you have this left to right flow that then compiles down to the actual SQL equivalent basically. I think that is a

Starting point is 00:13:46 big part of why the visual representation is so explainable is that you can look at it step by step and say, okay first I went and grabbed this data, then I went ahead and joined it with this data, etc. etc. And so you can go ahead and preview at any one of those steps. So if you want to see how the data looks after the join, you can do that. If you want to see how the data looks after say a filter, you can do that as well. There's a lot of advantages to this. I'm sure we'll talk about kind of AI and what we've put in there as well. But that is sort of the core workflow is you're placing these transformation operators

Starting point is 00:14:15 onto the canvas, you're wiring them together, and then you're going ahead and previewing it each step to see how the data came out. Okay, okay. So so in each each, I suppose canvas you have in kind of the visual editor is one model in dbt. Is that correct? It's not maybe a sequence of them. It's one model, in other words, one SQL statement or yeah. Is that how it works?

Starting point is 00:14:34 Yeah, that's right. And so you're able to chain these together. You will be able to at GA to chain multiple models together just like you would in kind of a commit or if you're working in dbt itself. Okay, okay. So within so what, so within these, these visual kind of models you build out, you, you mentioned a few things that are interesting. So, and, and so one thing you showed in the demo, um, when I was at KLS was you could effectively visualize any existing, well, a lot of DBT

Starting point is 00:14:59 existing models within, within there. So you go in there and you could actually, you know, you'd have to build it all from within the editor, you could go and visualize ones that were already there and you could then edit those. Well, I suppose what's the compatibility between existing dbt content and the stuff you build actually in the visual editor and what's the goal with that really? Yeah, I mean, this really is where kind of the state

Starting point is 00:15:19 of the art in the field has advanced, right? It's something that SDF gives us is the ability to go ahead and compile the SQL that's inside a model down and visualize it. There's a big surface area there. I've worked on products before that need to maintain SQL compatibility. SQL is a big surface area. What's nice about a lot of these queries is there actually is a relatively common set of functions that are being called and transformations that you're actually doing.

Starting point is 00:15:43 This is one of the things that we're working with our beta partners on now is sort of what do you actually need? What do you expect to see in this sort of an interface? What sort of transformations are most common for you and the SQL code you write? And then expanding that over time. So I'd say today we hit probably around 70, 80% of SQL can be represented in the visual editor. And then there's a whole nother conversation which is sort of the Jinja side of things

Starting point is 00:16:05 and how you represent existing models, Jinja that we could talk about for hours as well. What about, I suppose, there's the transformation side of things. Then you've got, I suppose, the data modeling and the building of kind of tables and you've got the Jinja side as well. So maybe with the data modeling side,

Starting point is 00:16:19 where is it now and what's your goal really with the data modeling side of things? Yeah, so I think there's a lot of automation that we can add in here that we just haven't had before. And that the visual editor provides us this graphical interface that we can do that with. Right. So if you're sitting inside of our IDE, it's an IDE, it's a hosted IDE, right?

Starting point is 00:16:34 Or if you're using the cloud CLI, you're developing at the command line. It's really hard to have kind of a nice, uh, build your warehouse from scratch flow in either of those tools. Um, and that's something I think we'll add here, right? The ability to go and auto-identify sources, to be able to go out and auto-generate staging tables from those sources and have kind of a zero to warehouse in a few seconds piece with a little bit of interesting AI support there as well.

Starting point is 00:16:58 So I think on the modeling side, that's really where I see kind of the upstream piece. For building actual models itself, it comes back to that SQL robustness piece and kind of what you can actually represent today on the canvas. And then of course you can go ahead and build that out into, into staging environment or into production or your local development schema as well. Okay. Okay. So, so I had a brief, brief kind of couple of years in product management

Starting point is 00:17:19 myself in the past, and one of the things I was always kind of told was, it's all about saying no, right, it's, it's what the product that is kind of, that is important. So what is the visual editor not going to be? And, you know, where do you draw the line between tools like this and maybe Alteryx? And we'll come on to the maybe history of these tools later, but where do you draw the line in it? And what's it focused in and what is it not going to be? Well, the first rule of product management is you say no internally and you say yes on the podcast to everything. But I think one of the big differentiators here, it's sort of what we're choosing to do when, right? I wouldn't say there are things we're saying a hard no to, but we're sort of thinking about how this develops over time.

Starting point is 00:17:58 The fact is there's already tons of Alterex workflows that are moving to DBT today, right? Because people want the governance, they want to be able to work in SQL, they want it to run on their actual warehouse. And so we see migrations from AlterX all the time. What I will say we're doing here is giving a place for users who felt like they didn't have a home in DBT, honestly, or felt like they needed to go learn a whole new skill set for a workflow they looked at once a month in order to be able to interact with DBT.

Starting point is 00:18:24 They now have a home and they now have a place where they can sort of come and onboard to dbt and start exploring. So I think one of the big features of Alteryx as well that I'm sure we'll touch on a little bit is sort of data loading and thinking about how to grab, you know, seven or eight disparate CSVs or source data files. We're thinking about all those things, right? And really the case for us is there's already migrations happening from Alteryx. How do we make that easier?

Starting point is 00:18:44 How do you make it easier? How do we make it easier for people to have kind of a governed workflow? And what do we need to add to make that to kind of bring the power of dbt to even more of those users, if that makes sense? I've seen this all before, right? So so I've been working in this kind of industry for 20 odd years to hell and and things you're talking about now things that companies were talking

Starting point is 00:19:04 about 20 years ago, with Informatica and things you're talking about now, things that companies were talking about 20 years ago with Informatica and tools like that really. What does this bring to the industry and the state of the art? Yeah. I mean, I think the first thing you have to realize is that there's been a technical leap forward in the industry. There is this concept of pipeline SQL now that I think we rely pretty heavily on that just wasn't around before, honestly, and is designed just to be more explainable, right? It's not necessarily more performant, but it really gets back to that developer experience point of if something's easier to use, that's actually really valuable to large companies, right? And then I think I would look at SDF and kind of the state of the art in SQL compilation. I mean, SDF is partly built on this project called Data Fusion, an Apache project.

Starting point is 00:19:47 It's only a few years old, right? These are kind of cutting edge technologies. I remember as I get my go seven, eight years ago, right? And there was a question of like, is SQL the language of the future, right? That was sort of what we were talking about as an industry then. And I think that's clearly a yes, right?

Starting point is 00:20:02 I think clearly we standardized more and more on top SQL, but the technical pieces that have come around in the past few years just give you so much more power. I mean, you could go even all the way back and think about how Alteryx runs on its own compute, right? And if you were building out a system now, you go, that's crazy, right? You have these powerful data warehouses

Starting point is 00:20:18 that can handle anything you throw at them. Why would you run on your own compute? And so I think that the state of the art technically in the industry has just shifted. I think what those tools are trying to do is really well-intentioned. I think it really was a better developer experience, but just you were constrained by the technical capabilities of the time. And then I think the second piece that we really think about and the team thinks about all the time is how to create that sense of delight with the product.

Starting point is 00:20:44 That is the thing that is so special to me about dbt core. Someone touches it, it doesn't matter how technical or not technical they are. And they go, this thing makes me feel smart. This thing makes me feel like I'm really accomplishing something and building something that the rest of my organization can build atop. And I think if you touch the visual editor,

Starting point is 00:21:02 that love and that feeling of you're going to have fun working with this is something that we really tried to push through. And I'm not sure if that exists in a lot of kind of enterprise software. So how do we kind of stay out of your way, help you kind of have this great experience and feel a real love for the tool? And then also kind of the technical pieces that have come in that have just made that possible. Yeah. People didn't tend to feel love for Informatica back in the day. It was a marriage of convenience, I think, rather than a sort of an affair of the heart, really. But what about another genre of products that is in this space, which is the

Starting point is 00:21:36 data preparation products? So you've got things like, and all tricks now, I think Bolt, TriFactor, but you have every BI tool has a kind of data prep element to it. So, again, where's the different, where do you draw the line between what they're doing and what you're doing? And is it a different problem for different people or what? Tell me.

Starting point is 00:21:55 Yeah, I mean, I think it's a problem we edge our way towards over time, right? The way that we think about it is, we have all these users who are doing DBT model development today, let's help them out, right? And what we found from talking to customers is that they've asked us for some kind of data prep integration as well.

Starting point is 00:22:09 They go, well, I actually need help with the data one step upstream of DBT. I think the real thing there is sort of loading and understanding how we can bring kind of DBT's approach to ingesting data, right? Into taking, you know, maybe it's a one-off CSV, maybe it's a Google sheet that your team just really needs in the warehouse. How can we handle that?

Starting point is 00:22:28 And I think that that's sort of tied in with the data prep piece, because to your point, the actual transformation is sort of trivial, right? Like we have the operators for that. We can do that. It's how do we get the data that you want to transform and kind of bring that into dbt. And so I think that's something you'll see us have some thoughts about in the next few months i remember i remember talking to tristan you know years ago and and it would have been in a way the last thing he'd wanted was to have a graphical interface on top of dbt um and and and and yet and yet you know we have this now so what does it

Starting point is 00:22:59 to your mind what does this mean about i suppose the soul of dbt so i think the soul of dbt is let more people work like engineers, right? Let more people who are doing analytics work, work like engineers. And kind of the delight and joy that comes with doing that. And I think that this just expands that aperture out. And really it is those technical innovations and kind of the SDF piece that lets us do that. It is how do you still use Git to version control your work?

Starting point is 00:23:26 How do you still write SQL so that these models are transferable? Can be used anywhere, right? You can go run them with core. You can go run them with dbt cloud. Any analytics engineer on your team can look at this code and tweak it or make performance optimizations to it, right? You're not working in some other software with this kind of back and forth. And so it really is everything that people love about dbt just made a bit more accessible. Okay. And do you think the people that use the

Starting point is 00:23:50 visual editor will still be analysis engineers? Or is it a different kind of I suppose role or persona you're thinking of ready for this? Yeah. So I can tell you again, back at your go your first building site, I was talking to Grace and Jeremy, who certainly inspired the chef's hat that you saw at Coalesce with their great costumes, who are kind of hardcore DBT users. And at first they were like, I don't see how I would use this, right? I don't see how this would be helpful for me. But that piece from being able to parse an existing model and go ahead and actually take a look at what the SQL is doing, it's actually a pretty great way to show your stakeholders or get up to speed yourself on what a model is doing because it goes back to that kind of pipeline SQL view that is really accessible. We'll have dedicated workflows here as well that I think will make

Starting point is 00:24:31 their way into the IDE over time, but maybe you want to add a test, right? And maybe you actually just want a guided step-by-step workflow or you want some help from an AI to actually go and build out a great unit test for this model. Those kind of experiences I think are just as applicable to or a unit test for this model. Those kind of experiences I think are just as applicable to a hardcore analytics engineer who's used to working with dbt at the command line as it is to an analyst at the other end. And so I think it will really be up to people.

Starting point is 00:24:53 Do I think that analytics engineers will be in the visual auditor 24 seven? No, right? I think it'll be a less used surface for them. But just kind of like all dbt cloud, there's different points where you might wanna engage. Imagine if there is a dbt 101 course in the future, where new people new to dbt are encountering that the product first of all, do you think they'll use the visual editor first?

Starting point is 00:25:14 I think it's an interesting question, right? I think it will depend a lot on their familiarity with dbt. It'll depend a lot on their familiarity with data modeling, honestly. But the more that we can do to make your first touch with dbt, that same delightful moment that you have at the command line, I think that that's great stuff, right? And so I think it's reasonable.

Starting point is 00:25:34 Yeah, I mean, it definitely isn't a less skilled job. I mean, it's interesting in back in my old world of kind of enterprise data warehouse and projects and Informatica and those tools,. You certainly got very, very complex and intricate mapping built in Informatica, for example. They'd be very complex. You'd go on a training course for a week, two weeks to learn these. You'd just build it with a visual editor with a terrible source control system in the background. No. It's funny, it wasn't any less technical. But it was a lower status job. And the interesting thing is, is that certainly, I think the great thing that

Starting point is 00:26:12 analytics engineering did was kind of elevated the role of the kind of the person doing the transformations from being the person doing the kind of the routine, but less glamorous work to being this kind of, you know, sort of very kind being this job everyone wants to have, this is the job of an engineer. It's interesting. It'll be interesting to see how it affects the audience for this. But certainly for addressing the addressable market for dbt, you can see why it's going to massively increase it. Yeah, I think that's exactly right. It's funny. I saw a very similar thing at MongoDB. You used to have these dedicated database engineers. In a lot of cases, they were viewed as blockers.

Starting point is 00:26:47 It was like, well, I can't make this schema change I want to make. This is blocking my app from rolling out. The frustration you feel. Businesses recognize that. You have a very different brand internally if you're the person who's always saying no rather than the person who's saying yes. I think that tool bringing the ability

Starting point is 00:27:04 to just quickly stand up an app to anyone, it sort of shifted the whole narrative, right? It changed things from a negative to a positive, it changed things from we can't build this to how can we go build this right now, right? And I think that's where the visual editor takes you, right? If you're downstream team, if you have to ask the central data team every single time you need a new model built, and you don't even know if that model is going to be useful, the two of you are going to get

Starting point is 00:27:23 frustrated with each other, right? This really is a process problem. It's going to be like, I asked for this and you built this and that's not quite what I needed and now I didn't use it. And now you go, well, you didn't use the thing I built you last time. Why would I build you anything in the future? There's real human problems here. And I think what's really exciting is now the downstream team could go and try something out themselves and get to explore that and make sure the thing they're asking for is actually going to be really useful for them. And that's why we're so passionate about it and kind of what makes this really exciting. Tell us how the version of control part of the visual editor

Starting point is 00:27:50 experience works. And I suppose multiple people working on the same project really, how does that work now? And how is it going to work in the future? Yeah, so I think you have to start kind of from first how we saw people working today, right, which is you can sort can do anything you want. It's the Wild West. And it was really actually important for us to maintain that first touch experience just in a way that wouldn't affect your teammates

Starting point is 00:28:15 and wouldn't affect the underlying warehouse. And so I mentioned this preview experience. You can go build and it will automatically reference your staging models that you have in DBT. Go out and build a whole workflow. You can see what the data looks like. You can mess around with it as much as you want. And as long as that's happening with just previews, it's not actually affecting the warehouse at all, right?

Starting point is 00:28:32 You're not building anything. You're just kind of exploring. When you decide, hey, this might be useful for me, you have the option to actually go ahead and run that model, and it'll build into your local development schema. You can wire that schema up to any downstream tool you're using, right? So maybe you want to go build a dashboard in Hex or in Power BI, you want to go take a look at how the data is actually working. You can just integrate that schema into there.

Starting point is 00:28:52 And again, using the power of deferral in dbt cloud, we won't build anything that you don't need to build. We'll refer to staging back for all of that. And then that's the only time where you actually hit the Git flow. So you go ahead and actually, this is useful for me. I want this model to make it back into production. I go ahead and I create a pull request that then goes back against the main project. And maybe that's reviewed by the central data team.

Starting point is 00:29:14 If I'm working in the main project with mesh, I could be working in my own kind of cordoned off project that just references public models. And so then within that mesh project, I can go ahead and approve my own PR, don't do that. But I could go ahead and kind of get something back into production quickly, right? And kind of follow the rules of the repo that I'm actually in. And so we think this gives us a good balance, right?

Starting point is 00:29:34 Between kind of being able to be hands-on and ad hoc and move quickly, with being able to say, okay, this actually matters for the business. And maybe I want an analytics engineer to actually get eyes on this and make sure it's performant and make sure it's written the way that we sort of want our code to be written here. And yeah, that's sort of the Git flow there. I've been using dbt cloud a bit more

Starting point is 00:29:53 myself recently. I hadn't touched it for a while. I'd been losing other tools, but I went back to dbt cloud and I was actually surprised at how much new functionality there was compared to what I used to know as being the product a few years ago. So a couple of years ago, all it really was was maybe the scheduler and quite a basic IDE for doing coding. But again, just paint a picture of what dbt cloud looks like now to someone who maybe hasn't used it for a couple of years. It's quite a bit different really, is it? Yeah. Yeah. I think there's a lot of functionality that we've been able to ship. And again, all of this has kind of been, you go talk out to all of the customers.

Starting point is 00:30:27 DBT is used everywhere. This kind of blew my mind when I first joined. It's like the smallest nonprofits find space in their budget for DBT all the way up to the largest kind of, you know, Fortune 500, Fortune 100 companies. It really is applicable across the entire stack. And so the question is like, what problems do those companies have when they work with data? It's actually a really common set, right? And so I would say now in dbt cloud you have Explorer, which is a great lightweight data catalog, right? Kind of pulls in all the information about

Starting point is 00:30:55 not just your dbt models, but also how they're being used downstream can make recommendations to you on how you might want to tweak or modify those models. Really just kind of a catalog for dbt specifically, and kind of expanding out from there. The orchestration capabilities, I think have gotten better and better. We have features like advanced CI now, where we can actually go ahead and show you the difference between a model that you initially built

Starting point is 00:31:19 and the new model when you go ahead and make a change in CI related to sort of data observability and making sure that when you introduce a and make a change in CI related to data observability and making sure that when you introduce a change, you don't break something downstream. So I think that's been great. The IDE itself, I have to call it, the IDE has gotten a lot better, gotten a lot more performant, a lot new features there.

Starting point is 00:31:36 We launched the Cloud CLI, which is a way to work with dbt the same way you would from the command line by getting the power of dbt cloud. So that means deferral, which I mentioned before. Don't build models you don't need to build based on your current change. Also support for dbt mesh, which is the ability to have multiple different projects that sort of define an interface to each other. So you can say these are the public models available to downstream projects from maybe my main analytics project and kind of chain things together and have a real, a real set of governance rules around those projects. That's really exciting. So those are the big ones that come to mind for me

Starting point is 00:32:08 kind of explore a lot of improvement to the developer experience, this advanced CI functionality we've built out and then kind of just incremental improvements to the platform across the board as well, including the visual editor. So again, something else I picked up on at KLS last year was the idea of kind of one dbt, right? So I think one of idea of 1DBT. We're a dbt labs partner and one of the challenges that we find in recommending dbt cloud for customers is there's quite a jump sometimes in cost and the perception of the difference between dbt core and dbt cloud and things that you sell really. So again, if it's not your area, please say, but this concept of one dbt, what's it about and how is it trying to address that gulf really? Yeah, I mean, I think for us it's that dbt should feel like dbt across all of the

Starting point is 00:32:59 surfaces, right? So whether you're using dbt core locally at the command line, whether you're using the cloud CLI, whether you're using the IDE, whether you're in VSbt core locally at the command line, whether you're using the cloud CLI, whether you're using the IDE, whether you're in VS code, and all of these experiences really are dbt experiences and dbt services. You mentioned some legacy vendors and providers before. I think a big difference between dbt and those providers, you can't take your code out.

Starting point is 00:33:20 You can't actually just go run that on your own. That's such a core part to the business is kind of the community and the breadth of access, right? Um, and kind of leaning into that, I think, thinking about how do we, uh, have features that maybe land in core and support core users, right? How do we have kind of this one surface area that spans across, um, all the different ways you can interact with dbt and really thinking about them

Starting point is 00:33:44 together, right? Because there, there is always this tension at an open source company, right? I saw this at Mongo for sure. You have the open source product and then you have the paid product and so what goes where, right? Where do things land, right? And I think there's a lot that's going to be coming in the next few months around SDF, et cetera, that really emphasizes that one dbt message. And really emphasizes that no matter where you're working, you should have a delightful experience with dbt. Okay, okay. And what's what's the data control plane in dbt, dbt world?

Starting point is 00:34:14 Yeah, so I would analogize it over to Cloudflare, where I also worked right Cloudflare is sort of the the front door to the internet, right, you put your website behind Cloudflare, and your client and your website's protected, it's cached, you have access to this globally distributed compute network that really defends your website, right? And you can think of transformation as a similar position for your data warehouse, right? It's how you actually structure your data warehouse. Sort of the first thing that happens after the data is loaded is it's transformed into an actual

Starting point is 00:34:41 usable format. And so what that means is whether you're going ahead and you're an analyst who's going ahead and querying this data, or if you're someone on the data engineering side who needs observability and kind of monitoring on this data, they all sort of base what they're doing off of the models themselves, off of that transformation layer. And so I think data control plane

Starting point is 00:34:59 is really about this idea that DBTs should do a little bit of that in each place, right? We have great partners, there's great tools that go really deep on each of these options, but it should just work out of the box, right? You should have some sort of observability tooling out of the box. You should have some sort of catalog out of the box. And I think that's where you're seeing us sort of move towards is really thinking about how do we provide a great experience that may not be entirely dbt model centric, right, it may be more warehouse

Starting point is 00:35:26 centric, that can think about things like cost optimization, kind of the total cost of your warehouse. And I think what the great thing about dbt is you get to start from the transformation layer, which is that front door to the rest of the warehouse. Okay, do you see do you see the semantic layer being part of part of the coverage of the visual editor in the future? I presume the answer is yes, but do you think there are some opportunities there or is it

Starting point is 00:35:50 part of your thinking? Yeah, absolutely. I mean, we talk about this all the time. There's a way to generate semantic models in the IDE now with AI. I think over time, you'll see us add more and more support. As an example today, we don't support materialization modes for your output models in the visual editor just because we haven't built that yet. You talked a little before about saying no and what comes now versus later. I think semantic models are absolutely on that list of things that we would like to have down the road. Because

Starting point is 00:36:17 if you think about an analyst coming into the dbt cloud environment, they would love to build a model, sure, but they would also love to build a semantic model that they can then go use in any of their downstream maps. I think absolutely coming is just a question of priority and one. Okay. To wrap things up really, getting into the I suppose details of access and so on, right? Is this feature GA yet? I don't think it is, but is it GA yet? How is it being packaged in terms of licensing and so on? We're not generally available yet. We're still in a private beta. We're working with close customers. I mentioned those, what transformations do people need? Questions, how do people want us to handle seeds and upstream sources? All these questions we're working through.

Starting point is 00:37:00 We're getting some pretty good answers there. I'd say we're going to have a lot to say in the next month or two around GA and kind of making access more broad. And then same thing on kind of the licensing question. It's honestly licensing for me every project I've launched ever. It's always been a race to the finish, right? To figure out exactly how you're going to price and package things. And so we're still working on this. So I would really encourage anyone who's interested in trying things out, reach out to me. Firstname.less, they have a dbt labs. We're happy to have more people in and kind of testing out things. And we really appreciate that. And that's kind of how I would talk about access going forward. Yeah, sure. And so there's a page isn't there not on the actual dbt labs website, we've got some basic details about the feature in there. That's right. Yeah. So between now and presumably

Starting point is 00:37:44 sort of coalesce, hopefully we should see this out in sort of production and people can get their hands on it, really. Yeah, I would expect you'll see it significantly before coalesce. Next couple of months or so, I think we'll be making an announcement about that. Fantastic. Anything else coming down the line

Starting point is 00:37:58 with DBT Labs that we could be told about? I mean, obviously you mentioned the SDF thing, but anything else that's interesting coming down the line? Yeah, the SDF thing is so cool, right? I think you can sit there for a long time and just kind of game theory out all the ways that having a compiler for SQL really opens up new opportunities, right?

Starting point is 00:38:15 I think you're gonna hear some exciting stuff there. I think there's ways to think about, yeah, just really having a semantic understanding of SQL, what that can do for the developer experience, what it can do for the developer experience, what it can do for teams that are trying to monitor their warehouse and operate the actual warehouse itself. It covers every angle.

Starting point is 00:38:33 And we talked a little bit about transformation as the front door to the actual warehouse. You can go a lot of places once you're behind the front door. I think that's really cool. So I think you'll be hearing a lot more about that the next few months. Tell us how, give us an example then. On the basis someone wouldn't be understanding that, why does understanding the actual meaning of the sequel and so on, why is that so important?

Starting point is 00:38:55 Yeah, definitely. I'll take an example from the visual editor. Let's say you go ahead and inadvertently build a join. Sort of causes this explosion. You do this inner join and you kind of pick the wrong column to set your equality condition on. If you do that in a normal tool that's just parsing the SQL,

Starting point is 00:39:17 it has no idea what the actual impact of this is. And so every data warehouse actually has a cost-based optimizer in it. It will sit there and go, how much is this actually going to cost me to run? And is this going to be, what's the fastest way to optimize this? What's my query plan? What am I going to do? But that logic is all in the warehouse.

Starting point is 00:39:34 It's after you've already run the query. For lack of an interpret, you're screwed at that point. You're already going to pay the cost for running that query itself. Whereas if you can bring that logic in earlier and say the visual editor understands the query plan, it's possible to validate that and say like, hey, actually this is gonna cause a huge explosion of costs and this query is gonna run for two hours. Is that what you wanted?

Starting point is 00:39:53 Probably not, right? And so I think that that's just one small example where you can put a little flag onto the join node and say, hey, this probably isn't the right condition for you right now. Or maybe it is and you wanna go ahead and do this, but probably not, right? That's sort of the semantic understanding you can have. And you can only get that if you're actually compiling the sequel and going ahead and understanding

Starting point is 00:40:12 the cardinality of these two tables that are being joined together, if that makes sense. Fantastic. It's been brilliant speaking to you, Greg. Thank you very much. And look forward to seeing the feature coming out, probably. And it's been fantastic to have you on the show. Thank you very much. Yeah, thank you, Mark. This was awesome. Thank you.

Your Ad Here

Drill to Detail - Drill to Detail Ep.119 ‘Data Control Planes, SDF Labs and dbt Cloud Visual Editor’ with Special Guest Greg McKeon

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.