Drill to Detail - Drill to Detail Ep.115 ‘Airbnb, DataOps and SQLMesh’s Data Engineering Innovation’ with Special Guest Toby Mao

Episode Date: November 1, 2024

Mark is joined in this latest Drill to Detail episode by Tobias (Toby) Mao, CTO and co-Founder at Tobiko Data to talk about SQLGlot, the innovation culture at Airbnb and the data engineering challenge...s solved by SQLMesh.SQLGlot Introducing Minerva — Airbnb’s Metric Platform SQLMesh homepage Tobiko Cloud Running dbt in SQLMesh dbt Mesh Dlt (Data Load Tool)

Transcript
Discussion (0)
Starting point is 00:00:00 So welcome to the Drill to Detail podcast sponsored by Rippman Analytics. I'm your host Mark Rippman and I'm very excited to be joined today by Toby Mao, CTO and co-founder of Tobiko Data, people behind SQL Mesh. So Toby, great to have you on the show. Thanks for having me mark so um so toby tell us a bit about yourself really and and how you got into this industry uh yeah so i got my start in my career as a pharmaceutical consultant i was a sass and kind of eba programmer working on excel and stuff and stuff like that and i enjoyed working with the data, but I realized I didn't really like making decks as much,
Starting point is 00:00:48 and so I decided to become a real software engineer. And I got my first software job building an iOS app for magazines. I realized then that although I enjoyed software engineering, I didn't really enjoy building UI, so I moved on to a company called Script, where I started data engineering. I started building recommendation systems for Script. After, so I moved on to a company called Script, where I started data engineering. I started building recommendation systems for Script. After that, I moved on to Netflix, where I led the experimentation team, building out their experimentation infrastructure.
Starting point is 00:01:15 And then after that, I moved on to Airbnb, where I led analytics. And that included both the experimentation team as well as the metrics team, which built out Minerva, Airbnb's semantics layer. Okay, interesting. So many consequential data companies have come out of people who've worked at Airbnb. So obviously there's Max Buschermann and other things and so on, but why was Airbnb such a great place to work in this area, and why do you think it's spawned so many people and companies? Yeah, it's interesting.
Starting point is 00:01:50 Airbnb has a lot of people contributing to data. It's a lot of people writing and contributing metrics and pipelines. And so the culture there is a little bit sort of the Wild west but in a sense it means that there's a need for these kind of data tools right they they kind of have a democratized approach to people getting data and metrics and so that's why tools like airflow and minerva kind of arised there it's interesting that's interesting so so um obviously, we know you now from Tobika Data and SQL Mesh. So tell us maybe the founding story of Tobika Data and SQL Mesh. And I suppose, what was the problem you were trying to solve at the start with it, really?
Starting point is 00:02:35 Sure. I think it started at Netflix in the sense that at Netflix, I was kind of building tools to allow data scientists to contribute to build metrics for experimentation. And the data scientists like to write SQL, right? But we had a challenge back then. And big companies like Netflix and Airbnb, they generally use Spark and Presto, which are very different dialects. And so if a scientist just contributes a Presto query, then it can't really run in Spark. And so I realized that there was a need for this transpilation process. And so at Airbnb, I created SQL Glot, which is an open source SQL parser and transpiler to gain the ability to be
Starting point is 00:03:18 able to swap between dialects. At Airbnb, I really started to use it in production, and I introduced it into Minerva. And that really kind of boosted Minerva's ability to be able to kind of understand SQL, execute it across Spark and Presto, etc. While I was at Airbnb, I really wanted to kind of build a metrics company because I saw the potential of Minerva. And I got a demo of dbt while I was there. And when I saw it, I had a lot of questions. And I asked the person giving the demo, how do you just backfill one day of data? Or like, how do you make a dev environment? And the answer I got was, well, you don't really do that. You can just refresh your entire warehouse. When I heard that response, I was like, what? You can't refresh your warehouse. At all the companies I've ever worked at, you can't just refresh your warehouse. You need to do things incrementally. You need to be
Starting point is 00:04:19 able to backfill, have one day of data. You need to have state, right? And so at that point, I realized that there was kind of an opportunity here to make a data transformation framework that could scale at any company as well as leverage SQL Glot to really have a first-class experience with SQL as opposed to just kind of sending raw string templates using Jinja to their warehouse. Okay, okay. So I guess, I mean, my background prior to working in this area, I used to work for companies like Oracle and work with technologies like Informatica and so on, that arguably have solved those kind of problems in a different way. But certainly what you're looking to do wasn't something new. Why did you and why did airbnb build something new um uh and say netflix and so on rather than going out and buying this technology from an enterprise
Starting point is 00:05:10 render you're saying why don't we just use like informatica or oracle or something like that yeah yeah yeah i mean what was your thinking at the time yeah well a couple things here those kind of etl tools are um kind of gui based right and of GUI-based, right? And they're also tied to, you know, proprietary technologies like Oracle, and those kind of things don't really scale at web companies. So, like, for example, at Netflix, you know, we would be getting, I don't know, 2 million events per second, right,
Starting point is 00:05:43 storing petabytes of data. And so trying to do that in Oracle would probably not be possible. And so those companies are mostly built around open source technologies like Presto and Spark. And so there's a lot of custom things that you need to do there to make things work like that. And so trying to do that in an ETL GUI builder just isn't really scalable or something that companies with a lot of data engineers want to do. And so instead of trying to like build a GUI tool, our focus was really to build tools for developers, tools for data engineers.
Starting point is 00:06:22 Like we understand that data engineers are good they know how to code how can we build tools to make them more effective that that was kind of our starting point okay okay so just maybe just a sort of high level really just set out what sql mesh does um and um uh yeah just give us a kind of like an overview of the functionality and what it does and what it doesn't do, really. Sure. So businesses make decisions based on data, right? And so there's a lot of data flowing in from your applications and you need to kind of transform or move and apply business logic to that data so that at the end of the day, an executive or somebody can get some metrics and make some business decisions. And so what SQL Mesh does is it gives you a framework for developers to write either SQL code or Python code.
Starting point is 00:07:14 And with that, because SQL Mesh actually understands your SQL, it can do things, a lot of things for you. For example, it can tell you the exact order in which your SQL files can run, right? Because we can parse and understand the SQL, we can infer all the dependencies automatically without you having to explicitly tell us, and then we can run them. Additionally, it keeps track of when and how often these models need to run so that you don't have to manage that manually. And it gives you the ability to kind of test and deploy these SQL queries. And so all this comes together
Starting point is 00:07:53 so that the engineers or the developers can really just focus on writing their business logic and handling things like operations, running it, backfilling it, all those kind of things to the tool. Okay. Okay. And another area I think that, obviously having used SQL Mesh myself, the idea of versioning of environments and so on, that was something that seemed to be a concept that I hadn't seen in tools like dbt. I mean, is that a fundamental part of how SQL Mesh works?
Starting point is 00:08:21 What does that involve? Absolutely. So the virtual data environments is one of the core parts of SQL Mesh, right? What it essentially allows is a couple of things. One, it allows you to actually have a zero cost development environment using production data, right? So traditionally, when you're doing kind of development data, it's very difficult today because either you're going to be using a different warehouse with fake data.
Starting point is 00:08:51 The problem with using fake data is now your analysis is not representative. And for example, if you're doing a machine learning model, you're doing inference off of fake data. So it's really not useful. And if you're using production data, it can be expensive because you don't want to have to copy everything to a testing bed. And so what SQL Mesh does is that it understands and has an abstraction over data. And so we split it between a virtual layer and a physical layer. So the physical layer are where all the actual tables are stored. And so when you
Starting point is 00:09:22 create a development environment, we just create views in a developer namespace and point those views directly to those physical tables. And so in development, you get a full copy, a full isolated copy of all the production data. And then whenever you make changes, SQL Mesh understands that those changes are different from what's in production. And so that it's going to only backfill
Starting point is 00:09:44 and create those necessary additional tables, giving you kind of a seamless and cost-effective experience. And tying it all together, like when you want to go to production, since you've already backfilled and done all the work and development, you can actually promote these to prod directly without recomputing them.
Starting point is 00:10:02 Using a tool like dbt, you kind of merge it into main and it reruns. So you don't really know what's going to happen until it happens. With SQL Mesh you can actually validate test all your changes and then instantly deploy those tables to prod by just switching the
Starting point is 00:10:17 views. So it's the first of its kind like a true data deployment. Okay so forgive my ignorance but the features in say Snowflake for doing zero copy clones and so on. So, and that's quite useful. I mean, how does what you're talking about compare to or differ from the ability
Starting point is 00:10:33 just to create these kinds of clones of production that only then get written to or whatever if they change? Yeah, that's a great question. So one way that you could do a development is to do a zero copy clone. And sometimes SQL Mesh does do that. But the issue with that is that clone is a clone, right? It's going to diverge.
Starting point is 00:10:51 And so if production ever updates or gets new data, you're not going to get that in development, right? And so the issue with a clone is that it goes stale and it diverges. Additionally, once you start developing on that clone, you can't merge it back to production because now it's different, right? You're not going to clone a clone to go back to production. And so using the actual physical tables in development with the right associations is much more powerful and makes sure that the data is correct, unlike a clone, which will go stale. Okay. Okay. So you mentioned earlier on when we were talking about, you said a couple of times
Starting point is 00:11:23 this word transpilation, okay? And I think we all nod at that point and say we know what you mean, really. But what do you mean by transpilation? And why was that a foundational, I suppose, technology and innovation that is useful and led to SQL Mesh? Right. So there are many, many SQL dialects out there. That's because every warehouse has slightly different SQL. And first of all, although there's an ANSI SQL standard,
Starting point is 00:11:52 no one adheres to it. And these warehouse vendors, they're kind of incentivized to make their SQL obscure because then that kind of increases the lock-in of using that particular vendor. So transpilation for us has a couple different benefits. One is that there are many companies who want to switch, who don't want to be vendor locked, right? And so they leverage SQL Mesh in order to ensure that they don't have a very costly and time-consuming migration to go from, for example, Redshift to BigQuery or something else or Snowflake to Databricks. So that's one use case, kind of the migration story. The second is actually unit testing. So let's say you're using BigQuery or you're using Snowflake and you want to unit test your data.
Starting point is 00:12:38 So actually, let me clarify real quickly. Unit testing in data is different from doing a data quality check, right? A data quality check is like checking for nulls, checking for uniqueness. It's something that you do after a run to check upstream data quality. A unit test, on the other hand, is given this set of fixed rows, what is the expected output? So I would say today in the industry, most people don't do unit testing. And a big reason for that is it's difficult because Snowflake and BigQuery cannot run locally, right? They're like, you know, proprietary warehouses that run the cloud. And so something
Starting point is 00:13:16 that we've done is we use transpilation to transpile your Snowflake queries into DuckDB so that now you can easily validate your business logic in CICD without touching the cloud. Yeah, yeah. I mean, actually, one of my colleagues built an open source project that lets you do that using DBT. But it's hard work. You know, the idea of being able to do your testing in DuckDB makes a lot of sense because it's a lot cheaper. But actually, to get it all then to tie together and have the same dialect and it all still work is complicated. So why wouldn't you just use Presto for that? I mean, isn't that, isn't that, is it Presto the one that the, the, the database
Starting point is 00:13:51 layer, I suppose, that will translate between different sort of like platforms. Is that not the same thing? Not really. I think Presto has some federation, but in terms of the dialect, the Presto dialect is very specific. Right, right, right. So it's more like Federation. Yeah, yeah, yeah. Okay, it's interesting. So I suppose the elephant in the room here is DBT, right? So to the average person listening to what you're saying about the features of SQL Mesh,
Starting point is 00:14:19 they would appear to be very similar things to what DBT does. And so conceptually, it's line it's it's a toolkit and so on there um so so and even to the point of dbt has this thing called dbt mesh so fundamentally how is your product built differently to dbt and why would somebody consider what was the benefit of the approach you've taken really so i think dbt DBT is great, and DBT has done a lot for industry, right? It's a huge kind of like step in the right direction to move away from things like stored procedures. I guess like I would view SQL Mesh similarly in that it kind of develops on the shoulder of giants, right? And so SQL Mesh takes a lot of ideas from DBT but tries to improve on them.
Starting point is 00:15:06 So I would say there's a couple key differences between SQL Mesh and DBT and how they were designed. One, SQL Mesh was built first and foremost to have state, right? So I think DBT's design is really built around a stateless nature. And so that's kind of why people default to things like full refresh. And even if you use incremental models, it's kind of an advanced use case in the dbt world
Starting point is 00:15:33 because you have to manage it yourself. There's no backfilling, and you have to alter your logic to get incrementality to work. SQL Mesh, on the other hand, was designed with backfilling and incrementality as first class citizen, because I wanted it to work at all the companies I've worked at, right? And fully refreshing your data warehouse is not an option. And so in order to accommodate that, you have to understand like what has been done and what needs to be done. And in order to do that,
Starting point is 00:16:05 you need to have state. So SQL Mesh basically keeps track of all of these things in order to be able to leverage this. And as a simple example, with SQL Mesh incremental models, you can actually specify a start date and end date. And SQL Mesh understands for a model what those should be, right? DBT recently introduced a similar concept, but because there's no state, it doesn't really work well. And there's like a lot of issues that could happen if you try to use that. It basically just defaults to yesterday. And so if you skip a day, you're going to have data gaps, which is a big problem. But SQL Mesh stores all this information in a state database and then is able to populate it appropriately. So it's not a new idea either, right, with SQL Mesh and state.
Starting point is 00:16:55 Tools like Airflow and Dagster, they all have a concept of state. And so SQL Mesh just melds that together. So that's one of the biggest differences. Okay, okay. Any others in there? Anything else that's like one of the biggest differences. Okay, okay. Any others in there? Anything else that's significant about how you built it? Yeah, absolutely. So SQL Mesh actually understands SQL, right?
Starting point is 00:17:13 We felt like a data transformation platform that operates on SQL should understand it, right? And so with dbt's approach, and this is because, you know, SQL Glot didn't exist at the time dbt was created, but they just use Ginger, right? And so it's basically a bunch of string concatenation and templating. And it doesn't really know anything about the SQL, and it just sends it to the warehouse. And so that's why things like table dependencies are manual, right? In dbt, if you want the DAG to be built, you have to manually specify ref. And so SQL Mesh, because it's built on SQL Glot, doesn't need any of that. It actually understands the SQL and you can get all your dependencies automatically. Additionally,
Starting point is 00:17:56 you even get syntax and checking, right? So for example, in SQL Glot, or sorry, in SQL Mesh, if you miss a parentheses or you name a column wrong, right, you reference the wrong column, SQL Mesh will tell you ahead of time and give you a compile time error or warning telling you that what you've done is incorrect. And so one of the benefits of that is developer time and cost, right? Because with dbt, if you make a mistake like that, you're going to be running 5, 10 models waiting 15 minutes before the warehouse tells you, oh, oops if you make a mistake like that you're going to be running 5 10 models waiting 15 minutes before the warehouse tells you oh oops you missed a parentheses or oh oops you referenced the wrong column with sql mesh because it's a compile time check we can let you know instantly
Starting point is 00:18:34 and you won't have to waste time kind of waiting for the warehouse to tell you that you did something very silly okay that's interesting because it sounds i mean i've worked in in database sort of development before as well and it sounds sounds what you're talking about sounds to be getting i suppose there's a bit of a blur there between features that the database would do as well so the database is able to understand obviously it knows understands its own sql it understands you know if you if you're using bigquery it would and you get a column name wrong it might say to you well do you mean this column so i suppose how do you decide where that goes in the database and where it should be in the tool?
Starting point is 00:19:08 And should SQL Mesh be closer to the database, really? Well, it's all about developer experience, right? And so we're trying to, ultimately, SQL Mesh is a tool to make developers more productive and more effective. And so whatever we can do to achieve that, I mean, we're going to do. And we also want to do it in a way so that you're not vendor locked to one warehouse. And so if you want to do something in BigQuery, but then the next day you want to do it in DuckDB or Snowflake, we want to enable that as well. Okay.
Starting point is 00:19:34 Okay. So a big part of the name, SQL Mesh, you've got mesh in there. All right. So how much is mesh, the data mesh concept part of what you're talking about? And how does your implementation of that compare to say the dbt version yeah so the kind of data mesh and data contracts is a core part to how sql mesh works
Starting point is 00:19:54 so because we understand the sql right you get automatic column level lineage i would say that most of the column level lineage tools that other products use today are built off of kind of our work, right? Because SQL Glot released it. Anyways, so SQL Mesh has a core concept of column level lineage.
Starting point is 00:20:13 And that means right off the bat, whenever you make a change in SQL Mesh, we can tell you exactly what is impacted downstream. We know for every model that uses those columns, right? And we can say, hey, you made this change. These downstream models, they've been affected. What do you want to do about that? And so our approach to kind of contracts and data mesh
Starting point is 00:20:40 is like really in terms of automation because systems that rely on manual defining of schemas and updating them they go out of date they're a hassle and it's hard to use additionally because sql mesh has a shared state you can have multiple repositories all interacting with each other so if you have multiple github repositoriesories with SQL Mesh projects in them, you can link them together and get column-level lineage and backfilling throughout the whole system. So how does that differ from dbt?
Starting point is 00:21:13 Well, the dbt kind of data mesh, first of all, is I think it's not available in open source. I think it's available only in the cloud version. But additionally, it's very manual, right? You have to manually specify all the columns and schemas by hand. So first of all, that's tedious. But second of all, it goes out of date, right? Additionally, I believe it's only to the effect of like metadata, right?
Starting point is 00:21:41 It doesn't really allow you to do things like do backfilling across repositories. It doesn't even allow you to see impact across repositories. And so I think it's kind of more like a facade of an API. And so we view what dbt has done as more like a schema contract, as opposed to like a true kind of, you know, data mesh, right? Because if you have a true data mesh, you should be able to kind of like interact with things and control things throughout the whole system and understand the whole flow of data, right?
Starting point is 00:22:16 And so that's kind of how we view the difference. Okay. So let me talk about the developer experience. So I've been playing around with SQL Mesh myself, and I was particularly taken by the fact that, you know, you just use the plan command, for example, very much like sort of, I've got the name now, the tool you use to lay out infrastructure. Yeah, Terraform.
Starting point is 00:22:36 Terraform, I can't remember that, yeah. So SQL Mesh uses the plan command, and I suppose the whole experience is quite elegant. So maybe just maybe just kind of walk through what a typical developer sort of life cycle will be with your tool and again where you've done the innovations there sure so with SQL Mesh we did take a lot of inspiration from Terraform right be being able to understand your changes before you actually apply them right and we do that because of our background, because at companies like
Starting point is 00:23:05 Airbnb and Netflix, running a model, running SQL can be expensive, and it can take a long time. So with SQL Mesh, you would make a change to your code, you know, you would change your SQL, and then you would do SQL Mesh plan dev or something like that. The first thing that's going to do, it's going to say, hey, you made these changes, and all of these downstream models are effective. Do you want to apply those? So if you say, okay, what it's going to do is it's going to figure out only the changes that you need to run. So all the other changes that don't need to run, they don't need to do anything, right? Because SQL Mesh already can use the virtual environments to reuse all the data. Then it's going to do the minimal amount of work in
Starting point is 00:23:46 your dev environment to basically make what you've declared in your SQL true in your development environment. So now you've got a full development environment with exactly your changes. You can validate it, you can do data diffing, you can do all your checks. and then when you're good with that, you can either go directly to prod or you can use the CICD bot to basically make a PR and go through that process. When that gets merged into main, SQL Mesh will automatically swap the data that you've done in your staging environment to prod instantly. So it's like a true blue-green deployment, right? And so we really wanted to kind of make the developer lifecycle of data with SQL Mesh what you would experience with modern software engineering tools.
Starting point is 00:24:36 Okay. And so interestingly, at the KLS conference, dbt Labs had launched their graphical interface for DBT. So the kind of the visual experience there. I know that obviously SQL Mesh has got a graphical IDE. Where do you draw the line there about how far you'd go with that? And would you see SQL Mesh at some point having a point and click environment or is that maybe a line too far really?
Starting point is 00:25:01 Well, who knows where we'll be in the next couple of years. But I would say that for sure, for the immediate future, our focus and our goals is really to empower the data engineer and the analytics engineer. So people who write SQL and Python, so people who write code. I think that kind of the UI point and territory is is a little bit of a different demographic and given that we're a small team it's not something that we want to really spread out to at the moment yeah okay so that's interesting point because you you also it's interesting to understand how far you go with the tool and what how many problems you try and solve and so so where would the boundary be really in the area you want
Starting point is 00:25:46 to play in? Would you see yourself as covering the functionality of Airflow or tools like DLT? How do you play with others and where's the boundaries of the product, do you think? Sure, that's a good question. So I would say that we do not go into real-time space. So if you're looking at real-time ingestion, so DLT or Fivetran, that is not what SQL Mesh is for. Like SQL Mesh can do up to five-minute micro-batches. So every five minutes, you can run SQL Mesh and that will be handled incrementally. But if you want like real-time, low-latency, sub-second streaming processing, SQL Mesh doesn't do that. And so, you know, we've integrated directly with DLT,
Starting point is 00:26:30 we have a DLT like kind of adapter, where if you have a DLT project, you can kind of create a skeleton SQL Mesh project built on top of all your definitions. Now, in terms of Airflow, Daxter, etc. I would say, there's a couple things here. One is, we do integrate with those as well. Like have integrations with Airflow currently, but we have seen a lot of people using SQL Mesh directly without Airflow. Because in a sense, they operate in similar spaces. They can run things in the right order. They can handle dependencies and all those kind of things. They have scheduling. They have audits and a UI to manage that. And so SQL Mesh can either work with Airflow or replace it, I would say. Okay. What about the semantic layer?
Starting point is 00:27:19 So you mentioned Minerva a little while ago, and I met your colleague at the Cube event at Coalesce so do you see it is is a semantic layer naturally a single standalone product or do you see that as being i know where again do you see the boundaries there around around um the semantic layer part of uh of the stack so originally as i said when we started the company we were thinking about doing a semantics layer but ultimately we decided not to go there first because in order to have good metrics, a good semantic layer, you need to have good data, right? And so we really focused our efforts
Starting point is 00:27:53 on building the best transformation platform we could. Now, given our backgrounds in semantics layer, it's definitely somewhere where we wanna go. And we actually already have a prototype of what a semantics layer with SQL Mesh could look like. So I would say, you know, it's definitely on the long-term roadmap, but given where we're at, you know, we're a small company and we're trying to make our existing customers very successful with transformation. It's not our primary focus at the moment. Okay. And so you talk about your customers. I heard a rumor that you'd been adopted by
Starting point is 00:28:27 Fivetran. You'll be being used at Fivetran quite a bit, which is interesting. Is that the case? And maybe tell us a bit about within the boundaries what you can, what they're doing with it. Yeah. I mean, at the end of the day, many customers come to SQL M with similar problems right they have an existing transformation framework maybe it's dbt maybe it's raw airflow right and these companies want to a reduce the money they're spending on transformation so reducing the money they're paying bigquery but to increasing the developer productivity right they shouldn't be spending 15, 20 minutes waiting for the system to tell them that they're missing a parentheses, right?
Starting point is 00:29:12 And so when they adopt SQL Mesh and give it a try, they can very quickly see the tangible benefits, right? They can see the value of having compile time checks. They can see the value of having development environments and they can see the value of having state. The first time you run plan, it's going to be a similar time to running a dbt. But then the second time you do it, you're going to see the state kick in because SQL Much is like,
Starting point is 00:29:38 ah, you've already done this. You don't need to do this again. You only need to do this very small thing. And that's really where the savings comes in, right? Because you're not constantly reprocessing data naively. Interesting. So what's the business model for your company then? Because you've got Tobiko data and you've got SQL Mesh, which is open source. So how do you make money? Yeah, like many open source companies, we kind of have a similar model. We have the open source product, SQL Mesh and SQL Bot. And with Topeka Cloud, it's a hosted
Starting point is 00:30:11 version of SQL Mesh with additional functionality. So you get hosted state. So one of the big factors of SQL Mesh is state management, right? And so you can do that yourself in open source, but it's a bit challenging. But with cloud, that's all managed for you, right? And so you can do that yourself in open source, but it's a bit challenging. But with cloud, that's all managed for you, right? You get the flagship observability product in cloud, which tells you things like run times, costs, and it's a great way to debug things because it tracks all the audit history. You can have custom metrics, et cetera.
Starting point is 00:30:41 Additionally, the SQL mesh that's included with Cloud is actually much more advanced. So I kind of alluded to SQL Mesh core being able to do things like understanding what needs to be done, right? So for example, let's say you're making a change and you add a column, right? In SQL Mesh Core, it knows that while you added a column, nothing downstream is using that column. So it's safe to not recompute everything, right? But it's very naive. That's all the core can do, SQL Mesh Core. With the enterprise product, if you make a change to a column, right, you change X to X plus one, Tobical Cloud will say, oh, okay, you made a change, but only these models downstream use that column. So only those will need to be backfilled. And so Topical Cloud kind of adds to single mesh core to make it even more efficient so that you'll
Starting point is 00:31:38 save even more money by not backfilling as much. Okay. Okay. And presumably you've got quite a community around the open source product as well. well i mean do you have kind of external contributors to it or how does that kind of work yeah definitely you know we have contributors from you know many different companies in particular like harness and um uh sorry the name eludes me others others other other companies are big contributors so we definitely gain a lot from the open source interesting right so so to kind of wrap things up really um so how do people find out more about sql mesh but and also if somebody has an existing dbt sort of investment in in kind of transformations and so on how can they how can they transition
Starting point is 00:32:20 that or try that out with your product sure so we understand that lots of people are using dbt today, which is great. But if you're looking kind of to the next step, if you're having some pain with dbt, you can easily try out SQL mesh. So there's a couple of ways you can do it. One is you can use our dbt adapter. So SQL mesh is compatible with dbt. So you can just run your existing dbt project on top of the SQL Mesh engine, gaining many of the benefits that you would get. Now, we've seen that for most people with, you know, under, let's say, 500 models, it's not too difficult to lift and shift and use SQL Mesh native. And we find that if you can do that, that's definitely the best option to try out SQL Mesh, because then you can start out with a clean state and use all of SQL Mesh's native kind of properties. Additionally, with a dbt SQL Mesh hybrid
Starting point is 00:33:11 project, you can do things incrementally. So if you just want to migrate some of your SQL Mesh models to dbt, you can do that as well and have that all plain. Okay. And how did people find out more about SQL Mesh in general then? Where would they go to and how would they get their first experience really working with it? Yeah, so we have a very healthy and growing Slack community. You can join at tobicodata.com slash Slack. We've got over 3,000 people there now and we're very active in the community. And you can kind of learn from other folks who have given SQL Mesh a shot or use it in production and chat with them.
Starting point is 00:33:47 Fantastic. Well, Toby, it's been fantastic speaking to you. Really appreciate you giving us the background to the product and the thinking behind it. And best of luck with it going forward with it. Yeah, thanks so much for having me. Thank you.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.