Drill to Detail - Drill to Detail Ep.110 'Building a Sequel to SQL with Malloy' featuring Special Guests Lloyd Tabb and Carlin Eng

Starting point is 00:00:00 Somebody at Looker once said, we'll be successful when we have 1,000 true fans. That's our goal, right? We have to get it into enough places that you can use it in your day. What we found is if people have a chance to use Malloy in their day, they love it. Like, I don't think I've ever heard anybody say they love Sequel. But there is a real joy in, you know,

Starting point is 00:00:23 if you make people more powerful, they will love you for it. And that's what we're trying to do. Welcome to this very special episode of Drill to Detail, and I'm your host, Mark Whitman. So I'm joined today by returning guest Lloyd Tabb, together with Carlin Eng, to talk about their new project called Malloy. So, Lloyd, why don't you start by introducing yourself? I've been working in data for a really long time. I'm one of the Looker co-founders, the designer of the LookML language, and, yeah, been making software for a long bit, for a bit. Fantastic.

Starting point is 00:01:13 It's great to have you here, Lloyd. You're a personal kind of hero of mine, really, and certainly someone who's been so influential in my career over the last kind of, I don't know, sort of how many years it is since Looker came into my life, really. So it's great to have you back here, really. But Carlin, why don't you introduce yourself as well? Yeah, great to chat here, Mark. So my name is Carlin.

Starting point is 00:01:33 I'm a product manager here at Google. And similar to Lloyd, I've been working in data for a while, not quite as long as Lloyd, but basically my entire career, wearing a lot of different hats. So I spent some time as a data engineer, spent some time as a software engineer, spent some time in sales, and now I'm here at Google in a product pool. Okay. So, Lloyd, for anybody who doesn't recognize who you are, just give us your backstory, really. How did you end up finding Looker back in the day?

Starting point is 00:01:59 I mean, a very high level because, I mean, the story is pretty well-known. Sure. But how did you end up doing that, and how did you end up doing what you're doing now at google you know i i was looking at the wikipedia this morning picasso painted 50 150 000 images 150 000 um i have been working on databases and languages since 1987 when i wrote my first native code dbase compiler and then I've worked at Borland on databases and languages, developing languages at Borland and working there. And then I was at Netscape. And after that, I founded a whole bunch of other companies.

Starting point is 00:02:32 And every time I was in one of these companies, what I was doing was making people who were working at the companies be able to see what was going on through the data, right? So you have these internet companies. The only way to understand what's going on is through data. And my job was to figure out how to expose it to them. And then so the core of Looker was basically something I had written over and over again at startups to enable visibility into data. And I realized that my programming language background, again, that it really needed to be a language.

Starting point is 00:03:00 And then Malloy is just the next painting, actually. Interesting. Interesting. Interesting. So, Karlin, how did you end up working for Google? And what's your kind of backstory into this kind of world? Yeah, so I think the good place to start would be my time. I was a data engineer working at Strava. This was around 2015. So we were a relatively early Looker customer.

Starting point is 00:03:21 And I guess my entire career I've worked in data and I've always been really attracted to tools that have let me, I guess, provide leverage to others in the organization and just allow them to see data and just see insights in the data. And LookML immediately was obviously like a really powerful tool to me. So did a Looker implementation at Strava. And one of the stories I love to tell about that implementation was I loved Looker, uh, the VP of product at Strava absolutely loved Looker because it gave them the ability to explore data in ways that they can never do before. Um, but the data scientist, the analyst that I, that I worked with, uh, never really loved Looker as much as I did. And I always kind of struggled to get on board with it.

Starting point is 00:04:06 And while I was always trying to convince them of the magic of Looker, I could still kind of empathize with their viewpoint. And that viewpoint was really that LookML as a semantic layer, as a tool, is really helpful for the folks that are not as technical and don't know the database as well. But for the folks who know the database really well, they know the data really well. It's just another thing that they have to do. So it kind of restricts their freedom in a way.

Starting point is 00:04:33 And I got really interested in Malloy because I was such a fan of Looker. And I love the way that Lloyd had run the company, aside from the product aspect. And when I saw kind of what he was trying to build with Malloy, it looked like he was kind of addressing all of the pain points that I saw with LookML and all of the reasons that the analysts and the data scientists, I think, didn't quite love LookML as much as I did. So it was kind of the best of both worlds in my mind. Fantastic. And what's it like working with Lloyd?

Starting point is 00:04:59 I mean, it sounds fantastic, really. How did you manage to luck out on that then? It is an amazing experience. It's actually kind of a funny story. Because Malloy is an open source project, I think I heard about it through one of the many data and analytics newsletters that are going around these days. I just started playing with it.

Starting point is 00:05:20 And I got really interested in what it was. And like I said, the message really resonated with me. So I actually started writing about it on my blog. I shared the blog post that I wrote with it. And I got really interested in what it was. And like I said, the message really resonated with me. So I actually started writing about it on my blog. I shared the blog posts that I wrote with Lloyd and, or on the Malloy community Slack channel, actually, I shared it. And I think the Malloy team read it. And I think it was clear to them through the writing that I kind of understood what they were trying to build. So just kind of developed a relationship with the team and that kind of grew over time. And eventually the opportunity came up to work with Lloyd and the rest of the Malloy team.

Starting point is 00:05:53 And it was definitely an opportunity I couldn't pass up. So I've been wanting to do this interview in this episode for a long time now. And I said this to you actually before the recording that I wanted to have a little play around myself with Malloy Festival to try and get an understanding of what it is and how it works and so on. I'm fascinated by it really. But for anybody that has maybe heard of Malloy but doesn't really know what it is or maybe just outline what is Malloy? What's the high level, I suppose, pitch for the product or the project and the problem it's solving

Starting point is 00:06:24 really at this stage. So Malloy is a data programming language, the same way that LookML is a data programming language. It's designed so that, so SQL has, you know, it's 50 years old. It has no reusability. You know, if you define a calculation in one SQL query, you're going to be redefining it in another one. But Malloy, in its core, allows you to program data, basically a semantic model, much like LookML's semantic model, but in addition it has a query language, which allows you to compose queries. And the queries are much simpler than they are in SQL. At their core, they're the same, which is like a group by or project or select, right?

Starting point is 00:07:06 But the composability is that you don't have to restate all of the calculations every time you build up one of these queries. You can pipe them together. You can build very complicated things that are very easy to read and reason about. The goal of it is, you know, one of my heroes is Anders Heilsberg, and he wrote Turbo Pascal and Delphi and C Sharp. And then eventually he wrote TypeScript. And the other languages were amazing, but TypeScript has taken the world by storm. And the reason is that it's open source. And it's an open language which anybody can contribute to.

Starting point is 00:07:48 And in data, there have been a lot of languages, but we've decided that we were going to build an open source data language so that it could be used everywhere that SQL could be used. So our goal, actually, is to be able to create a programming language that can be used everywhere that SQL can be used. Now, today, it can be used with DuckDB or Postgres or BigQuery. But the goal, the design goal, is that this language will be the language that people choose to use with data.

Starting point is 00:08:12 So that's what we're trying to do. And then all the stuff that we're building is in service of that. Okay. And so what problem does it solve and for who, really? Okay. So you have to think about it differently than so the problem space today is small but it's the core of all of the places that you use sql so this the same I'll pose it back to you where where does sql get used today mark it gets used in

Starting point is 00:08:37 transformation and bi tools and um you know uh and for feeding machine learning and for feeding data science and doing this. So all of those places are the places that you would use SQL. Where is Malloy today? Not in all of those places, obviously. Where today you can explore data with it. Today you can do data transformation with it. Today you can build, you can use it as basically an analytical ORM for your applications. So like if you're building a web app, you could use Molloy as the core analytical data

Starting point is 00:09:10 or object relational model that you could use for that. So we have Python integrations and we have NPM integrations. And we've built a thing called Composer, which is kind of a data exploration tool as an example. And we've built a very good development environment that is built into Microsoft VS Code so that you basically point it, you basically configure your data and you can explore your data directly from there and build data models and queries from there.

Starting point is 00:09:37 Okay, okay. So we'll go into a lot more detail in Malloy in a second, but for Carlin, so obviously you're on this call and so far Lloyd's been talking about this being an open source project but you're obviously working for Google um what's the what is what is the status of Malloy as an open source project and a initiative within Google how does that kind of work yeah I can try to answer this question and I might ask for a little bit of clarification uh But in my mind, I think Malloy definitely falls in the category of innovation. And, you know, we're innovating, and we have an extremely ambitious goal. You know, as Lloyd stated, you know, our goal is to replace analytical

Starting point is 00:10:15 SQL. Our goal is not to build a SaaS product to sell and, you know, hit a certain amount of ARR. But, you know, one of our theses is that the state of analytics and data science is really constrained today by the tooling. So SQL is really the lingua franca of data. And it's a poor language. And it's a poor language for a lot of the reasons that Lloyd stated. So if we're able to provide analysts and data scientists with a much better tool with much better mental ergonomics, I think that has the potential to actually really grow the entire pie of data and analytics. So I think ultimately, that's our goal is to make data analytics easier, to make data analytics possible, or to make things that are possible.

Starting point is 00:10:56 Let me rephrase that, I guess, to make things that should be simple, much easier. Today, it's very difficult to do simple things in SQL. And as a result, I think analytics and data science gets a little bit of a bad rap. I think a lot of the questions that business users are asking data scientists and analysts are actually really hard for those analysts to answer. And it's not because the analysts aren't smart or good at their jobs or bright. It's literally, I think, because the ergonomics of the tool are really, really terrible. And Malloy offers a way to really expand that, grow the pie and make, just push the entire industry forward. Now, Google being one of the companies with what I would argue is one of the best data platforms in the world, stands to benefit massively by that expansion.

Starting point is 00:11:44 Okay. And it's an open source project that is, what's the license you use out of interest for the project? MIT. MIT. So it's a pretty permissive license really, isn't it? Yes, it is. Yeah. Right.

Starting point is 00:11:54 Brilliant. Okay. That's interesting. So I suppose, Lloyd, so listen to some of your, to your presentations about Malloy. You know, you talk about the fact you spoke to a lot of people and you've taken, you know, you've listened customers you've you've done your research on this really and and I suppose in a way um you know the aim of replacing SQL is a fairly big one and a fairly

Starting point is 00:12:15 kind of I suppose not divisive but certainly it's a bold bold statement so what what did you hear out in the market and what led you to do this and and I suppose in some respects why didn't you just kind of I suppose you know because obviously Looker was was sold acquired by google you could have had a lot easier life really you know so why why did you do this and what prompted you to do this and why take on what probably what is the one of the most controversial things which is to try and replace sequel you know um i i haven't had to work for a really long time. I work because I love making things. You know, I feel so fortunate that I get to do my hobby, which is making things, with a brilliant group of people.

Starting point is 00:12:55 The Malloy team is just fabulous. It's, you know, a large percentage of the, you know, some of these people I've been working with for 25 years. You know, it's not short-term relationships. It's long-term relationships. And we get, we get to make something. We, you know, we've been thinking, we thought deeply about it when we made LookML and we made some mistakes. We, you know, we,

Starting point is 00:13:18 we learned a bunch of stuff and there was great ideas that came in as we were building LookML from other people, but we had already made early decisions in LookML. And so I get sponsored to go do this work that I love to do is just fabulous from my point of view. And it's not just me being sponsored. I mean, I could go off and work alone, but I don't work alone. I work with Carlin and I work with Michael Toy and I work with Ben Porterfield, my co-founder, and many other brilliant people on my team, on our team that make Malloy. I just feel just so fortunate that I get to do that. And why the problem space?

Starting point is 00:13:54 You know, data is a really interesting problem. These are fascinating problems. They're not, you know know we solve reusability in imperative so a sql is a declarative language you kind of declare what you you kind of declare the state of things and then the computer goes and figures stuff out and um most other languages are imperative they're they're they're python is imperative you just step you know with with branching and you know step by step but but but a declarative language is kind of like state the problem and then and then state what you want the

Starting point is 00:14:29 answer to look like and you let it go figure that out for you it's a lot more like mid-journey or or you know or like chat gpt which is like let me just state the problem and you you compute the answer um without me knowing how it's doing it necessarily. And so reusability is really hard. And so Malloy is very simple. Like our semantic models can be five lines of code, right? They just state the join relationships, for example. That's a semantic model. And so the density of Malloy is much better than our prior attempts at this, and the readability is much better.

Starting point is 00:15:08 Okay. So, Lloyd, you took on Justin – Julian Hyde? Justin Hyde? Sure, Julian Hyde. Yeah, Julian. He's great. He's terrific. Who basically was the guy behind a lot of kind of, I suppose,

Starting point is 00:15:19 aspects of SQL and SQL. Basically, SQL is something that a lot of people are invested in, and it's been something that has certainly been my career for that a lot of people are invested in. And it's been something that certainly been my career for the last kind of 20, 30 years. So why did you think about replacing SQL rather than just making SQL better? So first of all, I love Julian and we have these great conversations, right? And we're actually working in very similar problem spaces where we think about things, the problems in dealing with data a lot. And he actually has taught me a lot. So I've just loved working with him. And the thing about, you know, there's a solution which is adding one more feature to SQL. And then there's a solution which is like rethinking the

Starting point is 00:16:00 problem, right? At some point, you rethink the problem. So let me, I'll just give you a basic example. So MySQL does not have arrays, for example, but Postgres does and DuckDB does and Snowflake does and BigQuery does. And every one of those implementations is very different. Like the way that they handle array ag are very different. The way they handle nested structures are different. The way they handle JSON is very very different and so what's happened is that that everybody has been adding features to sql um but not in an orderly way and so even time zones that got added to sql are different in every single sql dialect and so there isn't a standard of sql there's a strategy which is add the new feature to SQL, right? But they've done this by increasing complexity and increasing and increasing and increasing complexity.

Starting point is 00:16:56 And what Malloy's attempts to do is actually to radically simplify the problem so that you could learn. If you know basic SQL, you could learn Malloy in a few minutes. You learn Malloy, I'm sure, in a couple of hours, right? And then, but you still have all of that power so that power is folded in in a in a really simplistic way so lawyers take on arrays for example or nested queries um that are super like and and those are super powerful right but you couldn't add that feature to sql in the way yeah yeah and and um do you think do you think that the i think i've heard you say in the past that the the the the kind of assumption that all data is tabular is a limitation that we we're limited by when we use sql is is that something is is i

Starting point is 00:17:35 suppose breaking away from everything being tabular part of the motivation of this as well or part of the part of the kind of the realization that led you to this yeah so i mean the first actually that we first figured this out at looker right so you know before looker the realization that led you to this? Yeah, so I mean, actually, we first figured this out at Looker, right? So before Looker, the way that you dealt with data was the dimensional model. And the dimensional model basically says that the reusability of data is actually data fragments, right? You materialize, you have a big transactional table with a bunch of dimensions, a bunch of basically scalar attributes, and you produce these tables

Starting point is 00:18:08 that look like dimension and calculation, right? And you produce a whole bunch of those, and then you join them together as the result. And this is like the, this is the fact tables. And so there's this whole dimensional model, you know, DAX is based on it. The MDX is all based on it.

Starting point is 00:18:26 Cubing is all based on it. It's all the old technology for data. And then when we start talking about the modern data stack, there's this big wide table view of data, which is that you join everything together, right? And then you have this visualization of this wide table, and you can do calculations against this wide table. And some of them use symmetric aggregates, which allow you to do calculations against some of the sub things.

Starting point is 00:18:52 And Looker pioneered that and invented that. But it's still a wide table, right? So your mental model, first of all, the dimensional mental model is very complex, right? The big wide table is a lot simpler because you have dimensional freedom, which is that you can pick any dimension and any measure and compute the result correctly. So that's what Looker gave you, right, in LookML. And what Beloy does is it actually, instead of treating it as a wide table, you have an orders table, and the order has items in it, and the order has the user in it. And the items each have a pointer to a product and a pointer to a item as it was in inventory and right so there's a tree there it's like a it's a tree and you can hang calculations anywhere within

Starting point is 00:19:34 that tree or you can make calculations anywhere within that tree but you don't lose the tree and the reason that it's not losing the tree is important is that when you're querying against something like a nested data set, like a JSON data object, you can use the same tree logic to query against any data without modeling it. So what Malloy lets you do is query against, like, you can just have an event log and you can query at it directly without having to do any data modeling. And so, so that's the, that's the benefit of having data in a tree versus having data in a big wide table. Okay. So I imagine that anybody listening in now is thinking,

Starting point is 00:20:14 yeah, I can't get this. I get the kind of the motivations and the drivers, but what does, what does programming or querying or modeling in Malloy look like really? Now you've obviously got, you've got the VS code extension. You've got the kind of thing with the cloud IDE but what does what if you can maybe I suppose in a way walk through what a what a what a kind of a session might look like using Malloy

Starting point is 00:20:35 and how the modeling bit goes first and the querying bit comes along and and try and paint the picture really of how how how a a querying a modeling session with maloy would go what would it look like yeah well the the the hard problem how do i how do you teach the world a new programming language like this is a really hard problem mark right like we and we've spent a lot of time trying to figure out how we're going to teach the world a new way to program um especially with data so if you go to the maloy's documentation this is the this is the So if you go to the Molloy's documentation, this is the experience. You can go to Molloy's documentation and it will show you examples

Starting point is 00:21:11 of how to do histograms and how to do time zone comparisons and how to do cohort analysis. And so all of those analysis are we're building indoor documentation. And then on any one of those pages, if you hit open in VS Code, that document will open as a notebook in VS Code

Starting point is 00:21:29 and you can play with it and you can change the code and you can learn it. So the experience is, basically what you do is you open VS Code and you can start writing data models and then you can start writing queries within your model and there's a little run button above every query that you can click on, and it will run. And you can also do notebooks, which are like Jupyter notebooks, where you can

Starting point is 00:21:54 have a Malloy model, and then you can write queries in each of the cells, and you can put Markdown in it too. And so that's what you open when you open it up. Yeah, I was just going to say here that I think one of the things that makes modeling in Malloy really special for me is just how integrated the entire experience is. So one of my reflections on using LookML for a long time was it was really split across many different tools when you're doing your data modeling. So you would initially do the exploratory data analysis in SQL using some kind of like direct query IDE, either like, you know, maybe the Snowflake web UI or the BigQuery web UI. And then you would jump into Looker and you would kind of use the Looker web UI to write your LookML models. And then to test that LookML

Starting point is 00:22:36 model, you would then jump into the Looker explore page. So you're kind of split across three different surfaces when you're actually doing the modeling. When really you're doing one task, you're doing data modeling. In Malloy, all of that can happen directly within the VS Code IDE. You can write queries in your IDE. You can explore the data. You can look at different cuts in the data. And then instantly you can kind of do a copy paste directly within that same VS code window into your model and then build upon that model.

Starting point is 00:23:06 So it's kind of like a much faster feedback loop, much more iterative process. And I think as a result makes it a lot easier for you as a data model or to kind of get into a flow state. In my mind, data modeling is like a very cognitively complex task and it's really benefited by just smooth interaction and getting into that flow state and being able to iterate over stuff very quickly.

Starting point is 00:23:26 So that was part of the magic for me. Okay. Okay. I mean, what about, I mean, I suppose my first, my most significant impression of using Malloy was how it, so going into one of your examples, for example, on using VS Code or using the Cloud IDE. What sort of struck me was in Looker, you would define your semantic model in LookML, okay, and then you would query it using the front end, for example, or you might go via the API, or I think recently you can go via JDBC and so on.

Starting point is 00:24:01 But certainly the modeling and the querying are two separate kind of, I suppose, scripts and through two separate engines and so on there. Whereas with Malloy, you know, a Malloy sort of like session, for example, or a query, it has the kind of, I suppose you define the data model at the start. So you'd go in there and you'd say, what are the dimensions? What are your tables you're getting data from? So the modeling bit is there.

Starting point is 00:24:25 And then the actual querying of the model comes along afterwards. So you've got the same language for querying and for modeling. But I mean, are you defining the models on the fly each time? And is that the correct impression of it, that you're both modeling and querying in the same language? Is that correct, Lloyd? Yeah, it is. And why?

Starting point is 00:24:48 And what's the point of that? And what's the benefit of that? Sure. Well, once you have a semantic model with queries in it, you can ask for the data at a very high level, right? So here's a code representation of give me this data back, and you can have a query and then add filters to it after the fact or add limitations to it after the fact.

Starting point is 00:25:08 So from an API point of view, it's very simple. It's just like, give me the… Just to jump in there, the thing that really struck me with that was when you had the example of querying the GA4 data with the nesting and so on, where the model was easy to define and the actual querying was easy to do as well. Right. And so, like I said, it's a radical simplification.

Starting point is 00:25:29 If you've ever tried to query against GA360 data, it's tough. And Malloy just makes it terrible. But the other thing is that in other BI tools, the query is stored in the BI tool, not in the language. And the problem is that if the model changes, you might break something that's separate from it. So if the only thing that the BI tool is using is the name of the query,

Starting point is 00:25:52 then it's going to always get the data back in a reasonable structure. But if the semantic model changes from beneath the query, you end up with this validation problem where you have to go through and check all of the existing queries that you know about and make sure that the semantic model still supports it. And if it's in the language, you'll get a compiler error when you try to compile that query.

Starting point is 00:26:14 And so having the query along with the semantic model makes for much tighter um uh code quality okay but does that mean that each person who's doing who's going to query some data has to build their own model along with that and is that not a danger then oh no right okay explain how that works then oh no no no no no so you can import a model and then query against it right so right i mean so like the like the ga360 thing you can just you can you can start by just saying import GA360 and then start writing your queries without having to know much about the structure of that. And it has like an inheritance thing. You can inherit from it and extend it and refine it. We call them extending and refining.

Starting point is 00:27:00 But you can extend it so that you can start with a base model and then do an analysis that adds to it. And the whole thing can be stored in the same repository so that it can be validated. And so that if anybody ever made a change, then your code would get checked also. Right. So you'd bring in the models via, via imports and for example,

Starting point is 00:27:18 that sort of thing. But, but certainly every, every query needs to have a model to work off of, off of, doesn't it though? One, one thing I want to add here is you don't actually need to have a model defined up front to start querying.

Starting point is 00:27:30 The simplest Malloy model is just a table. So you can specify, hey, this is a source that's on a table. And at that point, you're basically in a land that's similar to SQL where you have a bunch of tables. So one of the things I love about Malloy is the ability to kind of progressively model your data set and develop your model from the ground up. Right. Okay. That's interesting. And that's an interesting topic because I suppose you've got Colin Zima,

Starting point is 00:27:56 who went and formed Omni, who obviously was very involved with Looker, and we've got yourselves with Malloy. And it's kind of interesting to sort of, it's interesting to sort of look at the different takes on what would be the better, what Looker could be if it was done again or certainly some of the concepts

Starting point is 00:28:14 or the learnings from that really. I mean, the thing you just said there about progressively building up the model, did you think about different ways of doing that and maybe just on the fly, what's your kind of view on what Omni have done with their approach? You know, we're focused on the language at the core and i think that omni is focused on the the the the person building using a visual tool to build something and and and then be able you know and empowering that person to create something that is usable beyond their beyond them.

Starting point is 00:28:45 Right. So I, you know, I'm listen, I worked at Borland. I've been writing programming languages for years. This is obviously my approach. Right. And Colin's approach is also really valuable, too. I mean, I love working with Colin and all of those people at Omni. Those are my favorite, some of my favorite folks. And we're just're just tacking the problems from from slightly different angles um uh i feel like i'm playing the long game or melo is the long game it's which is like if we're successful we we've we've changed the whole stack right that's that's if we're successful there's a whole new stack i don't know if we're going to be successful i'd like us to be successful but that's what we're trying to do.

Starting point is 00:29:25 Okay, so Malloy currently compiles down to SQL, doesn't it? How does that work? And is that an intermediate sort of step for you? Or how does that work? And what's your vision for where that's going in the future? It does. So every single Malloy query compiles to what, what every single Malloy query compiles to a single SQL query. Um, and there's a really good reason for this, which is that means

Starting point is 00:29:52 that any place that SQL is being used, you can use Malloy instead. So it's a really, it's a, it's a core tenant of, of what we're doing. Um, um, so, uh, the, the, the, yeah, the, so the purpose is to make sure that we, that, that, that, that what we're doing can fit into the architectures that come with it. So that's the, I'm sorry, there's another part of your question, Mark, and I just spaced it. Yeah, it was, it was, I suppose, is, is compiling to SQL the end game for you on this? Or is replacing SQL the end game, do you think? So, you know, there are... One, Malloy is very efficient because of what it does. There could be... You could definitely build a database that spoke native Malloy.

Starting point is 00:30:39 That's not in the cards for us right now, right? I'm not... That's not what we're planning on building where we, you know, speaking SQL allows us to run in all the places that SQL runs. So that's obviously the first thing that we should do, right? So in order for this to work, you know, Malloy has to be available in all the places that you type SQL and we're not there, right? In order for it to work, it has to run when you have to be able to have the choice and we don't have that choice yet. So we're still very early. So what about data pipelines?

Starting point is 00:31:08 So we've been thinking about the concept of queries and analysts and so on. But is this something that you envisage as being used to build more complex data pipelines, really? Sure. So we've introduced notebooks a while ago, which are great. So basically, in a notebook, you can define a Markdown section, or you can define a Malloy section, which has some Malloy in it, the output of which will be like a graph or something. But we've also introduced SQL into these notebooks, where you can have the SQL have embedded Malloy in it, which is so follow me for a second. So you have a Malloy query that defines a projection that you're very interested in. So it's you're grouping by product and date and you have a whole bunch of measures around that.

Starting point is 00:32:00 And you want to give that to somebody who's using Tableau. So you you write this you write on your semantic model who's using Tableau. So you write this, you write all your semantic model in Malloy, right? So that has all your calculations in it. And then in your notebook, you add a create table as select, and then you give it a Malloy query. That Malloy query, when it hits that Malloy SQL block and it realizes that there's Malloy in there, it compiles that Malloy to SQL and then runs the statement, which creates the table for you or creates a view for you. And so we can do a lot of the things. So it's a very simplistic DBT-like functionality that we do today, which is

Starting point is 00:32:37 that you can build transformation notebooks that will take data in one state and produce it in another state for use in other tools. So for data pipelines, you can already use. Okay. Okay. So I suppose if you're going to take on and replace SQL, you might as well take on and replace dbt as well in some respects. And so is dbt, is dbt also something that's in your kind of,

Starting point is 00:32:57 in your focus really is something that could be improved on really with what you're doing. I would rather they took Malloy and embedded it. I would rather, I don't want to be in the business of replacing tools. I would rather, I would rather the tool is open. The language is open architecture.

Starting point is 00:33:15 And there are a number of people who are actually embedding Malloy into their, into their tooling, right? Which is what we want to happen, right? So I would rather they pick up Malloy and embed it than have me do it. And that's one of Carlin's jobs.

Starting point is 00:33:32 Yeah, Carlin. So you're attracted by the open source nature of this, and there's no more famous open source community in this area than the dbt community. So what would you say to them as you walked into a bar full of people with dbt T-shirts on? Yeah, I mean, dbt is a tool that's built around SQL. And there's no reason that dbt can't adopt other languages.

Starting point is 00:33:52 So you see already that today dbt supports Python. So dbt could definitely support Malloy as a language. Now, I think Malloy has a huge opportunity in the data transformation space, because as dbt has proved out that more and more people want to do their data transformations with an infinitely scalable compute layer like the data warehouse. So that means a lot of these transformations end up getting pushed into the data warehouse using SQL. Now, as we've discussed many, many times already on this podcast, one of the really poor problems or one of the things that SQL is really bad at is reusability. So we have this whole discipline called data engineering. And people like to talk about how data engineering is like software engineering. But data engineering is really missing one of the most fundamental components of software

Starting point is 00:34:38 engineering. And that's basic reusability. So when you're writing software, you can write functions, you can write classes that get reused elsewhere. When you're writing SQL queries, that doesn't exist. And that's really what Malloy provides. So I think the opportunity to bring one of the kind of most fundamental units or core concepts of abstraction to this discipline that's been lacking it for many years, I think is really where I see the opportunity. Okay. Okay. So, I mean, I suppose digging into some of the more sort of details of Malloy. So things like joins, how does Malloy handle joins? And how does it maybe make it easier to join data together when people get confused currently with inner joins, outer joins, all these kinds of things, really? So how does Malloy handle joins and how does it make that easier to work with? Yeah.

Starting point is 00:35:29 So, well, first of all, it makes it so that you really only have to join once, right? So in every SQL query, you're repeating the join pattern, so it gets very complicated, right? But in Malloy, you can declare your joins exactly one time. And Malloy's joins are much more simple to reason about. We have three joins. We have join one, which is where you're doing a many to one join. So think of you have an order and the order has a user. That's a join one.

Starting point is 00:36:02 And then you might have a join many, which is I'm starting with users, and each user has many orders. So if I was in the users table and I was joining orders, that would be a join many. And then there's a cross join, which is a matrix. But that's it. Those are the joins. If you want to do an inner join,

Starting point is 00:36:23 you can add a where clause to limit the thing that you joined. And other than that, you don't really need to know too much else. So again, we thought about it, we looked at it, we said, you know, these are really the patterns. We don't need all this other stuff that nobody really understands anyway. Let's do it in a way that people can understand it. What about complex calculations that people always struggle with things like time time

Starting point is 00:36:47 comparison calculations and percent of totals and that sort of thing how do you how do you make those easier yeah so um uh you know one of the really nice things that tableau invented in 2017 was the level of detail calculations these are basically so that when you're in you're within a table and you can you're you grouping, you can exclude the grouping. So the really simple thing is if you're grouping by, say you're doing a query that's grouping by state and you're counting the number of airports. So your table looks like state and airports. State and airport count, which is the count of airports. In Malloy, there's an all function, which basically escapes all of the grouping

Starting point is 00:37:30 so that you get a column which would have just the total. But the beautiful thing about that is that you can use it within a calculation. So you can have airports count over all of airports count and get the percent of total. It becomes very easy to write these kinds of calculations. And Malloy has a very sophisticated level of detail calculations. So this is something that's very difficult to write in SQL. It's a lot like writing a cube, like a data cube. Most human beings can't write them.

Starting point is 00:37:58 But Malloy builds it in so that it becomes easy for you. That's an example of one type of thing that we do. Nesting is the other thing that we do that is really hard to do in SQL, but just is absolutely trivial in Molloy. And it's kind of the thing that's like, it's like describing, I don't know, kind of like describing sex. You can't do it. You have to look at it, play with it.

Starting point is 00:38:20 I can't do it justice. Like I can sit here all day trying to talk about it, but really play with Molloy or you won't really understand it, right? Yeah, yeah. That's exactly what I did, actually. And something that struck me when I was playing around with Malloy was the fact you've got some data visualization features in there as well. I mean, how does that work?

Starting point is 00:38:39 And what was the purpose of that? And, yeah, just talk about those features. Yeah, so one of the things that the mistakes that everybody makes is that they tie their visualization layer to their semantic modeling layer. And really what you want to be able to do is you want to be able to declare the calculations and the structure of the data. And then when you use it, there are lots of consumers of this semantic model. So one of the things that is consuming, it might be a rendering library to do visualizations. And so what Malloy allows you to do is to tag anything that's got a name you can hang a tag on.

Starting point is 00:39:12 And so if you have a query named query and it looks like it's bar chart, which means it has a dimension and a measure, you can tag it with a bar chart. And then when it gets it, instead of showing it like a table, it'll show it like a bar chart. And then Malloy, because of all the nesting you can have nested line charts and bar charts and lists and and all that you have to do is just drop a little uh hashtag like a you know hashtag bar chart or hashtag line chart in it and then it works that way but it also works for other systems too like our doc system uses it for deciding how to render the page or uh you could have machine learning use it for

Starting point is 00:39:49 figuring out what are the other labels of things so the the the tag architecture is open architecture but it uh but but it's a separate layer from the the semantic layer again a question people might be asking or thinking when they're when listening this is, well, what actually is Malloy? Is it a server application that we talk to? Is it something we download? I mean, what actually is Malloy as in the actual thing you kind of interact with? Is it a program? What is it really?

Starting point is 00:40:18 And how do they get hold of it? Yeah, great question. So Malloy today exists in what I would say three forms. The first form, and I think the most important, is the language itself. So Molloy is a programming language. It's a syntax, semantics attached to that syntax, and it's a set of tools that will compile the Molloy

Starting point is 00:40:44 that you've written into SQL that runs against a database. The second and third parts are kind of interlinked, but there's also a development environment, which is today our VS Code extension. And then the third part is the various Molloy runtimes, and that's kind of where does Molloy code actually execute, how do you actually use it in your day-to-day.

Starting point is 00:41:02 So those are kind of the three forms. How do people use it today? You know, the actual interaction is most often going to happen with VS Code. VS Code is kind of our combined development environment and runtime. But we are actively developing new runtimes that will allow people to execute Malloy code in different contexts. So for example, we have an MPM package, and you can import the MPM package and have the Molloy query compiler service available to you. And you can also have our various database connectors to allow your application to directly speak Molloy and use this Molloy package to interface with the data warehouse. We also have a Python package that will allow some people with Python applications to do something very similar.

Starting point is 00:41:50 And then we're in the process of developing a command line interface tool. So you can imagine setting up a scheduler to execute Molloy queries via our CLI tool. So you might give it a file. That file contains some Molloy queries, and those queries end up getting executed on your database. The command line is how you'd, for example, do a transformation pipeline. Okay. Okay, so if somebody was a dbt and Looker developer at the moment,

Starting point is 00:42:17 where would it fit into their workflow, and how would it interact with or complement or work with Looker, really, for example? So is it is it is it more the dbt side or or what really in this in this in this scenario well the the the the transformation pipeline is great for building a set of views in your database that other people could consume so that's the first thing like the the very first thing is you go and you build a model you um you you want to publish govern data to other tooling. You can create views for other tools to consume.

Starting point is 00:42:49 And you just go into the VS Code. You build a notebook and do that. So that's the first easiest thing that you could do with almost no effort. But there are other things to do. I'll leave it to Carl. Yeah, so I think in the scenario that we're outlining here, it does fill a little bit more of the DBT side of the equation in terms of transformation.

Starting point is 00:43:10 I think ultimately our goal is to cover or to use Malloy for the whole spectrum. Again, we want to replace analytical SQL, and that's kind of anywhere where you run SQL. That could be DBT. That could be within your BI tool. Instead of running SQL in those places, Molloy should be the language. Now, what we have seen some use cases, some really

Starting point is 00:43:30 compelling use cases in the wild, where people are writing their transformation pipelines in Molloy, and it is much easier to maintain much more concise for several reasons. The main one being the reusability. So like, you're building pipelines, and oftentimes in those pipelines, you need the same calculation run over and over again. And in a SQL pipeline, that means the data engineer has to write that calculation in many different places and has to maintain that calculation in many different places. Whereas Malloy, that calculation gets placed in one part of the model. And therefore, there's fewer places to maintain, fewer places to break. The other thing that I think is maybe a little underappreciated

Starting point is 00:44:12 is the syntax of Malloy is just much simpler and much nicer, much more compact than SQL. And I think that's something that's easy to maybe just say, yeah, it's incrementally better. But I think having much better syntax has pretty profound implications on what the analysts or what the data engineers choose to express. So one of my favorite quotes in the world is, you know, we shape our tools, then our tools shape us. And I think, you know,

Starting point is 00:44:38 initially, we shaped SQL in the 1970s. And we've, you know, the analysis that we've chosen to do has been really shaped by kind of the language design choices, which, you know, that Lloyd has this joke about SQL, about how it's like, there's two write only languages in the world. One of them is Perl, the other SQL. And, you know, it's really easy to write SQL, but then you come back six months later, and you look at the SQL query that you've written and you think to yourself, you know, what the hell is this? Wait,

Starting point is 00:45:09 what idiot wrote this? And you look at the git command. It's like, Oh, that was me six months ago. So, so if, so I suppose we start from my side,

Starting point is 00:45:16 if this was anybody else other than Lloyd doing this, I'd worry this was going to be like the Esperanto of, of kind of query languages in that it would be the, I suppose the kind of the, in that it would be the i suppose the kind of the i suppose intellectually very interesting but but used only by a sort of a small bunch of cranks sort of thing but it's but but with with with this you know you've always put some thought behind it and and it's solving a real problem how are you going to how the two of you going to make this actually kind of like resonate and and kind of have an impact really

Starting point is 00:45:45 because i'm sure you're not doing this for the sake of it really so so what's your strategy really for really getting take off with this and and you know what would be what would be a i suppose a a goal for you really or a sign of success for you really somebody at looker once said we'll be successful when we have a thousand true fans i that. That's our goal. We have to get it into enough places that you can use it in your day. What we found is if people have a chance to use Malloy in their day, they love it. I don't think I've ever heard anybody say they love SQL. But there is a real joy in making – if you make people more powerful, they will love you for it. And that's what we're trying to do, right?

Starting point is 00:46:27 If we make you more powerful, you can do more by using this tool, then you'll be successful. And the thing that we have to do is make it so that it's easy for you to adopt. It's easy for you to learn. And so the goal is if you know SQL, you should learn Malloy really fast. Like basic Malloy should come to you very quickly. And, and, and we've been relentless about it.

Starting point is 00:46:49 We're, we're actually on the fourth iteration of the language, if you can believe it. We're like, this is the fourth. We're like, we've, we've reached,

Starting point is 00:46:58 we've restructured the syntax of the language four times so that it's more readable. And we keep going back and seeing, can you read it? And so we're getting there. I think we're at the precipice of getting in. And so we're trying, Mark. I don't know that we'll be successful, but we're trying. I can see a goal for this would be people learn this.

Starting point is 00:47:19 If someone new coming into our industry or maybe learning this at college or whatever, they learn this rather than learning SQL. That for me would be, when that happens, you've kind of done your job really, haven't you? Because I suppose you've opened up this topic to a lot more people. There's a lot more of an understandable and easy to kind of like learn language. And they learn this and don't learn SQL. Would that be a goal, do you think? I mean, or certainly a kind of...

Starting point is 00:47:43 That'd be lovely. lovely. And it's reasonable too. I mean, we used to teach assembly language in college and then we started teaching C, right? Because C had the right reusability abstractions and portability that assembly language didn't have. And I think we're kind of in the same state with SQL today. So I'm hoping we can... Yes, that's a vision SQL today. So I'm hoping we can. Yes, that's a vision. Yes.

Starting point is 00:48:10 So, Carlin, for you, so you're on the product side and you're looking after the open source project or certainly involved in that. So what's the kind of roadmap for Malloy going forward then, really? And what do you hope to achieve with the open source project? Yeah, I mean, I think Lloyd said it really well. If in five years, ten years, I think Lloyd said it really well. If in five years, 10 years, I think one other thing to add there is we're under no illusion that this is going to happen overnight. I think we take a lot of inspiration from the TypeScript project. TypeScript is 12

Starting point is 00:48:36 years old. I think it's 12 years old at this point. You can backtrack me on that. But it's only within the past few years has it really started getting mainstream adoption. So, you know, language adoption takes a long time and it takes kind of like just relentless hand to hand work where we need to step in and teach people and show people the way. So, you know, it's there's no hack here. There's no kind of there's no illusion that we're going to wake up overnight and everybody's going to see things our way. So a large part of our focus in terms of Malloy, in terms of language, there's a lot of language functionality that we are still really excited about that hasn't really been. Yeah, yeah.

Starting point is 00:49:17 So what's coming in the short-term roadmap, really? What's on the short-term roadmap for Malloy? Yeah, so I think one of the big things that we're excited about is parameters. So parameterizing queries. I think there's also some functionality around – okay, so let me step back a little bit. I think one of the other things that Lloyd mentioned here is that people love us when they can use us in their day-to-day.

Starting point is 00:49:45 And I think the focus right now on the Malloy team is figuring out that runtime aspect. Because right now, you can run Malloy in your VS Code environment. And it's really easy for you to write queries and share data. You get a really great feel for the language. And you can experience the joy of Malloy in your VS Code environment. But I think one thing that is not really where it needs to be yet is that runtime environment where people can extend that personal experience that they had to the rest of the organization. So you can now, like, so giving people the ability to run Malloy in more places, getting people the ability to run Malloy

Starting point is 00:50:21 as part of their day job. So, you know, that CLI tool that I mentioned is sort of the V0 of that, of like, hey, I can actually take these queries that I've run, and I can materialize assets in my database using, you know, my semantic model to create tables that other tools in my ecosystem can consume. So that's kind of where our efforts are focused in the short term. So primarily on kind of the language front, there's a lot of features and functionality that we're really excited about. And then on the runtime front.

Starting point is 00:50:54 Okay. Okay. Fantastic. And one last question for me, the name Malloy, where does that come from? Just out of interest. So if you Google the urban dictionary,

Starting point is 00:51:03 let me go do that here from Aloy. I'll see if I'll read it to you. It's essentially a man who plays by nobody's rules, has nothing to lose, but always gets results. While his superiors are often infuriated by his general disregard for established or authoritative norms in his given career, his success rate ultimately appeases their anger. I basically have real authority problems. So I kind of, so most of the people on our team, we kind of like, we're, we kind of like to see it our way and we're going to go for it. And it's a little unorthodox, but that's how we're going to get results boring if they go to www.malloydata.dev they'll get our web page and the best

Starting point is 00:51:51 thing to do is just to play with our documentation and see the use cases there that's the easiest way to learn it is to go open the documentation and then click on any of the vs code links and it'll take you to uh it'll let you play with it. Fantastic. And, Carly, what about the Slack community? Yeah, the Slack community is probably the best place to showcase what you've built, ask questions, give feedback on the language. Lloyd and I are both pretty active on the Slack channel. I think at Looker, one of the things that I loved about the company, I never worked there, but I was a very loyal customer.

Starting point is 00:52:22 It was kind of a kitchen table attitude where the kitchen table is where people go to learn it's where it's just the community of people where no question is stupid everybody should feel free and safe to ask questions about how to how to do something and I think the slack online community is kind of our equivalent of that kitchen table fantastic that's really good well thank you very much, both of you. It's been a great conversation and yeah, I've certainly been very impressed with what I've seen so far with Molloy. So best of luck and thank you very much for coming on the show.

Starting point is 00:52:53 Thanks, Mark. Thank you, Mark.

Drill to Detail - Drill to Detail Ep.110 'Building a Sequel to SQL with Malloy' featuring Special Guests Lloyd Tabb and Carlin Eng

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.