Drill to Detail - Drill to Detail Ep.110 'Building a Sequel to SQL with Malloy' featuring Special Guests Lloyd Tabb and Carlin Eng
Episode Date: August 14, 2023Mark Rittman is joined in this special episode by previous Looker founder and inventor of the LookML language Lloyd Tabb together with Carlin Eng, Product Manager at Google Cloud to talk about Malloy,... a new query language to replace SQL for analytics.Drill to Detail Ep.60 'A Deeper Look Into Looker' With Special Guest Lloyd TabbMalloy : An experimental language for dataA Sequel to SQL? An introduction to MalloyWhy SQL syntax sucks, and why it mattersComposing with QueriesDimensional Flexibility, one of the things that makes Malloy SpecialData is Rectangular and other Limiting MisconceptionsMalloy's Near Term Roadmap
Transcript
Discussion (0)
Somebody at Looker once said,
we'll be successful when we have 1,000 true fans.
That's our goal, right?
We have to get it into enough places that you can use it in your day.
What we found is if people have a chance to use Malloy in their day,
they love it.
Like, I don't think I've ever heard anybody say they love Sequel.
But there is a real joy in, you know,
if you make people more powerful, they will love you for it.
And that's what we're trying to do. Welcome to this very special episode of Drill to Detail, and I'm your host, Mark Whitman.
So I'm joined today by returning guest Lloyd Tabb, together with Carlin Eng, to talk about their new project called Malloy.
So, Lloyd, why don't you start by introducing yourself?
I've been working in data for a really long time.
I'm one of the Looker co-founders, the designer of the LookML language,
and, yeah, been making software for a long bit, for a bit.
Fantastic.
It's great to have you here, Lloyd.
You're a personal kind of hero of mine, really,
and certainly someone who's been so influential in my career over the last kind of,
I don't know, sort of how many years it is since Looker came into my life, really.
So it's great to have you back here, really.
But Carlin, why don't you introduce yourself as well?
Yeah, great to chat here, Mark.
So my name is Carlin.
I'm a product manager here at Google.
And similar to Lloyd, I've been working in data for a while, not quite as long as Lloyd,
but basically my entire career, wearing a lot of different hats.
So I spent some time as a data engineer, spent some time as a software engineer,
spent some time in sales, and now I'm here at Google in a product pool.
Okay.
So, Lloyd, for anybody who doesn't recognize who you are, just give us your backstory, really.
How did you end up finding Looker back in the day?
I mean, a very high level because, I mean, the story is pretty well-known.
Sure.
But how did you end up doing that, and how did you end up doing what you're doing now at google you know i i was looking at the wikipedia this morning picasso
painted 50 150 000 images 150 000 um i have been working on databases and languages since 1987
when i wrote my first native code dbase compiler and then I've worked at Borland on databases and languages,
developing languages at Borland and working there.
And then I was at Netscape.
And after that, I founded a whole bunch of other companies.
And every time I was in one of these companies,
what I was doing was making people who were working at the companies
be able to see what was going on through the data, right?
So you have these internet companies.
The only way to understand what's going on is through data.
And my job was to figure out how to expose it to them.
And then so the core of Looker was basically something I had written over and over again at startups to enable visibility into data.
And I realized that my programming language background, again, that it really needed to be a language.
And then Malloy is just the next painting, actually.
Interesting. Interesting. Interesting.
So, Karlin, how did you end up working for Google?
And what's your kind of backstory into this kind of world?
Yeah, so I think the good place to start would be my time.
I was a data engineer working at Strava.
This was around 2015.
So we were a relatively early Looker customer.
And I guess my entire career I've worked in data and I've always
been really attracted to tools that have let me, I guess, provide leverage to others in the
organization and just allow them to see data and just see insights in the data. And LookML
immediately was obviously like a really powerful tool to me. So did a Looker implementation at Strava. And one of the stories
I love to tell about that implementation was I loved Looker, uh, the VP of product at Strava
absolutely loved Looker because it gave them the ability to explore data in ways that they can
never do before. Um, but the data scientist, the analyst that I, that I worked with, uh,
never really loved Looker as much as I did. And I always kind of struggled to get on board with it.
And while I was always trying to convince them of the magic of Looker,
I could still kind of empathize with their viewpoint.
And that viewpoint was really that LookML as a semantic layer,
as a tool, is really helpful for the folks that are not as technical
and don't know the database as well.
But for the folks who know the database really well, they know the data really well.
It's just another thing that they have to do.
So it kind of restricts their freedom in a way.
And I got really interested in Malloy because I was such a fan of Looker.
And I love the way that Lloyd had run the company, aside from the product aspect.
And when I saw kind of what he was trying to build with Malloy,
it looked like he was kind of addressing all of the pain points that I saw
with LookML and all of the reasons that the analysts and the data scientists,
I think, didn't quite love LookML as much as I did.
So it was kind of the best of both worlds in my mind.
Fantastic. And what's it like working with Lloyd?
I mean, it sounds fantastic, really.
How did you manage to luck out on that then?
It is an amazing experience.
It's actually kind of a funny story.
Because Malloy is an open source project,
I think I heard about it through one of the many data and analytics newsletters
that are going around these days.
I just started playing with it.
And I got really interested in what it was.
And like I said, the message really resonated with me.
So I actually started writing about it on my blog. I shared the blog post that I wrote with it. And I got really interested in what it was. And like I said, the message really resonated with me. So I actually started writing about it on my blog. I shared the blog posts that I wrote
with Lloyd and, or on the Malloy community Slack channel, actually, I shared it. And I think the
Malloy team read it. And I think it was clear to them through the writing that I kind of understood
what they were trying to build. So just kind of developed a relationship with the team and that kind of grew over time.
And eventually the opportunity came up to work with Lloyd and the rest of the Malloy
team.
And it was definitely an opportunity I couldn't pass up.
So I've been wanting to do this interview in this episode for a long time now.
And I said this to you actually before the recording that I wanted to have a little play
around myself with Malloy Festival to try and get an understanding of what it is and how it works and so on.
I'm fascinated by it really. But for anybody that
has maybe heard of Malloy but doesn't really know what it is or
maybe just outline what is Malloy? What's the high level, I suppose,
pitch for the product or the project and the problem it's solving
really at this stage.
So Malloy is a data programming language, the same way that LookML is a data programming language.
It's designed so that, so SQL has, you know, it's 50 years old. It has no reusability.
You know, if you define a calculation in one SQL query, you're going to be redefining it in another one. But Malloy, in its core, allows you to program data,
basically a semantic model, much like LookML's semantic model,
but in addition it has a query language, which allows you to compose queries.
And the queries are much simpler than they are in SQL.
At their core, they're the same, which is like a group by or project or select, right?
But the composability is that you don't have to restate all of the calculations every time you build up one of these queries.
You can pipe them together.
You can build very complicated things that are very easy to read and reason about.
The goal of it is, you know, one of my heroes is Anders Heilsberg, and he wrote Turbo Pascal and Delphi and C Sharp.
And then eventually he wrote TypeScript.
And the other languages were amazing, but TypeScript has taken the world by storm.
And the reason is that it's open source.
And it's an open language which anybody can contribute to.
And in data, there have been a lot of languages,
but we've decided that we were going to build an open source data language so that it could be used everywhere that SQL could be used.
So our goal, actually, is to be able to create a programming language
that can be used everywhere that SQL can be used.
Now, today, it can be used with DuckDB or Postgres or BigQuery.
But the goal, the design goal,
is that this language will be the language
that people choose to use with data.
So that's what we're trying to do.
And then all the stuff that we're building
is in service of that.
Okay.
And so what problem does it solve and for who, really?
Okay.
So you have to think about it differently than so the problem space today is small but it's the core of all of the places that you use sql so
this the same I'll pose it back to you where where does sql get used today mark it gets used in
transformation and bi tools and um you know uh and for feeding machine learning and for feeding data science and doing this.
So all of those places are the places that you would use SQL.
Where is Malloy today?
Not in all of those places, obviously.
Where today you can explore data with it.
Today you can do data transformation with it.
Today you can build, you can use it as basically an analytical ORM for your applications.
So like if you're building a web app, you could use Molloy as the core analytical data
or object relational model that you could use for that.
So we have Python integrations and we have NPM integrations.
And we've built a thing called Composer, which is kind of a data exploration tool as an example.
And we've built a very good development environment that is built into Microsoft VS Code
so that you basically point it,
you basically configure your data
and you can explore your data directly from there
and build data models and queries from there.
Okay, okay.
So we'll go into a lot more detail in Malloy in a second,
but for Carlin, so obviously you're on this call
and so far Lloyd's been talking about this being an open source project but you're obviously working
for Google um what's the what is what is the status of Malloy as an open source project and
a initiative within Google how does that kind of work yeah I can try to answer this question and I
might ask for a little bit of clarification uh But in my mind, I think Malloy definitely falls in the category of innovation. And, you know, we're innovating, and we have an
extremely ambitious goal. You know, as Lloyd stated, you know, our goal is to replace analytical
SQL. Our goal is not to build a SaaS product to sell and, you know, hit a certain amount of ARR.
But, you know, one of our theses is that the state of
analytics and data science is really constrained today by the tooling. So SQL is really the
lingua franca of data. And it's a poor language. And it's a poor language for a lot of the reasons
that Lloyd stated. So if we're able to provide analysts and data scientists with a much better
tool with much better mental ergonomics, I think that has the potential to actually really grow
the entire pie of data and analytics. So I think ultimately, that's our goal is to make data analytics easier,
to make data analytics possible, or to make things that are possible.
Let me rephrase that, I guess, to make things that should be simple, much easier. Today,
it's very difficult to do simple things in SQL. And as a result,
I think analytics and data science gets a little bit of a bad rap. I think a lot of the questions
that business users are asking data scientists and analysts are actually really hard for those
analysts to answer. And it's not because the analysts aren't smart or good at their jobs or
bright. It's literally, I think, because the ergonomics of the tool are really, really terrible. And Malloy offers a way to really expand that, grow the pie and make, just push the entire industry forward.
Now, Google being one of the companies with what I would argue is one of the best data platforms
in the world, stands to benefit massively by that expansion.
Okay. And it's an open source project that is,
what's the license you use out of interest for the project?
MIT.
MIT.
So it's a pretty permissive license really, isn't it?
Yes, it is.
Yeah.
Right.
Brilliant.
Okay.
That's interesting.
So I suppose, Lloyd, so listen to some of your,
to your presentations about Malloy.
You know, you talk about the fact you spoke to a lot of people
and you've taken, you know, you've listened customers you've you've done your research on this really
and and I suppose in a way um you know the aim of replacing SQL is a fairly big one and a fairly
kind of I suppose not divisive but certainly it's a bold bold statement so what what did you hear
out in the market and what led you to do this and and I suppose in some respects why didn't you just
kind of I suppose you know because obviously Looker was was sold acquired by google you could
have had a lot easier life really you know so why why did you do this and what prompted you to do
this and why take on what probably what is the one of the most controversial things which is to try
and replace sequel you know um i i haven't had to work for a really long time. I work because I love making things.
You know, I feel so fortunate that I get to do my hobby,
which is making things, with a brilliant group of people.
The Malloy team is just fabulous.
It's, you know, a large percentage of the, you know,
some of these people I've been working with for 25 years.
You know, it's not short-term relationships.
It's long-term relationships. And we get, we get to make something.
We, you know, we've been thinking,
we thought deeply about it when we made LookML and we made some mistakes.
We, you know, we,
we learned a bunch of stuff and there was great ideas that came in as we were
building LookML from other people,
but we had already made early decisions in LookML. And so I get sponsored to go do this work that I love to do
is just fabulous from my point of view. And it's not just me being sponsored. I mean,
I could go off and work alone, but I don't work alone. I work with Carlin and I work with Michael
Toy and I work with Ben Porterfield, my co-founder, and many other brilliant people on my team, on our team that make Malloy.
I just feel just so fortunate that I get to do that.
And why the problem space?
You know, data is a really interesting problem.
These are fascinating problems.
They're not, you know know we solve reusability in imperative
so a sql is a declarative language you kind of declare what you you kind of declare the state
of things and then the computer goes and figures stuff out and um most other languages are
imperative they're they're they're python is imperative you just step you know with with
branching and you know step by step but but but
a declarative language is kind of like state the problem and then and then state what you want the
answer to look like and you let it go figure that out for you it's a lot more like mid-journey or
or you know or like chat gpt which is like let me just state the problem and you you compute the
answer um without me knowing how it's doing it necessarily. And so reusability is really hard.
And so Malloy is very simple.
Like our semantic models can be five lines of code, right?
They just state the join relationships, for example.
That's a semantic model. And so the density of Malloy is much better than our prior attempts at this,
and the readability is much better.
Okay.
So, Lloyd, you took on Justin – Julian Hyde?
Justin Hyde?
Sure, Julian Hyde.
Yeah, Julian.
He's great.
He's terrific.
Who basically was the guy behind a lot of kind of, I suppose,
aspects of SQL and SQL.
Basically, SQL is something that a lot of people are invested in,
and it's been something that has certainly been my career for that a lot of people are invested in.
And it's been something that certainly been my career for the last kind of 20, 30 years.
So why did you think about replacing SQL rather than just making SQL better? So first of all, I love Julian and we have these great conversations, right?
And we're actually working in very similar problem spaces where we think about things, the problems in dealing with data a lot. And he actually has taught me a lot.
So I've just loved working with him. And the thing about, you know, there's a solution which
is adding one more feature to SQL. And then there's a solution which is like rethinking the
problem, right? At some point, you rethink the problem. So let me, I'll just give you a basic example. So MySQL does not have arrays, for example, but Postgres does and DuckDB does
and Snowflake does and BigQuery does. And every one of those implementations is very different.
Like the way that they handle array ag are very different. The way they handle nested structures
are different. The way they handle JSON is very very different and so what's happened is that that
everybody has been adding features to sql um but not in an orderly way and so even time zones that
got added to sql are different in every single sql dialect and so there isn't a standard of sql
there's a strategy which is add the new feature to SQL, right?
But they've done this by increasing complexity and increasing and increasing and increasing complexity.
And what Malloy's attempts to do is actually to radically simplify the problem so that you could learn.
If you know basic SQL, you could learn Malloy in a few minutes.
You learn Malloy, I'm sure, in a couple of hours, right?
And then, but you still have all of that power so that power is folded in in a in a really simplistic way so
lawyers take on arrays for example or nested queries um that are super like and and those
are super powerful right but you couldn't add that feature to sql in the way yeah yeah and and um do
you think do you think that the i think i've heard you say in the past that the the the the kind of assumption that all data
is tabular is a limitation that we we're limited by when we use sql is is that something is is i
suppose breaking away from everything being tabular part of the motivation of this as well
or part of the part of the kind of the realization that led you to this yeah so i mean the first
actually that we first figured this out at looker right so you know before looker the realization that led you to this? Yeah, so I mean, actually, we first figured this out at Looker, right?
So before Looker, the way that you dealt with data was the dimensional model.
And the dimensional model basically says that the reusability of data is actually data fragments,
right?
You materialize, you have a big transactional table with a bunch of dimensions, a bunch of basically scalar attributes,
and you produce these tables
that look like dimension and calculation, right?
And you produce a whole bunch of those,
and then you join them together as the result.
And this is like the,
this is the fact tables.
And so there's this whole dimensional model,
you know, DAX is based on it.
The MDX is all based on it.
Cubing is all based on it.
It's all the old technology for data.
And then when we start talking about the modern data stack,
there's this big wide table view of data,
which is that you join everything together, right?
And then you have this visualization of this wide table,
and you can do calculations against this wide table.
And some of them use symmetric aggregates, which allow you to do calculations against some of the sub things.
And Looker pioneered that and invented that.
But it's still a wide table, right?
So your mental model, first of all, the dimensional mental model is very complex, right?
The big wide table is a lot simpler because you have dimensional freedom, which is that you can pick any dimension and any measure and compute the result correctly.
So that's what Looker gave you, right, in LookML.
And what Beloy does is it actually, instead of treating it as a wide table, you have an orders table, and the order has items in it, and the order has the user in it.
And the items each have a pointer to a product and a pointer to a item as it was in inventory and
right so there's a tree there it's like a it's a tree and you can hang calculations anywhere within
that tree or you can make calculations anywhere within that tree but you don't lose the tree
and the reason that it's not losing the tree is important is that when you're querying against something like a nested data set, like a JSON data object, you can use the same tree logic to query against
any data without modeling it. So what Malloy lets you do is query against, like, you can just have
an event log and you can query at it directly without having to do any data modeling. And so,
so that's the,
that's the benefit of having data in a tree versus having data in a big wide table.
Okay.
So I imagine that anybody listening in now is thinking,
yeah,
I can't get this.
I get the kind of the motivations and the drivers,
but what does,
what does programming or querying or modeling in Malloy look like really?
Now you've obviously got,
you've got the VS code extension. You've got the kind of thing with the cloud IDE but what does what if you can maybe I
suppose in a way walk through what a what a what a kind of a session might look like using Malloy
and how the modeling bit goes first and the querying bit comes along and and try and paint
the picture really of how how how a a querying a modeling session with maloy would go
what would it look like yeah well the the the hard problem how do i how do you teach the world
a new programming language like this is a really hard problem mark right like we and we've spent a
lot of time trying to figure out how we're going to teach the world a new way to program um especially
with data so if you go to the maloy's documentation this is the this is the So if you go to the Molloy's documentation,
this is the experience.
You can go to Molloy's documentation and it will show you examples
of how to do histograms
and how to do time zone comparisons
and how to do cohort analysis.
And so all of those analysis
are we're building indoor documentation.
And then on any one of those pages,
if you hit open in VS Code,
that document will open as a notebook in VS Code
and you can play with it
and you can change the code and you can learn it.
So the experience is,
basically what you do is you open VS Code
and you can start writing data models
and then you can start writing queries within your model
and there's a little run button above every query that you can click on,
and it will run. And you can also do notebooks, which are like Jupyter notebooks, where you can
have a Malloy model, and then you can write queries in each of the cells, and you can put
Markdown in it too. And so that's what you open when you open it up. Yeah, I was just going to
say here that I think one of the things that makes modeling in Malloy really special for me is just how integrated the entire experience is.
So one of my reflections on using LookML for a long time was it was really split across many
different tools when you're doing your data modeling. So you would initially do the exploratory
data analysis in SQL using some kind of like direct query IDE, either like, you
know, maybe the Snowflake web UI or the BigQuery web UI. And then you would jump into Looker and
you would kind of use the Looker web UI to write your LookML models. And then to test that LookML
model, you would then jump into the Looker explore page. So you're kind of split across three
different surfaces when you're actually doing the modeling. When really you're doing one task, you're doing data modeling.
In Malloy, all of that can happen directly within the VS Code IDE.
You can write queries in your IDE.
You can explore the data.
You can look at different cuts in the data.
And then instantly you can kind of do a copy paste directly within that same VS code window into your model and
then build upon that model.
So it's kind of like a much faster feedback loop,
much more iterative process.
And I think as a result makes it a lot easier for you as a data model or to
kind of get into a flow state.
In my mind,
data modeling is like a very cognitively complex task and it's really
benefited by just smooth interaction and getting into that flow state and
being able to iterate over stuff very quickly.
So that was part of the magic for me.
Okay.
Okay.
I mean, what about, I mean, I suppose my first, my most significant impression of using Malloy was how it, so going into one of your examples, for example, on using VS Code or using the Cloud IDE.
What sort of struck me was in Looker,
you would define your semantic model in LookML, okay,
and then you would query it using the front end, for example,
or you might go via the API, or I think recently you can go via JDBC and so on.
But certainly the modeling and the querying are two separate kind of,
I suppose, scripts and through two separate engines and so on there.
Whereas with Malloy, you know, a Malloy sort of like session,
for example, or a query, it has the kind of,
I suppose you define the data model at the start.
So you'd go in there and you'd say, what are the dimensions?
What are your tables you're getting data from?
So the modeling bit is there.
And then the actual querying of the model comes along afterwards.
So you've got the same language for querying and for modeling.
But I mean, are you defining the models on the fly each time?
And is that the correct impression of it,
that you're both modeling and querying in the same language?
Is that correct, Lloyd?
Yeah, it is.
And why?
And what's the point of that?
And what's the benefit of that?
Sure.
Well, once you have a semantic model with queries in it,
you can ask for the data at a very high level, right?
So here's a code representation of give me this data back,
and you can have a query and then add filters to it after the fact
or add limitations to it after the fact.
So from an API point of view, it's very simple.
It's just like, give me the…
Just to jump in there, the thing that really struck me with that
was when you had the example of querying the GA4 data
with the nesting and so on, where the model was easy to define
and the actual querying was easy to do as well.
Right.
And so, like I said, it's a radical simplification.
If you've ever tried to query against GA360 data, it's tough.
And Malloy just makes it terrible.
But the other thing is that in other BI tools,
the query is stored in the BI tool, not in the language.
And the problem is that if the model changes,
you might break something that's separate from it.
So if the only thing that the BI tool is using
is the name of the query,
then it's going to always get the data back
in a reasonable structure.
But if the semantic model changes from beneath the query,
you end up with this validation problem
where you have to go through
and check all of the existing queries that you know about
and make sure that the semantic model still supports it.
And if it's in the language, you'll get a compiler error when you try to compile that query.
And so having the query along with the semantic model makes for much tighter um uh code quality okay but does that mean that each person who's
doing who's going to query some data has to build their own model along with that and is that not a
danger then oh no right okay explain how that works then oh no no no no no so you can import
a model and then query against it right so right i mean so like the like the ga360 thing you can
just you can you can start by just saying import GA360 and then start writing your queries without having to know much about the structure of that.
And it has like an inheritance thing.
You can inherit from it and extend it and refine it.
We call them extending and refining.
But you can extend it so that you can start with a base model and then do an analysis that adds to it.
And the whole thing can be stored in the same repository so that it can be
validated.
And so that if anybody ever made a change,
then your code would get checked also.
Right.
So you'd bring in the models via,
via imports and for example,
that sort of thing.
But,
but certainly every,
every query needs to have a model to work off of,
off of,
doesn't it though?
One,
one thing I want to add here is you don't actually need to have a model defined up front to start querying.
The simplest Malloy model is just a table.
So you can specify, hey, this is a source that's on a table.
And at that point, you're basically in a land that's similar to SQL where you have a bunch of tables.
So one of the things I love about Malloy is the ability to kind of progressively model your data set
and develop your model from the ground up.
Right. Okay. That's interesting.
And that's an interesting topic
because I suppose you've got Colin Zima,
who went and formed Omni,
who obviously was very involved with Looker,
and we've got yourselves with Malloy.
And it's kind of interesting to sort of, it's interesting to
sort of look at the different takes on
what would be the better, what
Looker could be if it was done again
or certainly some of the concepts
or the learnings from that really. I mean, the thing
you just said there about progressively building up the model,
did you think about different ways of doing that
and maybe just on the fly, what's your kind
of view on what Omni have done with their approach?
You know, we're focused on the language at the core and i think that omni is focused on the
the the the person building using a visual tool to build something and and and then be able you
know and empowering that person to create something that is usable beyond their beyond them.
Right. So I, you know, I'm listen, I worked at Borland.
I've been writing programming languages for years. This is obviously my approach. Right.
And Colin's approach is also really valuable, too. I mean, I love working with Colin and all of those people at Omni.
Those are my favorite, some of my favorite folks. And we're just're just tacking the problems from from slightly different
angles um uh i feel like i'm playing the long game or melo is the long game it's which is like
if we're successful we we've we've changed the whole stack right that's that's if we're successful
there's a whole new stack i don't know if we're going to be successful i'd like us to be successful
but that's what we're trying to do.
Okay, so Malloy currently compiles down to SQL, doesn't it?
How does that work?
And is that an intermediate sort of step for you?
Or how does that work?
And what's your vision for where that's going in the future?
It does.
So every single Malloy query compiles to what, what every single Malloy query
compiles to a single SQL query. Um, and there's a really good reason for this, which is that means
that any place that SQL is being used, you can use Malloy instead. So it's a really, it's a,
it's a core tenant of, of what we're doing. Um, um, so, uh, the, the, the, yeah, the, so the purpose is to make sure that we, that, that, that, that what we're doing can fit into the architectures that come with it.
So that's the, I'm sorry, there's another part of your question, Mark, and I just spaced it.
Yeah, it was, it was, I suppose, is, is compiling to SQL the end game for you on this? Or is replacing SQL the end game, do you think?
So, you know, there are...
One, Malloy is very efficient because of what it does.
There could be...
You could definitely build a database that spoke native Malloy.
That's not in the cards for us right now, right?
I'm not...
That's not what we're planning on building where we,
you know, speaking SQL allows us to run in all the places that SQL runs. So that's obviously the first thing that we should do, right? So in order for this to work, you know,
Malloy has to be available in all the places that you type SQL and we're not there, right?
In order for it to work, it has to run when you have to be able to have the choice
and we don't have that choice yet. So we're still very early.
So what about data pipelines?
So we've been thinking about the concept of queries and analysts and so on.
But is this something that you envisage as being used to build more complex data pipelines, really?
Sure.
So we've introduced notebooks a while ago, which are great.
So basically, in a notebook, you can define a Markdown section, or you can define a Malloy section, which has some Malloy in it, the output of which will be like a graph or something.
But we've also introduced SQL into these notebooks, where you can have the SQL have embedded Malloy in it, which is so follow me for a second.
So you have a Malloy query that defines a projection that you're very interested in.
So it's you're grouping by product and date and you have a whole bunch of measures around that.
And you want to give that to somebody who's using Tableau.
So you you write this you write on your semantic model who's using Tableau. So you write this,
you write all your semantic model in Malloy, right? So that has all your calculations in it.
And then in your notebook, you add a create table as select, and then you give it a Malloy query.
That Malloy query, when it hits that Malloy SQL block and it realizes that there's Malloy in there,
it compiles that Malloy to SQL and then
runs the statement, which creates the table for you or creates a view for you. And so we can do
a lot of the things. So it's a very simplistic DBT-like functionality that we do today, which is
that you can build transformation notebooks that will take data in one state and produce it in
another state for use in other tools. So for data pipelines, you can already use.
Okay.
Okay.
So I suppose if you're going to take on and replace SQL,
you might as well take on and replace dbt as well in some respects.
And so is dbt,
is dbt also something that's in your kind of,
in your focus really is something that could be improved on really with what
you're doing.
I would rather they took Malloy and embedded it.
I would rather,
I don't want to be in the business of replacing tools.
I would rather,
I would rather the tool is open.
The language is open architecture.
And there are a number of people who are actually embedding Malloy into
their,
into their tooling,
right?
Which is what we want to happen,
right?
So I would rather they pick up Malloy and embed it than have me do it.
And that's one of Carlin's jobs.
Yeah, Carlin.
So you're attracted by the open source nature of this,
and there's no more famous open source community in this area
than the dbt community.
So what would you say to them as you walked into a bar full of people
with dbt T-shirts on?
Yeah, I mean, dbt is a tool that's built around SQL.
And there's no reason that dbt can't adopt other languages.
So you see already that today dbt supports Python.
So dbt could definitely support Malloy as a language.
Now, I think Malloy has a huge opportunity in the data transformation space,
because as dbt has proved out that more and more people want to do their data transformations with an infinitely scalable compute layer like the data warehouse.
So that means a lot of these transformations end up getting pushed into the data warehouse using SQL.
Now, as we've discussed many, many times already on this podcast, one of the really poor problems or one of the things that SQL is really bad at is reusability. So we have this whole discipline called data engineering.
And people like to talk about how data engineering is like software engineering.
But data engineering is really missing one of the most fundamental components of software
engineering. And that's basic reusability. So when you're writing software, you can write
functions, you can write classes that get reused elsewhere. When you're writing SQL queries, that doesn't exist. And that's really what Malloy provides. So I think the opportunity to bring one of the kind of most fundamental units or core concepts of abstraction to this discipline that's been lacking it for many years, I think is really where I see the opportunity. Okay. Okay.
So, I mean, I suppose digging into some of the more sort of details of Malloy.
So things like joins, how does Malloy handle joins?
And how does it maybe make it easier to join data together when people get confused currently with inner joins, outer joins,
all these kinds of things, really?
So how does Malloy handle joins and how does it make that easier to work with?
Yeah.
So, well, first of all,
it makes it so that you really only have to join once, right?
So in every SQL query, you're repeating the join pattern,
so it gets very complicated, right?
But in Malloy, you can declare your joins exactly one time.
And Malloy's joins are much more simple to reason about.
We have three joins. We have join one, which is where you're doing a many to one join.
So think of you have an order and the order has a user. That's a join one.
And then you might have a join many, which is I'm starting with users,
and each user has many orders.
So if I was in the users table and I was joining orders,
that would be a join many.
And then there's a cross join, which is a matrix.
But that's it.
Those are the joins.
If you want to do an inner join,
you can add a where clause to limit the thing that you joined.
And other than that, you don't really need to know too much else.
So again, we thought about it, we looked at it,
we said, you know, these are really the patterns.
We don't need all this other stuff
that nobody really understands anyway.
Let's do it in a way that people can understand it.
What about complex calculations that people always struggle with things like time time
comparison calculations and percent of totals and that sort of thing how do you how do you make
those easier yeah so um uh you know one of the really nice things that tableau invented in 2017
was the level of detail calculations these are basically so that when you're in you're within
a table and you can you're you grouping, you can exclude the grouping.
So the really simple thing is if you're grouping by, say you're doing a query that's grouping by state and you're counting the number of airports.
So your table looks like state and airports.
State and airport count, which is the count of airports.
In Malloy, there's an all function, which basically escapes all of the grouping
so that you get a column which would have just the total.
But the beautiful thing about that is that you can use it within a calculation.
So you can have airports count over all of airports count and get the percent of total.
It becomes very easy to write these kinds of calculations.
And Malloy has a very sophisticated level of detail calculations.
So this is something that's very difficult to write in SQL.
It's a lot like writing a cube, like a data cube.
Most human beings can't write them.
But Malloy builds it in so that it becomes easy for you.
That's an example of one type of thing that we do.
Nesting is the other thing that we do that is really hard to do in SQL,
but just is absolutely trivial in Molloy.
And it's kind of the thing that's like, it's like describing,
I don't know, kind of like describing sex.
You can't do it.
You have to look at it, play with it.
I can't do it justice.
Like I can sit here all day trying to talk about it,
but really play with Molloy or you won't really understand it, right?
Yeah, yeah.
That's exactly what I did, actually.
And something that struck me when I was playing around with Malloy
was the fact you've got some data visualization features in there as well.
I mean, how does that work?
And what was the purpose of that?
And, yeah, just talk about those features.
Yeah, so one of the things that the mistakes that everybody makes is that they tie their visualization layer to their semantic modeling layer.
And really what you want to be able to do is you want to be able to declare the calculations and the structure of the data.
And then when you use it, there are lots of consumers of this semantic model.
So one of the things that is consuming, it might be a rendering library to do visualizations.
And so what Malloy allows you to do is to tag anything that's got a name
you can hang a tag on.
And so if you have a query named query and it looks like it's bar chart,
which means it has a dimension and a measure,
you can tag it with a bar chart.
And then when it gets it, instead of showing it like a table,
it'll show it like a bar chart.
And then Malloy, because of all the nesting you can have nested line charts and bar charts and lists and and all that you have to do is just drop a little uh
hashtag like a you know hashtag bar chart or hashtag line chart in it and then it works that
way but it also works for other systems too like our doc system uses it for deciding how to render the page or uh you could have machine learning use it for
figuring out what are the other labels of things so the the the tag architecture is open architecture
but it uh but but it's a separate layer from the the semantic layer again a question people might
be asking or thinking when they're when listening this is, well, what actually is Malloy?
Is it a server application that we talk to?
Is it something we download?
I mean, what actually is Malloy as in the actual thing you kind of interact with?
Is it a program?
What is it really?
And how do they get hold of it?
Yeah, great question.
So Malloy today exists in what I would say three forms.
The first form, and I think the most important,
is the language itself.
So Molloy is a programming language.
It's a syntax, semantics attached to that syntax,
and it's a set of tools that will compile the Molloy
that you've written into SQL
that runs against a database.
The second and third parts are kind of interlinked,
but there's also a development environment,
which is today our VS Code extension.
And then the third part is the various Molloy runtimes,
and that's kind of where does Molloy code actually execute,
how do you actually use it in your day-to-day.
So those are kind of the three forms. How do people use it today? You know, the actual interaction is most often going to happen
with VS Code. VS Code is kind of our combined development environment and runtime. But we are
actively developing new runtimes that will allow people to execute Malloy code in different
contexts. So for example, we have an MPM package, and you can
import the MPM package and have the Molloy query compiler service available to you. And you can
also have our various database connectors to allow your application to directly speak Molloy and
use this Molloy package to interface with the data warehouse. We also have a Python package that will allow some people with Python applications to do
something very similar.
And then we're in the process of developing a command line interface tool.
So you can imagine setting up a scheduler to execute Molloy queries via our CLI tool.
So you might give it a file.
That file contains some Molloy queries, and those queries end up getting executed on your database.
The command line is how you'd, for example,
do a transformation pipeline.
Okay.
Okay, so if somebody was a dbt and Looker developer at the moment,
where would it fit into their workflow,
and how would it interact with or complement
or work with Looker, really, for example?
So is it is it
is it more the dbt side or or what really in this in this in this scenario well the the the the
transformation pipeline is great for building a set of views in your database that other people
could consume so that's the first thing like the the very first thing is you go and you build a
model you um you you want to publish govern data to other tooling. You can create views for other tools to consume.
And you just go into the VS Code.
You build a notebook and do that.
So that's the first easiest thing that you could do with almost no effort.
But there are other things to do.
I'll leave it to Carl.
Yeah, so I think in the scenario that we're outlining here,
it does fill a little bit more of the DBT side of the equation
in terms of transformation.
I think ultimately our goal is to cover
or to use Malloy for the whole spectrum.
Again, we want to replace analytical SQL,
and that's kind of anywhere where you run SQL.
That could be DBT.
That could be within your BI tool.
Instead of running SQL in
those places, Molloy should be the language. Now, what we have seen some use cases, some really
compelling use cases in the wild, where people are writing their transformation pipelines in Molloy,
and it is much easier to maintain much more concise for several reasons. The main one being
the reusability. So like, you're building pipelines, and oftentimes in those pipelines, you need the same calculation run over and over again.
And in a SQL pipeline, that means the data engineer has to write that calculation in many different places and has to maintain that calculation in many different places.
Whereas Malloy, that calculation gets placed in one part of the model.
And therefore, there's fewer places to maintain,
fewer places to break.
The other thing that I think is maybe a little underappreciated
is the syntax of Malloy is just much simpler
and much nicer, much more compact than SQL.
And I think that's something that's easy to maybe just say,
yeah, it's incrementally better.
But I think having much better syntax
has pretty profound implications
on what the analysts or what the data engineers choose to express. So one of my favorite quotes
in the world is, you know, we shape our tools, then our tools shape us. And I think, you know,
initially, we shaped SQL in the 1970s. And we've, you know, the analysis that we've chosen to do has been really
shaped by kind of the language design choices, which, you know, that Lloyd has this joke about
SQL, about how it's like, there's two write only languages in the world. One of them is Perl,
the other SQL. And, you know, it's really easy to write SQL, but then you come back six months later,
and you look at the SQL query that you've written and you think to yourself,
you know,
what the hell is this?
Wait,
what idiot wrote this?
And you look at the git command.
It's like,
Oh,
that was me six months ago.
So,
so if,
so I suppose we start from my side,
if this was anybody else other than Lloyd doing this,
I'd worry this was going to be like the Esperanto of,
of kind of query languages in that it would be the,
I suppose the kind of the, in that it would be the i suppose the
kind of the i suppose intellectually very interesting but but used only by a sort of a
small bunch of cranks sort of thing but it's but but with with with this you know you've
always put some thought behind it and and it's solving a real problem how are you going to how
the two of you going to make this actually kind of like resonate and and kind of have an impact really
because i'm sure you're not doing this for the sake of it really so so what's your strategy really
for really getting take off with this and and you know what would be what would be a i suppose a
a goal for you really or a sign of success for you really somebody at looker once said we'll
be successful when we have a thousand true fans i that. That's our goal. We have to get it into enough places that you can use it in your day.
What we found is if people have a chance to use Malloy in their day, they love it.
I don't think I've ever heard anybody say they love SQL.
But there is a real joy in making – if you make people more powerful, they will love you for it.
And that's what we're trying to do, right?
If we make you more powerful, you can do more by using this tool, then you'll be successful.
And the thing that we have to do is make it so that it's easy for you to adopt.
It's easy for you to learn.
And so the goal is if you know SQL, you should learn Malloy really fast.
Like basic Malloy should come to you very quickly.
And,
and,
and we've been relentless about it.
We're,
we're actually on the fourth iteration of the language,
if you can believe it.
We're like,
this is the fourth.
We're like,
we've,
we've reached,
we've restructured the syntax of the language four times so that it's more
readable.
And we keep going back and seeing, can you read it?
And so we're getting there.
I think we're at the precipice of getting in.
And so we're trying, Mark.
I don't know that we'll be successful, but we're trying.
I can see a goal for this would be people learn this.
If someone new coming into our industry or maybe learning this at college
or whatever, they learn this rather than learning SQL.
That for me would be, when that happens, you've kind of done your job really, haven't you?
Because I suppose you've opened up this topic to a lot more people.
There's a lot more of an understandable and easy to kind of like learn language.
And they learn this and don't learn SQL.
Would that be a goal, do you think?
I mean, or certainly a kind of...
That'd be lovely. lovely. And it's reasonable
too. I mean, we used to teach assembly language in college and then we started teaching
C, right? Because C had the right reusability abstractions
and portability that assembly
language didn't have. And I think we're kind of in the same state with SQL today. So I'm hoping
we can... Yes, that's a vision SQL today. So I'm hoping we can.
Yes, that's a vision.
Yes.
So, Carlin, for you, so you're on the product side and you're looking after the open source project
or certainly involved in that.
So what's the kind of roadmap for Malloy going forward then, really?
And what do you hope to achieve with the open source project?
Yeah, I mean, I think Lloyd said it really well.
If in five years, ten years, I think Lloyd said it really well. If in five years, 10 years,
I think one other thing to add there is we're under no illusion that this is going to happen
overnight. I think we take a lot of inspiration from the TypeScript project. TypeScript is 12
years old. I think it's 12 years old at this point. You can backtrack me on that. But it's
only within the past few years has it really started getting mainstream adoption.
So, you know, language adoption takes a long time and it takes kind of like just relentless hand to hand work where we need to step in and teach people and show people the way.
So, you know, it's there's no hack here.
There's no kind of there's no illusion that we're going to wake up overnight and everybody's going to see things our way.
So a large part of our focus in terms of Malloy, in terms of language,
there's a lot of language functionality that we are still really excited about that hasn't really been.
Yeah, yeah.
So what's coming in the short-term roadmap, really?
What's on the short-term roadmap for Malloy?
Yeah, so I think one of the big things that we're excited about is parameters.
So parameterizing queries.
I think there's also some functionality around –
okay, so let me step back a little bit.
I think one of the other things that Lloyd mentioned here
is that people love us when they can use us in their day-to-day.
And I think the focus right now on the Malloy team is figuring out that runtime aspect. Because
right now, you can run Malloy in your VS Code environment. And it's really easy for you to
write queries and share data. You get a really great feel for the language. And you can experience
the joy of Malloy in your VS Code environment. But I think one thing that is not really where it needs to be yet
is that runtime environment where people can extend
that personal experience that they had to the rest of the organization.
So you can now, like, so giving people the ability to run Malloy
in more places, getting people the ability to run Malloy
as part of their day job.
So, you know, that CLI tool that I mentioned
is sort of the V0 of that, of like, hey, I can actually take these queries that I've run,
and I can materialize assets in my database using, you know, my semantic model to create
tables that other tools in my ecosystem can consume. So that's kind of where our efforts are focused in the short term.
So primarily on kind of the language front,
there's a lot of features and functionality that we're really excited about.
And then on the runtime front.
Okay.
Okay.
Fantastic.
And one last question for me,
the name Malloy,
where does that come from?
Just out of interest.
So if you Google the urban dictionary,
let me go do that here from Aloy.
I'll see if I'll read it to you.
It's essentially a man who plays by nobody's rules, has nothing to lose, but always gets results.
While his superiors are often infuriated by his general disregard for established or authoritative norms in his given career,
his success rate ultimately appeases their anger.
I basically have real authority problems. So I kind of, so most of the people on our team,
we kind of like, we're, we kind of like to see it our way and we're going to go for it. And
it's a little unorthodox, but that's how we're going to get results boring if they go to www.malloydata.dev they'll get our web page and the best
thing to do is just to play with our documentation and see the use cases there that's the easiest way
to learn it is to go open the documentation and then click on any of the vs code links and
it'll take you to uh it'll let you play with it. Fantastic. And, Carly, what about the Slack community?
Yeah, the Slack community is probably the best place to showcase what you've built,
ask questions, give feedback on the language.
Lloyd and I are both pretty active on the Slack channel.
I think at Looker, one of the things that I loved about the company,
I never worked there, but I was a very loyal customer.
It was kind of a kitchen table attitude where the
kitchen table is where people go to learn it's where it's just the community of people where
no question is stupid everybody should feel free and safe to ask questions about how to how to do
something and I think the slack online community is kind of our equivalent of that kitchen table
fantastic that's really good well thank you very much, both of you.
It's been a great conversation and yeah,
I've certainly been very impressed with what I've seen so far with Molloy.
So best of luck and thank you very much for coming on the show.
Thanks, Mark.
Thank you, Mark.