Drill to Detail - Drill to Detail Ep.59 'Looker, dbt and Digital Analytics Today' With Special Guests Tristan Handy and Stewart Bryson
Episode Date: October 2, 2018Mark Rittman is joined in this Looker JOIN 2018 Special by long-term friends of the show Tristan Handy from Fishtown Analytics, and Stewart Bryson from Red Pill Analytics to talk about dbt and enablin...g data engineering for data analysts; the state of modern data analytics consulting today, and what we’re looking forward to hearing about at next week’s Looker JOIN 2018 conference in San Francisco, CA.Looker JOIN 2018, San Franciscodbt (data build tool)“Future Proof your Analytics Stack”
Transcript
Discussion (0)
welcome back to the drill to detail podcast after our long summer break and i'm your host mark
rittman next week is looker's annual conference in san francisco and i'm joined in this looker
joined 2018 special episode by long-term friends of the show Tristan Handy and Stuart Bryson. So Tristan, welcome back and
why don't you introduce yourself and tell us where you come from. Yeah for sure, thanks so much and
glad to be back. My name is Tristan Handy and I'm the founder and CEO of Fishtown Analytics.
We do two things, we help venture-backed companies implement analytics stacks,
and we also build an open-source data modeling tool called DBT.
Fantastic.
And Stuart, nice to have you back as well.
Do some introductions and tell us where you come from, what you do.
Thanks so much, Mark.
It is a pleasure to be back.
It's also a pleasure to be with Tristan. This is going to be really interesting.
So I'm CEO and founder of Red Pill Analytics. We're an analytics company about four years old.
And, you know, we work primarily or at least our history is primarily from the Oracle stack, but we're only about 60% Oracle now, and the rest is pretty much public cloud.
And we use a lot of the tools you discuss on this podcast,
so it's going to be a lot of fun today.
Great. Okay.
So, Tristan, when we did the first interview
in the first episode with you a while ago,
you talked about this tool called dbt that you'd built.
And I must admit, at the time,
I hadn't really had a chance to look at it,
and I hadn't really got what the context of it would be but since then i've
been i've noticed it being kind of mentioned a few times on projects i know stewart's had some uh
had some exposure to it as well so why don't you just tell us what this dbt tool is and what is
the problem you're trying to solve with it first of all yeah uh for sure the and you had no good
reason to know about it a year ago because almost no one was using it.
So DBT stands for Data Build Tool.
Currently, it is a purely command line tool that helps analysts model data in their data warehouses.
And it really grew out of a need that I had working at my last company. I actually worked at RG Metrics,
which became Stitch. And so we were big users of the Stitch product. And we had all of our
data loaded into Redshift. And I was actually trying to analyze marketing automation data. It was from Pardot, actually.
And I was just having a real hard time.
I was a longtime SQL user, but this was the first real experience I was trying to have analyzing data in this Redshift environment. to build these pre-aggregations because otherwise my eventual analytic queries were so freaking
complicated that I couldn't keep track of what was going on. And so my co-founder Drew and I
decided that we just needed this thing. And as we built more and more of it, we found that it was
more and more useful. And we couldn't believe that there wasn't something out there to do this so a big part of the impetus of uh starting fishtown was just to try to bring this thing
to life and and see what it became okay okay so i think i can understand what you're saying about
um i suppose uh you know needing a tool like this i mean i was doing some work on redshift recently
and the amount of times i was using the same kind of design patterns or the same kind of patterns to do things like trying to find out what the first transaction was in a
set of transactions or things like using a case statement to break out and decode a sort of like
a field and the amount of repeated code i was creating in there was terrible and looking again
i was looking at dbt at the time and it was two things particularly it potentially did, this thing about not repeating yourself, but also the testing framework you had there as well.
I mean, was that the kind of thing you were thinking of, really, when you were sort of putting it together?
Yeah, I mean, to be honest, the very initial need was something so trivial and stupid, but when you have a data warehouse that's loaded
by modern data pipelines like Statute and Fivetran, the data shows up in your
warehouse in a very raw format and each different integration, the table and
field names are aligned with the API endpoints that those tools are getting the data from.
And so in some integrations, you'll have underscores in between the different parts of the name
and some of them, they'll be case different.
And it was just very challenging to have all these different data sources that had different
styles of naming.
And so literally the very first thing that we did was just create a set of what we
call now base models that were one-to-one with the raw data, but provided renaming for all the
fields and for the table itself, and then filtered out like test records, like records that shouldn't
exist. And then we did all of our analytics on top of these base models and then everything was felt felt very clean and consistent um but you're totally right doing that type of
transformation downstream whether you're creating uh you know very simple like uh case statements
or like all the way to like kimball style star schema modeling um that that's stuff that people
use to uh use dbt for as well okay so, I mean, you've been looking at DBT,
and you've got quite a background history in, I suppose,
templated code and ETL tools and so on.
What was it that interested you about DBT,
and what was it that, I suppose,
made you think this is worth looking at, really?
Yeah, I mean, one of the founding principles of Red Pill
when we founded the company really was we wanted to reintroduce analytics, data and analytics, I'd say, to the software development lifecycle.
I mean, we started trying to make enterprise-y tools work in that mode, which is not necessarily easy.
And we've written several things and frameworks and that sort of thing to try to make that happen.
And then when I discovered dbt, I just felt kindred because the whole idea of trying to make a tool,
whether that be some sort of ETL tool or data integration tool,
work as your sort of total source to target in today's environment, I think, is sort of a fool's errand.
As Tristan mentioned, you know, Five Trans Stitch, they deliver your data very efficiently to your cloud native data warehouses.
And so we don't really need to focus on that side of it. And
what I loved about DBT is it really focused on the back-end side. And now hearing Tristan sort
of describe his desire to build this makes perfect sense to me. When you look at DBT,
it's very much about building one interface at a time, one step in your process at a time, but not just that,
but doing it in a way that works in modern development environments. So I'm thinking
continuous integration, continuous deployment and testing. These enterprise ETL tools and data
integration tools that we've both worked with in the past just never really thought about
what does it mean to test ETL code?
What does it mean to do the sort of things that developers are used to doing, which is
stubbing and mocking and thinking about what it means to write a test where you have the
anticipated results of what you expect that piece of your pipeline to do and be able to test it.
What DBT does is sort of abstract out just the real basics of what you need to write
in a templating engine.
And it really just takes care of the rest.
So I'm a big fan, Tristan.
Thanks.
I really appreciate that.
Yeah, and you did a way better job there than I've ever done before.
So I need to bring you along when I'm trying to convince people to use it.
Tristan, I mean, so for anybody, what we've described here is interesting.
And it's almost like an abstract sense.
You've described, you know, reusable code and templating and so on.
But Tristan, just maybe a very kind of basic level, explain, you know, I mean, what is dbt in terms of how people might kind of encounter it and install it and use it and so on?
Because it's like writing SQL, from what I understand, but it's in a reusable kind of templated way and so on.
I mean, how do people actually use the product?
Yeah.
So the goal is to make data engineering accessible for analysts. And so there are certain things in the workflow that are a little bit unusual for analysts to have to do.
But I think that we've tried to make the whole process as accessible as possible.
So you install dbt by firing up your command line and you do a brew install dbt.
And the install process is fairly
straightforward at this point. You make a blank project, you supply some database connection
credentials, and then you get to work writing transformations. And every new SQL file that you
create in your project becomes a new table or view. And the analyst doesn't actually have to write create table, create view,
cert keys, disk keys, partitions on BigQuery.
The analyst doesn't have to think about any of that stuff.
They express their business logic in a select statement.
And then dbt figures out how it needs to materialize that into the database, including if that needs to be materialized in some incremental view that builds over time.
Okay. So is this an ETL tool or is it kind of an analysis tool or both or what, really?
Yeah, I think that the – so people are so used to talking about, quote unquote, ETL. And now the order of the letters has been mixed up so that it's really ELT in the modern stack.
But I think that really tools like Stitch and Fivetran are really EL tools.
And DBT is the key part of that. So if Stitch and Fivetran are responsible for
extracting and loading your data into the warehouse, then dbt is the transformation
layer that then sits on top of that. Because if you, you know, originally in the modern stack,
people were just loading in raw data and then trying to write reports on top of that. And
that's a very challenging thing to do as anyone who's
worked in data warehousing in prior generations of you. So DBT is trying to fill that hole in the
modern stack. What I love about DBT is that it's the concept that when we use Fivetran or Stitch,
we can get our data loaded to a cloud data warehouse. And we can knock out a lot of requirements sort of initially.
And what I really like about DBT is that for those requirements
that can be solved by the delivery of sort of the raw schemas
from the source systems,
we can have analytics developers go ahead
and start trying to knock out some of those requirements
while the back-end transformation developers, whether those be analysts or real developers,
start to address the not-so-low-hanging fruit.
And I think that in traditional projects, we spent so much upfront time ETLing
before we could even get to a state where we could build analytics.
And so by the time we built analytics on top of these transform models, that's the first
time we find out from the user or the business that we had really, we didn't really capture
their requirements correctly.
What I love is that we can go ahead and start building analytics, building it on the Fivetran or Stitch delivered data models, get some stuff in front of the user or the business, start iterating with them.
And then DBT reacts to sort of those requirements in such a more reactive way that we're actually doing things with data instead of architecting projects. One of the main goals was to have really fast iteration speed because, like you said, the process of answering a question is you're going to do some analysis in a front- model in dbt you should be able to uh deploy that in
like five to ten seconds uh and and so that that speed allows analysts to actually like
get their work done like so much faster okay okay and then that's quite an interesting sort of lead
into something else i wanted to talk to you about, Tristan, and also to Stuart as well.
I mean, about, I think the last podcast we talked, we talked about the kind of the analytics market and the sort of technical stacks and so on at that point.
And since then, you kind of did a podcast or you did a webinar, I think, with Mode Analytics.
And you generally talk about things being, well, you tell us, what do you think the software stack, the analytics stack is like at the moment?
And where do you think it's kind of going?
And I mean, just start off by saying that, and then we'll kind of maybe have a discussion around that with Stuart as well.
Yeah, I think that there are several pieces of the analytics stack that has kind of evolved since Redshift was released in,
I guess that was 2012 or 2013. But the advent of Redshift, I think, is kind of the forcing
function that really caused the modern analytics stack to come into existence. It was first Redshift, and then both Fivetran and Looker came out at a
really similar time. Then you start seeing tools like Snowplow pop up to dump data directly into
Redshift. And then dbt came along in 2016, which was, I guess, Matillion's another good tool that fits in that layer of the stack.
But so you now have this whole modern tech stack
that is made up of event pipelines,
EL tools, extraction, loading, like Fivetran and Stitch,
data warehouses, data transformation,
and then BI, these five layers.
And what I think is so interesting
um is that the the layers of the stack have been very stable for the past several years and and
even at this point the players in each of those parts of the stack have have remained fairly
consistent there there haven't hasn't been some brand new bi tool since
the era of like looker mode periscope um to come out and like grab a ton of the the market um so i
think that we're starting to see things kind of stabilize a little bit for this kind of era of
the tech stack okay i mean so so is i mean for stewart and i we came from the oracle world
and for the from the more i suppose kind of you know enterprise enterprise databases and so on
and i think certainly for me when redshift came out it didn't quite i mean i was obviously aware
of it and i was aware that it was disruptive and and so on but i don't don't think redshift had
quite the same resonance with us at the time but obviously now we see the kind of role that it has
i mean stewart what was your you came into this around the same sort of time as me what was your take
on when redshift came out and and the way the market has gone really and and and so on really
so the beautiful thing about redshift at that time was the idea of provisioning something quickly
and i think that when you come from the world that you and i come from mark uh the sort of legacy
world just the idea that you could provision a data warehouse with clicks, and then you discover
underneath that it's really an API call, so you could automate that. I mean, that was just
revolutionary. But now what I look at, like BigQuery and Snowflake, really has taken that
model further, which is separating compute and storage. And I think that is just gangbusters for what we
can do. And I'm sure Redshift's going to address some of these issues for it. It doesn't quite
have the elasticity that the other two I mentioned. But I think the idea of, I mean, Mark, on these
legacy projects that we worked on so many years ago, I mean, how long did it take for the data warehouse machine to actually be stood up?
I mean, you know, we always built in a couple of weeks in the project to kind of stand around and wait.
And I think if that's the thing Redshift really changed, it is that you can get to work on day one.
Yeah, I think the worst one was that the project went for a year and a half,
and the machine actually got delivered about a week after the project finished so uh that was that's
the exadata one that uh that you and i probably know about in uh in in london as well i mean so
so i mean so one of the things that you've talked about tristan is you talk about the missing parts
and you talk about i suppose where there's opportunities for the stack to be maybe sort
of more more layers you put in or things that are
missing at the moment i mean at this point now what do you think is still missing or where do
you think there's opportunities to kind of like do things better or add features into the stack
we work with now yeah that i definitely am waiting for a lot of products to come out and
i wrote that blog post almost a year ago now. And there actually hasn't been...
So the first one that I mentioned there is that I feel like with a combination of data network effects plus AI or ML,
there should be a way to cleanse data more effectively than there is today.
You know, there are thousands of people now who all have Salesforce data loaded into a data warehouse.
And yet everybody has to do the same manual cleansing
over and over again.
It's still a very human thing.
And maybe Salesforce isn't the perfect example.
Maybe that is something a little
more automated like Stripe or something like that. But it seems to me like that problem will be
solved at some point. Probably the one that is the biggest pain point for people that we work with
today is what I called data reintegration. There's probably a real term for this, but that was what I had called it,
where you've built this tremendous amount of value
in your data warehouse.
You've integrated data from all these different sources,
then you've applied business logic to it.
But then if you want to get that data
back into the systems that actually run your business,
like you want to get it into Salesforce so that sales reps can do things with it, like you want to get it into salesforce
so that sales reps can do things with it or you want to get it into a marketing automation platform
so that it can trigger campaigns there's really not a great answer for that today there are a
couple of answers that are you know you're starting to see take shape and looker does a little bit of
that but i don't honestly think that like that's a mature space that has
has a good answer today okay sure i mean you and i know that kind of area quite well i mean what's
your thoughts on that and also your thoughts on the stack at the moment yeah i mean i i agree with
with tristan i think machine learning has promise and whether or not it will deliver on that promise
is is i think um debatable but you, the idea that you open these tools,
whether they be analytics tools or they be integration tools, they should have an opinion.
I mean, machine learning should allow these tools to have an opinion. Here are some
expected joins I think you might be thinking about from the data integration side.
Here's from the analytics side. Here are some opinions that the tool can make
because they have machine learning underneath them that can give you some visualizations when
you open the tool. I think what we'll see, and I'm hoping that we'll see from both the analytics
and the integration stacks, is just these tools having an opinion when you open them and go ahead
and do some of the work for you without you having to tell them to. One of the tools that does do this to a certain degree is
Glue, so Amazon Glue. At least there's a crawler in the background that's going ahead and finding
your schemas and taking a first pass at defining those schemas. I think we'll build on those sorts of things moving forward where these
tools we use because of advancements and what machines can do on their own. We'll start to see
these tools, you know, not give us blank slates or blank palettes when we open them and start to
help us sort of guess business logic and improvise in the delivery of analytics and data in such a way that
now all we're really doing is tweaking it or really defining what our, you know,
specific needs or use cases or requirements are okay okay i mean so i mean do you but do both of
you think that the inevitable thing with the market is you'll start to get consolidation
and the moment we've got these very kind of um uh you know you've got these very sort of separate
companies doing sort of you've got stitch you've got five tran you've got looker doing their
separate things do you think there's an opportunity or a need for uh for equivalent of say oracle to
come along to
kind of buy up these companies and create an integrated stack or i mean tristan or do you
think there's kind of do you think there's benefit in having these things separate and modular now
i think that modular is more customer centric um for the slice of the market that we work with.
The VC-backed startups are investing in smart data talent today. And I think that people who want control, they want the ability to put together best
of breed solutions.
Is that the right solution for the enterprise?
I honestly don't really know the answer to that. It may not
be. And historically, it clearly hasn't been. One of the things that I do think is interesting
from a consolidation perspective is that the warehouses, and especially BigQuery, are starting to do more than traditional analytics
under the hood of a SQL prompt,
which I think is an interesting type of consolidation.
They seem to believe that SQL is the lingua franca
and that you're going to be able to do more and more things from SQL.
They recently released this regression.
You can now create a, quote-unquote, machine learning model inside of BigQuery,
and then it automatically keeps the weights up to date,
and you can query it directly in BigQuery.
As a function call.
I mean, as a function call.
I mean, it's brilliant. Yeah. So, I mean, as a function call, I mean, it's brilliant.
Yeah.
So, I mean, Stuart, I mean, we've seen this before.
We've seen from the Oracle world that kind of,
first of all, sort of consolidation,
but also SQL being the lingua franca of everything
is a story we kind of know well, really.
I mean, what do you think on this?
Do you think there's going to be consolidation?
I mean, also, I'm interested to understand from you, Stuart,
we've looked at things like, I suppose oracle's uh you know data
warehouse cloud service that's trying to copy or or at least be similar to the uh the ease of
provisioning that you get with bigquery do you think that do you think that players like oracle
come along here and just repeat history or what were you here you put me on? You put me on the carpet, aren't you? So I think what Oracle has today
with their autonomous data warehouses
is more similar to, say, Redshift
than is to, say, BigQuery or Snowflake,
which is still very much local storage.
It's still very much non-elastic, I'll say. I mean, there's definitely some elastic
capabilities within Autonomous, and we've got a few customers using it. But I think the complete
separation of storage, you know, serverless. I mean, what Autonomous is not is serverless at
this point. I'm sure they'll get there. As far as the modular discussion, I mean, I completely agree
with Tristan. I don't see a need for a complete stack. I think that's regressive in the way we've
moved forward with these modern stacks and thinking about the cloud giving us APIs, whether
those APIs are SQL-based functions. I still consider that an API in ML. Or just real honest-to-goodness
RESTful APIs that we can just call. I mean, building the plumbing is what we used to have
to spend so much time doing, Mark, is building this integration and plumbing. And actually,
the cloud vendors are exposing that. I mean, you look at Looker, you look at Stitch, you look at
Fivetran, you look at Mode. These things are all RESTful based. And the idea that we want in the way that that that makes sense for
our requirements and i i hope it doesn't change i hope we move more toward modular pieces that
can be plugged together to get just exactly what we need okay okay and it's interesting i mean again
interesting to you really into the uh to the looker event that's happening sort of shortly so
you know this should go out around the same sort of time that the Looker Join is the Looker Join conference is happening and
I was there last year and I was I was impressed with what they were bringing out but I think
back to the point that Tristan made a lot of the features they were bringing out were things that
either I'd seen before or they were enterprise features. I mean Tristan what's your take on
Looker the product at the moment and perhaps kind of maybe what you're looking forward to seeing at the conference being announced, really?
What's your sort of what's your state of the union really for Looker at the moment?
Yeah, Looker still occupies an end of the market that like nobody else does. amazing at allowing analysts to write code that describes the relationships in their database
and then painting a user interface in front of business users that they can use to drag and drop
and create reports and dashboards. And I think that Looker's ability to translate the efforts
of these analysts and business users into functional reports and
SQL that actually performs, in most cases, performs pretty well, that doesn't exist
anywhere else on the market. And I'm a little surprised that it doesn't exist anywhere else
in the market because it seems like such a big problem. But they still continue to be the very best at that
from my perspective.
And that's what we use Looker for.
We don't really use Looker for data actions
or we don't use some of its more sophisticated
permissioning sometimes.
But that core query model the look ml to
explore to dashboard process um that that actually has been very stable since about 2015
um and and i think that most of what has been released since then even looker 5 which i did like
um it was was mostly kind of extensions
on top of this core experience and not real innovations on top of that core experience
okay okay so i mean we'll come back to those points in a second but stewart what's your what's
your take on the state of the world with looker at the moment i mean like me you're you're kind
of you know it's a relatively new thing the last couple years we've been using it and uh there's
been a lot of growth there and so on but you know what what do you think are the key the key kind of
selling points of looker and the things that perhaps you want to see announced at the event
this year so i mean first off we're we're exhibiting at join this year so we're really
so red pill is we're really excited about that so we're going to have quite an involvement while
we're there um so there's there's a you know we have open eyes and in in that. So we're going to have quite an involvement while we're there. So there's,
you know, we have open eyes in that experience and we're really excited to be, you know, truly
invested in the Looker community. As far as, you know, one of the things I wanted to sort of say
while Tristan was speaking is it's interesting that, Mark, the tools we worked on for years were trying to
abstract away SQL. They were trying to make it so that you didn't have to write SQL, or at least
in DBT and Looker's way of thinking, SQL-like template. And while all the big data technologies were moving back to SQL at the same time,
I think it's interesting that at some point in the enterprise tool set, it was decided that SQL
was not, or SQL-like things were not the way to express data. And I think, I haven't seen
anything that expresses data as effectively as SQL.
I mean, from the days of being a database administrator, which is where I started, I mean, SQL was always the way I thought about data.
And it still is the most powerful try to simplify and abstract away SQL as a sort of building block or at least a interface
between tools and moving more toward that direction. We've got a project going on right
now where we're doing data integration in Kafka using ksql. And's just so um enjoyable to think that sql is not a dirty word and i think that
if there's one thing i could say you know sort of the time a theme that kind of ties together
all the different things we're talking about today is that sql is alive and well um it's it's
being promoted as a first class citizen and a lot of the open-source technologies that don't have it.
Apache Beam, Spark already has it, KSQL, and Apache Kafka.
And these tools, the thing that Looker got right is that we don't need to abstract that away into a GUI.
There is an expression language that should be defining our data sets and something very, very SQL-like.
And I think that the lookML and sort of DBT way of expressing SQL in an easy-to-use morsel,
bite-sized morsel kind of from a model to a model at a time is really exciting. So I'm
hoping just to see a whole lot more investment from Looker in that LookML model. I mean, that's
the thing that for us really drives enterprise customers to consider non-traditional, non-Big
Stack software is that LookML model model and that's the thing that
we can take to to bigger customers so on that point and i'll hand this off to tristan in a second so
on that point there's a big there's a lot of uh you know attention and uh and talk last year about
data blocks and analytic blocks and source blocks and this general kind of templating or creating these pre-built or
pre-packaged analytic kind of solutions really in templates how well do you think that's worked
Tristan has that been sort of useful has there been some takeover that really in the market
yeah I agree with the priority that's placed around this there more and more businesses are using SaaS products as their systems of record. All
those SaaS products have identical schemas, or all the users of those SaaS products have identical
schemas to one another. And then when you load the data in via common tools, Stitcher, Fivetran,
or other smaller ones, the data looks the same when it gets to the warehouse.
So it seems like an obvious thing that you want to do.
And we've thought a lot about that problem with DVT as well.
Where the rubber meets the road is that no data are ever the same.
And so you not only need to figure out how to distribute this code to large
numbers of people, but you need to figure out what are the ways in which it makes sense for
those businesses to customize that to fit their own unique environment. But while they customize it,
you have to make sure not to break the upstream link to the core package.
Because one of the main reasons you want to use a package that's been developed by somebody else is that you will inherit the upstream improvements that get made. And if you break that link completely, for example, by copy pasting a
bunch of the logic so that you can then make changes to it and you break that link. And I
think that maybe the ways of distributing the code, I think are great with blocks and the
intention is there, but I don't know that we've like, it evolves into this like sophisticated
package management system where, you know, a thousand plus companies can all collaborate on common data sets together.
I just don't think we're quite there yet.
Okay. Okay. So Stuart and I are quite used to this idea of package code and package solutions and so on.
And Stuart, as the person that introduced you to agile development and Git, you know, I feel like I ask you, what's your opinion on this really?
You know, you must have encountered this and thought about this as well.
What do you think about blocks and what might you do differently
or hope they might announce differently around that sort of
area? I'll let you get away
with that, Mark.
I remember the first time you checked out a Git
repository, you checked it out into your
Dropbox folder.
We'll just let that one go.
All jokes aside... Tell them where my code restored still yeah yeah exactly so I think I mean the
get side of this is obviously you know so important I mean I'm just we're just
now seeing,
and in some cases not seeing,
traditional tools move in that direction,
and it just seems mindless.
I mean, when you talk to the people that are building these big box analytics tools,
they're obviously using Git and committing
and continuous integration,
but they don't think analytics,
this is broad stroke here,
but they don't think analytics is the kind of thing or data integration is the kind of thing that is not a subset of software development, to use Tristan's terminology.
And I very much think it is.
And so one of the first things about Looker that blew me away was its built-in integration we get, and it seems so
easy and obvious. And I think that boilerplate is kind of the term we use for pasting things around
and unnecessarily pasting things around. And I absolutely hope to see more enhancements from Looker in that direction to make inheritance more sort of a first-class development style.
They do have LookML extends, so you can sort of define core models and extend them.
And I think that's a great move in the right direction.
It only allows you to extend on top of something that's been built, but true inheritance allows
you to override. And I think what would be great is to see the extends functionality allow us to
start doing easy overrides of things that exist in sort of core packages. And I think if we see that,
and hopefully we will, when we start to see that, I think it'll be easy to express
a difference between a core model as just sort of an incremental change. And I think that's,
hopefully I'm not putting words in your mouth, Tristan, but I think that's kind of what you're hoping for. Yeah, I think that that's right.
And sometimes it's really very basic stuff.
I think that the ability to just specify configuration parameters for blocks in just variables would be nice.
And the way that Looker frequently deals with state in the liquid layer is a little bit wonky and you can't always do the things that you want with it.
Okay. Okay. So, I mean, as well as obviously the Looker event is in a week's time, but in terms of the kind of the open source market and just generally other innovation you're seeing in the market and analytics, I mean, Tristan, certainly DBT is open source, isn't it?
But what else is there you're seeing from the other vendors and open source projects and that sort of thing as well?
Yeah, I have been tracking a couple of open source BI tools, Metabase and Redash, for several years now. Both of those are making incremental improvements, but neither of them makes me want to encourage clients to stop paying for the proprietary solutions, at least where they are today.
There is GitLab, which is an open source GitHub competitor. Uh, they are doing some interesting things, uh, with a pro a project that they're calling
Meltano, um, where they're essentially trying to, to either build or integrate a solution
within every layer of the stack.
And one of the areas where they're spending most of their time is the BI layer.
And they, um, I don't know exactly when they're planning on like revealing that to the world,
but I've had the opportunity to see a couple of behind the scenes looks at that.
And it is pretty exciting.
So I do think that we are going to continue to see the open source infrastructure
at every layer of the stack continue to move forwards.
Okay.
And Stuart, you've been doing quite a bit of work.
I know it's not open source, but with Google Data Studio.
So what's the current state of the union with Google Data Studio at the moment?
So when you look at something like Data Studio, and obviously Amazon has QuickSight, and you
look at those sorts of things, which I put in a different category than I put Looker,
they're very much about visualizing data, and they expect you to bring
conformed models or at least curated models to the table. I think what's interesting about,
Tristan mentioned Metabase, and obviously Looker fits into this model of going back to this sort of
metadata way of thinking. And I think right now, as open source technologies start to
think about that middle ground, that really, as Tristan's right to say, that at this point,
only Looker has really been trying to fill that gap. As open source technologies start to think
that there needs to be a handoff between data integration, we don't necessarily want data integration as great as DBT and other things are.
We don't necessarily want to have to do all of it there.
We certainly can, and there are lots of projects that do that.
I think if you're going to use Data Studio, you're going to do a lot of your work in BigQuery or Dataflow. And that's what we've seen, or even Dataprep. And that's what we've seen. So I think
when we start thinking about just-in-time analytics, where we can do a certain amount
of the final delivery or final preparation of data in an analytics tool, or at least something like Metabase, which sits close to the analytics tool,
that's great. And then we'll have the, and obviously Looker does a great job at this.
What you really want is the ability for each user or each business or each project to decide
how much they want to push to the data and how much they want to push to the data, and how much they want to push to the analytics tool. And I think as we start to see overlap, overlap's not a bad thing.
I mean, we often think about, well, should we do that in DBT or should we do that in Looker?
And really, when there's overlap, that's just options you have.
And it's great that we start to see some of these tools overlapping between them
because that gives us the ability a to to refactor
in advance but also just choose what's right for us on this particular project okay okay we're
almost out of time now so tristan do you want to just tell us tell people how they can get hold of
dbt and find out more about it and just tell us about any kind of i suppose activities you've got
going on or or whether you'll be at Join in a week's time?
Yeah, we will be at Join as attendees, but we're also having a pregame event on the evening of
registration where everyone's going to hang out, talk about their usage of dbt, and then head over
to registration altogether. You can find out about dbt online at getdbt.com. And probably the
first place that you go from there should be the big red button that says sign up for Slack,
because we've got about 800 people in Slack today. And I really love hanging out with that community.
They have such smart things to say, and they're really helpful and nice.
Okay. And Stuart, just remind us about your exhibiting now, I think, at Join, and any things that you're doing as well there.
Yeah, we're exhibiting.
I think we're going to have about four or five of us from Red Pill there, myself included.
So we're really excited about that.
And the DBT channel on Slack is great, Tristan.
You know, it's almost table stakes today to have a Slack channel,
and I spend so much time in Slack trying to get help from these open source slash enterprise customers
or enterprise vendors that are both a company and an open source, and I love that model.
Quick, really fast answers to questions.
Also, I'll be speaking at Oracle Code this year. So let me throw a little plug in there. I'll be building a
machine learning model on taxi fare prediction. I'm doing that with a gentleman called Bjorn
Roost from Pythian. So anybody that's out there that's going to be going to Oracle Code,
check me out there. Excellent. That's good. Brilliant.
Well, it's been great speaking to you both.
And thank you very much for coming on the show.
And hopefully I'll see both of you at Join in a couple of weeks' time.
Thanks so much. Talk to you soon. Thank you.