Drill to Detail - Drill to Detail Ep.23 ‘Looker, BigQuery and Analytics on Big Data’ With Special Guest Daniel Mintz
Episode Date: March 28, 2017Mark Rittman is joined by Daniel Mintz from Looker to talk about BI and analytics on Google BigQuery, data modelling on the new generation of cloud-based distributed-data warehousing platforms, and Lo...oker's re-introduction of semantic models to big data analytics developers
Transcript
Discussion (0)
So hello and welcome to Jewels of Detail and I'm your host Mark Rittman. So my guest on this week's
show is someone who you may have seen present at the recent Google Next event in San Francisco
alongside one of the first guests we had on the show back in October last year, Dan McCleary. So the guest this
week is Daniel Mintz, who is the Chief Data Evangelist at Looker. And I'm very pleased
to welcome him on to drill to detail to talk about Looker, data modeling on Google BigQuery,
and where BI tools are going in this new world of cloud and big data. So Daniel, welcome
and thank you for coming on the show and maybe just introduce yourself properly to the audience.
Yeah, my pleasure. Thanks for coming on the show and maybe just introduce yourself properly to the audience. Yeah, my pleasure.
Thanks for having me, Mark.
So yeah, I mean, I think maybe I'll start with the funny story of how I came to Looker
because I actually was a Looker customer for a long time before I was a Looker staff member,
a Looker employee.
And my previous role before I came to Looker was,
uh, working at the media company Upworthy. And when I came on, they were about a year into their
history. They were just starting to get serious about data. Um, you know, and they didn't have
really a data stack to speak of. This was in early 2013. And so big queer, uh, excuse me,
Redshift, Amazon Redshift had just come out of beta and was
first really available to the public.
I kind of looked at our options and quickly settled on that as our data warehouse.
But then the next question was, well, what's going to sit on top of our data warehouse?
What's our BI tool going to be so that people can actually access the data?
Obviously, you know, me and my team, we were comfortable enough using a SQL runner, but most of the folks at Upworthy weren't going to do that. So, you know, I was
looking at different BI tools, and I think a co-investor of ours at Upworthy and Lookers said,
you know, oh, if you're looking at BI tools, you should check out this little startup out of Santa
Cruz that I've invested in called Looker.
I said, sure. So I hopped on the phone with Lloyd Tabb, who's Looker's co-founder, and Kenan Rice, who was the only business guy at the company at that point. And we got on the call.
Kenan realized pretty quickly, as a good, smart business guy guy that Lloyd and I spoke the same language. And so he
should really get out of the way. And, and so Lloyd, you know, and I started chatting and
Lloyd said, well, you know, what Looker is, is we're a platform, we sit on top of,
of your database, you tell Looker what your data means, you define, you know, the relationships
between your tables and the business logic that you have in LookML, which is our data
modeling language, our sort of abstraction of SQL, and then Looker will write the SQL.
And so then you can expose to the business user just a sort of drag and drop interface
where they can ask questions of the data themselves without having to rely on an analyst
to write SQL for them. And I, being the sort of skeptical analyst
that I am, was less than convinced that this was a real thing. Because I, you know, I will say I
kind of imagined that that kind of thing had existed, but I'd never really seen it in action.
And yet within 10 minutes, Lloyd, you know, had connected Looker to our Redshift cluster
and was kind of saying, ask me anything. And so we were,
you know, I was asking him questions and he was answering them and showing me things in my data
that I had never seen before, which was a pretty weird experience for someone I had just met to be
finding things in our data that I didn't know about. And so, you know, I think that experience of seeing a tool that could really
sort of write whatever SQL I needed it to and do so in a performant, clean way that gave everybody
in our company the ability to ask questions of the data and get sort of reliable answers pretty
quickly was pretty amazing. And so over the next three years, the tool really grew.
I will say that when we first signed up for Looker, I think we were customer number 21.
And the tool, the promise was there. The tool was not quite where it needed to be yet. There were
no dashboards. There were very basic visualizations, no ability to sort of do calculations on data
results that had come back from the database already. So you couldn't, you know, add column
A to column B to get column C, everything had to be done in SQL. And so that was, you know,
that over the success of three years, the tool really got better and stayed on the trajectory to live up to the promise that sort of Lloyd had laid out in that very first meeting.
And so when I was starting to think about what my next move would be, what you know, as I was getting ready to leave Upworthy, I was talking to Lloyd and he said, you know, well, what do you want to do?
And I said, I'm, you know, I'm not sure.
And he said, well, when you figure it out, tell me. And I said, well, I do miss writing and thinking and talking about how to do data andism like you want to evangelize for data
and i said yeah i guess that's a reasonable uh you know summation of what i want to do and he said
well we need an evangelist why don't you come be the evangelist at looker um and so so here i am
now a year later fantastic fantastic and and so i mean just as a as a kind of context i mean i i'm
using looker day in day out at the moment the. The company I'm doing some work with at the moment is using Looker to build a BI, I suppose, platform or tool on top of kind of Google BigQuery.
And it's an interesting kind of exercise to go through because a lot of the, and we'll get onto this later on in the conversation,
but a lot of the concepts that you're talking about with looker are fairly kind
of like you know established concepts in the world of suppose enterprise bi and crm and so on you
know writing sql against a kind of a model and then generating sql against that and abstraction
layers and so on but that's something that's very kind of uh up till now fairly unknown in the kind
of big data world because of various i guess kind of like differences in that world compared to say
working with small data sets on in in, I suppose, kind of, you know, structured databases.
Just tell us a bit about the foundation story of Looker and because what was the kind of thinking
behind Looker as a company and the products? And in this new market, what problem is it trying to
solve and for who really? Yeah mean i think you know when we
look back at the history of bi you know i think you as you said you this isn't brand new a brand
new idea uh by any means and when you look you know way back to the 80s and you look at folks
like cognos and business objects and microstrategy you know they had this idea. It was there. Now, obviously, the underlying database technology back then was not where it is today, to say
the very least, right?
And so, you know, like, first of all, the data volumes that they were dealing with were
far, far smaller, orders of magnitude smaller, partially just because rather than, you know,
machines generating the data, you actually had people generating the data. So there just wasn't that much of it. But even so, you know,
what you saw was as data volumes grew, the appliances couldn't keep up. And so, you know,
you did cubing, you did summarization, you did anything that you needed to to get the data small
enough so that you could access it and get your answers. But that idea of a sort of rigid model
was really core to that sort of first wave of BI, right?
This idea that IT is going to own the tool.
They're the ones who are going to own the definitions of the data,
the business logic.
And so that's great because it means when you ask a question of your tool,
you know that the answer is right.
It's been vetted.
It's not up for grabs. And, you know, and so that was where people were at. But over time,
I think, you know, the rigidity, people got frustrated with that rigidity, right? They got
frustrated with the summarization, frustrated with the cubing, frustrated with the inability to sort
of be agile and get answers to new questions. And so you see the rise
of the second wave tools, which, you know, are mostly visual discovery tools, tools like Hello
and Click, you know, where the idea is more around self-service. And I think when we look at the
choices that they made, I would say we agree with a lot of them. The idea of self-service is great.
The idea of, you know, getting rid of cubes, getting rid of summarization, keeping the sort of
higher resolution in the data, we totally agree with that.
But I think they maybe threw the baby out with the bathwater.
And so throwing out that model, I think, was a real mistake.
Throwing out a shared understanding of what the data means makes it much harder to get real business value from your data. And I don't think that it was a sort of dumb decision that they made. I think they were really constrained by the technology of their time. that were mostly running on these, you know, faster laptops and desktops that people had at
the time, but not were sort of disconnected from the warehouse because the warehouse certainly
couldn't keep up with the demands of, of an agile workflow. Um, and so, you know, because you lost
your connection to the mothership there, you lost this model that, that told you what the data meant.
Um, and so that, you know, sort of self-service
workbook BI, you know, had real advantages in terms of speed to, you know, to getting answers,
but it also exposed you to this possibility that you were getting an answer, you were getting it
fast, but it wasn't the right answer. And so with the advent of this new wave of tools, you know, I think really starting with
Redshift, frankly, not that there weren't, you know, amazing MPP databases before then, but they
were mostly on premise. They were very expensive, you know, and then you see Redshift come out as a
cloud MPP warehouse column or data store, And that really changes the game. And then
you see, you know, BigQuery, which had been Google's internal tool for a long time, they
repurpose it as a product that can be sold to the market. You see, you know, Vertica be available in
the cloud, you see a big wave of SQL on Hadoop tools that are cloud-based. You see Snowflake. So
you just see this huge wave of really, really fast, cheap data warehouses. And all of a sudden,
that makes it possible to keep the agility that you wanted from that second wave tool,
that self-service idea, but also keep a model that had come from the
first wave tool. And so you kind of get the best of both worlds. And that's really where Looker is
aiming. We think of it as a third wave where you've got all the agility, all the self-service
that you had with the sort of desktop workbook-based tools, but also the shared understanding
of the data and the trustworthiness that comes with that,
the reliability that comes with that, that you had with the old school BI tools.
Yeah. So, I mean, within the world that I'm working at the moment,
Looker is the kind of the hot new BI tool, is the hot new kind of startup.
But Looker might not be so well known to, I suppose,
people who are more used to using kind of Oracle and Cognos and so on.
Who's the typical kind of Oracle and Cognos and so on. Who are they? Who's the
typical kind of like customer for Looker? And who are the typical kind of developers and users that
are working with the tool? Yeah, I mean, we so we're in terms of companies that are using us,
we're up over 800. And, you know, I think, as you said, we are first the first place that we
slotted in that people really got Looker very quickly was in the sort of venture-backed tech Silicon Valley group. Not hugely surprisingly, lots of people speak SQL there. They don't have big million-dollar investments in existing stacks that there's a sort of sunk cost there. But more and more we're seeing larger companies, enterprise companies, who have these longstanding tools in place.
They're both starting to transition their on-premise data warehouses to the cloud and also starting to consider new, more flexible tools.
And we generally don't see them saying, oh, well, we've, $5 million on this BI tool over the last 20 years.
We're going to throw it out and start fresh. That is generally not the way that these things go.
It is much more frequently that they, you know, have a new project that they're working on and
they want more flexible data access. And so they, rather than continuing to sort of add to the rigid
architecture that they had, they start with a new implementation of looker
and then you know another team inside the company sees that and says well why why can they ask any
question that they want and get an answer and you know in seconds that i want that um and so then
you know more they sort of evangelize it internally and more teams start to pick it up yeah i mean
certainly the experience we had in the place i'm at now um
was that internally there was use of of tableau for example and so tableau connects to bigquery as you're probably aware you know and it's you can create some pretty nice kind of data
visualizations in there and that was typically that was typically the tool that was given to
uh i suppose people working um as kind of strategists or analysts or whatever but the
limitation with that with that was was a few things so it was very much kind of strategists or analysts or whatever. But the limitation with that was a few things.
So it was very much kind of silo-based.
And so people would build a report based on a particular topic area.
And then typically, they'd need to write a custom piece of SQL
to populate the Tableau extract.
Because really, and this is the thing that one of the engineers I wrote for Looker
was that you could then actually...
Looker was very, very good at retrieving the data live from, say, BigQuery, whereas Tableau was always really going via a cache, you know,
and certainly in that respect as a BI tool, we found that kind of Looker was a better SQL engine,
for example, than Tableau. But the thing that really kind of made the impact with people I've
seen use Looker is this kind of, is the use of the model, is the use of this kind of semantic model
where you can just explore your data and there's no kind of like having to join data and so on
there but do you find that's a concept that's sometimes quite hard to explain to people to
engineers particularly i mean it's almost like a sort of stockholm syndrome isn't there with a lot
of engineers where you're used to you're used to kind of data analysis being hard to try and
explain to them it can be easy it must be hard sometimes yeah i mean i
um i i would say the people who uh i i tend to think of our users in sort of three big buckets
you know there's the data analysts who um who speak sql uh who have had ready access to the
data who know the power of being able to answer their own questions with data. And for them, Looker, I would say the biggest advantage of Looker for them is that they
spend a lot less time writing one-off queries for their colleagues, right?
They, rather than spending, you know, they send, someone comes with a request and then
they come back a day later and they say, so I know you wrote me that query, but it turns
out I need six months of data
and i need it broken down by state not by country it's the classic thing isn't it in bmf every
question you have has another question after it isn't it really for sure and so you know that's
just as an analyst myself that's a that's really not a fun use of your time um it just feels like
a never there's a never-ending queue it's's always growing. It is, you know, and it just prevents you from doing higher value analysis because you're
just stuck answering one-off questions.
And so the analysts like Looker for that reason.
But I would say that the people for whom Looker is most sort of life-changing are the people
who are very analytically minded.
You know, there may be an Excel wizard or a Tableau desktop wizard,
but they've always relied on IT or the analyst organization to do the data extraction for them
because they don't speak SQL, or maybe they speak just enough SQL to get themselves in trouble,
but not enough to get themselves out of trouble. And so all of a sudden they go from having to
wait for each new workbook to be built so that they can explore in it,
to being able to access the data directly, right? And that is a radical change for them.
You know, they all of a sudden, the sort of speed with which they can iterate goes from, you know,
one iteration per day to one iteration per minute, you know, they come up with a new question based
on the answer. And, and then they, that gives them an. You know, they come up with a new question based on the answer and then that gives them an answer and then they come up with a new question and
they get the answer to that all instantly. And that's just a huge change. And these are the
people who really already know how to use good data to make better decisions. And so that's a
huge boost to the enterprise as a whole because all of a sudden the people who are analytically
minded have access to the data.
And then I think the third group are the sort of more data consumers, the people who need
their reports, need their dashboards, but they kind of rely on those and they're not
going to do a ton of exploration.
But what we do see with those folks is more and more of them get the dashboard, the Looker
dashboard in their email.
It arrives on Monday morning and they say, huh, that's weird.
I wonder why sales were down on Friday.
And rather than having to go to an analyst to get an answer to that question, they instead can just click on the dashboard.
And that drills into and brings them to a place where they can start exploring.
And whereas, you know, a sort of blank template would be pretty intimidating to them. You know, the thing about Looker is, is that it brings them
right into a context that already exists. Someone has already built an exploration and, and answered
that first question of sales were down on Friday. And they say, well, maybe I'll, I'll look by
country to see if maybe there was a holiday in one country and that's why they were down. So,
you know, all they have to do is click one thing to add that new dimension and all of a sudden they've got more
information and they can see not just what happened but maybe why it happened yeah absolutely i mean
the way i describe it is about two things it's about reducing if you're if you're an analyst
and you're trying to understand data and explore it you need we're looking to reduce the friction
involved in that process and certainly certainly by making it everything pre-joined a semantic model um and so on it means
that they can follow the train of thought much easier really and explore the data and understand
it and that that group you described you know what i would call kind of strategists really they are
the ones i think that really benefit out of this um in the past their job has often been like that
chess by post in some respects where you'd sort of like write a move down and post it off and a week later you'd get an answer back
the difference has been been astounding really for people working with this just being able to
kind of like to just explore their data and just kind of i suppose make data-driven decisions
whereas in the past it was it was hard to do that really so um so let's kind of look a little bit
into so so for all listeners on this show, certainly those that come from a more traditional background,
what you're describing is something very familiar.
But there's certain ways that I think Looker have approached this kind of problem that's different to before.
And I think one of the ways that's interesting, and it would be good for you to talk us through it,
is this concept of LookML and the way that the kind of modeling works.
Because as you rightly said, this has been done before, but it's been done in a way that was not all that agile and and probably was a bit out of
step with how things are developed now so maybe tell us a bit about what look ml is what is the
problem it was solving again and paint a picture a little bit of what it does and and kind of the
approach that you you've got with that really yeah i mean i you know i talked a bunch about this at
google next but i think when think when you look at the development
of programming languages, you know, there's been enormous progress over the last, you know, really
since the beginning of programming languages. But you look at like C, right, which is still runs the
world fundamentally, you know, 98% of programs eventually compile down to C and then from there,
right? Or I don't know exactly what the stat is, but, you know, fundamentally% of programs eventually compile down to C and then from there, right? Or I don't know
exactly what the stat is, but, you know, fundamentally, it is still there at the base.
But most people most of the time are not writing in C. And that's because they mostly don't care
about the low level concerns that C makes very easy to control. So things like garbage collection,
you know, most of the time, you're okay with the higher level programming
languages worrying about memory management, you know. And so folks have invented these higher
level languages, these abstraction layers that allow people to not worry about the little stuff
so that they can worry about the bigger questions, which is how do I quickly, you know,
prototype an application? And that would be
a lot faster in Python or Rails than it's going to be in C. And so, you know, when you look at
the parallel evolution on the data side, you know, you see SQL comes about in the same decade as C
and still really does fundamentally run the world in terms of how people access data. But the amount of advancement that has been made
in SQL is pretty minimal, right? You know, you can look at T-SQL comes out in the late 80s,
and then MySQL and Postgres in the mid 90s. And, you know, you get a new version of Postgres or
MySQL every so often, and not much more than that. And so we're still at the same level of abstraction,
right? We're still worrying about, oh, which dialect am I in? How do I, does this one, you
know, use sub queries? Or does it use common table expressions? How does this one do date handling,
you know, all of these little things that you just really shouldn't have to worry about.
Computers are great at worrying about those things, and humans have bigger things to worry
about. And so what LookML is, is it's an abstraction layer on top of SQL. It still maintains all of the power and the versatility, the provenness of SQL, but it allows you not to worry about things like, oh, what is the foreign key, primary key relationship for those two tables. I will define that once. And then anytime
that I want to join those tables, I'll let the computer know that I have defined that join and
it will figure out what the correct join logic is. Right. Um, and so that, you know, look,
ML is a way to define all of that logic, all that business logic once, and then let looker
write, you know, performant clean sequel, uh, on top of that. And one of the amazing things,
I was just looking at this, but Looker now, I believe with the addition of Google Cloud Spanner,
which they just released two weeks ago, I believe we now support 33 different dialects of SQL.
And the thing about LookML is because it's an abstraction of SQL, it doesn't care what the
underlying dialect is.
And so we've actually had customers who have built these whole models in LookML, have been on one database, been using one dialect, and then decided that they wanted to switch databases and switch dialects.
And that process, which normally would mean rewriting every single line of SQL, is seamless because they just point Looker at the new connection and Looker says, oh, I speak that dialect too.
So when I compile my LookML down to SQL, rather than compiling it to SQL dialect A, I will compile to SQL dialect B.
And so what, you know, I think my experience with SQL is that there are a bunch of problems with it.
You know, it's not reusable. I, I frankly can't
even read, um, sequel that I've written, you know, a week or two ago, rather than, than trying to
understand that, that giant block of sequel that I wrote, I just start from scratch cause it's
easier. Um, and you know, in reading somebody else's sequel in their own style with their own,
you know, syntactic conventions is, is basically impossible. So, um, you know, syntactic conventions is basically impossible. So, you know, I don't,
it's not collaborative, it's not reusable. It's not version controlled, because it's not organized
the way that, you know, normal programming languages are. And then, you know, it just
makes you deal with all these low level things. You know, when I was at the first company where
I was learning SQL, there was an order table.
And to make sure that you excluded all the failed credit card transactions, you had to say order
status equals completed. And you had to do that on every query that you ran against the orders table.
Even though 95% of the time when you ran a query against that table, you didn't want the failed
credit card transactions, right?
But as a default, if you wanted to exclude them,
you had to remember that where clause.
And if you didn't, you would all of a sudden think,
wow, sales were through the roof, but they weren't.
You just were including all these failed transactions.
And so, you know, even for somebody who is very good at writing SQL,
it becomes very easy to make stupid mistakes in SQL. And so LookML says,
oh, when I explore the orders table, by default, apply the order status equals completed filter.
And I can remove it if I want. But since I know 95% of the time I'm going to want it,
let's apply that by default, right? And it doesn't make you reshape the underlying data,
which is hard. And once your model and your data get intermingled, it gets very difficult to be agile because every change requires
reprocessing all your data, reformatting your schemas, all of that. But, you know,
by keeping the modeling layer agile and separate from your data, I can say, you know,
order status equals completed. And then, you know, we change something in the application and now
it's order status equals complete. But I can make that change in the model. And as soon as I do, it propagates
out. And now every query that runs against my, my database, when it hits the orders table, it knows
now that it's called order status equals complete instead of completed or whatever it is, right. And
so that, that agility is incredibly useful, because today's businesses, you know, the structure of
their data is just changing all the time.
Yeah, I mean, so it was interesting, again, coming from a history of using tools with these kind of semantic models, abstraction layers and query generation engines.
It was interesting that Looker took the approach of, I suppose, making SQL more agile and more abstracted rather than, say, building a graphical interface and saying, look, you know, you just define things like dimensions and metrics and kind of fact tables
and so on. Was there a specific sort of decision to go down the route of sort of like a markup
type language as opposed to using graphical tool for this?
For sure. That was not an accident. You know, Lloyd Tabb, the founder who I was talking about,
you know, he actually he's been around for a while. His, um, he was a, uh, database and languages architect back at Borland. Uh, he was the,
the architect on Netscape Navigator Gold. Um, so he's, he's seen a fair amount and,
you know, his experience with, uh, GUI based languages that, that allow you to sort of,
you know, um, to define relationships and things like that is that they get
very unwieldy very quickly. Um, you know, and, and and things like that is that they get very unwieldy very quickly.
You know, and that's been my experience, too, with all kinds of graphical, you know, languages,
you know, for the same reason that we don't use cuneiform, that we use actual words, right?
Languages are great. They are really good at communicating things. And so, you know, in different kinds of
kinds of languages, excel at different things. And, you know, so natural language is great,
but it's not extremely precise, I would say, whereas computer languages, programming languages
are extremely precise, because they need to be in, you know, an analysis, analytics is really
a precise discipline. And so, because SQL, I think part of the reason that that analysts love SQL so much is because it it is structured the way that they think about problems. It forces them to think about the analytic problems they're trying to solve in the right way. And so we didn't want to reinvent that wheel. You know, like I said, SQL has been around for 40 years. It's everywhere. And I think even in the sort of last 10 years
where you saw Hadoop start to make a push against SQL,
I think you've pretty quickly seen SQL push back successfully.
You've seen all of the columnar MPP data stores
which use some version of SQL,
and even then all the Hadoop dialects
which actually are building SQL on Hadoop structures,
whether it's Spark SQL or Impala or Presto or Hive, you know, you see that the Hadoop
communities have realized that people really do want to write SQL when they're doing analytics.
And so, you know, given that, it's just why reinvent that wheel, right?
The same way that the programmers didn't
invent whole new languages that that went around C they invented languages that built on top of C
yeah so another thing that what's been quite useful for what we're doing is is certainly within
say the e-commerce space and and there's a lot of yeah what we call in in in kind of database world
things like many to many relationships and lots of opportunities for things like double counting of aggregates and all this kind of stuff. And one of the things
that we found LookML particularly good for is dealing with those things like double counting
and the way that you do aggregation within these kind of data sets and so on. Is that something,
again, that it was deliberate in there? Is that something you build in, particularly for this
kind of market and the type of data you're working with yeah i mean so i think you're referring specifically to something
that we call symmetric aggregates yes which is you know it it when you fan out you know the classic
example is like uh you know uh users and orders right you know a user can have multiple orders
and so if you just count user the users table you get the correct answer but if you count
users and you've joined the orders table now it's going to tell you that there are a lot more users
than there actually are. And so you have to do count distinct user ID or whatever it is, right?
And what we found with LookML and particularly with its sort of reusability paradigm is that
you can actually achieve things with SQL, which you normally wouldn't think were achievable, right? Because SQL is, it has been a write-only language, right? Something that you write, but you never go back and read or elaborate on. I think we've underestimated what it's capable of. And so, you know, particularly around symmetric aggregates, we actually use a function, which I never knew was part of the ANSI SQL standard, but is called sum distinct. You know, everybody's familiar with count distinct,
but it turns out there's actually average distinct and some distinct as part of the ANSI SQL standard.
And so if you know what the primary key is, you can avoid summing its value more than once or
averaging its value more than once by looking at that that
primary key and so this is something that you would never ever want to write if you were writing
sql by hand but when look at mel is doing it for you when looker is doing the sql writing for you
it becomes very simple um and these things that that look pretty complicated um actually are not
particularly um computationally. And so the databases
can achieve them quite easily. So symmetric aggregates is a great example, but there's
lots of other things where you can define really complicated logic, like case statements, for
example, nested case statements, where again, you never want to try to keep track of those in raw
SQL. But in XML, where you can say, here's a set of case statements that defines this one dimension.
But then I want to define another dimension, and I want to use the case that I've already defined as part of that definition.
And I don't have to rewrite that whole thing.
I can just reference it.
It makes it much easier to build these complex things in SQL
because you're not the one writing the SQL. And so I think that idea of reusability, which is
really core to programming, the DRY is the acronym that they use, don't repeat yourself. You do
something once, you define it in one place, and then you reuse it rather than redefining it.
And that has a ton of advantages because it allows you to build more complex structures.
It also means that when you change something, which is certainly happens a lot in the data
world, you change it in one place, and it propagates out into all the other places,
you don't have to then go track down every occurrence of some variable, and change it in
all those different places, which is a pretty common occurrence when you're looking at a bunch of, you know,
SQL recipes and you go, all right, I have to search this whole directory for every occurrence
of this field because that field has now changed its meaning. And so I have to go replace all of
those, find and replace all of them. Yeah. And I'd like to get on in a moment to talk about,
I suppose, specifics of working on BigQuery and how we interact with the database there and so on.
But one of the other things that was particularly a plus point for you guys was the fact that all your, I suppose, metadata is stored in GitHub or certainly in GitHub repos and that sort of thing.
That's an immediate kind of plus with any kind of engineering department now.
I mean, again, just tell us about how that works and the use of things like kind of GitHub and your kind of looker blocks and that sort of thing.
How does that kind of github and your kind of looker blocks and that sort of thing how does that kind of work yeah i mean so so get uh github or it's actually
and then we generally use github but you can use whichever git versioning uh repository you want
but you know that that's been built into our our platform since the very beginning because again
something that engineers have figured out a while ago and data analysts have not.
And so again, rather than reinventing the wheel, we said, yeah, that's a great idea.
It would be very useful to be able to roll back to an earlier state or understand who
changed this definition at what point in the past and why.
And that's all things that are deeply integrated into developer workflows, but haven't been
part of analyst workflows. And so because LookML is code, but very simple code, it's code that really
makes sense to SQL writers, you know, and not a GUI, it's very easy for us to build in versioning.
And so it's been there since the beginning. Whereas, you know, if you're looking at a GUI
based system, it's much harder to understand how you would version that in that context. So, you know, I think that that goes to
this, this core idea of let's not reinvent the wheel, you know, same thing with our visualizations,
you know, where Looker runs entirely in your browser, it's web native. And so again, we could
have written a library to do our visualizations from scratch. But, you know, D3 is amazing and gorgeous and incredibly flexible.
And there's a huge community out there around that.
So why wouldn't we make use of visualizations that already exist?
And so that gives us, you know, dynamic, interactive visualizations, you know, that are JavaScript that run right in the browser really beautifully.
Fantastic. So, again, the last time I saw you, I suppose, in the flesh, I suppose,
recently was when you presented LogSign Dan at the Google Next event.
And the two of you did a very good presentation on,
it was entitled about data modeling on BigQuery and so on.
And something that was interesting on there that I'd like to talk to you about
is what is analytics and BI like on a platform like Google BigQuery you know how how yeah how does
how do the table structures look like how does how do you form kind of queries
but particularly things like joining and indexes and stuff that we would take for
granted in a more traditional database that's different really isn't it and Dan
opened it quite nicely I think you talked about it as well saying that
let's be aware that we're working with a distributed data store here and it it's different to a relational database. I mean, tell us a bit about that. What do we have
to be aware of when we're doing BI on top of kind of a platform like BigQuery?
Yeah, I mean, so I think for anybody who hasn't experienced BigQuery, you should go try it out.
I think your first terabyte of processing per month is free. So there's no reason not to just
go check it out.
I mean, I have to say, across all of the cloud data warehouses, that is one of the things that I love, that you don't have to go buy an appliance.
You just log in, and they give you some free credits, and you can play around.
And my experience with them is they pretty uniformly, coming from the world of MySQL and Postgres, they blow my hair back.
I just can't believe how fast
they are. But I think BigQuery particularly, it is so simple in terms of the DevOps requirements
that you stop worrying about things that you're just used to worrying about. Indexing is a great
example. There's no concept of indexing in BigQuery. That's just not something that you
have to worry about. They worry about it. Compression is another thing. They were,
let Google handle your compression. It's not something that you have to worry about. And in
fact, it's not something you even can worry about. They handle it for you. Um, and so I think,
you know, the Dan's point, um, and, and, you know, I talked about this from the BI side,
he talked about it from the sort of, um, the schema side and the database side, and, and, you know, I talked about this from the BI side, he talked about it from the sort of the schema side and the database side. But, but I think our sort of joint point was that when
you're dealing with this whole new environment, where a lot of the constraints that have
historically attached to, to the ways that you do things are gone, when those are gone, you really
need to go back to first principles and ask yourself,
why am I doing this? Rather than just assuming that, well, I'm going to do it this way because
it's always been done this way. Right. And so, you know, I think if your if your goals are around,
you know, simplicity and accurately representing the real world in your data and making it easy for other people to
understand your schema, if you're keeping those first principles in mind, rather than saying,
oh, well, we do snowflake schemas because we always do snowflake schemas. Oh, we do third
normal form because that is how we have always done it, right? That is how we've always done it,
but we've always done it that way for a reason, which is those first principles are the reasons,
right? And because of the limitations
of the hardware, we had to do it. That was the only way you could do that. But that is not the
only way you can do it now. And so you really do need to go back and examine those first principles
and understand where they came from, understand a little bit about the history. And frankly,
those are questions we get all the time from customers, people who are coming from, you know, a big heavyweight Oracle data
warehouse that they've had for, for 20 years on premise. And they say, all right, we're ready to
move to BigQuery, we're ready to move to the cloud. What do we do? Like, we, we don't, how do
we structure our data? What's the right way to structure our data? I mean, I think I will say that one of the great things about about BigQuery is that it's very forgiving. It is so powerful
that you can make a lot of little mistakes and they really don't hurt you. You know, the query
takes 2.1 seconds to return instead of two seconds to return. So, you know, I think you do start to
get up to data volumes where you want to be thoughtful about how you model your data, how you structure it. And I think there are a lot of neat things that BigQuery makes possible to improve that. But, you know, it is for certainly if you're talking about, you know, megabytes or gigabytes of data, you know, millions of rows instead of billions of rows the the penalty is very very small so you you really do have a very forgiving database whereas
in the past if you made those kinds of mistakes you'd be you'd be spinning for 24 hours and never
get a result so do you typically find that there's in a project that would involve big query and
looker is there going to be a in quotes kind of optimization or performance optimization phase in
those projects i mean certainly my finding has been it just works the whole thing broadly just works and you know you
i guess in big query you're trying to avoid joins you you know there's no indexes but that's because
fast anyway is that i mean in your experience the projects typically work okay performance wise or
is there a process that goes on to optimize the kind of the big query table structures and so on
what's your experience on that yeah i mean i think it it really depends on the kind of the BigQuery table structures and so on. What's your experience on that? Yeah, I mean, I think it really depends on the scale of the data that the customers are
dealing with. So if you're dealing with, you know, if you're coming out of a Postgres or MySQL or
SQL server environment, the reality is your data can't possibly be big enough to challenge BigQuery,
right? BigQuery is going to laugh at your data scale and make quick work of it no matter how
badly you model it.
If, on the other hand, you're coming out of a Hadoop world where you had a giant data lake and you're trying to make that perform it, then you start really one of the keys to our architecture, right? Because we're architected without our own proprietary analytic engine,
we are reliant on the underlying data warehouses.
And so as a result, we have to work really, really well with them, right?
We have to take full advantage of all of their tricks and all their power.
And so, you know, when BigQuery rolled out standard SQL to
replace their legacy SQL over last summer, that was immediately in our next release, you know,
a couple weeks later, when they rolled out an integration with Google Sheets, where you can
use a Google spreadsheet as if it were a table in BigQuery. So you could have a billion row table
in BigQuery and have a 50 row mapping table in
Google Sheets and join those two together as if they both lived in your data warehouse.
Next release, you know, that was out in a couple of weeks. You know, support for nested tables,
which I talked about a bunch at Google Next. Similarly, they just rolled out for BigQuery,
a hyper log log implementation, which allows you to do things like count distinct on aggregates and, you know, really importantly allows you to sort of keep the
intermediate state of those aggregates so that you can take advantage of that. So you can initialize
a sort of hyperlog log calculation, keep a running log of that. So for example, you know,
we, we might use that on IP addresses in our internal logs, right? That's a thing, you know,
we log something about, you know, where a user logs in from, we don't actually need the underlying
information, but we do want to know how many different, you know, addresses are being used.
And so we can keep a running tally of that or running count of this distincts
using this intermediate calculation that BigQuery exposes to you. And so things like that are really
powerful ways to, you know, to sort of improve the performance of BigQuery. And I will say with
BigQuery, the funny thing is that it really, you have to work very hard to get BigQuery to go slow.
But BigQuery can get expensive because BigQuery charges on a query basis. So it depends on how to take full advantage of the performance optimizations.
And then also, you know, have built things into the tool where Looker will use BigQuery's API to tell you before you run the query, how much is it going to cost?
How many gigabytes are you going to scan? And you can even then set, you know, a limit so that people can't accidentally run a runaway query that's going to cost $100 or $1,000.
You can do that all right in Looker's Connection.
And, you know, I think that speaks to our sort of really deep native integration with these tools that we want to take full advantage of those things.
It's not, this is not a sort of afterthought.
It's not that we were architected
for one world and now we're trying to figure out how to exist in the new world. This is really
goes deeply to the way that we were architected from the very beginning. And I'd say that that
is not to be underestimated. I mean, certainly again, my experience with tools that were from
the kind of more, I suppose, on-premise small data set world. So for example, Tableau or
Power BI or any of those tools really that came along with kind of BigQuery support.
The issue with those is they're either based on having a kind of cache layer and that really
isn't, I mean, I know you've got obviously persistent tables in kind of Looker, but
in general, in that kind of environment, particularly BigQuery, you can't be relying
on having a cache there. Well, I mean, why bother?
Why bother scaling up and going to the cloud
and using this incredibly powerful warehouse
and then use it to pull an extract out
and put it on your local server?
It's not practical.
Another thing really is,
and that's a good point you made there
about being mindful of the way these things are charged.
So if you look at, again, you look at, for example, how certain BI tools work,
they will go and retrieve the entire data set, first of all,
and then they will kind of lay it out for you to select the columns.
Whereas BigQuery, because of the way, obviously, that it charges,
it charges per amount of data you bring back and so on.
The thing that clinched it for us with Looker was the fact that it was much more economical
in how it retrieved data.
It was much more cognizant, really, of how kind of BigQuery works. And I'm saying BigQuery
all the time here, but this is kind of a similar story to all of these new style columnar kind of,
you know, hybrid big data sort of systems that run in the cloud now. You've got to be aware of how
they charge for things and work along with that. Yeah, we're very cognizant. We only join a table
when you actually need a dimension from that table, right?
We're very cognizant.
And it results in SQL that you can,
you know, the SQL that Looker's writing
is always available right behind a tab.
You just one click and you can see exactly
what the SQL that Looker's writing in,
which is great because you can see,
you can debug, right?
If you're an analyst and someone says, hey, I don't think that this is quite right, you
can look at the SQL immediately and say, oh, you're right.
You know what?
We made a little mistake in the model.
Let's go fix that.
You fix that.
You push it to production.
They hit refresh on their browser and it works properly.
And so we're not, we do not believe in black boxes.
We think black boxes and true analysis, real valuable
analysis don't get along very well together. And so we want to be really transparent about what
we're doing. And that means that we have to be clean. You know, sunlight is the best disinfectant.
And so we are really, we want to be clear and transparent about what we're doing. And so we
make it very visible so that folks can trust what we're doing. Okay. So to wrap up then, I mean, just tell the listeners, how would they
get some exposure to Looker? How would they maybe get hold of a trial or how would they kind of
understand some of these concepts around LookML? What's the next step if somebody's interested and
they want to kind of take this further and understand a bit more really? Yeah. I mean,
we'd love to talk to them. They can come to Looker.com
and sign up for a demonstration or a free trial.
We give a free 30-day trial
where we'll work with you to set up a Looker instance
right on your database
so that you can explore your own data
and see how powerful it is.
Our documentation is fully available on the web publicly,
so you can go sort of check out LookML
if you're really interested in the concepts there.
But I really do think,
Lloyd actually said this to me
in our very first conversation
when he was trying to sell me Looker for the first time.
But he said, you know,
I'm not that good at explaining what Looker is.
Let me just show you.
I think that's really our experience
that we're getting better at explaining what it is,
but there's still nothing more powerful than seeing it in action fantastic well look
daniel it's been it's been fantastic speaking to you and i'll also i'll put the link to the
the video of your uh presentation with dan mcclary in the show notes as well i mean that's that's
fantastic i think um uh story from where we were in the past with 3nf models and so on through to
kind of where we are now um but you know it's been, it's been great to speak to you.
Thank you very much for coming on the show and yeah,
thank you very much. And it's been great to speak to you.
Thanks Mark. Thanks for having me.
Cheers. Thanks.