The Data Stack Show - 165: SQL Queries, Data Modeling, and Data Visualization with Colin Zima of Omni

Starting point is 00:00:00 Welcome to the Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You'll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by Rutterstack, the CDP for developers. You can learn more at rutterstack.com. Welcome back to the Data Stack Show. Costas, I'm so excited because we get to talk with Colin. He's at Omni, but he, I mean, literally helped build Looker over almost a decade, seven or eight years. And Looker has had a huge impact in multiple ways

Starting point is 00:00:49 on the data industry, from analytics to architecture to modeling. It has spawned entirely new categories of companies. And I am so interested to hear about what Colin learned at Looker that he is building his new company on, right? Because, I mean, Looker is still a great tool, right? So the people who built Looker, like, what are they trying to build? I think that's what I want to figure out. How about you? Okay, I'm very interested to see how someone who is involved in such a

Starting point is 00:01:26 successful product and company starts another company with another product in the same industry. So I want to learn about that. Like what's why and how. So I don't know. I think it's going to be super, super interesting to have this conversation with him today. Yeah, I agree. Well, let's dig in. Let's do it. Colin, welcome to the Data Sack Show. Thanks for having me. Okay, you have a fascinating background. So give us the story and, you know, especially how you ended up starting Omni.

Starting point is 00:02:00 Yeah, sure. So right out of school, I created synthetic CDOs. So think of them as credit instruments that no one needs anymore. As that are actually doing ranking for Google. And we help evaluate search ranking results. So we were sort of like the judge team for how search was working. Started a company actually with one of my co-founders at Omni. Ended up selling that to a company called Hotel Tonight. Hotel Tonight was actually Looker's fourth customer. So I led the data team at Hotel Tonight following that acquisition, got very close with the Looker team,

Starting point is 00:02:50 eventually said, hey, I love the product. I want to come work on it. And I joined as around the 40th employee leading. Originally customer success and support alongside analytics, eventually took over the product team, kind of moved in and out of those roles, was there for eight years through the Google acquisition. Frankly, like got a little

Starting point is 00:03:12 bit tired, just as we scaled up, sort of the culture was changing and wanted to fire it up again. So that's how we started up Omni. Very cool. So many questions about the background, but really quickly, can you just give us a quick explanation of what Omni. Very cool. So many questions about the background, but really quickly, can you just give us a quick explanation of what Omni is? Yeah. So it's going to be very familiar to people that are familiar with Looker, but the core of Omni is that we balance the analytical process. So we give you all of the power of a data model to write queries and self-serve for end users. And then we also give you all of the freedomness and openness of something like writing SQL or extract-based analytics.

Starting point is 00:03:52 And the idea is that users can actually mix and match between those two versions of the world. They can move very quickly in sort of SQL and Freeland. And over time, we help them build a data model so that other users can self-serve. Very cool. Okay. Well, I have to ask, of course, as a marketer, I have to ask a little bit about being a statistician at Google. Sure. Because I think we were chatting before the show that was, you know, sort of, you know, what time period was that when you were doing? 2007 to 2011. Okay, wow. So yeah,

Starting point is 00:04:26 so like, man, that's like, when search advertising was, you know, going through a crazy hockey stick. What types of projects did you work on? Like, what types of things were the engineers building the algorithm? Because it sounds like you supported the engineers building the algorithm? Like, what types of problems were you trying to solve? Or what types of things are you trying to understand? I mean, it's going to sound kind of funny, but we were just trying to define mostly what good is for search. And I know that sounds sort of obvious, but kind of similar to like a lot of analytics actually in sort of businesses, Google is using a mix between live ranking signals. So think things that people click on. And then they're also using objective or subjective, I guess, evaluation of ranking results. So it's not just

Starting point is 00:05:14 a black box that looks at clicks and promotes things up to the top of the result set. And similarly, it's not just survey that says kind of here's one algorithm, here's another algorithm, which one's better. It's a mix of those things. And the whole job was sort of creating the process and the framework for doing that sort of evaluation. So an example that I love to give, because there's almost a mix of like philosophy and statistics here, is one of these queries that would always come up as Harry Porter. So Harry and then P-O-R-T-E-R. And it actually gets into the philosophy of what people are searching for when they look for ranking results. Because Google evolved this idea of spell correcting aggressively over time. And I think

Starting point is 00:05:59 thinking through how frequently does a user need to be looking for Harry Potter to intersperse or return Harry Potter results versus exclusively providing Harry Potter results? And how do you create frameworks for doing those sorts of things across sort of the whole surface area of search? Yeah. So it's this process of trying to use statistics to create frameworks, but also sort of tying that to the logic of what people are trying to do when they're searching. Yeah, super interesting. And what was the output of your work, you know, sort of like the specific work product? Was that an input to the algorithm? Like, what did that exchange look like when you shipped a product or a product?

Starting point is 00:06:42 Yeah, so I mean, the simplest way to explain it is that engineers are constantly coming up with refinements to the algorithm. So they're saying, I have a new sort of layer in the way that we return search results, and it changes search results in a certain way. So it might affect 1% of all queries sampled in some sort of weighted basis or something like that. And these are the results that my algorithm would give. And these are the current results. Like, how do we create a framework for deciding whether that change is actually positive for users? That was our team's job was to create that framework, try to explain it to leadership, and then leadership essentially made decisions. And sometimes it's cut and dry. It's just like, we're finding the exact thing that the person

Starting point is 00:07:22 is searching for 10 times more. Very frequently, it's a lot more subtle than that. Certain results get better, certain ones get worse. They change in ways that are not obvious. So we're also creating a framework for how to evaluate those things. That was the whole process for what we did. Yeah. Super interesting. Okay. One more question. How did you integrate newer, but sort of like peak trending search topics and then optimize around those? And like, how did you weight the work against like, this is like very timely and very important, like world of finance or whatever? Yeah. Yeah. So I mean, there are, the simplest way to explain is there's lots of different modules in the search algorithm and freshness is sort of a whole area of

Starting point is 00:08:02 search. And one of those modules, we had to create guidelines for sort of a whole area of search and one of those modules. We had to create guidelines for sort of how timely results get expired over time and how much boost they get when they are occurring. So it's sort of all part of the framework. It's what is so subjective about a good search result. If you search for a politician today, how different should the results be from yesterday based on news and how newsworthy is an event is very subjective, but it was sort of vaguely trying to come up with how you describe these things so they could be evaluated and then obviously using a mixture of click signals. So sometimes clicks

Starting point is 00:08:45 can give you answers to these sorts of things in terms of what people are using for. There are also problems with clicks, like clickbait is a thing. So you have to sort of adjust for things like that as well in terms of how ranking works. So that's why it couldn't just be click signals. Yeah. Well, as a marketer, I can say that you did quite a good job over time of really limiting the ability to game the SEO type of things. Yeah. I mean, it was an impossible

Starting point is 00:09:12 ongoing battle and I was a micro piece of it. Yeah. Okay, well, let's jump. So many questions about Omni, but I want to jump from there to Looker because it seems like there's a really clear connection between, you know,

Starting point is 00:09:30 freshness was a word you mentioned, tried to define what good is you mentioned analytics and I mean, to me it's like, well, a lot of those things got baked into Looker, you know, because that's, I mean, as a Looker user, I've experienced a lot of those things. Is that true? I think certainly Looker had a unique take on the analytics world. I thought it was really interesting when I started using Looker and eventually joined. I remember the founding team, so Lloyd and Ben, took a lot of pride in not looking at the analytics landscape very much. That was good in some ways and bad in others. But it meant that Looker did have a fairly unique perspective on how to approach BI in a way that was honestly scary for people. We got a lot of criticism from folks like Gartner

Starting point is 00:10:19 for the whole life of the company that operating exclusively in database was crazy. I remember we would get back a Gartner survey and there'd be a hundred questions on your in-database engine. And we'd just have to write NA for a sixth of the survey. Did those analysts still work at Gartner? They do. I mean, they're still slowly coming around to the concept of in-database analytics as like an exclusive way of doing things. And it's kind of funny because now Omni is building in-memory layers above the database. So it's like what's old is new constantly. Totally. I love that.

Starting point is 00:10:53 But I mean, we were trying to do things a very different way. And that it was this really strict compilation of SQL down into the database, heavily governed. And like it was a backlash to things like Tableau where extract was the focus. So I mean, in many ways, we had to sort of teach people the way that is normal to think about analytics today, which is like centralized data in a data warehouse, put something that looks like a data model on top of it and let people query freely in the database. Those were scary concepts at the time. One of the biggest reasons that we lost deals early was because people couldn't get their data in a database. And I think now the idea of buying Fivetran or Stitch and getting that done

Starting point is 00:11:42 might even happen before you buy a BI tool in many contexts. So there was this sort of paradigm shift that was happening. And it was really Redshift and then later Snowflake and BigQuery that actually opened that up. But the idea of an analytical database that you could just really exercise heavily from the BI layer is sort of what unlocked the whole world. Yeah, super interesting. And in terms of the architecture, where did that come from, right? Because Looker in many ways sort of introduced this entire new architecture. I mean, most companies today, I would say, that are setting up a new data stack or trying

Starting point is 00:12:22 to modernize their data stack are sort know, are sort of in a way are influenced by the architecture that Looker championed. Where did that come from at Looker, right? Because like you said, it's a scary concept. But it really changed, I think, a lot of the ways that people think. And I think, I mean, literally launched a lot of new companies. Yeah. The core pieces of the concepts came from software engineering, which was just this idea of layers that sit on top of each other, where an API from below is sort of what the layer above can work with. This whole microservices approach to doing analytics, connecting to Git, building a code

Starting point is 00:13:02 based model. So a lot of people don't even know the first version of the LookML model you only could interface with through the command line. So it was truly a SQL compiler to start. It was not a BI tool. I mean, it was a BI tool, but it was a SQL compiler first. And then sort of the BI layers flowed out from there. But the real core was just this modeling layer that could describe essentially queries in a more abstracted way. So rather than writing SQL with characters, now we can write it with fields, filters, pivots, or structural concepts that users think about. Yeah, that makes total sense. Okay. So Looker sort of reinvented the way that a lot of people approach architectures what i see now and feel free to disagree but it's sort of you know looker introduced a couple of

Starting point is 00:13:55 layers but they were integrated right and so like the wonderful thing about looker is you sort of like put it on top of raw data and then you can model it and then you visualize it or whatever. And so now that's been unbundled on a number of levels, right? And so companies took Looker. I mean, DBT is obviously the elephant in the room for LookML. So let's abstract that out. Yep. Which is interesting. Thoughts on like, is it good that the unbundling is happening? Yeah, I mean, I think in some ways it's natural, but I think that there's also equilibrium that you have to deal with in terms of like how many of these tools that you want to manage over time. So the challenge with Looker is always, or with sort of any BI tool that's bundled with a modeling layer, so Omni included, is that you want to

Starting point is 00:14:46 create data model to build analytics. And then inevitably you want to use those things in other places. So you push things further and further down the stack. The challenge is that the further down the stack that you push things, so into DBT or into an ETL process or something like that, the more rigid that transformation becomes. So the more difficult it is to adjust and adapt to what users are doing. And I think in many ways, LookML's superpower was this concept of development mode, where I can go into a branch, I can edit a thing, I can immediately see what a report is impacted, and I can go push that thing out into production. And moving it further down into the

Starting point is 00:15:25 stack, so into the ETL pipeline or into DBT or something like that, creates this discontinuity, where now the API layer below is producing a data set, and the thing above needs to consume that data. It can't really as easily interact with that thing. And so the trend that I've seen is more and more people doing things like producing reporting tables, almost cubing a la sort of 2005. And the advantages of things like that are you do get standardization. So you get tables that your BI stack can consume, your data science stack can consume, your now reverse ETL stack can consume. And so you do get that standardization. The challenge is that rigidity is a business problem also. The number of people that can touch that layer naturally drops over time. And I think very early, that was an advantage for dbt. Like almost you have now a modeling layer that fewer people can touch.

Starting point is 00:16:23 So people can't screw things up. Exactly. We can maintain it more tightly and its inaccessibility is an advantage. But if you play that forward to a whole organization depended on materialized modeling, the challenge then becomes very few people can touch it. So now we're waiting on the data team for the next column or something like that. And I think these concepts play really heavily into the way that we're thinking about building product, which is, obviously, we do have a modeling layer kind of a la LookML that is doing just-in-time transformation and pushing SQL down. I think the mistake that we made with Looker was hoping that

Starting point is 00:17:01 our modeling layer could be everything for everyone so that we could do all transformation for the company. And I do think the things that DBT has shown people and just sort of the evolution of the data stack has shown people is that there are concepts that may start in your BI layer that need to get pushed down and standardized for everyone. And it's even obvious to say, but like, I remember we made a customer health score, our business depended on it. We actually picked it up out of LookML and we put it into an ETL process in Airflow. And the reason, because we didn't want people to touch it. So there are good reasons for those things. I think the challenge is that you need now these divergent layers to be

Starting point is 00:17:42 able to speak with each other. So I need to be able to start and produce a report. And I don't want to start by doing ETL and do that. I want to be able to iterate on it quickly, publish it, validate it, and then decide whether things need to get standardized. So our point of view is that you do need those modeling layer pieces, but we need to be more pragmatic about the things that we truly need to to be more pragmatic about the things that we truly need to own in orchestrate and the things that we should push out. So like the example in Omni is I want you to start by writing a piece of SQL, and then I want to take the pieces of that SQL, so maybe the joins, and model them for you or fields. And then if that sort of

Starting point is 00:18:22 virtualized view becomes important, I actually want you to pull it out of Omni and I want to publish it into DBT. And I want all of our reporting in Omni to continue to function silently. So I don't want to care where that lives, but I want a user to be able to make that quickly and then harden it as needed. And some things should go down into those lower layers and some things actually should not. And I think that is sort of what the ecosystem is missing here is there's almost this view that everything should be in the centralized metrics layer when the reality is like that requires time and investment and something should have that level of care from the data team with SLAs from

Starting point is 00:19:06 data sets and sort of alerting and things like that. And something should not, it's not worth that effort. And so we're trying to sort of inject some pragmatism into the modeling experience for the user. Yeah. Can you talk about that in terms of learnings that you had at Looker around, you know, like a KPI, like you said, like a customer health score. It's core to the business and it shouldn't be touched, but so much of good analytics is actually exploratory, right? And so when you, I think it was like when you were talking about going deeper and deeper into the stack, like one way to look at that is that you decrease people's ability to explore because you're really constricting parameters. What did you see at Looker and even with Omni?

Starting point is 00:19:56 Exploration actually is the way to figure out maybe what you need to harden. But if you just start with the modeling layer, you make so many assumptions that you end up having to go back and change it, and it's very slow. Yeah, I think that's exactly right. I mean, the way I would sort of summarize it is I think there's a lot of things that are bottoms up and a lot of things that are tops down. And what that means is that there are certain data sets that you can publish out where making them easy to work with is the superpower of them.

Starting point is 00:20:22 So maybe it's like a revenue time series or something like that, where you're not going deep and accessibility is the most important thing. And then there are other data sets where there's no amount of sort of preemptive manicuring that you can do to make it effective for people. And I think event analytics is a great example of this. Like we're building new features. Maybe we fought a little bit about tracking. Maybe we haven't. We've got nested JSON blobs all over the place. I'm not going to be able to, as the data analyst, predict what my product manager needs to do. They're going to have a question and I need to give them as much as they possibly can

Starting point is 00:20:56 to go answer that question. And then we can think about reshaping that data set. So like, again, I think it's about data teams focusing on where the cleanup that they do has the most leverage. So maybe it's Salesforce does need a lot of manicuring and they do need to build published data sets there. But these long tail sets just require getting data into people's hands and enabling them and then reacting to what they're doing with it. It's actually very similar to sort of like the MVP process of building a company, which is like, we can overbuild our product before people are

Starting point is 00:21:30 using it and we're not going to learn from it. And if we put out younger things that are less complete, we can see how they're getting consumed and then react to it. But if you put up walls in front of people, you're going to disengage them. So I think trying to sort of dumb it down for the user can be advantageous, but not universally. Yep. Super interesting. How do you think about your user at Omni based on what you learned at Looker? Because one term that's cropped up in the last several years is sort of analytics engineer.

Starting point is 00:22:04 And in many ways, Looker basically created that because it's like, you know, you sort of have someone who does a bunch of data modeling, but they're not an actual analyst, and they actually become an analyst and then like, yep. And so Looker enabled this like really interesting environment where it sort of gave superpowers on both ends. So how do you think about that? No, it's true. It's like giving SQL people superpowers. I think that we talked about this a little bit in the pre-show, but I think in a lot of ways at Looker, we needed to really simplify our message because we were teaching people a new way of doing things.

Starting point is 00:22:37 So this idea of a centralized modeling layer that's highly governed and highly controlled was very appealing. And so templating SQL was a piece of that, but the core message of Looker was governing data. And I think the flip side of that was that it's very hard to compromise your most core message. And the core message was like, everything is governed and it's really tight. And what that meant was that when people needed to do pragmatic things, so they needed to transpose before we had transpose, or they needed to write a piece of SQL that they didn't want to model, they were picking it up and injecting it into the data model.

Starting point is 00:23:13 And it wasn't governed. It was just a raw piece of SQL that was getting dumped in a practical way. And so I think one of my big takeaways was we, in some ways, weren't allowed to be pragmatic about our user. We weren't allowed to give them more SQL things because we had to simplify. Yeah. And that's a lot of sort of the opportunity of doing this again is now rather than teaching people the looker way, we can build on people that understand that and DBT and Fivetran.

Starting point is 00:23:44 And now we can say like, great, you want modeled things. I can give you modeled things. But if you want to poke through the model and write some SQL, I want to let you do that too. And then you can decide to model it later. And so it's sort of like we can give people nicer things because we don't need to protect them from themselves. And that was a lot of the balances that we felt like we had to be very unopinionated about the product. And if the developer of a model was not good, that was not our responsibility. And I think now we're taking a more opinionated point of view and being a little bit more aggressive. So a simple example is we don't operate exclusively in database.

Starting point is 00:24:28 If you write a query to a table and you write select star to that table, we'll actually pull the whole thing back, put it in the browser, and let you re-query that data set. Because it's sort of more pragmatic and faster and better. And so we sort of get to take some of these foundational concepts and go two steps further in terms of what we're allowed to do with them. That's what's been so fun about doing this again is we know how to build the core foundational pieces. And now we get to build those things just

Starting point is 00:24:57 outside the customer promise that are sort of so exciting. I always used to joke that I wrote more SQL, like raw SQL than any other Looker user ever. And now I just get to write raw SQL alongside a data model. And it's like, it's what I've always wanted. So it's, I get to build for me a little bit. Very cool. Okay, one more question for me

Starting point is 00:25:20 before I hand it over to Costas. Can we talk about, so, you know, we talked a little bit about unbundas. Can we talk about, so, you know, we talked a little bit about unbundling. Can we talk about the relationship between the visualization layer and the modeling layer? As a Looker user, I think that was one thing that was really nice was that, you know, you sort of have the ability to like drill and then it's like okay well if you want to like look under the hood you can look under the hood which is really nice and so you know but with the unbundled model you don't really get to do that right like yep that becomes a ticket for someone who's you know doing the dbt

Starting point is 00:25:59 model or whatever and so yep the user like i love that about looker is that something you're trying to retain in omni or like how do. Is that something you're trying to retain in Omni? Yeah. And actually, we're trying to even sort of push it a step further. So we sort of talked about this in terms of there's a level of prep on a data set that you can make for a user, and the more prepared it is, the less flexibility the user gets. And so the simplest version is you make a reporting table, and that's all the user can touch. And the Looker version was we give you sort of this modeled schema and you can touch anything inside the model. And we're almost going a step further, which is you can touch the model and do anything.

Starting point is 00:26:34 And if you want to even poke through the model and write SQL, we'll let you go that step further. But the key here is always sort of this interplay between trying to structure things more over time. So if you do let someone write SQL, what we're trying to do is sort of pull out those sort of granular concepts that can make the next question simpler. So a really obvious example is that if you join two tables together, we know that we can make a virtualized view over that table.

Starting point is 00:27:02 And so I don't need to write that join the next time. And kind of the more of those pieces that we can help you build fluidly. So I don't need to drop into a model, it publish the model out right now. It's just, I can write a join and now it looks like it's modeled. And then we can kind of structure that model more and more over time. That fluidity, I think is really the superpower of what sort of the modeling layer helps with the compilation. But it's not just having a model there that does it. It's the model and the ability to adjust the model based on what the user needs. So a great example that's sort of constant is that you have some sort of internal product and you want to filter out your internal users. The version of it where you're doing this in the ETL cycle is like, go back in the ETL cycle,

Starting point is 00:27:51 rewrite the queries, filter the rows. What Looker's true innovation was on a query result, you can drop into the model, put a where clause on everything that's hitting a table and then boom, it's gone. And it's that coupling that is the real power of the model is that it takes that events view and it really makes it events view where user is not internal. And it makes it super accessible. And I think then what we're trying to sort of build on is sort of how do we then refine that work that user did and make it as fast as possible? And then if that where needs to push all the way down into dbt, can we make that really simple as well? So it's make it really fast for the user to answer the question and then make it really robust or as robust as the company wants for controlling the logic later. Yeah, that's super

Starting point is 00:28:42 interesting. It's, it sounds as if like, you know, in many ways, and I know this may be an oversimplification, but it's almost like reclaiming the value of the thousands of ad hoc, you know, activities that people are performing on a weekly or daily basis because there's so much value in what they're trying to do,

Starting point is 00:29:04 but largely it goes wasted. You know, that's exactly it. No. And I'd say like, even it's trying to also knock down that sort of decision node of like, do I make this scalable now? Or do I answer the question? Like, I want you to answer the question and I want you to make it scalable later. And sort of like the original version of the world is like, do I go pull up mode and write it in SQL? Or do I go take the time to like think about a data model and model it? And I want you to just answer the question. And then I want to pull out like, hey, we found three things that are modelable. Do you want these things?

Starting point is 00:29:39 Like, boom, let's model them. It still can stay orphaned. Like it's okay to have one-off SQL and it's pragmatic and users are doing it and we can't argue with them. So it's like, that is what I, that is the subtlety here is let's let users put themselves in trouble a little bit, but let's try to help them like make it scalable and make it better. Yep. Love it.

Starting point is 00:30:01 So fascinating. Costas, please. Thank you, Eric. So Colin, I have, okay, let's start with like a question about the past and how it relates to today, right? Like, yeah, Looker was, I think the company was founded in 2012 or something like that. That sounds about right. Yeah, like 10 years, right? So? And 10 years after we have Omni. So what has remained the same and what has changed between 2012 and 2022 when it comes to the problems that Omni today is addressing? I mean, I think a lot of it.

Starting point is 00:30:41 So at first, I think there's a really big difference between 2012 and 2015. So I think in some ways, Looker got a little lucky. Great example was our first Hotel Tonight instance was actually on top of our production MySQL. I took down the app a couple times querying in Looker. Not recommended. Set up your production replicas. But after Redshift and Snowflake, columnar databases on the web and essentially this sort of mixing

Starting point is 00:31:06 between the data lake and the data warehouse and just the ability to query lots of data has become just obvious and normal to people. You don't need to walk in and say, do you know what Snowflake is? Are you ready to start doing that? That thing has just become completely normal. I think the idea of compiling SQL and sort of SQL familiarity and that being a core component of your data stack has just become normal. And similarly, like this idea of just all of your data from 10 different sources showing up in your data warehouse or your data lake or whatever it is has become normal.

Starting point is 00:31:44 So I'd say early in Looker's life, we were teaching people these concepts and sort of learning about them. Early in Omni's life, we can assume that you have Fivetran set up, you started your Snowflake, you've got DBT implemented, and you have two people that know how to write SQL, great. And you're ready to start going. And so like we get to start with a lot of those concepts existent in the user base. I think the demands of end users have not changed at all. Like people, the dashboard is not dead. Like most people are looking for dashboards.

Starting point is 00:32:17 They're looking for interactive analytics. They're looking for some version of self-service so that a marketer can go look at their channels over the last three weeks and see the evolution of lead generation across them. I think all those things have become just normal and standard for people. I think one big obvious thing that's sitting out there is there hasn't been a generational BI company since Looker. So tableau and click and to some extent power bi were the way before looker and looker like i mean i obviously have a very looker centric point of

Starting point is 00:32:53 view but i think looker grew up in an isolated like it was the isolated winner of its generation and there's a couple other tools out there that are sort of similar generations i i think the current generation has not quite been figured out yet. And so there's some white space. But at the same time, Tableau and Looker and Power BI have become extremely commonplace in people's data stacks. So I do think people are somewhat comfortable with the stack that they have now, which is a little bit different because we got to be very different when we were Looker. We were pitching something very new.

Starting point is 00:33:29 Yeah, yeah, 100%. Okay, you mentioned something very interesting. I would like to spend like a few minutes on it and hear your thoughts on that. So going back to the Looker cohort of BI tools, there were a couple of them. We had Chart.io, we had Periscope Data, Mode, which is still around. It's still out there. What's the other one that got merged with Periscope Data?

Starting point is 00:34:03 Sisense. Sisense, yeah. So we have all these companies growing, and at some point we get the acquisition from Google to acquire Looker. And it almost felt like a cycle was closed in the market. Companies got merged, IPOs were canceled. SciSense was talking about IPOing for a while. It hadn't happened yet.

Starting point is 00:34:41 And then we also had events like Tableau, for example, from being public, getting acquired. And we still have, by the way, as you said, Microsoft with Power BI, which we don't chat that much about it, but it's huge. Enormous. Biggest by far. Yeah, exactly. So can you give us a little bit of like what happens with these cohorts? And after you do that, also like how do you see the future, like the next iteration? Yeah.

Starting point is 00:35:15 I mean, I think the tools sort of divide into a few different buckets. So I think the thing that Looker did really well was it was very opinionated about what it did in terms of this modeling layer. But I think we also understood to be a big successful company, we needed to serve enterprises effectively. And so while we started really focused on the hotel tonights of the world, the venture-backed tech companies that were young and sort of first early adopters of product, we built a lot of the things that gigantic companies needed to be successful with enterprise analytics for 100,000 people. And I think that was one of the reasons that we were able to be so successful as a company is

Starting point is 00:35:55 we thought a lot about the business as we built out the product. That wasn't always best for the product, to be clear. And there's always some tension between the business and the product that you've got to deal with. But I think that was a lot of the reason that we were allowed to be successful was because we thought about the trajectory of what could support the business. I remember having conversations about how to get to a billion dollars in revenue. And when you're having those types of conversations, it makes it much easier to think about sort of what the business looks like five, 10 years from now and what it needs to be successful. And I think some of those companies that you're listing were more focused, a little bit more down market, and maybe a little

Starting point is 00:36:34 less focused on sort of the sustainable economics of the business. Though again, like some are surviving and like sort of continue to grow. And the SaaS model is great for things like that. It's really hard for me to figure out what the next generation is actually. And obviously I'm trying to build one of them, but I think one of the things is that when we were starting Looker, I looked back for a lot of inspiration at MicroStrategy Cosmos business objects, like the first generation of BI. And I think in some ways I was literally looking through MicroStrategy's docs and they've got like the little folder menus, and it looks like sort of Windows 2001 as you're using the product. And I think some of using some of these tools, like even Tableau to some extent, feels a little bit dated in terms of sort of the web interactions and sort of the user interaction models. And I think a lot of the opportunity is to sort of update really at the margin some of these concepts in terms of like how we interact with the database. So a great example here is

Starting point is 00:37:33 as Looker was growing up, we had the columnar database growing with us. So Snowflake, BigQuery, Redshift. And just the idea of using them was sort of the new concept. We're not going to extract everything. We're going to operate in database. Not only is it going to be okay, it's going to be faster than if you were working on an extract basis and it's real time. I think now we've reached the sort of pain point in that sort of trajectory, which is like my snowflake bill is a million dollars. Is my BI tool too recklessly consuming that layer? And I think DBT is probably the contributor to this as well. And so this is where now we can take some of those core concepts and say, okay, what is the borrowed concept from historical BI that we can actually layer in here? So

Starting point is 00:38:13 the example here is that we're silently putting in-memory layers into our product. And again, no new concepts here. BI tools have done this for 30 years. But I think the concept of operating entirely in database when you need to be so that you're real time and you're working with Fivetree and well, but I'm able to build a dashboard where I can download the whole data set in memory and cross filter it instantaneously. I think that's actually what users want. So it's sort of like, how do we look at the things that are great about columnar in database and then build on them? Or Fivetree and having all your data there or DBT, the familiarity of SQL, those sorts of pieces.

Starting point is 00:38:52 Yeah, that makes a lot of sense. And we've been talking a lot about SQL, but there's also like a lot of discussion about Python, right? Yep. And the reason I'm asking that is because, okay, I get it for dbt, for example, to having a lot of requests around Python because they are working a lot with ETL. ETL traditionally was always preferred to have Python. Yeah, but there are also a couple of,

Starting point is 00:39:23 let's say, BI products out there, but they try to merge the two paradigms together. And it usually happens through the notebook. The notebook, yep. Yeah. So what's your take on that? Do you think that this can be the new iteration of BI, or at the end, notebooks is something different

Starting point is 00:39:44 and it's not BI? I think it certainly can be BI. Like, I think all data consumption is sort of overlapping Venn diagram circles where like similar users are doing similar things. I think that I have found it in the past that it's a very different type of user and that data science activities tend to be done less scalably and less to be shared and sort of more force-directed diagram style analytics than creating a self-service environment for end users and you're making different trade-offs. So like we are very focused on the SQL side of things for now and sort of the query consumption side. I would say that like, as databases make querying in different ways more available to end users, I think that people will

Starting point is 00:40:33 want to use them. It's just the frequency with which I need to use looping or sort of like higher order construction, I think tends to be more important in data engineering type activities and data science versus consumption and reporting and things like that. So for me, I would say we're actually going in the other direction, which is like, I want to build a functional library that looks a little bit more like Excel on top of our data. So a Google Sheets style interface on top of queries rather than thinking about Python, because I think that's what makes it more accessible to more users. But sort of to your point, I think you need all of these things. And that is why I don't want to lock the business logic into our layer. If we help you build business logic, I want you to push it into the

Starting point is 00:41:25 right place so that a Python user can go pick it up. And maybe we'll do that in sort of the infinity of time. Like I would like to do everything eventually, but our focus is very much on like the SQL compilation, the consumption, the reporting, the functional sort of consume layer more than the sort of deeper science species. Yeah, that makes total sense. Okay, I want to share with you what I considered always like the genius of Looker. And then based on that, I want to ask you about Omni. So what I found like amazing about Looker was how, I mean, it was like, you have a product that

Starting point is 00:42:07 in order to deliver value, you had to engage two completely different personas. One was the business user who is consuming the data and does the reports. And then you have the analyst or the data engineer who has to prepare this data and all that stuff. And Looker for me created this very distinct experience between Look and Mail that was for, let's say, the data engineer, and then for the business user where you pretty much you could only do what you know really well to do, which is like pivots, right? Yep. you could only do what you know really well to do, which is like pivot, right?

Starting point is 00:42:45 Like for me, that was like, just like switching from like the developer mode, like to the real world, let's say. It was like amazing. Like it was like amazingly smart what happened. So I always considered like a very successful and unique example of a product that can serve like two personas

Starting point is 00:43:04 like almost equally well, right? Which is hard. It's super hard to do. And based on that, let's talk about Omni. Who is the user of Omni? Do you have this duality again? Absolutely. No, I think you perfectly actually... It's amazing because we really tried to profess that point of view strongly internally. We actually had personas for each of those. It was called the Fox and the Hound. Like they were top level user types in the company and we care deeply about them.

Starting point is 00:43:37 I actually think the funny thing is, I think we stopped just short of really diverging the product enough to serve both of those well. So almost what we're trying to do is take one more step in both directions. And that is, I want to give those technical users more superpowers in SQL and more fluidity to model and do things and fork away. And I want to make the end user experience more Excel-y and more sort of interactive, still based on the pivot table. But like a simple example is use point and click interactions to make functions instead of typing functions in a modal. Like how can we elevate it so that

Starting point is 00:44:20 any user can consume things? But I think to your point, even going back to sort of the success of those previous businesses, I think that was the most important things to Looker's success was understanding that we're selling to a data person, but we're selling to a data person whose job and what makes them successful is making other people successful with data products. And that is exactly our focus is we're not building a product for data people. We're building a data product for data people for business users. And when you sort of shift the thinking a little bit, it still needs to be outstanding for the data person. Like that is what got Looker bought. And we got plenty of criticism from our non-technical users, but they're doing that work not to do research on an island.

Starting point is 00:45:07 They're doing that work to build a self-service environment for people. And so that self-service environment needs to be truly great. Ours is still getting better, but that's exactly what we are trying to do is build a great environment for end-user self-service that lets data people go a little bit further too. That's great. All right. One last question from me, because we're getting close to the end here, and I want to give some time to Eric to ask any follow-up questions that he has.

Starting point is 00:45:40 Can you give us, tell us like a little bit about what Omni is today? Like what's the product experience? And also share with us like the vision that you have, like what we should expect like in a year from now, because Omni is a new company. You've been around for a little. So please do that. Yeah. So what we are doing is really balancing these two worlds between the directness of writing SQL and the sort of governance and the richness of accessing a data model.

Starting point is 00:46:11 So what you see when you're using Omni today is all the analytics is built through a centralized data model. But on top of that data model, users can essentially embellish in SQL and go beyond and sort of ask open-ended questions and do open things. And then they can take the components of those analysis and push them down and centralize them. So the idea is that I can start in a workbook, I can do analysis with a mix of SQL and pivot table and front-end fields and UI, and I can do that in isolation. And I can pick that isolated thing up, push it down into a centralized data model and share it with everyone. Or I can leave it in isolation. And so the idea is that I can really straddle these worlds between doing things in a free and

Starting point is 00:46:56 open way and sharing them directly with my neighbor. Or I can build a data environment that everyone can self-serve from. And I can sort of evolve that over time. So I could start in a very open, sort of sloppier analytical pattern, and I can slowly have it look a little bit more like a mature looker instance over time. And what's happening behind the scenes, what's powering that is sort of our model management piece that is picking out the sort of components of your SQL, turning them into a data model, and then we're able to pull them sort of out of SQL queries and push them into our data model and push them out of our data model and into the database.

Starting point is 00:47:36 So you can almost think of it as just Looker meets a SQL runner and lets you sort of move back and forth between them completely fluidly. And what should we expect? Like, give us something to, you know, like... Yeah, I mean, certainly, like, more and more maturity around these sorts of experiences. Like, the magical super motion that I want to see in the future is that an analyst starts a one-off analysis, and they start writing SQL, and they share it with their team, and their team wants to do interactive analysis with that thing. We're able to hit a button and quote-unquote model that SQL down, put it in a centralized modeling layer, and give people self-service with it. Now a data

Starting point is 00:48:20 science team decides that they want to work with that same data set. Again, we can pick up that business logic, go persisted in DBT through some sort of sort of cron schedule. And all of the metrics work silently through the Omni layer. So it's a self-service environment for your end users. And it's sort of a technical iterative environment for your technical users. And we do all the orchestration of the business logic between those layers. So visualization reporting obviously comes along with those things. And then you're going to see more and more in terms of end user experience. So things like spreadsheet style analysis, CSV upload, acceleration, a lot around those pieces, so that you're on a dashboard and you hit a button and now your dashboard filtering is instantaneous and it's all just sort of happening magically behind the scenes.

Starting point is 00:49:10 Okay. That's great. Eric, all yours from here. All right. And so this is a question that kind of combines multiple parts of your past experience. So one really interesting thing when it comes to data, the data space in general, but in analytics as well, is that you have machine learning and AI getting a lot of...

Starting point is 00:49:42 There are a lot of headlines out there about, you know, ML and AI and, you know, automated insights and, you know, anyone who's actually, you know, tried to use Google analytics to use their, you know, automated, like AI based insights, like they know that they are real. If you're like a real business trying to scale, that stuff is like very difficult to do but you also have a lot of experience you know and sort of you know from your google experience like feeding algorithms that are making decisions you know building products that create a lot of data that go into algorithms and then in in some ways like it sounds like Omni is trying to make intelligent decisions or at least make decisions around what options to give you based on what you're trying to do, which is really interesting.

Starting point is 00:50:33 Yep. Not that that's AI or ML necessarily in a formal sense. But what's the relationship? Because in some ways, it's dangerous to introduce like an algorithm into business logic like i actually think you sort of nailed it right at the end there which is like the option like i think the most underrated concept around all these things is like light human in the loop on these sorts of concepts so like we've even actually noticed this in our product there's a really big difference between writing automated joins on your behalf and telling you that we think

Starting point is 00:51:10 that we found very good joins. And the difference is being right a hundred percent of the time. I think these are the sorts of concepts that tie to like self-driving and things like that, which is the bar to being correct in a lot of contexts is honestly, it can be very close to 100%. And so I think that you can use these tools in ways that are extremely powerful, but that you just then want to present them to users in a way that's more interactive. So an example that we're thinking of in the future is, I don't really want to have people writing joins in our product. Like you can, if you show up and you have a list of joins that you want to punch in,

Starting point is 00:51:49 I would much rather see two tables that you want to join together and say, it looks like these two keys join and these are the three most likely other couplets of fields. And this is why we think that like hit yes or no. Yeah. To me, that is like super magical. And it's like a half step back from just doing it for you. But I think those are the types of pieces that we're trying to layer in. And it's actually the same with our SQL parsing. If you write SQL, we parse it out and try to write fields. We don't just immediately stick those fields in the model because we're wrong. We're wrong,

Starting point is 00:52:25 but we're comfortable being wrong. And we can accelerate your ability to make those fields. And we can do that in a way that's very expressive. So I think we're trying to layer those pieces in. We're just trying to do them in a way that puts the user in control as much as possible. So very little black boxing, but black boxes that point you at decisions to go make. Yeah. Fascinating. I love it. Yeah. I mean, you can almost like, as those patterns become established, then those become even more powerful, right? It's almost linting for queries or something. That's exactly it. And like they can become automated, but like, let's make the bar really high. Yeah. Yep. Love it. Okay. We're at the buzzer here,

Starting point is 00:53:09 but before we jump off, where can people learn about, try use Omni? Yep. Explore Omni.com. Just fill out the form there. Like we're about to probably put something real public out there, but we're still young. So we like to sort of handhold through the early process for now, or just shoot me an email, Colin at exploreomni.com and I'll take you on the tour myself. Awesome. Sounds great. Well, Colin, this has been an amazing conversation. Thank you so much for giving us the time. Of course. Thanks. It's been fun. We hope you enjoyed this episode of the Data Stack Show. Be sure to subscribe on your favorite podcast app to get notified about new episodes every week. We'd also love your feedback. You can email me, ericdodds, at eric

Starting point is 00:53:55 at datastackshow.com. That's E-R-I-C at datastackshow.com. The show is brought to you by Rutterstack, the CDP for developers. Learn how to build a CDP on your data warehouse at rutterstack.com.

Pet Camera - EBO Air 2

The Data Stack Show - 165: SQL Queries, Data Modeling, and Data Visualization with Colin Zima of Omni

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

Pet Camera - EBO Air 2

The Data Stack Show - 165: SQL Queries, Data Modeling, and Data Visualization with Colin Zima of Omni

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.