The Data Stack Show - 248: AI and BI: The Future of Data Analytics with Michael Driscoll of Rill Data

Starting point is 00:00:00 Hi, I'm Eric Dotz. And I'm John Wessel. Welcome to the Data Stack Show. The Data Stack Show is a podcast where we talk about the technical, business, and human challenges involved in data work. Join our casual conversations with innovators and data professionals to learn about new data technologies and how data teams are run at top companies. How to Create a Data Team with RutterSack Before we dig into today's episode,

Starting point is 00:00:30 we want to give a huge thanks to our presenting sponsor, RutterSack. They give us the equipment and time to do this show week in, week out, and provide you the valuable content. RutterSack provides customer data infrastructure and is used by the world's most innovative companies to collect, transform, and deliver their event data wherever it's needed, all

Starting point is 00:00:49 in real time. You can learn more at ruddersack.com. Welcome back to the show, everyone. Mike, thank you for coming back on. We have Mike Driscoll here. He's a co-founder of Real Data, and we have some amazing topics to cover today. But Mike, for those who did not hear the first episode with you, give us a brief background.

Starting point is 00:01:11 Yeah, thanks, Eric. Thanks, Brooks and John for having me. We talked when we first met about, I think, the Real as an emerging BI tool and some of the unique lens we had on admittedly a crowded market. And we were excited to show off a real developer that had just been launched in real cloud. So it's been a couple of years. We're now doing a lot more in, of course, the world of AI. And I'm just thrilled to share with the audience at the Data Stack show what we've been cranking on.

Starting point is 00:01:41 Yeah. So Mike, one of the topics that we talked about before the show that I'm excited to dig in on is kind of the philosophy behind the BI tool. When you're in a crowded market, one of the things you can do is have a really strong philosophy about how you approach problems. And I've really seen that with RIL. So we're going to dig into that. But what are some topics you want to hit?

Starting point is 00:01:59 Well, you know, I'd be remiss if I didn't say that I think the most exciting thing we're seeing in the world right now is, of course, AI. And I think naturally there's some connections between AI and BI and particularly with our embrace of BI as code, we think that there's some really cool intersections between those two. Love it. Well, let's dig in. Awesome. Let's do it. Well, let's dig in. Awesome, let's do it.

Starting point is 00:02:25 Mike, welcome back to the show. It's been way too long and a lot has happened since we last had you on the show. Thanks for having me, Eric and Brooks. And great to see you here, John, joining the crew. All right, well, for those who did not listen to the first show, just give us a brief background on you and RIL, just your career,

Starting point is 00:02:41 and then how RIL came about. Absolutely. Rill was originally started actually as a company called MetaMarkets, which sold to Snapchat in the late 2010s. And we actually started Rill by recognizing at Snapchat that this technology we had built for BI was being used not just for advertising analytics,

Starting point is 00:03:05 but for a whole lot more. And so, Real was founded with the idea that we could take some tech we knew well around interactive exploratory dashboards and maybe take it to more markets beyond advertising. Since 2024, we launched Real Cloud. We've had a tremendous number of adopters. We've got thousands of users out in the world,

Starting point is 00:03:27 everyone from Comcast to Ericsson and Fintech, AI Ops companies using these opinionated BIS code dashboards that RealCloud powers. And we've also got an open source tool called RealDeveloper that thousands of folks are using to run a local version of our stack, a run a local version of our stack, a free and local version. And, you know, I guess the inspiration for this company in some ways it is my life's work,

Starting point is 00:03:53 I would say, is just, you know, making data more accessible, more explorable for, you know, human brains. And really the genesis of all of this was I actually started my career in data in computational biology. So believe it or not, RIL is probably the result of decades of working with data sets.

Starting point is 00:04:14 It started with genomic data sets, wow, over 20 years ago. Wow, very cool. Okay, you mentioned when we were talking before the call, which we probably should have recorded because it was such a fun conversation. But you mentioned you were coding this morning. What were you working on?

Starting point is 00:04:30 Can you tell us or is it top secret? No, definitely not. One of the things that we've been working on the last few weeks here at VIL is we've been building a MCP server. And so what I was working on this morning was actually a demo for a prospect who really some of the best ideas you have in business often come from your customers.

Starting point is 00:04:53 And so we had a customer ask, you know, Mike, we, we love this dashboard that you've given us and, but we really love if you can actually have a natural language prompt that could also allow us to interact with, We put all this data into real, we pay you guys a lot of money. We know that natural language prompts are how AI agents work. Could you guys build us a real agent that could answer questions

Starting point is 00:05:16 without us having to click around? And so we've been building an MCP server where you can ask the same data that lives in your real dashboard, you can ask, hey, why is revenue down over the last 24 hours? And the MCP will kind of go through and examine all of the dimensions associated with that metric and provide insights.

Starting point is 00:05:38 So yeah, I've been hacking on our MCP server, which has been pretty fun. So fun. Okay, we actually, it's funny, MCP server, which has been pretty fun. It's funny, we got two internal MCP servers up and running this week. Only two? It was hacky as well, but it's super fun. I want to get into some nuts and bolts for the nerds out there on this.

Starting point is 00:06:05 How did you deploy it? How did you decide to deploy it for V1? What did you guys use for your two? It's just running locally. Yeah. So yeah, I mean, we, there's sort of two interfaces. So I wired everything up in cursor, but some people are using Claude as the interface as well. And so there's just an internal team that's, you know, kicking the tires on it. Yeah.

Starting point is 00:06:37 So we just did it locally, but it's great. Hey, local first development, you know, we're a huge fan of local first development. Yeah. Yeah. We, so right now we we are we're actually using it's a Docker container. So we built it in Python, you know, from our looking at the you know, the MCP servers out there, the Python community, no surprise, is the most mature the Python based one. But so right now it is a number of steps we to get it working

Starting point is 00:07:03 to use Claude, we have a Claude, youude, I've got on my Claude right now, I've got some, you know, JSON config where I can specify the name of the Docker container that you spin up and then talk to. And then it, you know, then it talks to the real runtime. I think we are going to rewrite it in Golang. And so that will just be embedded. Your real is a downloadable binary.

Starting point is 00:07:25 So if we put the MCP server into that, we could have that entire experience running locally. So we're using Cloud and a Python back end at this point. Ours is in Go, actually. OK. Yeah. Very nice. OK, well, maybe I'll hit you up after the show.

Starting point is 00:07:38 Hit me up. I mean, I don't know. I'm not a developer by trade. And so someone else opened the pull request. We did internally. I think this is a great jumping off point I'm not a developer by trade, MRR trending over time, or tweaking your JSON config. in the interface. So can you speak to that a little bit, Mike? Because it's pretty wild to think about, you know, if you had told someone 10 years ago, like your BI interface is actually going to be a chat bot made by another company that is like on the surface a generic intelligence tool.

Starting point is 00:09:01 You know, people would be like, that's hard to even conceive of. But I mean, earlier before prepping, like you literally showed us that you were doing the eye and Claude. Yeah, I think it's, you know, to use an overused adjective out there, it's wild, right? It really is wild to see what's happening in the world of applications. I think what we're really seeing is a shift in the interfaces for software. How do humans engage with software?

Starting point is 00:09:31 And so we saw folks talking about, you know, how do you really need to click around Salesforce to understand your sales pipeline? Or do you just need to ask, you know, your AI sales force agent, hey, who are our top 10 leads this month? I think BI is going to be just effective, anything potentially more so impacted by this shift

Starting point is 00:09:53 in the way that people interface with software. And to be honest, I think, you know, BI really was never about dashboards, right? We've always say BI is actually about insights, right? It should be about, you know, making better decisions. And I think a great analogy is the world of maps, right? No one actually wants a map, right? They just wanna get to where they're going.

Starting point is 00:10:19 And you know, with the advent of something like Uber and Lyft and you know, now of course even Waymo, it wouldn't surprise me if people are actually using mapping software less frequently, because they didn't actually ever want to map, they were just trying to get from point A to point B. I think the important, I think the important thing to note is that you kind of do need both,

Starting point is 00:10:41 because especially when it comes to AI agents, right? You want, the reason why the visualization software, the dashboards still matter, is that if an AI agent tells you that revenue is down in the last 24 hours because of the following correlated dimensions, and let's talk advertising, you know, it's because iPhone clicks are down, you still want to be able to trust the agent, but verify their work. And so without visualizations,

Starting point is 00:11:09 it's very hard for humans to kind of comb through, reams and reams of data. Visualizations are still the right-sized interface for human eyes. Yep. I also think there's a very practical, there's a very practical utility for visualizations around things like accountability. If I'm holding a team accountable to reaching some metric, or if I'm holding a team accountable to analyzing adoption of features or other things like that,

Starting point is 00:11:40 well, how do you do that? That's just one example among many where when you actually use them practically to operate a function or other things like that, I think there's a ton of utility there. Yeah. I always think about sports. What would sports be like without a scoreboard, right? And that because that's true just giving somebody a blank piece of paper and saying like what do you want to know about your data? Like that probably has a place but it's also not the same as like look Here's a scoreboard with a with whatever your point is and then like up is like higher is better lower is worse Yeah, like that basic I think is still gonna be a thing. Yeah, it's just simple and it helps people Mike I'm interested to know how does this change the way you think about It's just simple and it helps people.

Starting point is 00:12:45 Mike, I'm interested to know, how does this change the way you think about building RIL and even measuring the success of the product? Because if you think about a metric like product usage or engagement, a company deploys RIL, they're using it to do all sorts of things. But ultimately they're using it to derive insights from their data sets and then hopefully make decisions that improve the business. But now what's interesting is part of that engagement data will now be API calls through an MCP server.

Starting point is 00:13:21 The fidelity there is very different. In many ways they now have sort of three interfaces, right? So you have the, you know, let's call it the traditional UI and the tools that are available via a GUI. You have real developer, which is like configbased, you know, in conjunction with real developer. And so the surface area of your product and like the number of users and different types of people who can interact with real, the surface area gets way wider.

Starting point is 00:14:12 So how do you think about that? Yeah, I think it's a very fluid environment right now. Yep. Where I think everyone is trying to ask themselves. The first question we asked ourselves when we were building out this, you know, real BI agent is where should it live? Should it live next to the dashboard?

Starting point is 00:14:31 Right, people log into their dashboard at realdata.com and there's a chat bot. Should it live in Slack? Should we meet them where they're already doing a lot of their knowledge work? And should it be a Slack channel? Should it live in, should it be an app for open AI or for, you know, for client, right?

Starting point is 00:14:47 So I think that, I think we're sort of, that's starting to shake out. I think one insight, you know, that we can take from, I mentioned the Sequoia folks did a conference last week about AI and they put a slide up and they showed, you know, for all the technological innovations of the last couple of decades, which companies that were able to achieve

Starting point is 00:15:06 more than a billion dollars in revenue. And you look at that slide and you see the vast majority of the companies that get to a billion in revenue, not market cap revenue, are application companies. So it's, and if you look at OpenAI's decision to acquire Windsurf, there are some hints that, you know, people think that a lot of the value maybe even not value

Starting point is 00:15:27 creation, but certainly the value extraction, the ability to actually get dollars from customers is going to come at the application layer. So I think there, a company like real needs to think, okay, well, if we just become, you know, an API that plugs into someone else's application, does that put us at risk in terms of our ability to command command dollars? So I think the again, the answer is often it's going to be, you know, just like we think about API layers, you need both you need to have a UI, you know, you need to have a UI, you need to have an

Starting point is 00:16:01 API, you need to have a UI, you need to have multiple ways that you can drive value for your customers. And yeah, so I think it's gonna be interesting to see how this plays out the next, you know, coming months. Yeah, it'll be fun. One question, and I'm thinking about, I'm thinking about our listeners who, you know,

Starting point is 00:16:21 I'm sure there's a range, right? I'm sure we have listeners who, you know, they work in Power BI, they run, you know, I'm sure there's a range, right? I'm sure we have listeners who work in Power BI, they run very traditional pipelines. We probably have some listeners who they turn on the show while they're tinkering with their MCP server setup. And so there's a very broad spectrum. But I want to think about the ones who sort of are looking at this and they're saying, okay, I get it. I can see that there are some pretty tectonic shifts like about to happen. How should they begin to think about, you know, as a data professional, they're managing

Starting point is 00:16:54 a stack, they have internal stakeholders, they are going to face demands for delivering things through different interfaces just because the chat interface is so user friendly and it's becoming muscle memory for so many people. How do they navigate that as they think about tools, as they think about their stakeholders? It's a great question. And I think there's a few shifts that are worth, that I think are worth paying attention to

Starting point is 00:17:20 as a data practitioner. I think one of the first, I think that's gonna be accelerated by AI, which is, and we've already seen, is that I think data professionals are increasingly embracing code powered workflows and code defined stacks. So we saw, you know, DBT created in some ways,

Starting point is 00:17:40 an analyst that understood how to use Git. And at first that seemed like, my gosh, you'll never get anyone to adopt a data tool that you have to go to the command line and commit stuff to Git. But the power of that approach was such that I think it overwhelms some of the, frankly, the developer challenging ergonomics

Starting point is 00:18:02 of working with command lines and code. I think that's gonna happen in other areas of data. So obviously Rills embracing BIS code, right? You can load up a project in, you know, in cursor and Vibe code it, Vibe update it. I think across the data stack from ETL to, you know, to spinning up databases, right? To defining metrics layers and semantic layers, to defining dashboards. I think we're gonna see code

Starting point is 00:18:29 backed by Git as a predominant theme there. And I think that is a big thing that data professionals should be embracing. That's on the creation side. And then I'm happy to talk about the consumption side for their stakeholders, right? That doesn't mean that their stakeholders need to know Git, but if you're a data professional today,

Starting point is 00:18:46 you should be writing your data pipelines and your metrics layers and your dashboards. You should be writing those in code and not with loaded UIs. Speak a little bit about the, because I've had personal experience with this too, about the developer efficiency and speed gain there. Because I mean, even talking before AI, honest, personal experience with this too, about the developer efficiency and speed gain there.

Starting point is 00:19:05 I've been talking before AI, honest, and with AI I think there's a multiplier. So if I'm used to a traditional stack and I have three different GUIs and I log into this one GUI and then make a change to a pipeline and hit save and then log into this other, you can kind of visualize, I think, what I'm talking about. What do you... And I know we're kind of just guessing, but what do you think the efficiency game is, even pre-AI, with a full like BI, a full as code workflow, pipelines, modeling, and you know, BI? So for any listeners there, if you want to read a long blog post that a guy named Simon Spady wrote called the Declarative data stack. He actually talks about this in

Starting point is 00:19:47 terms of what are some of the gains of, you know, your entire data stack again, from pipelines to database ingestion code to being, you know, declared entirely with, you know, YAML and SQL. I think the there's a number of gains. First, let's be honest, there's a higher technical barrier to that, right? So the great thing about UIs is almost anyone can navigate a UI. When you start working in code, you do need to know what you know, you need to know programming languages. I think one of the biggest gains is that you don't need to move between multiple

Starting point is 00:20:22 different tools. If you've if you parameterize your project in you know, as code files, you can kind of move, move between layers of your stack by just moving between code files, right? You don't have to go and look at some other tool to figure out what's going on with your pipeline. You may have to observe it. But

Starting point is 00:20:39 if everything is declared in code, I think it means moving between layers of your stack gets a lot easier. I think that I think that in the I'm going to fast forward to the era of AI, it used to be hard if you had to move between languages like SQL or Python. Not everyone's an expert in every one of these languages. But I do think that cursor and copilot and you know, their brethren do allow for folks to individual developers can do a lot more. We used to have this idea of, right, you had a database admin,

Starting point is 00:21:10 and then you have maybe someone good at writing pipelines and someone good at building dashboards. But I do think that when you sort of define everything in code, an individual developer can do a lot more. They can kind of be a 10x developer and move from writing Python code to SQL code to CSS code in a single day's work.

Starting point is 00:21:27 The last thing I'll say, it's talking about moving between layers because I think that's so powerful, right? The kind of, but the last thing is that debugging issues gets a lot easier if everything is an again, in a, we think the largest companies, the world still use mono repos. A lot of things in the world of data are dealing with dependencies between systems, a lot of glue code. And when you have kind of a mono repo for your data stack, even if it contains a to be, you know, it contains a SQL, Glot for your transformations and Dagster for your orchestration, maybe Rill for your metrics layer and dashboards. When all this stuff and maybe snowflake for some,

Starting point is 00:22:05 you know, warehouse code, when all of that lives in a single repo, I think debugging what's wrong gets dramatically faster for folks. So I think it just accelerates what, you know, that we talk about the one person trillion dollar company, right? That's the dream that I think a lot of VCs talk about.

Starting point is 00:22:23 I think you can have a one person data team these days. Yeah. That's, I think a lot of VCs talk about. I think you can have a one-person data team these days. I think that's what we're moving toward. Here's a kind of anecdotal example. There's a lot of companies that are kind of in that middle level. Maybe they've decided, we're going to use get for transformations, use a tool like dbt. But on the one hand we kind of have a traditional BI tool, and maybe on the other end we kind of have traditional pipelines. But just something interesting that I ran into the other day. This was a fairly progressive stack,

Starting point is 00:23:00 essentially all the way up to the BI layer was all like as code, you know, kind of modern tooling. And then it was a newer, it was, I don't know what Gen we would be on in BI, let's call it Gen 3, maybe, I don't know. But kind of that like, that Snowflake era, like BI, several other tools that came out. They're on one of those, trying not to call them out by name. And we had to rename something, right? And we renamed it, and then I think we like, named it, like changed the name back. So it's like, what's wrong with this is fine. Like everything should be fine here.

Starting point is 00:23:35 So update everything. And then, so everything goes through, everything's fine in the code. And then you've got this layer, and you flip over into the UI, and everything is broken. Nothing is mapped anymore. You have to like manually click in this layer, and you flip over into the UI, and you have like this indeterminate amount of work that will haunt you for like several weeks

Starting point is 00:24:05 because you missed like one little minute detail when you rebuilt it. So I feel like that to me is the reason like, all right, BIOS code sounds wonderful. I mean, imagine how many clicks it's gonna take to go into like four or five different GUIs, right? To rename that one field, right? That start, and by the way, sometimes it's not a choice.

Starting point is 00:24:25 It's not that you decide to rename something, you're upstream choice. Sales or marketing for the fifth time decides to rename X, right? Yeah. Customers get renamed. Or customers, sure. Yeah, yeah, yeah, good point, yeah.

Starting point is 00:24:38 We're gonna take a quick break from the episode to talk about our sponsor, Rudderstack. Now, I could say a bunch of nice things as if I found a fancy new tool, but John has been implementing Rudderstack for over half a decade. talk about our sponsor, RutterStack. customer data can get messy. running production instance of Rutter Stack at six years and going. Yes, I can confirm that. And one of the things about the implementation

Starting point is 00:25:40 that has been so common over all the years is that it wasn't a wholesale replacement of your stack. It fit right into your existing tool set. Yeah, and even with technical tools, Eric, things like Kafka or PubSub, but you don't have to have all that complicated customer data infrastructure. Well, if you need to stream clean customer data to your entire stack, including your data infrastructure tools,

Starting point is 00:26:00 head over to ruddersack.com to learn more. I think, you know, one thing that's interesting is thinking about infrastructure as code, right? head over to rudderstack.com to learn more. I think one thing that's interesting is thinking about infrastructure as code. So let's talk about data infrastructure as code. If you think about there are patterns around this. Someone had to have a huge amount of willpower to actually get the data stack there. And so it's like, okay, when we finally get there, this is amazing.

Starting point is 00:26:30 But what's incredible, I think, is that, and this is just one dimension of the change happening here, right? But if you think about tools like Real, where you have a bunch of config files and it literally is BI as code, teams are gonna intentionally choose tools like that you have a bunch of config files there and there's ways for the MCP servers to go and whatever, but like, it's going to be, because everyone loves the dream of IAC, but it was just like, okay, is the juice worth the squeeze to like force absolutely everything in Terraform and have all the central governance

Starting point is 00:27:16 around all of it and like do all the weird customizations to like make all of it work? And it's like all of that's gone. Like just. Yeah. And every single like Terraform provider from my like DevOps history of like the like one make all of it work. tools, right? it. And I was like, that is commitment. But to your point, most people don't do that and there's some kind of creep and there's like, well, we started off good. The other thing that's amazing is, and you mentioned this too, Mike, I am not a developer by trade,

Starting point is 00:28:20 but I can go in and work with an MCP server and reason about YAML files, like fully sufficiently. And then especially if there's a companion UI where I can see the materialization of that, like I can understand what's going on, right? It's totally doable. And I'll give a great example. And this is where I think, you know, these infrastructure's code, BI's code,

Starting point is 00:28:42 kind of everyone's sort of standing on the shoulders of each other, right? So it actually also means that you can really lean into open standards that again, AI bots are great at reading open standards. And so one example of a standard that I love is D3 format. Formatting strings and formatting numbers is always been a bit of a bitch, right?

Starting point is 00:29:07 And there's two ways to provide your users with the ability to format how they want a number to appear in their dashboard. One is to build a very complex UI with currency of a menu for currency format of pull down menu for I don't know how many times I've gotten the Excel button that like, do I increase precision or decrease precision? I know, and you click the wrong direction. Oh man, I removed it. So the great thing about what you're talking about, Eric and John, is that you could just go and I don't need to even look at the D3 format documentation. Rill just implements D3 format. And I've done this.

Starting point is 00:29:45 I can say, hey, Rill, Curserk, update this currency format to be British pounds. And there's a whole set of things that need to go into making something look like British pounds. And even the commas are different in terms of the separators. And it just does it.

Starting point is 00:30:02 And not only does it do it based on the documentation of D3 format, which kudos to Mike Bostock, they've done a great job documenting, but it also looks at thousands and thousands of examples out there of D3 format. And it can learn from what's already been done before us. Yep, yep, I love it. Okay, well, let's, there are so many things

Starting point is 00:30:23 we could keep talking about, but Mike, you are one of the people that we talk about when we wanna think about the analytics landscape in general. So we just talked about the bleeding edge, right? Honestly, we should have you back on because there are probably a couple more hours of discussion to have about that stuff generally. We'll do that after John and I get the real MCP server up and running and do some podcast analytics. Yeah, and we'll do a after John and I get the real MCP server up and running and podcast analytics.

Starting point is 00:30:45 Yeah, we'll do a nerd out show. But let's zoom out from the bleeding edge. And John, actually, I know this is going to strike very close to home for you because you work with a lot of clients who are still running very traditional analytics stack, Power BI, et cetera. So the bleeding edge is changing at breakneck pace, but so much of the world still operates running analytics like the same way that it's been done for years, decades. My kids is fair.

Starting point is 00:31:14 Changes hard when a tool gets embedded, it's very hard to pull it out. So Mike, talk to us about the analytics landscape. Generally, John, please weigh in because you see this every day with your professors. I actually want to add to the question for Mike. I also want to know from your perspective how the current velocity is impacting adoption. Because I have a theory about when it's high velocity,

Starting point is 00:31:41 there's a certain group of people that are like, I'm sitting off to the side until this thing slows down or levels off. So let's talk about that are like, I'm sitting off to the side until this thing slows down or levels off. So let's talk about that too, but I'm really curious your perspective on the landscape out there. Great. I'll say, so I think first in terms of like the shifts

Starting point is 00:31:55 that we're seeing, and then we'll talk about velocity. I think that the level of change right now is significant enough. I don't think this is any longer evolutionary change that we're seeing in terms of how people are gonna build their data stacks. I think there's an analogy to in the early days of telecommunications, right?

Starting point is 00:32:15 Some folks were like, should we put down wires? And at some point cellular became enough of a powerful technology that certain countries never even laid down wire. They just went right to Siler, right? They just leapfrogged that entire phase. And so I think there's a lot to be learned about people who haven't built anything yet. What are they building with? If you're starting a company from scratch today, what would you do? Would you lay down a bunch of wire? Would you sort of go you know, a bunch of why would you sort of

Starting point is 00:32:45 go out and buy Snowflake and go out and buy dbt and Informatica, you know, five tran and all the other modern data stack? Or would you do something different? I think the leap, I think we're, we can look at these, what these leapfrog architectures might look like. And I'll predict a few things that are components of a leapfrog architecture for BI. Yep. The first is data lakes. I think everyone starts with the data lake. That is the foundational substrate of where your data lives, right? Namely iceberg, not data lakes with a bunch of Parquet

Starting point is 00:33:16 and JSON spread all over and no catalog, but a governed structured data lake with iceberg at the core. It's really a lake of tables, right? That's and your databases just talk to that data lake, many applications, right? That's the first piece. And then the question is second layer,

Starting point is 00:33:35 I would say is gonna be a fast layer. I actually think we don't need cloud data warehouses if you've got a data lake. There's no point in moving data from your data lake into a warehouse and paying all sorts of taxes. You really just want for that second tier is a serving layer is a fast layer. So I think real time analytical databases is where you want to put data that needs to be quickly accessible to applications. If it wants to be slow, just query the data lake directly. You need to be fast, put in a real time analytical database.

Starting point is 00:34:06 What are those? Those are the fastest growing class of databases today, in my opinion, and that's things like ClickHouse, of course, you're now talking about raising it a $6 billion valuation, according to the information there to be trusted last week, you've got obviously Mother Duck and DuckDB as a fast analytical database. And then you've got your Star Tree, Pino, Star Rocks, right, there's a whole group of folks that are building fast database engines. And then the layer above that we talk about BI, of course, now I'm biased, I think that I don't think I don't think you could start

Starting point is 00:34:43 with Looker. If you were building it from scratch Certainly, you might look at something like Omni, which is the next-gen looker But I think this is where the exciting stuff happens it really is can we think about not just BI as code but But really a you know AI as BI right? Could we actually consider not even you know There's that joke in Back to the Future, he says, where we're going, there are no roads. Maybe where we're going, there are no dashboards, right?

Starting point is 00:35:10 Maybe we can just have a pure AI interface that interacts with that fast metric store that's in Clickhouse or DuckDB or something else. And we just start from scratch. We have Claude as our interface to all of our, you know, company's metrics and, you know, business insights. So that's just a speculation. And then on the second question,

Starting point is 00:35:31 how's velocity impact the adoption? I think people are nervous. No one wants to, you know, commit millions of dollars and find out they picked the wrong horse. So I think there's a lot of, I think people are, you know, wading into things like Iceberg and there's a lot of experiment, you know, some experimentation. I don't think, you know, FedEx is going to rip out, you know,

Starting point is 00:35:54 their existing database infrastructure yet. But I think the area where we're seeing the most velocity, maybe no surprise, is around AI initiatives. So if people are doing something with AI, they might be building analytics on that AI and they might be building analytics on that AI, and they're willing to experiment on their analytics stack or a particular AI initiative that they're working on. Well, and I think for large companies,

Starting point is 00:36:16 it's not fair to think of it as all or nothing either, right? Like there are some that are like, gonna have very cutting edge divisions of the company that working on X or Y and some they're gonna have like they run the division runs a mainframe and like will always run a mainframe. Yeah. Yeah. I think iceberg is the clear one that you'll see people probably adopt soonest because

Starting point is 00:36:40 it's easy to swap out. You can still run snowflake on iceberg. Right. It's just to swap out. You can still run Snowflake on Iceberg, right? It's just an external table. And the other reason that people are, I think going there is cost. Iceberg represents a huge decrement in cost for companies. And so that's where I think, you know, people get excited about a cost reason,

Starting point is 00:36:59 not just an innovation reason. Yeah, well, and I think the velocity thing too, like when people can arrive on a standard where like, Yeah, and I think the velocity thing too, when people can arrive on a standard where CSV is a standard, still Parquet, I think is really becoming a standard, iceberging is a big deal. When you can land on that and feel really comfortable that I have complete control, it's open source, nobody's going to take it from me, nobody's going to change the licensing terms, nobody's going to, like whatever. It's like, well, do we want to switch the whole company from Snowflake to Databricks?

Starting point is 00:37:44 Not really. Do we want this new AI chief, AI officer, and all his team who knows Databricks, staff alone, Snowflake? Not really. Do we just buy both? Do we just buy both? Yes.

Starting point is 00:37:55 And if you look at the numbers, I mean, Eric, you've seen this too, the number of overlapping companies between Snowflake and Databricks is really high. Yeah, it's really high, sure. And I think it's funny, and that's true of other tooling too, right? But I think when you have those open standards

Starting point is 00:38:08 and essentially Snowflake and Databricks already said, hey, we separated compute and storage. And Snowflake essentially has their own algorithm of how they access storage, right? But it's still the same principle, whether it's in the Snowflake's table format or an open table format, like it's the same principle as far as design. You know, I think I love, so there's one the snowflakes table format or an open table format.

Starting point is 00:38:25 It's the same principle as far as design. So there's one more question we need to get to before the end. But to share one last thought, I love the concept of LeapFrog, Mike. And we actually talked with a company at data council called Mooncake. Speaking of fast databases. But it's okay, you build your app, of infinite interoperability with whatever other tools you want. And that leapfrog is wild to think about, right? And it's so subtle and so easy, right? But it's like the option value for the future is automatically baked in, and you can basically build an extremely scalable enterprise stack on that foundation. And you're just building your app on scratch.

Starting point is 00:40:05 whatever you want, whatever a number of reasons, it's such a smart set of decisions they've made there. But I think, you know, this is why we see, in general, Postgres, right, is the de facto standard database that everyone starts with. And to your point, John, you know, everyone knows, people understand Postgres, they know how it works.

Starting point is 00:40:43 And of course, again, I think increasingly we're not just building, you talk about people that have Power BI. I think one other shift we'll see is certainly among those of us who are building tools is that the tool users increasingly will not be humans. The tool users will be agents. And a quote from Nikita who sold Neon

Starting point is 00:41:03 for a billion dollars to data bricks last week Nikita the founder there who also Neon was built on serverless Postgres, right? So another good Yep He said that the majority of databases created on neon in the last year were created by agents not by humans Yeah, insane. Yeah, I mean it really is agents in for sell the between those two. Yeah Well, I mean if you've actually probably you know, v0 Yeah, if you've actually done it it is magical, you know, but the punchline here is that the

Starting point is 00:41:36 major trade-offs in Critical decisions are actually being removed from the equation. Yeah, that is yeah that is tectonic, right? Where it's like, you know just these decisions that were are actually being removed from the equation. That is tectonic, right? Where it's like, you know, just these decisions that were decades long impact, you don't have to make that decision anymore, you know? Which is crazy. Okay, we have to hit this,

Starting point is 00:41:56 and we should have saved more time because there's so many fun things to talk about, but BI is super crowded, and so, and John, you actually tell your story recently because like you actually like started using real and your reaction was like, I don't know if I want to explore another BI. Yeah, yeah, we really should save more time for it. But yeah, yeah, we'll definitely have to have you back Mike. So several friends like you got to check out real you got to check out real. And like when you're doing consulting,

Starting point is 00:42:22 I'm switching back and forth. So many contexts and so many BI tools is like, ah, I don't know if I want to see another BI tool. But find like through like data council and people's like, all right, I gotta check it out. But anyways, like long story short, working with one of the large consulting companies that's entire BI, thinking about Real from data council, just all right, I'm gonna dig into this.

Starting point is 00:42:42 And one of the best parts, and I want you to share kind of the backstory of the philosophy here is we've talked a lot about BIOS code and AI, but there's actually a really opinionated philosophy behind Real that really has nothing to do with either of those. It's more on like good analytics and like what people actually do. So what just the one thing that I noticed right off the bat is date-time comparisons of like, I want to see week over week percentage change, month over month percentage change, year over year. Of course, and to build that out in Tableau, Power BI, a lot of these other tools, it's

Starting point is 00:43:16 a fair amount of effort. But everybody wants to answer that question. And Google Analytics is actually the only other tool that I can think of maybe looker, but specifically Google Analytics So that was like a native part of it. That's why some people still use it. Yeah, I think that's probably true Yeah, and so anyways, I'm sure there's other stuff built in because I haven't spent tons of time yet with the tool But like tell us some about that philosophy and maybe even some other opinion aid decisions you guys have made building the tool Well, thanks, John. I'm glad you were able to overcome your initial hesitancy

Starting point is 00:43:49 about checking out yet another BI tool. I think really one of the core principles that folks that appreciate once they get going with RIL is actually, as you said, it's not just the BI as code and it's not just fast dashboards. At the center of real is our metrics first philosophy. And so a lot of BI tools that you build dashboards on, data tables.

Starting point is 00:44:14 And our opinion is that, data tables are kind of too raw. Fact tables are too raw to really build on. You can do it, but you have to put a lot of logic in to define measures. And on the other hand, like reports are too baked, really build on. You can do it, but you have to put a lot of logic in to define measures. And on the other hand, like reports are too baked, they're too rigid. And so what we consider the, we call the metrics layer, you know, metrics are really, they are like

Starting point is 00:44:36 what every business uses to think about their company, you know, revenue, MAU, you know, campaign spend, return on advertising spend. And so metrics are these flexible aggregate functions that can be explored in different contexts, right? A metric can be, you can look at revenue by country, revenue by customer, revenue by product category. And so I think that's the essence of almost any,

Starting point is 00:45:01 every BI tool is this metrics layer, this metrics model and very few BI tools force you to build the, you know, define those metrics from the start. They kind of do it like as a side effect. And so real puts that front and center. And once you have that metrics model, everything else flows from that. So you really, and really you actually don't build a dashboard. Once you've defined your metrics and your business dimensions,

Starting point is 00:45:26 we actually give you a dashboard without making any decisions. And so I think that's the heart of it. And so then it makes things like revenue week over week, revenue month over month, all of those time series comparisons of metrics are made much easier because we're also, you have to pick your time column.

Starting point is 00:45:43 And so there's some hard decisions that people make upfront but then we make you know the downstream effects make a lot of things very intuitive and easy afterwards. One other thing that I thought was brilliant that I've never seen before is when you build out this metrics layer and then you like hit go the default is essentially like I believe this is the default is essentially like, I believe this is the default, or it's the option I chose. I can't remember. But essentially, show me everything is an option, and then I will tell you what I don't want. Which is different than most BI, it's like I'm going to drag and drop each little thing on here.

Starting point is 00:46:20 But the show me everything with a quick option option of like oh actually hide these five things is actually kind of refreshing Because like I don't know what I want like let me see some things So I thought again that was just another one of those things that was yeah, so neat design pattern Yeah, I mean, you know I'll say this we've been building a version of this product for probably 15 years And so I think sometimes you know, they say where are good from? Oftentimes, I think a lot of founders are building a product that they would have wanted, right? Yeah. Yeah. You know, this is the product that I always wanted to have. I was a

Starting point is 00:46:53 data scientist, I was a data engineer. And so I always wanted this product. And so we've learned the hard way over the years what not to do. And some of these choices are, yeah, school of hard knocks in the data analytics space. Nice. Awesome. Well, Mike, we're at the buzzer, as we like to say, but we have so much more to cover. So let's have you back on the show soon.

Starting point is 00:47:15 We can keep talking about AI and dig even deeper into RIL. And where do people go to check out the tool? Oh, yes. RILDATA.com. to check out the tool. Oh, yes. Rildata.com. R-I-L-L, data.com. You can download it with a single curl command. So, yeah, I hope people will check it out.

Starting point is 00:47:32 Awesome. Well, thanks again, Mike. Thanks, Eric. Thanks, John. Great to be here. The Data Stack Show is brought to you by Rudder Stack. Learn more at rudderstac.com

The Data Stack Show - 248: AI and BI: The Future of Data Analytics with Michael Driscoll of Rill Data

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.