The Data Stack Show - Re-Air: AI and BI: The Future of Data Analytics with Mike Driscoll of Rill Data

Starting point is 00:00:00 Hey, everyone. Before we dive in, we wanted to take a moment to thank you for listening and being part of our community. Today, we're revisiting one of our most popular episodes in the archives, a conversation full of insights worth hearing again. We hope you enjoy it and remember you can stay up to date with the latest content and subscribe to the show at datastack show. Hi, I'm Eric Dodds. And I'm John Wessel. Welcome to The Datastack Show. The Datastack Show is a podcast where we talk about the technical, business, and human challenges involved in data work. Join our casual conversations with innovators and data professionals to learn about new data technologies and how data teams are run at top companies. Before we dig into today's episode, we want to give a huge thanks to our presenting sponsor, Ruttersack.

Starting point is 00:00:52 They give us the equipment and time to do this show week in, week out, and provide you the valuable content. RudderSack provides customer data infrastructure and is used by the world's most innovative companies to collect, transform, and deliver their event data wherever it's needed, all in RutterSack. real time. You can learn more at rudderstack.com. Welcome back to the show, everyone. Mike, thank you for coming back on. We have Mike Driscoll here. He's a co-founder of Real Data, and we have some amazing topics to cover today. But Mike, for those who did not hear the first episode with you, give us a brief background. Yeah, thanks, Eric. Thanks, Brooks and John for having me. We talked when we first met about, I think, the real as an emerging BI tool and some of the unique lens we had on admittedly a crowded

Starting point is 00:01:42 market. And we were excited to show off a real developer that had just been launched in real cloud. So it's been a couple of years. We're now doing a lot more in, of course, the world of AI. And I'm just thrilled to share with the audience at the Datastack show what we've been cranking on. Yeah. So, Mike, one of the topics that we talked about before the show that I'm excited to dig in on It's kind of the philosophy behind the BI tool. When you're in a crowded market, one of the things you can do is have a really strong philosophy about how you approach problems. And I've really seen that with Rill. So we're going to dig into that. But what are some topics you want to hit? Well, you know, I'd be remiss if I didn't say that I think the most exciting thing we're seeing in the world right now is, of course, AI. And I think naturally there's some connections between AI and BI and particularly with our embrace of BIAS code, we think that there's some really cool intersections between those two. Love it. Well, let's do again. Awesome. Let's do it. Mike, welcome back to the show. It's been way too long and a lot has happened since we last had you on the show. Thanks for having me, Eric and Brooks.

Starting point is 00:02:51 And great to see you here, John, joining the crew. All right. Well, for those who did not listen to the first show, just give us a brief background on you and Rill, just your career and then, you know, how Real came about. Absolutely. Rill was originally started actually as a company called Meta Markets, which sold to Snapchat in the late 2010s. And we actually started Rill by recognizing at Snapchat that this technology we had built for B.I was being used not just for advertising analytics, but for a whole lot more. And so Rill was founded with the idea that we could take some tech. We knew well around interactive. exploratory dashboards and maybe take it to more markets beyond advertising. Since 2024, we launched RealCloud. We've had a tremendous number of adopters. We've got thousands of users out in the world, everyone from Comcast to Erickson and Fintech AI ops companies using these opinionated BIS code dashboards that VIL cloud powers. And we've also got an open source tool called build developer that thousands of folks are using to run a local version of

Starting point is 00:04:03 our stack, a free and local version. And I guess the inspiration for this company in some ways it is my life's work, I would say, is just making data more accessible, more explorable for human brains. And really the genesis of all this was I actually started my career in data in computational biology. So believe it or not, REL is probably the result of decades of working with datasets. It started with genomic data sets, wow, over 20 years ago. Wow, very cool. Okay, you mentioned when we were talking before the call, which we probably should have recorded because it was such a fun conversation. But you mentioned you were coding this morning. What were you working on? Can you tell us, or is it top secret? No, definitely not. One of the things that we've been working on the last

Starting point is 00:04:55 A few weeks here at Rill is we've been building a MCP server. And so what I was working on this morning was actually a demo for a prospect who really some of the best ideas you have in business often come from your customers. And so we had a customer ask, you know, Mike, we love this dashboard that you've given us. But we really love if you can actually have a natural language prompt that could also allow us to interact. We put all this data into Rill. we pay you guys a lot of money. We know that natural language prompts are how AI agents work. Could you guys build us a real agent that could answer questions without us having to click around?

Starting point is 00:05:37 And so we've been building an MCP server where you can ask the same data that lives in your real dashboard, you can ask, you know, hey, why is revenue down over the last 24 hours? And the MCP will kind of go through and examine all of the dimensions associated with that metric. and provide insight. So yeah, that's, I've been hacking on our MCP server, which has been pretty fun. Okay. We actually, it's funny, we got two, two internal MCP servers up and running this week. Only two. Only two. I mean, anything, it was hacky. It was hacky as well, but it's super fun. So, yeah, I've been doing the same thing this week. I want to get in some nuts and bolts for like the nerds out there on this. How did you deploy it? Because I'm curious, I've seen some cool stuff, Cloudflare is doing.

Starting point is 00:06:24 around MCP servers? Yep. There's some other cool ones. How, for both of you, like, how did you decide to deploy it for, like, V1? All right, I'm going to let Eric go first, and then I'll see if...

Starting point is 00:06:34 There we go. What did you guys use for your two? Oh, I mean, we just... It's just running locally, yeah. Okay. Yeah, I mean, there's sort of two interfaces, so I wired everything up in cursor, but some people are using Claude as the interface

Starting point is 00:06:49 as well. And so there's just an internal team that's, you know, kicking the tires on it. So we just did it locally. It's great. Hey, local first development. You know, we're a huge fan of local first development.

Starting point is 00:07:00 Yeah, yeah. We, so right now we are, we're actually using, it's a Docker container. So we built it in Python. You know, from our, looking at the, you know, the MCP servers out there, the Python community, no surprise, is the most mature, the Python-based one. But so right now, it is a number of steps to get it working. We use Clod. we have a cloud. I've got on my cloud right now,

Starting point is 00:07:26 I've got some, you know, Jason config where I can specify the name of the Docker container that you skin up and then talk to and then it, you know, then it talks to the real runtime. I think we are going to rewrite it in GoLine. And so that will just be embedded. You know,

Starting point is 00:07:42 real is a downloadable binary. So if we put the MCP server into that, we could have that entire experience running locally. So that's using cloud and a Python backend at this point. Rars is in Go, actually. Okay. Yeah. Nice.

Starting point is 00:07:55 Okay, well, maybe I'll hit you up after this show. Hit me up. I mean, I'm not a developer by trade, and so someone else opened the poll request we did internally. But I think this is a great jumping off point to talk about how most of the world of BI still lives in dashboards, right? Like your customer is like, this is a great dashboard. And they're useful. They're not going anywhere, right?

Starting point is 00:08:17 To look at, you know, MRR trending over time. Like, that's great to have a visual. to have it in a chart. Exactly. It's helpful, right? Right. Yeah. But the interface is changing dramatically right now.

Starting point is 00:08:31 I know it's in early phases, right? I mean, I love that we talked about getting, you know, MCP servers up and running locally, you know, or through a Docker container or tweaking your JSON config. You know, and there is a lot of fragmentation in, you know, how is the, how are the standards going to be developed, you know, for MCP servers generally. However, I think for everyone who's going to,

Starting point is 00:08:52 gotten a taste of it, it's very clear that we're about to undergo some very dramatic changes in the interface. So can you speak to that a little bit, Mike? Because it's pretty wild to think about, you know, if you had told someone 10 years ago, like your BI interface is actually going to be a chat bot made by another company that is like on the surface a generic intelligence tool. you know, people would be like, that's hard to even conceive of. But I mean, earlier before we were prepping, like, you literally showed us that. You were doing BI and Claude. Yeah.

Starting point is 00:09:30 I think it's, you know, to use an overused adjective out there, it's wild, right? It really is wild to see what's happening in the world of applications. I think what we're really seen is a shift in the interfaces for software. You know, how do humans engage with software? So we saw folks talking about, you know, how do you really need to click around Salesforce to understand your sales pipeline? Or do you just need to ask, you know, your AI sales force agent, hey, who are our top 10 leads this month? I think BI is going to be just, in fact, if anything, potentially more so impacted by this shift in the way that people interface with software.

Starting point is 00:10:14 And to be honest, I think, you know, BI really was never about. dashboards, right? We always say, B.I is actually about insights, right? It should be about, you know, making better decisions. And I think a great analogy is the world of maps, right? No one actually wants a map, right? They just want to get to where they're going. And, you know, with the advent of something like Uber and Lyft and, you know, now, of course, even Waymo, it wouldn't surprise me if people are actually using mapping software less frequently because they didn't actually ever want them out. They were just trying to get from point A to point B. I think the important, I think the important thing to note is that you kind of do need both,

Starting point is 00:10:59 because especially when it comes to AI agents, right? You want the reason why the visualization software, the dashboards still matter is that if an AI agent tells you that revenue is down in the last 24 hours because of the following correlated dimensions. And let's talk to, advertising. It's because iPhone clicks are down. You still want to be able to trust the agent, but verify their work. And so without visualizations, it's very hard for humans to kind of comb through, you know, reams and reams of data. Visualizations are still the right-sized interface for human eyes. Yeah. I also think there's a very practical, there's a very practical utility for visualizations around things like accountability, right? If I'm holding a team accountable

Starting point is 00:11:50 to reaching some metric or if I'm holding a team accountable to analyzing, you know, adoption of features or other things like that, right? Well, how do you do that? You know, it's like that it's a good reference point. And that's just one example among many where, you know, when you actually use them practically to operate a function, you know, or other things like that, I think there's a ton of utility there. Yeah. How I think about sports? And I think like what would sports be like without like a scoreboard, right? Like it's a very simple metric of like to, you know, like whatever the sport is of like points than the definition of winning and seeing who wins. And then, but like in business like that that is it's not always

Starting point is 00:12:31 clear like what winning looks like to be fair. But a lot of that is still very possible. Yeah. Right. And that because that's true, just giving somebody a blank piece of paper and saying like, what do you want to know about your data? That probably has a place, but it's also not the same as like, look, here's a scoreboard with whatever your point is and then like up is like higher is better, lower is worse.

Starting point is 00:12:56 Like that basic, I think is still going to be a thing. It's just simple and it helps people. Mike, I'm interested to know how does this change the way you think about building real and even measuring the success of the product? Right. Because if you think about a metric like product usage, right, or engagement.

Starting point is 00:13:15 So, okay, a company deploys real, they're using it, you know, to do all sorts of things, right? But ultimately, they're using it to derive insights from their datasets and then, you know, hopefully make decisions that improve the business, right? But now what's

Starting point is 00:13:31 interesting is, you know, part of that engagement data will now be API calls through an MCP server, right? Like the fidelity there is very different. You You also, you know, you in many ways, they're now have sort of three interfaces, right? So you have the, you know, let's call it the traditional UI and the tools that are available via a GUI.

Starting point is 00:13:55 You have real developer, which is like config based, you know, and you're actually in an IDE. And then you have this kind of in between where the MCP server can deliver like an end user interface like through a, chat bot, but also can be used within a tool like cursor, you know, in conjunction with real developer. And so the surface area of your product and like the number of users and different types of people who can interact with real, the surface area gets way wider. So how do you think about that? Yeah. I think it's a very fluid environment right now. Yep. Where I think everyone is trying to ask themselves. The first question we asked ourselves when we were building out this, you know, real BI agent is where should it live? Should it live next to the dashboard, right? People log

Starting point is 00:14:50 into their dashboard at real data.com and there's a chat bot. Should it live in Slack? Should we meet them where they're already doing a lot of their knowledge work? And should it be a Slack channel? Should it live in, and should it be a, you know, an app for Open AI or for, you know, for client, right? So I think that, I think we're sure that's starting to shake out. I think one. One insight, you know, we can take from, I mentioned the Sequoia folks did a conference last week about AI. And they put a slide up and they showed, you know, for all the technological innovations of the last couple of decades, which companies that were able to achieve more than a billion dollars

Starting point is 00:15:25 in revenue? And you look at that slide and you see the vast majority of the companies that get to a billion in revenue, not market cap revenue, or application companies. So it's, and if you look at Open AIs decision to acquire, when. Andsurf, there are some hints that, you know, people think that a lot of the value, maybe even not value creation, but certainly the value extraction, the ability to actually get dollars from customers is going to come at the application layer. So I think they're a company like real needs to think, okay, well, if we just become, you know, an API that plugs into someone else's application, does that put us at risk in terms of our ability to command dollars? So I think, again, the answer is often, it's going to be, you know, just like we think about API layers, you need both. You need to have a UI. You know, you need to have a UI. You need to have an

Starting point is 00:16:20 API. You need to have a UI. You need to have multiple ways that you can drive value for your customers. And yeah, so I think it's going to be interesting to see how this plays out the next, you know, coming months. Yeah, it'll be fun. One question, and I'm thinking about, I'm thinking about our listeners who, you know, I'm sure there's a range, right? I'm sure we have listeners who, you know, are, you know, they work in Power BI, they run, you know, sort of like very traditional pipelines. You know, we probably have some listeners who, you know, they turn on the show while they're, you know, tinkering with their MCP server, you know, set up, right? And so there's a very broad spectrum. But I want to think about the ones who sort of are looking at this and they're saying,

Starting point is 00:17:01 okay, I get it. I can see that there are some pretty tectonic shifts like about to happen. how should they begin to think about, you know, as a data professional, they're managing a stack, they have internal stakeholders, they are going to face demands for delivering things through different interfaces just because the chat interface is so user-friendly and it's becoming muscle memory for so many people. How did they navigate that as they think about tools, as they think about, you know, as they think about their stakeholders? It's a great question. And I think there's a few shifts that are worth, that I think are worth paying attention to as a data practitioner. I think the one of the first, I think that's going to be accelerated by AI, which is,

Starting point is 00:17:46 and we've already seen, is that I think data professionals are increasingly embracing code-powered workflows and code, you know, code-defined stacks. So we saw, you know, DBT created in some ways the fact, you know, an analyst that understood how to use Git. And at first, that seemed like, my gosh, you'll never get anyone to adopt a data. data tool that you have to go to the command line and commit stuff to get. But the power of that approach was such that I think it overwhelmed some of the, frankly, the developer, you know, challenging ergonomics of working with command lines and code. I think that's going to happen

Starting point is 00:18:24 in other areas of data. So obviously Rills embracing BIS code, right, where you can load up a project in, you know, in cursor and vibe code it, vibe update it. I think across the data stack from ETL to spinning up databases, right, to defining metrics layers and semantic layers to defining dashboards. I think we're going to see code backed by Git as a predominant theme there. And I think that is a big thing that that data professionals should be embracing. That's on the creation side. And then I'm happy to talk about the consumption side for their stakeholders, right? That doesn't mean that their stakeholders need to know Git. But if you're a data professional today, you should be writing your data pipelines and your metrics layers and your

Starting point is 00:19:09 dashboards, you should be writing those in code and not with, you know, loaded UIs. Speak a little bit about the, because I've had personal experience with us too, about the developer efficiency and speed gain there. Because I mean, I've been talking before AI, honest and with AI, I think there's a multiplier. But I, so if I'm used to a traditional stack and I have like three different guis. I'm like log into this one GUI and then like make a change to a pipeline and hit save and a login in this other. I mean, you can kind of visualize, I think, what I'm talking about. What do you, and I know we're kind of just guessing, but what do you think the efficiency game is even pre-AI with a full like BI, a full as code workflow, pipelines, modeling and, you know,

Starting point is 00:19:54 B.I. So for any of the listeners there, if you want to read a long blog post that a guy named Simon Spady wrote, called the declarative data stack. He actually talks about this in terms of what are some of the gains of your entire data stack, again, from pipelines to database ingestion code, to being declared entirely with YAML and SQL. I think there's a number of gains. First, let's be honest, there's a higher technical barrier to that, right? So the great thing about UIs is almost anyone can navigate a UI. When you start working in code, you do need to know, you know, you need to know programming languages. I think one of the biggest gains is that you don't need to move between multiple different tools. If you've, if you parameterize your

Starting point is 00:20:43 project in, you know, as code files, you can kind of move, move between layers of your stack by just moving between code files, right? You don't have to go and look at some other tool to figure out what's going on with your pipeline. You may have to observe it. But if everything is declared in code, I think it means moving between layers. your stack gets a lot easier. I think that in the, okay, I'm going to fast forward to the era of AI, it used to be hard if you had to move between languages like SQL or Python. Not everyone's an expert in every one of these languages. But I do think that cursor and copilot and, you know, their brethren do allow for folks to individual developers can do a lot more. We used to have this

Starting point is 00:21:25 idea of, right, you had a database, you know, admin and then you have maybe someone good at writing pipelines and some good building dashboards. But I do think that when you sort of define everything in code, an individual developer can do a lot more. They can kind of be a 10x developer and move from writing Python code to SQL code to CSS code in a single day's work. The last thing I'll say,

Starting point is 00:21:47 I talk about moving between layers because I think that's so powerful, right? But the last thing is that debugging issues gets a lot easier if everything is, again, and we think the largest companies, the world still use mono. repos. A lot of things in the world of data are dealing with dependencies between systems, a lot of glue code. And when you have kind of a mono repo for your data stack, even if it contains

Starting point is 00:22:11 a to be, you know, it contains a SQL glot for your transformations and Dagster for your orchestration, maybe Ril for your metrics layer and dashboards, when all this stuff and maybe snowflake for some, you know, warehouse code, when all of that lives in a single repo, I think debugging what's wrong gets dramatically faster for folks. So I think it just accelerates what, you know, we talk about the one person trillion dollar company, right? That's the dream that I think a lot of VC is talking about. I think you can have a one person data team these days. Yeah, right. That's, I think that's what we're moving towards. A one person data team powered by these tools. Yep. Yeah. Here's like kind of a anecdotal example. And I think there's a lot,

Starting point is 00:22:56 there's a lot of companies that still have kind of traditional stack right but I think there's a pretty good number of companies that are kind of that middle level maybe they've decided like our we're going to use get for transformations user too light dbt but we but on the one end we kind of a traditional bi tool and maybe on the other end we kind of have traditional pipelines but just something interesting that I ran into the other day this was a fairly progressive stack essentially like all the way up to the bi layer was all like as code you know kind of modern tooling. And then it was a newer, it was, I don't know what gen we would be on in BI. Let's call it Gen 3 maybe. I don't know. But kind of that like that Snowflake era like, like BI, several of the

Starting point is 00:23:35 tools that came out. They're on one of those trying not to call them out the name. And we had to rename something. Right. And we renamed it and then I think we like named it like changed the name back. So it's like what like what's wrong with? This is fine. Like everything should be fine here. So update everything. So everything goes through. Everything's fine in the code. And then you've got this layer and you flip over into the UI and everything is broken. Nothing is mapped anymore. You have to like manually click in and like reset date for just like all the little like things. And I know that like just about every person that's worked in BI like knows that feeling of like, ah, we updated something and you just have to. And you have like this indeterminate amount of work

Starting point is 00:24:21 that will haunt you for like several weeks because you missed like one little minute detail when you rebuilt it. So I feel like that to me is the reason like all right, BIS code sounds wonderful. I mean, imagine how many clicks it's going to take to go into like four or five different guis, right? To rename that one field, right? That start. And by the way, sometimes it's not a choice. It's not that you decide to rename something. You're upstream. Sales or marketing for the fifth time decides to rename X, right? Yeah. Right.

Starting point is 00:24:52 Customers get renamed. Or customers, sure. Yeah. Yeah. Great point. Yeah. We're going to take a quick break from the episode to talk about our sponsor, Rudder Stack.

Starting point is 00:25:00 Now, I could say a bunch of nice things as if I found a fancy new tool. But John has been implementing Rudder Stack for over half a decade. John, you work with customer event data every day and you know how hard it can be to make sure that data is clean and then to stream it everywhere it needs to go. Yeah, Eric. As you know, customer data can get messy. and if you've ever seen a tag manager, you know how messy it can get. So rudder stack has really been one of my team's secret weapons.

Starting point is 00:25:27 We can collect and standardize data from anywhere, web, mobile, even server side, and then send it to our downstream tools. Now, rumor has it that you have implemented the longest running production instance of rudder stack at six years and going. Yes, I can confirm that. And one of the reasons we picked Rudderstack was that it does not store the data and we can live, stream data to our downstream tools. One of the things about the implementation that has been so common over all the years and with so many rudder stack customers is that it wasn't a wholesale replacement of your stack. It fit right into your existing tool set.

Starting point is 00:26:04 Yeah. And even with technical tools, Eric, things like Kafka or PubSub, but you don't have to have all that complicated customer data infrastructure. Well, if you need to stream clean customer data to your entire stack, including your data infrastructure tools, head over to rudderstack.com to learn more. I think, you know, one thing that's interesting is thinking about infrastructure is code, right? So let's talk about data infrastructure is code.

Starting point is 00:26:29 If you think about there are patterns around this, right? Yeah. But it used to be, like someone had to have a huge amount of willpower to actually, like, to actually get the data stack there, right? And so it's like, okay, when we finally get. they're like, this is amazing. Yeah. But what's so, like, what's incredible, I think is that one,

Starting point is 00:26:52 and this is just one dimension of the change happening here, right? But if you think about tools like real, where you have a, you know, you have a bunch of config files and it's, you know, it literally is, you know, BIS code, like teams are going to intentionally choose tools like that, and the MCP servers are going to let them run IAC, like, practices with almost none of the previous paint. Like, it's going to be amazing, you know? And I think some people are probably there.

Starting point is 00:27:18 And there's ways for the MCP servers to go and whatever. But like it's going to be, because everyone loves the dream of IAC. But it was just like, okay, is the juice worth the squeeze to like force absolutely everything in the Terraform and have all the central governance around all of it. And like do all the weird customizations to like make all of it work. And it's like all of that's gone. Like just gone. Yeah. And every single like terraform provider from my like DevOps history of like one edge case, it's not really.

Starting point is 00:27:46 covered in it and like, ah, what are we going to do? Yeah. And this is a funny, like, story around that. So with, like, Terraform has like, obviously really good coverage in DevOps tools, but like kind of iffy coverage and data tools, right? Yeah. Like, back in DevOps land, I have a friend, I've only ever heard of this at one company, but there's a friend of mine that they went all in, like, we're doing infrastructure

Starting point is 00:28:06 as code. They, and they're in AWS, I believe. So they had this bot that would go through and just like hard delete anything that wasn't an infrastructure as code, like weekly. It would just like wipe it. And I was like, that is commitment. Like, that is like some really solid commitment. But to your point, most people don't do that.

Starting point is 00:28:25 And there's some kind of creep. And there's like, well, like, we can turn it off good. But yeah. Well, I mean, the other thing that's amazing is, and you mentioned this too, Mike. Like, I am not a developer by trade. Right. But I can go in and work with an MCP server and reason about YAML files, like, fully sufficiently.

Starting point is 00:28:45 And then especially if there's a companion UI where I can see the materialization of that, like I can understand what's going on, right? It's totally doable. And I'll give a great example. And this is where I think these infrastructures code, BIS code, kind of everyone's sort of standing on the shoulders of each other, right? So it actually also means that you can really lean into open standards that, again, AI bots are great at reading open standards.

Starting point is 00:29:14 And so one example of a standard that I love is D3 format. You know, formatting strings and formatting numbers is always been a bit of a bitch, right? And there's two ways to provide your users with the ability to format how they want a number to appear in their dashboard. One is to build a very complex UI with a currency of a pull-down menu for currency format of pull-down menu for, I don't know how many times I've gotten the Excel button that, like, do I increase precision or decrease precision? I know. And you click the wrong direction. Oh, man. I removed it. So the great thing about what you're talking about, Eric, and John, is that you could just go in. I don't need to even look at the D3 format documentation. Rill just implements D3 format. Yeah. And I can, and I've done this. I can say, hey, Rill, cursor, update this currency format to be British pounds. And there's a whole set of things that need to go into making something look like British pounds.

Starting point is 00:30:15 The commas are different in terms of the separators. And it just does it. And not only does it do it based on the documentation of D3 format, which kudos to Mike Bostock, they've done a great job documenting. But it also looks at thousands and thousands of examples out there of D3 format. And it can learn from what's already been done before us. Right. Yep.

Starting point is 00:30:37 Yep. I love it. Okay, well, let's, there are so many things we could keep talking about, but like, you are one of the people that we talk about when we want to think about the analytics landscape in general. So we just talked about the bleeding edge, right? Honestly, we should have you back on because there are probably a couple more hours of discussion to have about that subject generally. We'll do that after John and I get the real MCP server up and running. That's right. Yeah. Yeah. And we'll do it. Yeah, we'll do a podcast analytics. Yeah. Yeah, we'll do a handout show. But let's zoom out from the. bleeding edge, okay? And so, and John, actually, I know this is going to strike very close to home for you because you work with a lot of clients, you know, who are still running very traditional analytics back, you know, Power BI, etc. So the bleeding edge is changing at breakneck pace, but so much of the world still operates running analytics like the same way that it's been done for years, decades.

Starting point is 00:31:31 My case is fair. Change is hard when a tool gets in, you know, when a tool gets embedded, it's very hard to pull it out. So Mike, talk to us about the analytics landscape. Generally, John, please weigh in because you see this, you know, every day with your... Well, I actually want to add to the question for Mike. I also want to know from your perspective how velocity the current velocity is impacting adoption. Because I have the theory about when it's high velocity, there's a certain group of people that are like, I'm sitting off to the side until this thing slows down or levels off. So let's talk about that too, but I'm really curious your perspective. on the landscape out there.

Starting point is 00:32:09 Great. I'll say, so I think first in terms of like the shifts that we're seeing, and then we'll talk about velocity, I think that the level of change right now is significant enough. I don't think this is any longer evolutionary change that we're seeing in terms of how people are going to build their data stacks. I think there's an analogy to, in the early days of, you know, telecommunications, right? Some folks were like, should we put down wires?

Starting point is 00:32:36 And at some point, cellular became enough of a powerful technology that certain countries never even laid down wire. They just went right to sleep. Right. They just leakfrog that entire phase. And so I think there's a lot to be learned about people who haven't built anything yet. What are they building with? If you were starting a company from scratch today, what would you do?

Starting point is 00:33:00 Would you lay down a bunch of white? Would you sort of go out and buy snowflake and go out and buy a snowflake and go out and buy DBT and Informatica, you know, 5Tran and all the other modern data stack? Or would you do something different? I think the leap, I think we can look at what these leapfrog architectures might look like. And I'll predict a few things that are components of a leapfrog architecture for B.I. Yep. The first is data lakes. I think everyone starts with the data lake. That is the foundational substrate of where your data lives, right? Namely, Iceberg. Not data lakes with a bunch Parquet and Jason spread all over and no catalog, but a governed, structured data lake with

Starting point is 00:33:41 iceberg at the core. It's really a lake of tables, right? That's, and your databases just talk to that data lake, many applications, right? That's the first piece. And then the question is second layer, I would say, is going to be a fast layer. I actually think we don't need cloud data warehouses if you've got a data lake. There's no point in moving data from your data lake into where. into a warehouse and paying all sorts of taxes, you really just want for that second tier as a

Starting point is 00:34:09 serving layer is a fast layer. So I think real-time analytical databases is where you want to put data that needs to be quickly accessible to applications. If it wants to be slow, just query the data like directly. You need to be fast, put in a real-time analytical database. What are those? Those are the fastest growing class of databases today, in my opinion. And that's things like Click House, of course, in now talking about, raising it a $6 billion evaluation according to the information there to be trusted last week. You've got obviously Mother Duck and DuckDB as a fast analytical database. And then you've got Star Tree, Pino, Star Rocks, right?

Starting point is 00:34:48 There's a whole group of folks that are building fast database engines. And then the layer above that, we talk about BI. Of course, now I'm biased. I think that I don't think you would start with Looker if you were building it. from scratch. Certainly you might look at something like Omni, which is the next-gen looker. But I think this is where the exciting stuff happens. It really is. Can we think about not just B-I as code, but really AI as B-I, right? Could we actually consider not even, you know, there's that joke in back to the future. He says, where we're going, there are no roads.

Starting point is 00:35:26 Maybe where we're going, there are no dashboards, right? Maybe we can just have a pure AI interface that interacts with that fast metric store that's in Clickhouse or DuckDB or something else. And we just start from scratch. We have Claude as our interface to all of our, you know, companies, metrics and, you know, business insights. So that's just a speculation. And then on the second question, how is velocity impacting adoption? I think people are nervous.

Starting point is 00:35:54 No one wants to, you know, commit millions of dollars and find out they picked the wrong horse. So I think there's a lot of, I think people are. you know, wading into things like iceberg. And there's a lot of experiment, you know, some experimentation. I don't think, you know, FedEx is going to rip out, you know, their existing database infrastructure yet. But I think the area where we're seeing the most velocity, maybe no surprise, is around AI initiatives.

Starting point is 00:36:20 So if people are doing something with AI, they might be building analytics on that AI and they're willing to experiment on their analytics stack for a particular AI initiative, of it, you know, that they're working on. Well, and I think for large companies, it's not fair to think of it as all or nothing either, right? Like, there are some that are, like, going to have very cutting edge divisions of the company that are working on X or Y, and some they're going to have, like, they run the, their division runs a mainframe and, like, we'll always run a mainframe. Yeah, yeah.

Starting point is 00:36:52 I think Iceberg is the clear one that you'll see people probably adopt soonest because it's easy to swap out, you can still run Snowflake on Iceberg, right? It's just an external table. And the other reason that people are, I think, going there is cost. Iceberg represents a huge decrement in cost for companies. And so that's where I think, you know, people get excited about a cost reason, not just an innovation reason. Yeah. And I think the velocity thing too, like when people can arrive on a standard where like CSV is a standard still parquet, I think it's really becoming a standard iceberg. I think is because when you can land on that and feel really comfortable that I have complete control, it's open source, nobody's going to take it from me, nobody's going to change the licensing terms,

Starting point is 00:37:39 nobody's going to like whatever, like all those things that people, you know, fear if you've been around for long enough. So you can get on that standard and then, especially in a larger org, realize that like, all right, so we're a snowflake shop or something. and then we hire the chief AI officer and he has a Databricks background. It's like, well, do we want to switch the whole company from Snowflake to Databark?

Starting point is 00:38:02 Not really. Do we want like this new AI chief AI officer and all his team who knows Databricks, stuff alone, Snowflake, not really. Do we just buy both? Do we just buy both? And if you look at the numbers, I mean, Eric, you've seen this too.

Starting point is 00:38:16 The number of overlapping companies between Snowflake and Databricks is really high. Yeah, that's really high, sure. And I think it's funny. And that's true of other tooling too, right? But I think when you have those open standards and essentially Snowflake and Databricks already said, hey, we separated compute and storage. And Snowflake essentially has a, you know, has their own algorithm of how they access storage, right?

Starting point is 00:38:37 But it's still the same principle, whether it's in Snowflake's table format or an open table format. Like it's the same principle as far as design. You know, I think I love, so there's one more question we need to get to before the end. But to share one last thought, I love the concept of leapfrog, Mike. And we actually talked with a company at Data Council called Mooncake. Yeah. Speaking of fast databases. Yeah, speaking of fast databases.

Starting point is 00:39:01 Well, this is what's interesting when we think about the lead frog, right? And obviously, we talked about this a little bit, but I was thinking the other day, it's okay, like, if I was going to build a product, what would the TechSack be, right? Well, you know, one thing that's, like, really fascinating about a technology like Mooncake, I mean, they're early, right? But it's okay, you build your app, it's running on Postgres, and then you, you you run Mooncake and then you just have iceberg and DuckDB. And what's really insane about that is you just, you gave yourself future access to an enterprise

Starting point is 00:39:35 data stack and you can have like sort of infinite interoperability with whatever other tools you want. And that leapfrog is wild to think about, right? And it's so subtle and so easy, right? But it's like the option value for the future is automatically baked in and you don't have to, you know, you can basically build an extremely scalable enterprise stack on that foundation, you know, and you're just building your app on. In the flexibility like we were talking about, like I think, I think especially if you get into advanced use cases and like you really want

Starting point is 00:40:07 experts and like whatever your company does, like you get a broader pool of people if you can say, hey, we're on this one open standard, but like bring your own tool. Like, what license, whatever you want, whatever your compute layer is, whatever your, you know, data science layer is, like bring that and we can do it. And I think that expands your talent pool instead of having you just tend to have companies right now on data where like your Microsoft shop or you're like kind of tablo shop and the talent is segmented based on the software. And I think you might see a little bit less of that. Yeah. I mean it's the Postgres I was looking up mooncake here. I think it's for a number of reasons it's such a smart set of decisions they've made there.

Starting point is 00:40:47 But totally. But I think, you know, this is why we see in general postgres, right? is the it's the de facto standard database that everyone starts with. And to your point, John, if everyone knows, people understand Postgres. They know how it works. And of course, again, I think increasingly we're not just building, you talk about people know Power BI. I think one other shift we'll see, certainly among those of us who are building tools, is that the tool users increasingly will not be humans.

Starting point is 00:41:17 The tool users will be agents. And a quote from Nikita, who sold neon for a billion, dollars to Databricks last week. Nikita, the founder there, who also Neum was built on serverless Postgres, right? So another good news. He said that the majority of databases created on neon in the last year were created by agents, not by humans. Yeah. Insane. Yeah. I mean, it really is. Agents and Versailles. Between those two. Yeah. Well, I mean, if you've actually probably, you know, V-0, yeah. If you've actually done it, it is magical, you know. But the punchline here is that the major tradeoffs in critical decisions

Starting point is 00:41:57 are actually being removed from the equation. That is tectonic, right? Where it's like, you know, just these decisions that were decades-long impact, you don't have to make that decision anymore, you know, which is crazy. Okay, we have to hit this, and we should have saved more time

Starting point is 00:42:15 because there are so many fun things to talk about, but B.I. is super crowded, and so and John you actually you tell your story recently because like you actually like started using real and your reaction was like I don't know if I want to explore another VI too. Yeah yeah we really should save more time for that but yeah we'll definitely have to have you back Mike so several friends are like you got to check out real you got to check out real and like when you're doing consulting I'm switching back and forth but so many contexts and so many BI tools is like I don't know if I want to see another BI tool but finally like through like data counsel and people's like all right I got to check it out. But anyways, long story short, working with one of the large consulting companies that's Empire thinking about RIL from Data Council. Just all right, I'm going to dig into this. And one of the best parts, and I want you to share kind of the backstory of the philosophy

Starting point is 00:43:04 here, is we've talked a lot about BIS code and AI. But there's actually a really opinionated philosophy behind Rill that really has nothing to do with either of those. It's more on like good analytics and like what people actually do. So just the one thing that I noticed right off the back, is date-time comparisons of like, I want to see week-over-week percentage change, month-over-month percentage change, year-over-year. Of course. And like to build that out in Tableau, PowerBi, a lot of these other tools, like, it's a fair amount of effort. But everybody wants to answer that question. And Google Analytics is actually the only other tool that I can think of, maybe

Starting point is 00:43:43 looker, but specifically Google Analytics, so that was like a native part of it. That's why some people still use it. I think that's probably true. Yeah. And so anyways, I'm sure there's other stuff built in because I haven't spent tons of time yet with the tool. But like, tell us some about that philosophy and maybe even some other opinionated decisions you guys have made building the tool. Well, thanks, John. I'm glad you were able to overcome your initial hesitancy about checking yet another BI tool. I think really one of the core principles that folks that appreciate once they get going with Rill is actually, as you said, it's not just the B. code, and it's not just, you know, fast dashboards. At the center of real is our metrics first philosophy. And so a lot of BI tools that you build dashboards on, you know, data tables. And our opinion is that,

Starting point is 00:44:35 you know, data tables are kind of too raw. You know, fact tables are too raw to really build on. You can do it, but you have to put a lot of logic in to define measures. And on the other hand, like reports are too baited. They're too rigid. And so what we consider the, we call the measure, metrics layer, you know, metrics are really, they are like what every business uses to think about their company, you know, revenue, MAU, you know, campaign spend, return on advertising spend. And so metrics are these flexible aggregate functions that can be explored in different contexts, right? A metric can be, you can look at revenue by a country, revenue by customer, revenue by product category. And so I think that's the essence of almost any, every

Starting point is 00:45:20 tool is this metrics layer, this metrics model, and very few BI tools force you to build the, you know, define those metrics from the start. They kind of do it like as a side effect. And so RIL puts that front and center. And once you have that metrics model, everything else flows from that. So you really, and really you actually don't build the dashboard. Once you've defined your metrics and your business dimensions, we actually give you a dashboard without making any decisions. And so I think that's the heart of it. And so then it, makes things like revenue week over week, revenue month over month, all of those time series comparisons of metrics are made much easier because we're also, you have to pick your time column.

Starting point is 00:46:01 And so there's some hard decisions that people make up front, but then we make, you know, the downstream effects make a lot of things very intuitive and easy afterwards. One other thing that I thought was brilliant that I've never seen before is when you build out this metrics layer and then you like hit go, the default is, essentially like, I believe this is the default or it's the option I chose. I can't remember. Or you were part of an AB test. But essentially like show me everything is an option. And then I will tell you what I don't want, which is different than most BIA is like I'm going to drag and drop each little thing on here and decide. But to show me everything with like a quick option of like, oh, actually hide these five things is actually kind of refreshing.

Starting point is 00:46:44 Because like I don't know what I want. Like let me see some things. So I thought again, that was just another one of those things that was like so neat. pattern. Yeah. I mean, you know, I'll say this. We've been building a version of this product for probably 15 years. And so I think, you know, sometimes, you know, they say, where are good products come from? Oftentimes, I think a lot of founders are building a product that they would have wanted, right? Yeah. Yeah. You know, this is the product that I always wanted to have. I was a data scientist. I was a data engineer. And so I always wanted this product. And so we've learned the hard way over the years what not to do. And some of these choices are, yeah, school of hard knocks in the data analytics space. Nice. Awesome. Well, Mike, we're at the buzzer, as we like to say, but we have so much

Starting point is 00:47:31 more to cover. So let's have you back on the show soon. We can keep talking about AI and dig even deeper into real. And where do people go to check out the tool? Oh, yes. Realdata.com. R-I-L-L-Data.com. You can download it with a single curl command. So, yeah, I hope people will check it out. Awesome. Well, thanks again, Mike. Thanks, Eric. Thanks, John.

Starting point is 00:47:53 Great to be here. The Datastack show is brought to you by Rudderstack. Learn more at rudderstack.

The Data Stack Show - Re-Air: AI and BI: The Future of Data Analytics with Mike Driscoll of Rill Data

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.