The Data Stack Show - 248: AI and BI: The Future of Data Analytics with Michael Driscoll of Rill Data
Episode Date: June 11, 2025This week on The Data Stack Show, Eric and John welcome Michael Driscoll, Co-Founder and CEO of Rill Data. Mike discusses the transformative impact of AI on business intelligence (BI) and data analyti...cs. He also explores the shift from traditional dashboard-based tools to more dynamic, code-driven, and AI-powered interfaces that provide deeper insights. During the conversation, the group emphasizes the importance of a metrics-first approach, the potential of leapfrog architectures using technologies like data lakes and real-time analytical databases, and how AI agents are increasingly becoming the primary users of data tools. The conversation highlights the evolving landscape of data infrastructure, where open standards, flexibility, and intelligent interfaces are reshaping how businesses interact with and understand their data, and more. Highlights from this week’s conversation include:Welcome Back Mike Driscoll (1:11)Philosophy Behind BI Tools (2:04)Building a Natural Language Processing Server (4:33)Deployment of MCP Servers (6:07)The Role of Visualizations in BI (10:09)Measuring Product Success (12:43)Navigating Changes as Data Professionals (16:13)Efficiency Gains with Code (19:00)The Future of Data Teams (22:29)Long-term Use of Rutter Stack (25:16)Analytics Landscape Overview (30:59)Future of BI Architecture (33:04)AI's Role in Analytics (35:07)Interoperability and Talent Pool (39:41)The Crowded BI Market (42:03)Final Thoughts and Takeaways (46:07)The Data Stack Show is a weekly podcast powered by RudderStack, customer data infrastructure that enables you to deliver real-time customer event data everywhere it’s needed to power smarter decisions and better customer experiences. Each week, we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Transcript
Discussion (0)
Hi, I'm Eric Dotz.
And I'm John Wessel.
Welcome to the Data Stack Show.
The Data Stack Show is a podcast where we talk about the technical, business, and human
challenges involved in data work.
Join our casual conversations with innovators and data professionals to learn about new
data technologies and how data teams are run at top companies. How to Create a Data Team with RutterSack
Before we dig into today's episode,
we want to give a huge thanks
to our presenting sponsor, RutterSack.
They give us the equipment and time
to do this show week in, week out,
and provide you the valuable content.
RutterSack provides customer data infrastructure
and is used by the world's most innovative companies
to collect, transform, and deliver their event data wherever it's needed, all
in real time.
You can learn more at ruddersack.com.
Welcome back to the show, everyone.
Mike, thank you for coming back on.
We have Mike Driscoll here.
He's a co-founder of Real Data, and we have some amazing topics to cover today.
But Mike, for those who did not hear the first episode with you,
give us a brief background.
Yeah, thanks, Eric. Thanks, Brooks and John for having me.
We talked when we first met about, I think, the Real as an emerging BI tool
and some of the unique lens we had on admittedly a crowded market.
And we were excited to show off a real developer that had just been launched in real cloud.
So it's been a couple of years.
We're now doing a lot more in, of course, the world of AI.
And I'm just thrilled to share with the audience at the Data Stack show what we've been cranking
on.
Yeah.
So Mike, one of the topics that we talked about before the show that I'm excited to
dig in on is kind of the philosophy behind the BI tool.
When you're in a crowded market, one of the things you can do is have a really strong
philosophy about how you approach problems.
And I've really seen that with RIL.
So we're going to dig into that.
But what are some topics you want to hit?
Well, you know, I'd be remiss if I didn't say that I think the most exciting thing we're
seeing in the world right now is, of course, AI.
And I think naturally there's some connections between AI and BI and particularly with our
embrace of BI as code, we think that there's some really cool intersections between those
two.
Love it.
Well, let's dig in.
Awesome. Let's do it. Well, let's dig in. Awesome, let's do it.
Mike, welcome back to the show.
It's been way too long and a lot has happened
since we last had you on the show.
Thanks for having me, Eric and Brooks.
And great to see you here, John, joining the crew.
All right, well, for those who did not listen
to the first show, just give us a brief background
on you and RIL, just your career,
and then how RIL came about.
Absolutely.
Rill was originally started actually
as a company called MetaMarkets,
which sold to Snapchat in the late 2010s.
And we actually started Rill by recognizing at Snapchat
that this technology we had built for BI
was being used not just for advertising analytics,
but for a whole lot more.
And so, Real was founded with the idea
that we could take some tech we knew well
around interactive exploratory dashboards
and maybe take it to more markets beyond advertising.
Since 2024, we launched Real Cloud.
We've had a tremendous number of adopters.
We've got thousands of users out in the world,
everyone from Comcast to Ericsson and Fintech,
AI Ops companies using these opinionated BIS code dashboards
that RealCloud powers.
And we've also got an open source tool called RealDeveloper
that thousands of folks are using to run a local version
of our stack, a run a local version of our stack,
a free and local version.
And, you know, I guess the inspiration for this company in some ways it is my life's work,
I would say, is just, you know, making data more accessible, more
explorable for, you know, human brains.
And really the genesis of all of this was
I actually started my career in data
in computational biology.
So believe it or not,
RIL is probably the result of decades
of working with data sets.
It started with genomic data sets,
wow, over 20 years ago.
Wow, very cool.
Okay, you mentioned when we were talking before the call,
which we probably should have recorded
because it was such a fun conversation.
But you mentioned you were coding this morning.
What were you working on?
Can you tell us or is it top secret?
No, definitely not.
One of the things that we've been working on
the last few weeks here at VIL
is we've been building a MCP server.
And so what I was working on this morning
was actually a demo for a prospect
who really some of the best ideas you have in business often come from your customers.
And so we had a customer ask, you know, Mike, we, we love this dashboard that you've given
us and, but we really love if you can actually have a natural language prompt that could
also allow us to interact with, We put all this data into real,
we pay you guys a lot of money.
We know that natural language prompts
are how AI agents work.
Could you guys build us a real agent
that could answer questions
without us having to click around?
And so we've been building an MCP server
where you can ask the same data
that lives in your real dashboard,
you can ask, hey, why is revenue down over the last 24 hours?
And the MCP will kind of go through and examine
all of the dimensions associated with that metric
and provide insights.
So yeah, I've been hacking on our MCP server,
which has been pretty fun.
So fun.
Okay, we actually, it's funny, MCP server, which has been pretty fun.
It's funny, we got two internal MCP servers up and running this week.
Only two?
It was hacky as well, but it's super fun.
I want to get into some nuts and bolts for the nerds out there on this.
How did you deploy it?
How did you decide to deploy it for V1?
What did you guys use for your two?
It's just running locally. Yeah. So yeah, I mean, we, there's sort of two interfaces.
So I wired everything up in cursor, but some people are using Claude as the interface as
well.
And so there's just an internal team that's, you know, kicking the tires on it.
Yeah.
So we just did it locally, but it's great.
Hey, local first development, you know, we're a huge fan of local first development.
Yeah.
Yeah.
We, so right now we we are we're actually using
it's a Docker container. So we built it in Python, you know, from our looking at the
you know, the MCP servers out there, the Python community, no surprise, is the most mature
the Python based one. But so right now it is a number of steps we to get it working
to use Claude, we have a Claude, youude, I've got on my Claude right now,
I've got some, you know, JSON config
where I can specify the name of the Docker container
that you spin up and then talk to.
And then it, you know, then it talks to the real runtime.
I think we are going to rewrite it in Golang.
And so that will just be embedded.
Your real is a downloadable binary.
So if we put the MCP server into that,
we could have that entire experience running locally.
So we're using Cloud and a Python back end at this point.
Ours is in Go, actually.
OK.
Yeah.
Very nice.
OK, well, maybe I'll hit you up after the show.
Hit me up.
I mean, I don't know.
I'm not a developer by trade.
And so someone else opened the pull request.
We did internally. I think this is a great jumping off point I'm not a developer by trade,
MRR trending over time, or tweaking your JSON config.
in the interface. So can you speak to that a little bit, Mike? Because it's pretty wild to think about, you know, if you had told someone 10 years ago, like your BI interface is actually going to be
a chat bot made by another company that is like on the surface a generic intelligence tool.
You know, people would be like, that's hard to even conceive of.
But I mean, earlier before prepping, like you literally showed us that you were doing
the eye and Claude.
Yeah, I think it's, you know, to use an overused adjective out there, it's wild, right?
It really is wild to see what's happening in the world of applications.
I think what we're really seeing is a shift
in the interfaces for software.
How do humans engage with software?
And so we saw folks talking about, you know,
how do you really need to click around Salesforce
to understand your sales pipeline?
Or do you just need to ask, you know,
your AI sales force agent,
hey, who are our top 10 leads this month?
I think BI is going to be just effective,
anything potentially more so impacted by this shift
in the way that people interface with software.
And to be honest, I think, you know,
BI really was never about dashboards, right?
We've always say BI is actually about insights, right?
It should be about, you know, making better decisions.
And I think a great analogy is the world of maps, right?
No one actually wants a map, right?
They just wanna get to where they're going.
And you know, with the advent of something like Uber
and Lyft and you know, now of course even Waymo, it wouldn't surprise me if people are actually using
mapping software less frequently,
because they didn't actually ever want to map,
they were just trying to get from point A to point B.
I think the important,
I think the important thing to note is that
you kind of do need both,
because especially when it comes to AI agents, right?
You want, the reason why the visualization software, the dashboards still matter,
is that if an AI agent tells you that revenue is down in the last 24 hours
because of the following correlated dimensions,
and let's talk advertising, you know, it's because iPhone clicks are down,
you still want to be able to trust the agent,
but verify their work.
And so without visualizations,
it's very hard for humans to kind of comb through,
reams and reams of data.
Visualizations are still the right-sized interface
for human eyes.
Yep.
I also think there's a very practical,
there's a very practical utility for visualizations around things like accountability. If I'm holding a team accountable to reaching some metric,
or if I'm holding a team accountable to analyzing adoption of features or other things like that,
well, how do you do that?
That's just one example among many where when you actually use them practically to operate a function or other things like that, I think there's a ton of utility there.
Yeah. I always think about sports.
What would sports be like without a scoreboard, right?
And that because that's true just giving somebody a blank piece of paper and saying like what do you want to know about your data?
Like that probably has a place but it's also not the same as like look Here's a scoreboard with a with whatever your point is and then like up is like higher is better lower is worse
Yeah, like that basic I think is still gonna be a thing. Yeah, it's just simple and it helps people
Mike I'm interested to know how does this change the way you think about It's just simple and it helps people.
Mike, I'm interested to know, how does this change the way you think about building RIL
and even measuring the success of the product?
Because if you think about a metric like product usage or engagement,
a company deploys RIL, they're using it to do all sorts of things. But ultimately they're using it to derive insights
from their data sets and then hopefully make decisions
that improve the business.
But now what's interesting is part of that engagement data
will now be API calls through an MCP server.
The fidelity there is very different.
In many ways they now have sort of three interfaces, right? So you have the, you know, let's call
it the traditional UI and the tools that are available via a GUI. You have real developer,
which is like configbased, you know,
in conjunction with real developer. And so the surface area of your product
and like the number of users and different types of people
who can interact with real,
the surface area gets way wider.
So how do you think about that?
Yeah, I think it's a very fluid environment right now.
Yep.
Where I think everyone is trying to ask themselves.
The first question we asked ourselves
when we were building out this, you know,
real BI agent is where should it live?
Should it live next to the dashboard?
Right, people log into their dashboard at realdata.com
and there's a chat bot.
Should it live in Slack?
Should we meet them where they're already doing
a lot of their knowledge work?
And should it be a Slack channel?
Should it live in, should it be an app for open AI
or for, you know, for client, right?
So I think that, I think we're sort of,
that's starting to shake out.
I think one insight, you know, that we can take from,
I mentioned the Sequoia folks did a conference last week
about AI and they put a slide up and they showed, you know,
for all the technological innovations
of the last couple of decades,
which companies that were able to achieve
more than a billion dollars in revenue.
And you look at that slide and you see
the vast majority of the companies
that get to a billion in revenue,
not market cap revenue, are application companies.
So it's, and if you look at OpenAI's decision
to acquire Windsurf, there are some hints that,
you know, people think that a lot of the value maybe even not value
creation, but certainly the value extraction, the ability to
actually get dollars from customers is going to come at
the application layer. So I think there, a company like real
needs to think, okay, well, if we just become, you know, an API
that plugs into someone else's
application, does that put us at risk in terms of our ability to command command dollars? So
I think the again, the answer is often it's going to be, you know, just like we think about API
layers, you need both you need to have a UI, you know, you need to have a UI, you need to have an
API, you need to have a UI, you need to have multiple ways that you can drive value
for your customers.
And yeah, so I think it's gonna be interesting
to see how this plays out the next, you know,
coming months.
Yeah, it'll be fun.
One question, and I'm thinking about,
I'm thinking about our listeners who, you know,
I'm sure there's a range, right?
I'm sure we have listeners who, you know,
they work in Power BI, they run, you know, I'm sure there's a range, right?
I'm sure we have listeners who work in Power BI,
they run very traditional pipelines.
We probably have some listeners who they turn on the show while they're tinkering with their MCP server setup.
And so there's a very broad spectrum. But I want to think about the ones who sort of are looking at this and they're saying, okay, I get it. I can see that there are some pretty tectonic shifts like about to happen.
How should they begin to think about, you know, as a data professional, they're managing
a stack, they have internal stakeholders, they are going to face demands for delivering
things through different interfaces just because the chat interface is so user friendly
and it's becoming muscle memory for so many people.
How do they navigate that as they think about tools,
as they think about their stakeholders?
It's a great question.
And I think there's a few shifts that are worth,
that I think are worth paying attention to
as a data practitioner.
I think one of the first,
I think that's gonna be accelerated by AI,
which is, and we've already seen,
is that I think data professionals
are increasingly embracing code powered workflows
and code defined stacks.
So we saw, you know, DBT created in some ways,
an analyst that understood how to use Git.
And at first that seemed like, my gosh,
you'll never get anyone to adopt a data tool
that you have to go to the command line
and commit stuff to Git.
But the power of that approach was such
that I think it overwhelms some of the, frankly,
the developer challenging ergonomics
of working with command lines and code.
I think that's gonna happen in other areas of data.
So obviously Rills embracing BIS code, right?
You can load up a project in, you know, in cursor and Vibe code it, Vibe update it.
I think across the data stack from ETL to, you know, to spinning up databases, right?
To defining metrics layers and semantic layers,
to defining dashboards.
I think we're gonna see code
backed by Git as a predominant theme there.
And I think that is a big thing
that data professionals should be embracing.
That's on the creation side.
And then I'm happy to talk about
the consumption side for their stakeholders, right?
That doesn't mean that their stakeholders need to know Git,
but if you're a data professional today,
you should be writing your data pipelines
and your metrics layers and your dashboards.
You should be writing those in code
and not with loaded UIs.
Speak a little bit about the,
because I've had personal experience with this too,
about the developer efficiency and speed gain there.
Because I mean, even talking before AI, honest, personal experience with this too, about the developer efficiency and speed gain there.
I've been talking before AI, honest, and with AI I think there's a multiplier.
So if I'm used to a traditional stack and I have
three different GUIs and I log into this one GUI and then make a change to a pipeline and hit save and then log into this other,
you can kind of visualize, I think, what I'm talking about. What do you... And I know we're kind of just guessing, but what do you think the efficiency game is,
even pre-AI, with a full like BI, a full as code workflow, pipelines, modeling, and you
know, BI?
So for any listeners there, if you want to read a long blog post that a guy named Simon
Spady wrote called the Declarative data stack. He actually talks about this in
terms of what are some of the gains of, you know, your entire
data stack again, from pipelines to database ingestion code to
being, you know, declared entirely with, you know, YAML and
SQL. I think the there's a number of gains. First, let's be
honest, there's a higher technical barrier
to that, right? So the great thing about UIs is almost anyone can navigate a UI. When you
start working in code, you do need to know what you know, you need to know programming
languages. I think one of the biggest gains is that you don't need to move between multiple
different tools. If you've if you parameterize your project in
you know, as code files, you can kind of
move, move between layers
of your stack by just moving between code files,
right? You don't have to go
and look at some other tool to figure out what's going
on with your pipeline. You may have to
observe it. But
if everything is declared in code, I think it
means moving between layers of your stack gets a lot
easier. I think that I think that in the I'm going to fast forward to the era of AI, it used
to be hard if you had to move between languages like SQL or Python. Not everyone's an expert
in every one of these languages. But I do think that cursor and copilot and you know,
their brethren do allow for folks to individual developers can do a lot more.
We used to have this idea of, right,
you had a database admin,
and then you have maybe someone good at writing pipelines
and someone good at building dashboards.
But I do think that when you sort of define everything
in code, an individual developer can do a lot more.
They can kind of be a 10x developer
and move from writing Python code to SQL code
to CSS code in a single
day's work.
The last thing I'll say, it's talking about moving between layers because I think that's
so powerful, right?
The kind of, but the last thing is that debugging issues gets a lot easier if everything is
an again, in a, we think the largest companies, the world still use mono repos.
A lot of things in the world of data are dealing with dependencies between systems, a lot of glue code.
And when you have kind of a mono repo for your data stack, even if it contains a to be, you know, it contains a SQL,
Glot for your transformations and Dagster for your orchestration, maybe Rill for your metrics layer and dashboards.
When all this stuff and maybe snowflake for some,
you know, warehouse code,
when all of that lives in a single repo,
I think debugging what's wrong gets
dramatically faster for folks.
So I think it just accelerates what, you know,
that we talk about the one person trillion dollar company,
right?
That's the dream that I think a lot of VCs talk about.
I think you can have a one person data team these days. Yeah. That's, I think a lot of VCs talk about.
I think you can have a one-person data team these days.
I think that's what we're moving toward.
Here's a kind of anecdotal example.
There's a lot of companies that are kind of in that middle level.
Maybe they've decided, we're going to use get for transformations, use a tool like dbt.
But on the one hand we kind of have a traditional BI tool, and maybe on the other end we kind of have traditional pipelines.
But just something interesting that I ran into the other day. This was a fairly progressive stack,
essentially all the way up to the BI layer was all like as code, you know, kind of modern tooling.
And then it was a newer, it was, I don't know what Gen we would be on in BI, let's call it Gen 3, maybe, I don't know.
But kind of that like, that Snowflake era, like BI, several other tools that came out.
They're on one of those, trying not to call them out by name.
And we had to rename something, right? And we renamed it, and then I think we like,
named it, like changed the name back.
So it's like, what's wrong with this is fine.
Like everything should be fine here.
So update everything.
And then, so everything goes through,
everything's fine in the code.
And then you've got this layer,
and you flip over into the UI,
and everything is broken.
Nothing is mapped anymore. You have to like manually click in this layer, and you flip over into the UI,
and you have like this indeterminate amount of work that will haunt you for like several weeks
because you missed like one little minute detail
when you rebuilt it.
So I feel like that to me is the reason like,
all right, BIOS code sounds wonderful.
I mean, imagine how many clicks it's gonna take
to go into like four or five different GUIs, right?
To rename that one field, right?
That start, and by the way, sometimes it's not a choice.
It's not that you decide to rename something,
you're upstream choice.
Sales or marketing for the fifth time
decides to rename X, right?
Yeah.
Customers get renamed.
Or customers, sure.
Yeah, yeah, yeah, good point, yeah.
We're gonna take a quick break from the episode
to talk about our sponsor, Rudderstack.
Now, I could say a bunch of nice things
as if I found a fancy new tool, but John has been implementing Rudderstack for over half a decade. talk about our sponsor, RutterStack.
customer data can get messy. running production instance of Rutter Stack
at six years and going.
Yes, I can confirm that.
And one of the things about the implementation
that has been so common over all the years is that it wasn't a wholesale replacement of your stack. It fit right into your existing tool set.
Yeah, and even with technical tools, Eric,
things like Kafka or PubSub,
but you don't have to have all that complicated
customer data infrastructure.
Well, if you need to stream clean customer data
to your entire stack,
including your data infrastructure tools,
head over to ruddersack.com to learn more.
I think, you know, one thing that's interesting is thinking about infrastructure as code, right? head over to rudderstack.com to learn more.
I think one thing that's interesting is thinking about infrastructure as code.
So let's talk about data infrastructure as code.
If you think about there are patterns around this.
Someone had to have a huge amount of willpower to actually get the data stack there.
And so it's like, okay, when we finally get there,
this is amazing.
But what's incredible, I think, is that,
and this is just one dimension
of the change happening here, right?
But if you think about tools like Real,
where you have a bunch of config files
and it literally is BI as code, teams are gonna intentionally choose tools like that you have a bunch of config files
there and there's ways for the MCP servers to go and whatever, but like, it's going to be, because everyone loves the dream of IAC, but it was just like, okay, is the juice worth
the squeeze to like force absolutely everything in Terraform and have all the central governance
around all of it and like do all the weird customizations to like make all of it work?
And it's like all of that's gone.
Like just.
Yeah.
And every single like Terraform provider from my like DevOps history of like the like one make all of it work.
tools, right? it. And I was like, that is commitment. But to your point, most people don't do that and there's some kind of creep and there's like, well, we started off good.
The other thing that's amazing is, and you mentioned this too, Mike,
I am not a developer by trade,
but I can go in and work with an MCP server and reason about YAML files, like fully sufficiently.
And then especially if there's a companion UI
where I can see the materialization of that,
like I can understand what's going on, right?
It's totally doable.
And I'll give a great example.
And this is where I think, you know,
these infrastructure's code, BI's code,
kind of everyone's sort of standing on the shoulders of each other, right?
So it actually also means that you can really lean
into open standards that again,
AI bots are great at reading open standards.
And so one example of a standard that I love is D3 format.
Formatting strings and formatting numbers
is always been a bit of
a bitch, right?
And there's two ways to provide your users with the ability to format how they want a
number to appear in their dashboard.
One is to build a very complex UI with currency of a menu for currency format of pull down
menu for I don't know how many times I've gotten the Excel
button that like, do I increase precision or decrease precision? I know, and you click the
wrong direction. Oh man, I removed it. So the great thing about what you're talking about,
Eric and John, is that you could just go and I don't need to even look at the D3 format
documentation. Rill just implements D3 format. And I've done this.
I can say, hey, Rill, Curserk,
update this currency format to be British pounds.
And there's a whole set of things
that need to go into making something
look like British pounds.
And even the commas are different
in terms of the separators.
And it just does it.
And not only does it do it based on the documentation
of D3 format, which kudos to Mike Bostock,
they've done a great job documenting,
but it also looks at thousands and thousands of examples
out there of D3 format.
And it can learn from what's already been done before us.
Yep, yep, I love it.
Okay, well, let's, there are so many things
we could keep talking about, but Mike, you are one of the people that we talk about
when we wanna think about the analytics landscape in general.
So we just talked about the bleeding edge, right?
Honestly, we should have you back on
because there are probably a couple more hours of discussion
to have about that stuff generally.
We'll do that after John and I get the real MCP server
up and running and do some podcast analytics. Yeah, and we'll do a after John and I get the real MCP server up and running and podcast analytics.
Yeah, we'll do a nerd out show.
But let's zoom out from the bleeding edge.
And John, actually, I know this is going to strike very close to home for you because
you work with a lot of clients who are still running very traditional analytics stack,
Power BI, et cetera.
So the bleeding edge is changing at breakneck pace, but so much of the world still operates
running analytics like the same way that it's been done for years, decades.
My kids is fair.
Changes hard when a tool gets embedded, it's very hard to pull it out.
So Mike, talk to us about the analytics landscape.
Generally, John, please weigh in
because you see this every day with your professors.
I actually want to add to the question for Mike.
I also want to know from your perspective
how the current velocity is impacting adoption.
Because I have a theory about when it's high velocity,
there's a certain group of people that are like,
I'm sitting off to the side until this thing
slows down or levels off. So let's talk about that are like, I'm sitting off to the side until this thing slows down or levels off.
So let's talk about that too,
but I'm really curious your perspective
on the landscape out there.
Great.
I'll say, so I think first in terms of like the shifts
that we're seeing, and then we'll talk about velocity.
I think that the level of change right now
is significant enough.
I don't think this is any longer evolutionary change
that we're seeing in terms of how people
are gonna build their data stacks.
I think there's an analogy to in the early days
of telecommunications, right?
Some folks were like, should we put down wires?
And at some point cellular became enough
of a powerful technology that certain countries
never even laid down wire.
They just went right to Siler, right? They just leapfrogged that entire phase. And so
I think there's a lot to be learned about people who haven't built anything yet. What are they
building with? If you're starting a company from scratch today, what would you do? Would you lay
down a bunch of wire? Would you sort of go you know, a bunch of why would you sort of
go out and buy Snowflake and go out and buy dbt and Informatica, you know, five tran and all the
other modern data stack? Or would you do something different? I think the leap, I think we're,
we can look at these, what these leapfrog architectures might look like. And I'll predict
a few things that are components of a leapfrog architecture for BI. Yep. The first is data lakes.
I think everyone starts with the data lake.
That is the foundational substrate
of where your data lives, right?
Namely iceberg, not data lakes with a bunch of Parquet
and JSON spread all over and no catalog,
but a governed structured data lake with iceberg
at the core.
It's really a lake of tables, right?
That's and your databases just talk to that data lake,
many applications, right?
That's the first piece.
And then the question is second layer,
I would say is gonna be a fast layer.
I actually think we don't need cloud data warehouses
if you've got a data lake.
There's no point in moving data from your data lake into
a warehouse and paying all sorts of taxes. You really just want for that second tier is a
serving layer is a fast layer. So I think real time analytical databases is where you want to put
data that needs to be quickly accessible to applications. If it wants to be slow,
just query the data lake directly. You need to be fast, put in a real time analytical database.
What are those?
Those are the fastest growing class of databases today, in my opinion, and that's things like
ClickHouse, of course, you're now talking about raising it a $6 billion valuation, according
to the information there to be trusted last week, you've got obviously Mother Duck and
DuckDB as a fast analytical database.
And then you've got your Star Tree, Pino, Star Rocks, right, there's a whole group of
folks that are building fast database engines. And then the layer above that we talk about
BI, of course, now I'm biased, I think that I don't think I don't think you could start
with Looker. If you were building it from scratch
Certainly, you might look at something like Omni, which is the next-gen looker
But I think this is where the exciting stuff happens
it really is can we think about not just BI as code but
But really a you know AI as BI right? Could we actually consider not even you know
There's that joke in Back to the Future,
he says, where we're going, there are no roads.
Maybe where we're going, there are no dashboards, right?
Maybe we can just have a pure AI interface
that interacts with that fast metric store
that's in Clickhouse or DuckDB or something else.
And we just start from scratch.
We have Claude as our interface to all of our, you know,
company's metrics and, you know, business insights.
So that's just a speculation.
And then on the second question,
how's velocity impact the adoption?
I think people are nervous.
No one wants to, you know, commit millions of dollars
and find out they picked the wrong horse.
So I think there's a lot of, I think people are, you know,
wading into things like Iceberg and there's a lot of
experiment, you know, some experimentation.
I don't think, you know, FedEx is going to rip out, you know,
their existing database infrastructure yet.
But I think the area where we're seeing the most velocity,
maybe no surprise, is around AI initiatives.
So if people are doing something with AI, they might be building
analytics on that AI and they might be building analytics on that AI,
and they're willing to experiment on their analytics stack
or a particular AI initiative that they're working on.
Well, and I think for large companies,
it's not fair to think of it as all or nothing either, right?
Like there are some that are like,
gonna have very cutting edge divisions of the company
that working on X or Y and some they're gonna have like they run the division runs a mainframe
and like will always run a mainframe.
Yeah.
Yeah.
I think iceberg is the clear one that you'll see people probably adopt soonest because
it's easy to swap out.
You can still run snowflake on iceberg.
Right. It's just to swap out. You can still run Snowflake on Iceberg, right? It's just an external table.
And the other reason that people are,
I think going there is cost.
Iceberg represents a huge decrement in cost for companies.
And so that's where I think, you know,
people get excited about a cost reason,
not just an innovation reason.
Yeah, well, and I think the velocity thing too,
like when people can arrive on a standard where like, Yeah, and I think the velocity thing too,
when people can arrive on a standard where CSV is a standard, still Parquet, I think is really becoming a standard,
iceberging is a big deal.
When you can land on that and feel really comfortable that I have complete control, it's open source, nobody's going to take it from me,
nobody's going to change the licensing terms, nobody's going to, like whatever.
It's like, well, do we want to switch the whole company from Snowflake to Databricks?
Not really.
Do we want this new AI chief, AI officer,
and all his team who knows Databricks,
staff alone, Snowflake?
Not really.
Do we just buy both?
Do we just buy both?
Yes.
And if you look at the numbers,
I mean, Eric, you've seen this too,
the number of overlapping companies between Snowflake
and Databricks is really high.
Yeah, it's really high, sure.
And I think it's funny,
and that's true of other tooling too, right?
But I think when you have those open standards
and essentially Snowflake and Databricks already said,
hey, we separated compute and storage.
And Snowflake essentially has their own algorithm
of how they access storage, right?
But it's still the same principle,
whether it's in the Snowflake's table format
or an open table format,
like it's the same principle as far as design. You know, I think I love, so there's one the snowflakes table format or an open table format.
It's the same principle as far as design.
So there's one more question we need to get to before the end.
But to share one last thought, I love the concept of LeapFrog, Mike.
And we actually talked with a company at data council called Mooncake. Speaking of fast databases. But it's okay, you build your app,
of infinite interoperability with whatever other tools you want. And that leapfrog is wild to think about, right?
And it's so subtle and so easy, right?
But it's like the option value for the future is automatically baked in, and you can basically build an extremely scalable enterprise stack on that foundation.
And you're just building your app on scratch.
whatever you want, whatever a number of reasons,
it's such a smart set of decisions they've made there.
But I think, you know, this is why we see,
in general, Postgres, right,
is the de facto standard database that everyone starts with.
And to your point, John, you know,
everyone knows, people understand Postgres,
they know how it works.
And of course, again, I think increasingly
we're not just building,
you talk about people that have Power BI.
I think one other shift we'll see
is certainly among those of us who are building tools
is that the tool users increasingly will not be humans.
The tool users will be agents.
And a quote from Nikita who sold Neon
for a billion dollars to data bricks last week
Nikita the founder there who also Neon was built on serverless Postgres, right? So another good
Yep
He said that the majority of databases created on neon in the last year were created by agents not by humans
Yeah, insane. Yeah, I mean it really is agents in for sell the between those two. Yeah
Well, I mean if you've actually probably you know, v0
Yeah, if you've actually done it it is magical, you know, but the punchline here is that
the
major trade-offs in
Critical decisions are actually being removed from the equation. Yeah, that is yeah that is tectonic, right?
Where it's like, you know just these decisions that were are actually being removed from the equation. That is tectonic, right?
Where it's like, you know, just these decisions
that were decades long impact,
you don't have to make that decision anymore, you know?
Which is crazy.
Okay, we have to hit this,
and we should have saved more time
because there's so many fun things to talk about,
but BI is super crowded, and so,
and John, you actually tell your story recently because
like you actually like started using real and your reaction was like, I don't know if
I want to explore another BI. Yeah, yeah, we really should save more time for it. But
yeah, yeah, we'll definitely have to have you back Mike. So several friends like you
got to check out real you got to check out real. And like when you're doing consulting,
I'm switching back and forth. So many contexts and so many BI tools is like,
ah, I don't know if I want to see another BI tool.
But find like through like data council
and people's like, all right, I gotta check it out.
But anyways, like long story short,
working with one of the large consulting companies
that's entire BI, thinking about Real from data council,
just all right, I'm gonna dig into this.
And one of the best parts,
and I want you to share kind of the backstory of the philosophy
here is we've talked a lot about BIOS code and AI, but there's actually a really opinionated
philosophy behind Real that really has nothing to do with either of those.
It's more on like good analytics and like what people actually do.
So what just the one thing that I noticed right off the bat is date-time comparisons of like, I want to
see week over week percentage change, month over month percentage change, year over year.
Of course, and to build that out in Tableau, Power BI, a lot of these other tools, it's
a fair amount of effort.
But everybody wants to answer that question.
And Google Analytics is actually the only other tool that I can think of maybe looker, but specifically Google Analytics
So that was like a native part of it. That's why some people still use it. Yeah, I think that's probably true
Yeah, and so anyways, I'm sure there's other stuff built in because I haven't spent tons of time yet with the tool
But like tell us some about that philosophy and maybe even some other opinion aid decisions you guys have made building the tool
Well, thanks, John.
I'm glad you were able to overcome your initial hesitancy
about checking out yet another BI tool.
I think really one of the core principles
that folks that appreciate once they get going with RIL
is actually, as you said, it's not just the BI as code
and it's not just fast dashboards.
At the center of real is our metrics first philosophy.
And so a lot of BI tools that you build dashboards on,
data tables.
And our opinion is that, data tables are kind of too raw.
Fact tables are too raw to really build on.
You can do it, but you have to put a lot of logic
in to define measures. And on the other hand, like reports are too baked, really build on. You can do it, but you have to put a lot of logic in to define measures.
And on the other hand, like reports are too baked,
they're too rigid.
And so what we consider the, we call the metrics layer,
you know, metrics are really, they are like
what every business uses to think about their company,
you know, revenue, MAU, you know, campaign spend,
return on advertising spend.
And so metrics are these flexible aggregate functions
that can be explored in different contexts, right?
A metric can be, you can look at revenue by country,
revenue by customer, revenue by product category.
And so I think that's the essence of almost any,
every BI tool is this metrics layer, this
metrics model and very few BI tools force you to build the, you know, define those metrics
from the start.
They kind of do it like as a side effect.
And so real puts that front and center.
And once you have that metrics model, everything else flows from that.
So you really, and really you actually don't build a dashboard.
Once you've defined your metrics and your business dimensions,
we actually give you a dashboard
without making any decisions.
And so I think that's the heart of it.
And so then it makes things like revenue week over week,
revenue month over month,
all of those time series comparisons of metrics
are made much easier because we're also,
you have to pick your time column.
And so there's some hard decisions that people make upfront but then we make you know the
downstream effects make a lot of things very intuitive and easy afterwards.
One other thing that I thought was brilliant that I've never seen before is when you
build out this metrics layer and then you like hit go the default is
essentially like I believe this is the default is essentially like,
I believe this is the default, or it's the option I chose. I can't remember.
But essentially, show me everything is an option, and then I will tell you what I don't want.
Which is different than most BI, it's like I'm going to drag and drop each little thing on here.
But the show me everything with a quick option option of like oh actually hide these five things is actually kind of refreshing
Because like I don't know what I want like let me see some things
So I thought again that was just another one of those things that was yeah, so neat design pattern
Yeah, I mean, you know
I'll say this we've been building a version of this product for probably 15 years
And so I think sometimes you know, they say where are good from? Oftentimes, I think a lot of founders are building a
product that they would have wanted, right? Yeah. Yeah. You
know, this is the product that I always wanted to have. I was a
data scientist, I was a data engineer. And so I always wanted
this product. And so we've learned the hard way over the
years what not to do. And some of these choices are, yeah,
school of hard knocks in the data analytics space.
Nice.
Awesome. Well, Mike, we're at the buzzer, as we like to say,
but we have so much more to cover.
So let's have you back on the show soon.
We can keep talking about AI and dig even deeper into RIL.
And where do people go to check out the tool?
Oh, yes. RILDATA.com.
to check out the tool. Oh, yes.
Rildata.com.
R-I-L-L, data.com.
You can download it with a single curl command.
So, yeah, I hope people will check it out.
Awesome.
Well, thanks again, Mike.
Thanks, Eric.
Thanks, John.
Great to be here.
The Data Stack Show is brought to you by Rudder Stack.
Learn more at rudderstac.com