The Data Stack Show - 193: Introducing the Cynical Data Guy: Is Data-Driven a Myth?
Episode Date: June 12, 2024Highlights from this week’s conversation include:Introducing a special edition of the show with the cynical data guy (0:19)Metadata and LLMs (2:32)Data-driven culture (8:44)No-code orchestration too...ls (17:09)No Code vs. Low Code (19:58)Enterprise Challenges with No Code Solutions (20:08)No Code Tools for Small Companies (21:40)Inappropriate Use of Tools (23:06)Final thoughts and takeaways (24:05)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Transcript
Discussion (0)
Hi, I'm Eric Dotz.
And I'm John Wessel.
Welcome to the Data Stack Show.
The Data Stack Show is a podcast where we talk about the technical, business, and human
challenges involved in data work.
Join our casual conversations with innovators and data professionals to learn about new
data technologies and how data teams are run at top companies. Welcome back to the show. We have a very special kind of episode
that we're going to try and start doing monthly. And you're going to love the name for this show.
It's called The Cynical Data Guy. And it's because we have a special guest who is
the Cynical Data Guy. Matt Kelleher Gibson is joining John and I. So Cynical Data Guy,
welcome to the show. Thanks for having me.
I'm so excited. I can't wait to dig into it. But we do need to give the listeners some context here. So how did Cynical Data Guy come about? Well, first of all, John's data consulting practice is called Agreeable Data. So we kind of jokingly call him, you know, the Agreeable Data Guy. stack from deep in the bowels of corporate America doing all sorts of data stuff. And you have tons
of war stories and tons of scars from trying to do data at very large organizations. And so when
we talk about topics, you often have, let's say, a salty view that's been dashed against the rocks of corporate reality.
I prefer to think of it as a realistic view,
but others seem to disagree.
Well, of course, when John and Matt and I chat in the office
about topics or LinkedIn posts that we see,
we just enjoyed Matt's hot take so
much that we started jokingly calling him the cynical data guy. And then of course, one day
we stopped and said, this has to go on the podcast. So we got to do it. We got to do it. Okay. So here
we are. Here's the format. So I'm going to act as the moderator. We have the cynical data guy. We have the agreeable data guy.
And I just pulled some LinkedIn posts that I think are interesting topics to discuss.
Okay.
So Matt, I'm going to just present some of that.
I'm going to read some snippets here.
Okay.
Some of them I'm going to keep anonymous.
Others I won't for obvious reasons.
Okay.
Are we ready?
We're going to do, we're going to try and do three lightning rounds here.
So if I interrupt you and move on, sorry.
Not sorry, I guess.
Sorry, not sorry.
Definitely not sorry.
Let's do it.
I'm definitely not sorry, and neither is Brooks.
Okay, I'm going to read.
Here's the first one.
You ready?
Go. Yeah.
Metadata will have a profound
impact on the success of modern LLMs. With better assets, developers can leverage APIs to access and
utilize their organization's data more efficiently in their applications, enhancing functionality
and capabilities and streamlining the development of AI models. Okay. Metadata and LLMs.
I'm sure that's true.
Once CEOs actually start caring about metadata.
Do CTOs even care about metadata?
I don't think I've ever met a person
who cares about metadata
like in an actual corporate.
So is this just trying to sell software?
Is this a pipe dream? No. I mean, everyone just trying to sell software? Is this a pipe dream?
I mean, everyone's trying to sell software.
Is it a pipe dream? I don't know.
They'll probably say, hey, I can do it for
them, but I don't know that's going to happen.
What about the data quality aspect?
Yeah, so how much
data quality do you see corporate America
really investing in?
Not a lot.
In your job, in your most recent job prior to Rutter Sack, at a publicly traded company, did you ever use the word metadata in a meeting?
Like with a business user or ever?
Ever.
Might have come up two, three times in a year and a half.
Okay.
Truthfully, yeah, it doesn't come up that much.
Yeah, yeah, yeah.
I remember the first time I think I heard metadata
and I was like, what is that?
John, give us an agreeable take here.
I think that is the right answer.
But what percentage of companies have metadata in place
that AI would be useful?
Like today?
0.1 maybe, right?
I want to be there when you meet with another company
and say, let me tell you about metadata.
Metadata.
I want to see how quickly their eyes glaze over.
I think it's an easier sell than data quality.
I think that's my take on it, right?
Because data quality was a thing.
It's like, oh yeah, data quality got a better quality data.
Data you can trust.
That's still a thing.
But now it's like, really yeah, data quality, got a better quality data that you can trust. That's still a thing. But now it's like, you know, really
it's about the metadata. And people
don't understand metadata more than they
don't understand data quality.
So does that mean then metadata
is just going to become like, we're just going to shove
all of data quality into metadata?
It's like, you know, it's about
the metadata. Like, making
sure your pipelines don't break.
No, I think people... This is going to sound cynical your pipelines don't break i know i think people this is gonna
sound cynical you've corrupted someone in the first like minutes of the show no it's been longer
it's just been offline right yeah that is no i think when people sell this metadata has the
benefit of being less clear right because if your data quality like oh yeah data is wrong data is
right metadata is like people will use it and not know what they know what it means i think everyone's of being less clear. Because if you're data quality, like, oh yeah, data's wrong, data's right.
Metadata is like, people will use it
and not know what it means.
I think everyone's going to think
you're talking about the metaverse.
Is that like data in the metaverse?
Or meta, right?
The company, yeah.
All right.
Okay, so man, co-host corruption in six minutes.
That's a wrecker.
That is a wrecker.
Technically, John and I have known each
other for like eight years.
Yeah. So I've been working a long
time. Oh, that is true. That is very true.
Yeah, he's been
undermining it a long time.
Okay, so the future,
the very future of LLMs
is based on
an ambiguous concept that no one cares
about and that no he's actually have that
what i'm getting from this is the key to llms and selling it is picking terms that nobody
understands what they mean that's probably true for a lot yeah it is true but my agreeable take
on it is that there are some there's there there is some progress i think like in the bi space with
people doing some neat stuff with llms like zenlytic for example has a pretty neat semantic
layer that you can put on top of your data and then the llm interacts with the semantic layer
which does work better than like hey generate sql like yeah from GBT. That is the right answer. People are doing it.
But I think there's a lot of overhead.
And in some ways,
if you're a small company with not that much data
and the semantic layer is like,
well, all my data came from Shopify and my ERP,
somebody could do that for you, right?
And you could have something reasonably usable.
I think where you get into corporate, it's like, man, this is could do that for you, right? And you could have something reasonably usable. I think where you get into like corporate,
it's like, man, this is just like
an impossible amount of work.
So scalability, like everything.
Yeah, yeah, yeah, yeah.
But you can have those early like proofs of like,
hey, it actually worked for this like smaller company.
And then people will extrapolate,
oh, like, yeah, it's going to take over the world.
That'll be the dream of every,
the dream of, look, we did it with one data source. How hard could it be for one? Right, right, yeah, it's going to take over the world. That'll be the dream of look, we did it with one data source.
How hard could it be with one?
Right, right.
Okay, okay. Ding!
Moving on to the next round here.
I can't wait for this one.
I'm just going to read a snippet here.
This author will also remain anonymous.
I'm going to choose a couple of pieces here.
Data-driven people
do not equal people looking at dashboards.
You don't achieve data centricity through the wide adoption of a BI tool.
Skipping down a little bit.
While access to business data is a crucial first step in achieving a data-centric outcome,
it is only a small and early step in the overall journey.
And then where's the zinger here?
True data centricity or data drivenness is achieved
when there are tangible commercial and operational outcomes
stemming from the use of data at all decision-making levels in the business.
Are you using data to effectively generate more value for the business?
Are the top leaders openly asking asking what does the data say?
Or have we tested this assumption yet?
Okay.
So data-driven culture is a myth.
No, they just don't.
They don't care.
Everyone says they want it,
but when it really comes down to it,
you're fighting against
usually a VP or someone
who spent 30 years
fighting their way to the top of that corporate structure
and they haven't been using data.
The idea that they're going to suddenly care
about what data sells now, kind of naive.
John?
Yeah, I don't even respond to that.
He stunned you.
The correct response is, why, yes, that's correct.
What comes to mind is the whole, like, data-informed thing.
Like, it was like, we need to be data-driven.
And then it's like, let's pump the brakes a little bit
and, like, go back to, like, data-informed
because there's this, like, space for, like, intuition and blah, blah, blah.
Which I got to hear your take on that before I move on.
But what's your take on the phrasing?
Well, I mean, data-informed?
What does that mean?
I gave you data.
Yep.
I'm doing whatever I want anyways.
Okay.
You were informed.
Right. right so i think the take on it for me is data is absolutely helpful from a forensic standpoint
like i need to find out what happened super helpful to have it's helpful from a behavior
standpoint of like almost all of us have like watches now that like track steps and if you
track steps like do you walk more yeah like you do if you care about it right so i think that
type of data is super useful and like that is a form of data driven of like we have a goal like
we're not like really clearly like tracking this activity and we need to get here every single day
like that's a wonderful use of data like the stuff beyond that where it's predictive or it's
you know like recommendations like where you get into like the AI ML stuff, I think mileage may vary.
Looking back on it, part of it, I've been in plenty of meetings
where it was essentially pick your metric.
Oh, we just did a big campaign. How did it do?
I don't know. Pick the top three metrics that show the best results. That's what we're
now saying.
And the data says, which is amazing.
Look at the number of views.
I mean, I've been in situations where, you know, literally,
like you're looking at stuff like year over year or whatever,
and it's down.
It's like, well, it's down, but it could have been down by more.
So, you know, really, this is a success
and we should roll this out everywhere.
And then that argument won.
So is it, okay, so the, okay,
let's talk about the executive who battled their,
you know, 30 years through the valley
to emerge on the other side
and they don't use any data.
Why aren't they using data? I mean, it probably
started out partially just because there wasn't as much available
when they were going up, right? Yeah. I mean,
this isn't like a, oh,
they just have always been
data's for nerds. Yeah.
I mean, they most
likely had to do stuff without it.
They either didn't know it was available or it wasn't available.
Yeah. And they had to make decisions. And one of the things we do as we're successful is we reinforce
and say, well, this is what got me here. So when you're then going to a person and saying,
hey, I know your gut or whatever you've been doing or however you've been going about it
has been working for the last 20 years, but I have some numbers that say you should do the opposite. You're fighting against nature.
You're going to win. One other thing that I've noticed is that a lot of really good executives,
if you break a business down into its most basic building blocks, there aren't a lot of numbers
that actually drive the business forward.
And there really are only a couple
that are mission critical
to move in the right direction
from the executive standpoint.
Now, there's, of course, like a ton of data
and a ton of stuff that like ladders up to that.
But I think that there can often be this,
you know, everyone needs to be really data-driven,
meaning there needs to be this mass democratized access to easily drill down reports and all that
sort of stuff. When in reality, that person's probably been successful because they know which
two numbers matter and they push aside literally everything except for the stuff that moves those numbers in
the right direction.
Right.
Yeah.
I think also it's one of those if you are in a position where, you know, you're in a
company and like, just to be honest, the VPs are probably only going to be on your side
if the numbers agree with what they want to do.
Your best hope long term is to start going at people who are still early in their managerial
career where they're still forming these habits and what do they trust and work with them they're
more likely to be open and they're more likely to to work with you and see opportunities and ways
they can make better decisions with that.
I mean, it's a little bit of like,
there's also probably a chance they're going to move on to another company in three years,
but it's still a better approach
that you're going to have
than trying to really convince that 65-year-old CEO,
you really got to trust my numbers right here.
Yeah, I think tying things to financials
is like the best way
to be data-driven
in most companies
because the numbers that matter
are like profit and loss.
Like if you're a VP over something,
like it's whatever your profit
and loss is going to show up
at the end of the month,
at the end of the quarter.
Yep.
So if you can say like,
hey, these are drivers that impact pnl then like that's i think a conversation you can have and get a vp on your side of like oh
like okay cool like yeah yeah we should work on this we should track this but what i think matt's
referring to which yeah i've sat in those meetings too where it's like mark like pick on marketing like marketing's not doing well like again like this quarter and
they like just rotate through vanity metric after vanity metric of like views like switching out
views and sessions on row as like high row as unlike campaigns that like were like a hundred
dollars and like high views on campaigns with awful ROAS, just like the
shuffle.
Totally.
Totally.
Just say it, Matt.
You shied away from the mic.
Well, I will say it also matters when you catch them because I've literally sat in meetings
where we were giving a very financially based, it had to do with the pricing thing.
And the meeting started with the CEO saying well you know we've got to do whatever
the data tells us we showed them why raising prices was not going to be a good idea and
literally the decision was well but we put it in the budget at the beginning of the year
so if you don't catch it at the right time it's like yeah but the budget says that right yeah
you're not going to hit those numbers yeah but that's what the budget is I mean we got to do with
the budget yeah it has
a very like you know I don't make the
rules I just think them up and write them down yeah
I agree timing matters yeah
yeah yeah okay lightning round
three are we ready
go for it yeah okay
I'm going to read this phone actually this is a great
this is great so I'm going to mention
so this is Adam Lenning who's a data platform engineer at Ben Labs. And I'm just going to read this. It's
kind of a short post, but it's great. And it ends on my question for the cynical data guy.
Ever heard of a tar pit idea? Basically, a tar pit idea is one that seems very appealing,
and many people have tried to make it work,
but ultimately no one has really achieved product market fit. Already so good, huh?
So as I've been thinking about it, I'm starting to believe that no code orchestration tools and no code ETL tools may be tar pit ideas. Tools like Fivetran, Airbyte, Gather, and literally a billion
others all claim to handle moving data from A to B with low, no code, and they work in 90% of use cases.
The issue lies in the last 10%, which will almost always need a code solution.
Whether these tools may be useful, if we always need two plus tools to get data into our warehouses,
are they really worth it?
I think many people would argue yes, but what's your take cynical data guy are the is that a tarpon idea yes yeah i mean yeah
there and i think there's a lot of those out there i mean you know anything text too i feel like a
lot of times it's just got that siren call for a lot of technology people and when you actually sit there you go who actually
cares about this yeah so that's yeah there's a lot of those out there and they get they just
get recycled over and over again what break down the no are you skeptical of no code i mean you've
built tons of data pipelines in your career but you've also interacted with, you know, non-technical users
or semi-technical users, is the no-code thing a myth?
Yeah.
I mean, let's be honest.
When you're like, well, we've got non-technical users and they can do this.
Do you really want non-technical users building this stuff with a bunch of building blocks?
I don't think you actually want that most of the time.
Eventually, you need someone with more expertise.
I mean, you know, it has the feeling of basically saying,
hey, we're going to build data like McDonald's,
where we're going to just have a handbook
and anyone can cook a burger then.
I just don't think that works.
Agreeable data guy?
On the no code side I
really like it
or like embedded
analyst like I can think back
to a role I had where we had
an analyst that sat like on the floor
with ops like was there
probably at the company like seven years
and like
did so many really incredible
stuff with I'd call probably low code would be the right thing.
And by the end, he could kind of code.
Did some really incredible operational stuff.
From an IT and governance and quality,
well, quality was decent,
but IT governance perspective,
scalability, I mean, awful, right?
It just doesn't work.
But from a business knowledge capture
to something IT people could take
and then go do it the right way and scale and stuff like I think that's a great use case.
And what companies end up doing is they let that get out of control.
They hire a bunch of analysts and they all have their own things and like that's where it becomes like a big problem.
Also there's a difference between no code and low code.
Low code gives you some ability in there.
No code is like just trust us. It's all gonna
work. That's a problem.
Which I would argue there are actually very few
no code solutions out there.
Alteryx is a great example
of like it has a GUI
but that is not a no code solution.
They've got all sorts of little
spots.
But to go back to what you said, it comes down
also to your discipline with it right
so as you said if you have one person who's really there and they can do these things and then it
gets handed off that's great the way this stuff typically ends up happening and the way it gets
sold to people to a certain extent is well you can give this to everybody you can get it everywhere
and now oh and we'll sell you the tool that will help orchestrate collect all of
these which are all just out of control and they're just duplicates of the same four things
with slight changes in them and it just becomes this mess that's someone that then they try to
hand off to data or it like you can just fix that i think specifically in the orchestration space, I think Fivetran does a really nice job of no code.
They do a really nice job.
They have the ability to do low code.
Well, it's not even low code.
They have the ability to do custom connectors
in their platform.
I think they're a really good example of if you're,
again, things break down at scale mainly for them
mainly at cost right like it's just very expensive to run them at scale but for small companies like
they do a nice job of connectors that a lot of these small companies need they're all together
like yeah it can get a little expensive but it really is a pretty much a no code and i think
it's no kind of like no your level too. It's perfectly fine if you're
a small company or a startup and you just
need these tools in order to get it.
It's when you insist on trying to use
them as you get bigger and more complex.
Or the other one is, well,
we bought it for one small part
of the organization. I think
that's a big trap you can get into.
We just need it right here,
but eventually it's got to come outside if it's going to be there. A lot of enterprise stuff,
there's a reason they've hired teams for a lot of this. They have a lot of edge cases. It's got to
be fit to their exact specifications. You're just not going to get that with no-code stuff.
Yeah. I agree with Adam's post that, like many topics,
it's very dependent on the context, but completely no code.
I agree. I don't think it's realistic for this stuff.
But the one thing I would say is if we always need two plus tools
to get our data into warehouses,
are they really worth it?
But the reality in the enterprise
is you're going to have far more than two different people.
It's not like you can,
I mean, I think that is actually one of the fallacies
of like you should have one single tool that handles.
I just don't, I mean, is it possible?
Yes.
Is it reality?
I don't think so. Yeah yeah I think it's less like oh we have to have one tool that does this and it's more the idea of like
people are going to want to use this stuff in inappropriate ways and can you contain that
right you're like well we just use it for this really really I mean so like the biggest example of this to me
is every like Jupiter notebook
ish type thing where they're like
oh well but you're not supposed to use this
for production I'm like really then why
does it say schedule notebook
like oh it's really good in this
isolated situation but we
made it so that you can use it hog walk
it just doesn't work.
It's like, this is just for development.
We'll put no restrictions on it to use it in production.
All right.
Well, I think we're going to schedule this episode
to go into production and end on a high note.
Cynical Data Guy, thanks for joining us
and we'll see you again in a couple of weeks.
Great to be here.
The Data Stack Show is brought to you by Rudderstack,
the warehouse-native customer data platform.
Rudderstack is purpose-built to help data teams
turn customer data into competitive advantage.
Learn more at rudderstack.com.