The Data Stack Show - Shop Talk: Is It Possible for Excel To Die?
Episode Date: October 10, 2022In this bonus episode, Eric and Kostas talk shop around the wide world of data. ...
Transcript
Discussion (0)
Welcome to the Data Sack Show Shop Talk.
I believe this is our third Shop Talk, Kostas, if I'm not wrong.
And I love this format because so far we actually have not done any pre-work
and we each bring a question that the other person has had no time to think about, which makes for a really good conversation.
Yeah, it's fun.
I really enjoy it too.
And yeah, we'll continue doing it.
Oh yeah.
Okay.
It's my week.
So here's my question for you.
What do you think it would take in terms of data tooling, let's say, or data tooling, it's probably not the best word, but in order for there to be a world where Excel largely goes away, Excel, Google Sheets?
Because it's like the most widely used data application in the world.
Yeah, I think... I don't think that it will ever go away.
I think that's like a...
Why we should do that?
Why...
I'm not saying we should.
I'm just saying, what do you think it would take?
What would that world look like?
Yeah.
I don't think that it has to do with tooling, to be honest.
I think it has more to do with, let's say, access to technology and like
how easy it is like to use technology.
The way that I think about it, it's like, you can think of it like as a pyramid.
And it's sort of like about Excel. Like you probably's like, you can think of it as a pyramid. And it's not just about Excel.
You probably have, like, okay, people are debating.
Is Python what we need for data, or is it SQL?
And why we don't do everything with SQL?
And we also need Python.
And I think that's like a false question. We shouldn't be asking that. What is actually happening
out there is that as you have this pyramid of needs, remember Matlow, whatever, how it's
called this thing, think of it in a similar way. And you have at the base, you have Excel.
And that has to do with how accessible Excel is.
People out there, pretty much everyone can use Excel.
You just need to know how to type in a computer to use it.
And then on top of that, you have SQL,
which is a smaller group of people that can use that.
But still, a lot of people out there can use that.
And then you have a smaller group of people that can use Python.
Now, these people also have different needs around data.
And that's what is important.
The person at the bottom who is like doing like
Excel stuff, they will never need like to go and do things with Python and like, what are they
going to do? Like use Python as a calculator? Like, like what's the point? I think people do
more in Excel than just use it. Yeah. Okay. It's an extreme example of what I mean.
Yeah, yeah.
Okay.
But like, let's say you are doing, like I'm using Excel, for example,
okay, to do my budget for the month, right?
Yep.
Are you going to do that like with Python?
You can.
But are you going to do it?
Like, I know Python.
Am I going to use Python?
No.
Why?
Because like, why I would do that?
Like, it's not built for that stuff.
Right?
So what I'm trying to say here is that like, there's such like a big
diversity of needs around data.
Yep.
I don't think that like the whole population of planet Earth is going to
turn into data engineers anytime soon.
So I don't see why they would need to use something like Python to do that.
Excel is perfect for this.
We have like, I don't know, like two, three generations of people right
now that they have been trained into this.
Like it's almost like intuition to use it.
So don't think about Excel as like the product itself.
Think of Excel, the spreadsheet model of like interacting with data,
which is like part of the way that we grow up now
and like the way that we learn how like to deal with numbers
and like how to do like things with data.
And I don't see any reason like of this going away.
Like it's a great tool.
Totally.
Costas' hierarchy of data needs.
Yes. Now, I do' hierarchy of data needs. Yes.
Now, I do think...
I like that.
It's a good idea.
Excel, SQL, Python.
Costas' hierarchy of data needs.
That's sweet.
Yeah.
Now, okay, so you bring up an interesting distinction.
And yes, you're totally right.
Like, it is an unfair question.
Just like Python versus SQL is an unfair question, right?
It doesn't, it unnecessarily oversimplifies an issue
and creates a comparison that actually doesn't help answer,
you know, it doesn't reflect the reality of,
you know, what's happening out there on the ground.
But I will say the example you gave, I think, is actually interesting.
So you gave the example of like, I'm building a budget, you know, like a basic budget and
a spreadsheet, right? I do think that there is a high possibility that the complex use cases
that spreadsheets are, and this is getting into semantics, but that spreadsheets
or Excel specifically are used for will be displaced. And I will caveat that by saying,
I don't know if the, I don't think the interface for those more complex use cases
will be replaced necessarily. but i do think the entire
infrastructure under the hood will in my opinion likely be displaced so like i'll give you an
example okay like a personal budget totally like people even use spreadsheets for like
planning projects or whatever right if i think about think about like marketing, like, you know, but like my budget,
right? For the marketing activities, right? I always start by modeling that out in a basic
spreadsheet. It's really simple, right? It's like you have 12 months and you know, the line items
and all that sort of stuff. But once you start to get into more complex equations, and you start to
like involve additional different types of data and
you're referencing across multiple tabs and then you get into you know like obviously vlookups
macros like you know you can i mean people literally build like software in excel which
is totally wild i think some of those more advanced use cases, and I actually think,
I can't remember the name of the company,
but I think there are some companies
that are literally just giving,
like provide a spreadsheet interface
that sits on top of an actual database, right?
So, which is really interesting.
So I do think that those use cases,
because, you know, it's like the power user set, which is like in between, there's like another layer.
We'll call it Eric's layer in Costas' hierarchy of data needs.
Excel, Eric's layer, SQL, Python.
Yeah. Because really they're like under the hood, modern databases and tooling, I mean,
whatever, even interfaces that can like generate complex SQL are becoming more and more common,
right? And there are more and more patterns around that, which I think is super exciting
because you can take an Excel power user and essentially give them a familiar interface on top of like a wildly powerful,
like sort of potentially infinitely scalable infrastructure that has all sorts of different
types of data. Right. And then you don't have to worry about file sizes. I mean,
like, I think that's super interesting. Oh yeah. Yeah. Like don't like, I need to clarify something here.
When I'm talking about like Excel, Python, SQL, I just consider them like the APIs.
Like the API that like a human interacts like with data, like cloud
intimidation happens behind the scenes, like a completely different story, right?
Like in the same way that you can have, let's say Spark and you can use, let's say
the Spark SQL, but at the same time you can also use PySpark or even like PySpark Pandas.
Like the processing engine behind is like the same, okay? Like the data that you can access
are the same, but like the API that you have to interact is like different, exactly because like the people involved are different and
like the interfaces that they have learned and they are like more intuitive
and like better for their use cases are different, right?
So yeah, like you can have behind Excel, I don't know, like a
supercomputer running, like whatever.
Right?
Yeah.
But what is important is the interface and like how the mental model that people
use to conceptualize like the data for each one of these like three different
let's say interfaces, so they got just like as an interface, like the rest is
like, yeah, I totally agree with you.
Like we can see, I don't know, like seats on top of like snowflake
or something like that.
Yeah.
Yeah.
Yeah.
Super interesting.
No, that it doesn't surprise me, but it's really helpful.
Like the mental model of thinking about those is actually just API's
with a different interface on top.
What?
Okay.
Another, another question so
i like i think google sheets is obviously like a fairly pervasive spreadsheet interface right
tons and tons of people use it and i don't have the numbers but this is shop talk so we don't have to actually be accurate. But I'd be shocked if actual Excel, Microsoft
Excel, as packaged software that runs on your hard drive, not in a browser, surely outstrips
Google Sheets usage by a massive margin, would be my assumption.
Mm-hmm. usage by a massive margin would be my assumption.
Do you think that... Well, and actually, this is interesting to think about.
I was thinking about your budget.
So when you think about Google Sheets and having cloud compute power behind a spreadsheet,
God, that sounded so buzzwordy.
Cloud compute power. Your spreadsheet with the power of a spreadsheet. God, that sounded so buzzword-y. Cloud compute power.
Your spreadsheet with the power of cloud compute.
Am I a product marketer working in data?
I think you are, yeah.
I'm waiting for the moment that you are going to use the term hyperscaler.
Oh, man.
Multi-node horizontal scaling?
Can you imagine Google Sheets, but with multi-node horizontal scaling?
Oh, that would be so good.
Okay, so one interesting thing, if you take the budget example, right,
is that if you take the paradigm,
if you basically adopt the paradigm of BigQuery ML that runs on BigQuery that enables non-data scientists to do very data scientist-y type things using SimpleSQL or whatever, it's not a huge step to actually think about that same model being applied to a spreadsheet right where when if you have a standardized
something that you're trying to do in a spreadsheet like a budget or you know something
of that nature like you could run like you could conceivably like think about a spreadsheet that
could like essentially use machine learning to help you do your task or whatever right you know like optimize your budget right like you have a template in your spreadsheet and machine learning to help you do your tasks or whatever, right? Optimize your
budget, right? You have a template in your spreadsheet and machine learning can actually
help you optimize your budget. It's kind of frightening to think about Google having access
to all that data, but do you think that something of that nature where machine learning type... I don't even know if assistance is the right word, but machine learning type,
I don't even know if assistance is the right word,
but machine learning enabled spreadsheet usage
could drive a lot of the offline packaged
software running on your hard drive online
in order to access that type of thing.
I mean, I always have the impression
that Wall Street
brands on a spreadsheet.
So, no, no, no, seriously.
Like I think that like the amount, like, let's say modeling and processing that
you can do like on a spreadsheet, like crazy, I mean, okay, we say they want the
mail and we think that like ML is a
immense recognition or something like that, but like, no, like 90% of ML
use cases, they are like statistical models that I mean, the financial
sector is doing that stuff like for just like forever, right?
And they are doing them like in Excel.
Like Excel is like a very expressive system.
Like there's no difference between like what you can do at the end, between
like SQL, Excel and Python.
Okay.
Like they are equivalent.
It's like with one of them, you can do something more than the other.
Like the question is like, how easy is to do it?
Or like how well it works with the rest of your tools that you have.
Whether it's capable of hyperscale.
Of course.
What I'm trying to say here is that, yeah, we can see that.
We can see, let's see, and probably if you go to the App Store for Google Sheets,
there might be tools that optimize your budget.
I don't know.
Maybe.
Right?
I think what is important here is that we need to understand deeply
why we end up having different interfaces
and what are the needs of the people behind each one of these interfaces.
And that's what will guide us in building, let's say, the right tooling or come up with
the right opportunities for business and all that stuff. Because yeah, like if you ask me,
do you think it's possible like
use Google Sheets as an interface
to go and do like ML training?
Maybe it is.
But like why?
Like you would be crazy to try and build that stuff
because no one who is actually building
and training models will ever care about that, right?
Yeah.
And the opposite.
Like can I come up like with the Python library that
does budgeting for my household?
Yeah.
But like, I don't know, do you want to go to your father and like give him like
a Python library to install with pip to go and like budget what to buy from Costco
next week?
Like, I don't think so.
I mean, you can try.
Right. that we have to buy from Costco next week? I don't think so. I mean, you can try. Right?
I'm just thinking about sitting down
to work on the budget with my wife,
and I'm like, it's pip install and make a meal.
Right?
And really, we just need to acknowledge together that we need more milk, but we're
not having to run notebooks.
I love it.
Yeah.
Like why not?
Anyway, I think these interfaces, like seriously now, I think they're like a very
interesting window into like the needs of the people behind them.
And like humanity, let's say like has matured enough, like to cut, like
creating clear boundaries between like different groups of people based on the needs that they have.
And that's where like opportunities are for productization, right?
It's like, if someone wants to build a business, like figure out like a product,
that's what like, where the opportunity is, go and like figure out what is missing from there and build it.
Stig Brodersen I agree.
All right.
If anyone listening to this has a great idea based on this, then we want at least
a sliver of the equity since we helped encourage.
Alex Bialik- Ah, yeah.
And please like, if you mentioned
the hierarchy of data needs.
Royalty.
Under reference to Data Slack show, okay?
Yes, royalties.
Kostas needs to work that into his budget.
Yeah, let's put some virality to this show.
Come on, let's do that.
All right.
Well, thank you for joining us on Shop Talk.
We'll have more good banter for you coming up in future episodes. Catch you on the next one.