The Data Stack Show - Shop Talk: Is It Possible for Excel To Die?

Episode Date: October 10, 2022

In this bonus episode, Eric and Kostas talk shop around the wide world of data. ...

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the Data Sack Show Shop Talk. I believe this is our third Shop Talk, Kostas, if I'm not wrong. And I love this format because so far we actually have not done any pre-work and we each bring a question that the other person has had no time to think about, which makes for a really good conversation. Yeah, it's fun. I really enjoy it too. And yeah, we'll continue doing it. Oh yeah.
Starting point is 00:00:35 Okay. It's my week. So here's my question for you. What do you think it would take in terms of data tooling, let's say, or data tooling, it's probably not the best word, but in order for there to be a world where Excel largely goes away, Excel, Google Sheets? Because it's like the most widely used data application in the world. Yeah, I think... I don't think that it will ever go away. I think that's like a... Why we should do that?
Starting point is 00:01:14 Why... I'm not saying we should. I'm just saying, what do you think it would take? What would that world look like? Yeah. I don't think that it has to do with tooling, to be honest. I think it has more to do with, let's say, access to technology and like how easy it is like to use technology.
Starting point is 00:01:39 The way that I think about it, it's like, you can think of it like as a pyramid. And it's sort of like about Excel. Like you probably's like, you can think of it as a pyramid. And it's not just about Excel. You probably have, like, okay, people are debating. Is Python what we need for data, or is it SQL? And why we don't do everything with SQL? And we also need Python. And I think that's like a false question. We shouldn't be asking that. What is actually happening out there is that as you have this pyramid of needs, remember Matlow, whatever, how it's
Starting point is 00:02:18 called this thing, think of it in a similar way. And you have at the base, you have Excel. And that has to do with how accessible Excel is. People out there, pretty much everyone can use Excel. You just need to know how to type in a computer to use it. And then on top of that, you have SQL, which is a smaller group of people that can use that. But still, a lot of people out there can use that. And then you have a smaller group of people that can use Python.
Starting point is 00:02:56 Now, these people also have different needs around data. And that's what is important. The person at the bottom who is like doing like Excel stuff, they will never need like to go and do things with Python and like, what are they going to do? Like use Python as a calculator? Like, like what's the point? I think people do more in Excel than just use it. Yeah. Okay. It's an extreme example of what I mean. Yeah, yeah. Okay.
Starting point is 00:03:28 But like, let's say you are doing, like I'm using Excel, for example, okay, to do my budget for the month, right? Yep. Are you going to do that like with Python? You can. But are you going to do it? Like, I know Python. Am I going to use Python?
Starting point is 00:03:43 No. Why? Because like, why I would do that? Like, it's not built for that stuff. Right? So what I'm trying to say here is that like, there's such like a big diversity of needs around data. Yep.
Starting point is 00:03:55 I don't think that like the whole population of planet Earth is going to turn into data engineers anytime soon. So I don't see why they would need to use something like Python to do that. Excel is perfect for this. We have like, I don't know, like two, three generations of people right now that they have been trained into this. Like it's almost like intuition to use it. So don't think about Excel as like the product itself.
Starting point is 00:04:23 Think of Excel, the spreadsheet model of like interacting with data, which is like part of the way that we grow up now and like the way that we learn how like to deal with numbers and like how to do like things with data. And I don't see any reason like of this going away. Like it's a great tool. Totally. Costas' hierarchy of data needs.
Starting point is 00:04:44 Yes. Now, I do' hierarchy of data needs. Yes. Now, I do think... I like that. It's a good idea. Excel, SQL, Python. Costas' hierarchy of data needs. That's sweet. Yeah.
Starting point is 00:04:57 Now, okay, so you bring up an interesting distinction. And yes, you're totally right. Like, it is an unfair question. Just like Python versus SQL is an unfair question, right? It doesn't, it unnecessarily oversimplifies an issue and creates a comparison that actually doesn't help answer, you know, it doesn't reflect the reality of, you know, what's happening out there on the ground.
Starting point is 00:05:21 But I will say the example you gave, I think, is actually interesting. So you gave the example of like, I'm building a budget, you know, like a basic budget and a spreadsheet, right? I do think that there is a high possibility that the complex use cases that spreadsheets are, and this is getting into semantics, but that spreadsheets or Excel specifically are used for will be displaced. And I will caveat that by saying, I don't know if the, I don't think the interface for those more complex use cases will be replaced necessarily. but i do think the entire infrastructure under the hood will in my opinion likely be displaced so like i'll give you an
Starting point is 00:06:13 example okay like a personal budget totally like people even use spreadsheets for like planning projects or whatever right if i think about think about like marketing, like, you know, but like my budget, right? For the marketing activities, right? I always start by modeling that out in a basic spreadsheet. It's really simple, right? It's like you have 12 months and you know, the line items and all that sort of stuff. But once you start to get into more complex equations, and you start to like involve additional different types of data and you're referencing across multiple tabs and then you get into you know like obviously vlookups macros like you know you can i mean people literally build like software in excel which
Starting point is 00:07:00 is totally wild i think some of those more advanced use cases, and I actually think, I can't remember the name of the company, but I think there are some companies that are literally just giving, like provide a spreadsheet interface that sits on top of an actual database, right? So, which is really interesting. So I do think that those use cases,
Starting point is 00:07:21 because, you know, it's like the power user set, which is like in between, there's like another layer. We'll call it Eric's layer in Costas' hierarchy of data needs. Excel, Eric's layer, SQL, Python. Yeah. Because really they're like under the hood, modern databases and tooling, I mean, whatever, even interfaces that can like generate complex SQL are becoming more and more common, right? And there are more and more patterns around that, which I think is super exciting because you can take an Excel power user and essentially give them a familiar interface on top of like a wildly powerful, like sort of potentially infinitely scalable infrastructure that has all sorts of different
Starting point is 00:08:14 types of data. Right. And then you don't have to worry about file sizes. I mean, like, I think that's super interesting. Oh yeah. Yeah. Like don't like, I need to clarify something here. When I'm talking about like Excel, Python, SQL, I just consider them like the APIs. Like the API that like a human interacts like with data, like cloud intimidation happens behind the scenes, like a completely different story, right? Like in the same way that you can have, let's say Spark and you can use, let's say the Spark SQL, but at the same time you can also use PySpark or even like PySpark Pandas. Like the processing engine behind is like the same, okay? Like the data that you can access
Starting point is 00:09:00 are the same, but like the API that you have to interact is like different, exactly because like the people involved are different and like the interfaces that they have learned and they are like more intuitive and like better for their use cases are different, right? So yeah, like you can have behind Excel, I don't know, like a supercomputer running, like whatever. Right? Yeah. But what is important is the interface and like how the mental model that people
Starting point is 00:09:30 use to conceptualize like the data for each one of these like three different let's say interfaces, so they got just like as an interface, like the rest is like, yeah, I totally agree with you. Like we can see, I don't know, like seats on top of like snowflake or something like that. Yeah. Yeah. Yeah.
Starting point is 00:09:51 Super interesting. No, that it doesn't surprise me, but it's really helpful. Like the mental model of thinking about those is actually just API's with a different interface on top. What? Okay. Another, another question so i like i think google sheets is obviously like a fairly pervasive spreadsheet interface right
Starting point is 00:10:16 tons and tons of people use it and i don't have the numbers but this is shop talk so we don't have to actually be accurate. But I'd be shocked if actual Excel, Microsoft Excel, as packaged software that runs on your hard drive, not in a browser, surely outstrips Google Sheets usage by a massive margin, would be my assumption. Mm-hmm. usage by a massive margin would be my assumption. Do you think that... Well, and actually, this is interesting to think about. I was thinking about your budget. So when you think about Google Sheets and having cloud compute power behind a spreadsheet, God, that sounded so buzzwordy.
Starting point is 00:11:05 Cloud compute power. Your spreadsheet with the power of a spreadsheet. God, that sounded so buzzword-y. Cloud compute power. Your spreadsheet with the power of cloud compute. Am I a product marketer working in data? I think you are, yeah. I'm waiting for the moment that you are going to use the term hyperscaler. Oh, man. Multi-node horizontal scaling? Can you imagine Google Sheets, but with multi-node horizontal scaling?
Starting point is 00:11:28 Oh, that would be so good. Okay, so one interesting thing, if you take the budget example, right, is that if you take the paradigm, if you basically adopt the paradigm of BigQuery ML that runs on BigQuery that enables non-data scientists to do very data scientist-y type things using SimpleSQL or whatever, it's not a huge step to actually think about that same model being applied to a spreadsheet right where when if you have a standardized something that you're trying to do in a spreadsheet like a budget or you know something of that nature like you could run like you could conceivably like think about a spreadsheet that could like essentially use machine learning to help you do your task or whatever right you know like optimize your budget right like you have a template in your spreadsheet and machine learning to help you do your tasks or whatever, right? Optimize your budget, right? You have a template in your spreadsheet and machine learning can actually
Starting point is 00:12:30 help you optimize your budget. It's kind of frightening to think about Google having access to all that data, but do you think that something of that nature where machine learning type... I don't even know if assistance is the right word, but machine learning type, I don't even know if assistance is the right word, but machine learning enabled spreadsheet usage could drive a lot of the offline packaged software running on your hard drive online in order to access that type of thing. I mean, I always have the impression
Starting point is 00:13:04 that Wall Street brands on a spreadsheet. So, no, no, no, seriously. Like I think that like the amount, like, let's say modeling and processing that you can do like on a spreadsheet, like crazy, I mean, okay, we say they want the mail and we think that like ML is a immense recognition or something like that, but like, no, like 90% of ML use cases, they are like statistical models that I mean, the financial
Starting point is 00:13:38 sector is doing that stuff like for just like forever, right? And they are doing them like in Excel. Like Excel is like a very expressive system. Like there's no difference between like what you can do at the end, between like SQL, Excel and Python. Okay. Like they are equivalent. It's like with one of them, you can do something more than the other.
Starting point is 00:13:58 Like the question is like, how easy is to do it? Or like how well it works with the rest of your tools that you have. Whether it's capable of hyperscale. Of course. What I'm trying to say here is that, yeah, we can see that. We can see, let's see, and probably if you go to the App Store for Google Sheets, there might be tools that optimize your budget. I don't know.
Starting point is 00:14:26 Maybe. Right? I think what is important here is that we need to understand deeply why we end up having different interfaces and what are the needs of the people behind each one of these interfaces. And that's what will guide us in building, let's say, the right tooling or come up with the right opportunities for business and all that stuff. Because yeah, like if you ask me, do you think it's possible like
Starting point is 00:15:07 use Google Sheets as an interface to go and do like ML training? Maybe it is. But like why? Like you would be crazy to try and build that stuff because no one who is actually building and training models will ever care about that, right? Yeah.
Starting point is 00:15:23 And the opposite. Like can I come up like with the Python library that does budgeting for my household? Yeah. But like, I don't know, do you want to go to your father and like give him like a Python library to install with pip to go and like budget what to buy from Costco next week? Like, I don't think so.
Starting point is 00:15:43 I mean, you can try. Right. that we have to buy from Costco next week? I don't think so. I mean, you can try. Right? I'm just thinking about sitting down to work on the budget with my wife, and I'm like, it's pip install and make a meal. Right? And really, we just need to acknowledge together that we need more milk, but we're not having to run notebooks.
Starting point is 00:16:09 I love it. Yeah. Like why not? Anyway, I think these interfaces, like seriously now, I think they're like a very interesting window into like the needs of the people behind them. And like humanity, let's say like has matured enough, like to cut, like creating clear boundaries between like different groups of people based on the needs that they have. And that's where like opportunities are for productization, right?
Starting point is 00:16:41 It's like, if someone wants to build a business, like figure out like a product, that's what like, where the opportunity is, go and like figure out what is missing from there and build it. Stig Brodersen I agree. All right. If anyone listening to this has a great idea based on this, then we want at least a sliver of the equity since we helped encourage. Alex Bialik- Ah, yeah. And please like, if you mentioned
Starting point is 00:17:07 the hierarchy of data needs. Royalty. Under reference to Data Slack show, okay? Yes, royalties. Kostas needs to work that into his budget. Yeah, let's put some virality to this show. Come on, let's do that. All right.
Starting point is 00:17:20 Well, thank you for joining us on Shop Talk. We'll have more good banter for you coming up in future episodes. Catch you on the next one.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.