The Data Stack Show - The PRQL: Will we ever get rid of the CSV?

Episode Date: November 8, 2021

...

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to a Data Stack Show debrief. We've done a handful of these. This is where we talk about sort of the high-level topics that were discussed on a show, and Kostas and I can kind of go back and forth and maybe get a little bit more opinionated on the topics that were discussed. Okay. Here's my question for you, Kostas. We were talking about data design. I learned a ton of things, but the subject of loading CSVs came up and column naming came up. Okay. Which I think has just been the perennial problem of dealing with data since the beginning of a CSV file. But here's my question.
Starting point is 00:00:50 We kind of laugh about that because it seems archaic. I think a lot of companies are still doing this. In fact, I would say most companies are still somewhere in their business process are loading a CSVv somewhere yeah of course and they will never stop doing that i don't think that this will ever like disappear there is a reason that we have so many csv files out there it's a very easy okay like data format to generate and handle in a way but okay okay, it becomes like really problematic like when you load this into a database and then you start like working with it
Starting point is 00:01:32 because you don't have any kind of like the strict guarantees that you get from using like a proper data management system like a database when you're managing your data. We need always to remember that probably the most convenient or the most well-known data manipulation system on Earth is Excel and Google Sheets, right?
Starting point is 00:01:55 And I don't think that this is something that's going to change anytime soon. And the data model behind these systems is like a CSV at the end. That's how you interoperate with these systems if you want to load and unload data. So we will keep having to do that. And okay, it's fine. I mean, I don't think it's so bad. I think there are probably worse things out there.
Starting point is 00:02:21 But it's funny, it keeps coming to my mind, this kind of pendulum kind of behavior that like we have with that stuff so i don't know how many of our listeners like remember like xml right so like the xml format is like the complete opposite thing from a csv like everything is so it's so verbose everything has to be so strict you have a schema you have attributes you have objects you have trees you have like it's crazy right so the first attempt like humans do like okay this is a mess let's formalize it to death like to a level that like let's lock it down yeah and it's not useful anymore it's just like so strict that's like it's just like kills any kind of like productivity or like ability
Starting point is 00:03:13 like to maintain or extend like the systems and we keep going from one extreme to the other and like until we find the right balance right yeah so i think that's like what is like very very interesting for me like when i see how technology is like developed because you can find many kind of these extremes like with many different like technologies and of course you can find that like with with data right like another example is like the no sql movement we had like the relational databases. We were like, no, too strict. Let's go to no SQL. Then, oh shit, like doesn't work. Let's go back. And then we have this new SQL thing, which is something in between. And yeah, it's an interesting, very interesting back and forth until we find like through experimentation, what works at the end. Yeah.
Starting point is 00:04:06 Okay, second question. We probably have to end on this one because these things are supposed to be five minutes. But second question. So the one would be super interesting. Yeah, I guess we run the show. We can go as long as we want, right? Debrief for an hour. The other question.
Starting point is 00:04:21 So a really interesting point was brought up in that sort of the end user interacting with the data, be that the end user of a consumer mobile app, or I think the point that was brought up that was really interesting was thinking about your own employees as just because of the nature of, of humans interacting with data, different processes, all that sort of stuff, there tends to be problems that come with humans that are interacting with or manipulating data that they have to do because it's part of their job. Right. And so the point was brought up that actually restricting that with pretty tight guardrails is empathetic, which I thought was a really interesting concept because it solves a lot of downstream problems. But I'm going to raise a counterpoint and I want your opinion. There's a trade-off. I don't disagree that if you lock everything down sort of on the interface side, and goodness gracious, like everyone who's still using Salesforce Classic knows that there are problems in the interfaces that employees inside of companies and users of data experience and that create major challenges. So I think the basic point is very valid.
Starting point is 00:05:41 But there needs to be a level of flexibility. So I'll just give you one example that came to mind when we were recording the show. Let's say you have different salespeople within an organization, right? And they're going to follow the high level process that's prescribed and maybe even mandated by a business, but they're going to have their own individual style. And their individual style actually relates to the way that they're going to have their own individual style. And their individual style actually relates to the way that they're going to interact with data that's in Salesforce or HubSpot or whatever tool they're using. And in reality, if you let them use their own style, they will probably become more productive, which means the business makes more money. Or at least I think there's a very strong case for that. How do you find that balance, right? The end interface needs to be
Starting point is 00:06:30 locked down so that you avoid downstream problems, so that you don't have variations in process. I mean, there are a lot of things like that, but if you lock it down too hard, you actually inhibit people from sort of doing their best and most creative work. Your thoughts? Yeah, 100%. I totally agree with you. Like it's very easy to end up instead of offering guardrails to introduce bureaucracy, even if you're using like technology, right? And I'll give you an example as a customer, okay? How this might affect you. Like, I don't know if you have ever experienced like going to a T-Mobile shop or like an AT&T shop. You want to buy like a new subscription or a new phone and you have this like happy employee
Starting point is 00:07:11 there coming with the tablets and like, yeah, let's activate your new account, right? And suddenly there is a gap in time where they're typing, you're waiting, waiting, waiting, waiting, things do not work very well maybe they have to make a phone call like too many guardrails there you see what i mean like that's exactly what they're trying to do and you can see that it even might have like impact to the end like i i mean okay i'm not annoyed that much because I don't have to do this every day, but I always think, okay, why did it take so long?
Starting point is 00:07:48 What is the person doing? It's exactly like all these forms and all these guardrails that they are trying to put on very complex CRM systems that make it difficult for the people to do their job at the end, right? Yeah. And in many cases, they just bypass it again. That's the thing that whatever you are like to do their job at the end, right? Yeah. And in many cases, it's just by basket again. Like that's the thing that like, whatever you are going to do,
Starting point is 00:08:07 you will reach like a point where you will have like hot set or like the missing returns or something. Like it's not going to add any additional value and still errors will exist. So you will still have to go and put like all these checks on your database systems, run your scripts, clean your data,
Starting point is 00:08:27 and all that stuff. That doesn't mean that we don't need the guardrails, but I'm just trying to reinforce that it's really hard to find this trade-off and there is a trade-off there. Totally. Yeah, that's a really good point. was just thinking about um just a couple calls recently i've had with the utility company right because i need to update something on the account where we pay our water bill and it's so common to hear hope i'm sorry hold on like my computer's taking a while to do this or you can tell that they're looking up information and so yeah that's such an interesting point. All right.
Starting point is 00:09:06 Well, we're probably over time. Super interesting episode. Subscribe to the podcast if you haven't already, and you'll get notified when the episode comes up. Data Design with Kevin Gervais, and we will catch you soon. you

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.