Postgres FM - Modern SQL

Episode Date: February 2, 2024

Michael is joined by Markus Winand, creator of use-the-index-luke.com and modern-sql.com, and author of SQL Performance Explained, to discuss Modern SQL — what Markus means by it, why it's ...important, some benefits, some examples, and at least one phrase that should be on a t-shirt. Here are some links to things they mentioned:Modern SQL (site) https://modern-sql.comUse the index, Luke! (site) https://use-the-index-luke.comSQL Performance Explained (book) https://sql-performance-explained.comOur episode on NULLs https://postgres.fm/episodes/nulls-the-good-the-bad-the-ugly-and-the-unknownIS DISTINCT FROM (null-safe not equals comparison) https://modern-sql.com/caniuse/is-distinct-fromUNIQUE NULLS DISTINCT https://modern-sql.com/caniuse/unique-nulls-distinctModernes SQL ist mehr als SELECT * FROM (a German language podcast Markus was on recently) https://engineeringkiosk.dev/podcast/episode/99-modernes-sql-ist-mehr-als-select-from-mit-markus-winandMarkus’ website https://winand.at~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is produced by:Michael Christofides, founder of pgMustardNikolay Samokhvalov, founder of Postgres.aiWith special thanks to:Jessie Draws for the elephant artwork 

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to PostgresFM, a weekly show about all things PostgresQL. I am Michael, founder of PGMustard, and today I am joined by Marcus Winnand, creator of the popular sites UseTheIndexLuke, ModernSQL, author of SQL Performance Explained, performance trainer and consultant, and according to your LinkedIn, SQL Renaissance Ambassador. It's a pleasure to have you here today, Marcus. Thank you for joining us. Yes, thank you. Awesome. Well, I suggested the topic this week, but it is very much your topic. It's modern SQL.
Starting point is 00:00:30 And I'd love you to give us an intro into what you mean by modern SQL and also how you came to be such an ambassador for it. So it basically goes back when I realized that many, many people think that the old SQL, so what is more technically described as SQL 92, is pretty much the only SQL. We have joins, yeah, we have them for decades, and people think that's it because that's what they learned decades ago and so on. They didn't realize that the story didn't stop there. It went on. And they didn't realize that the story didn't stop there. It went on. And we got what I refer to as more modern features, which are not rooted in the relational idea,
Starting point is 00:01:13 like window functions or recursive queries, or more modernly like JSON support, document support at all. We have XML for quite a while in SQL. This is what I kindly refer to as modern SQL, as opposed to the grandparents SQL everybody has learned like decades ago. So do you see this more as an issue with people that have been around for a few decades and got stuck in their ways? Or do you see it also with new folks coming along and getting taught by those people or the lessons they're learning only include the most basic rudimentary SQL? I would draw the line a little bit differently there.
Starting point is 00:01:53 I think the problem is more related to ORM users and SQL users. So you see, SQL users, there are also two kinds of users. Those who just learned it once and never get an update. That definitely exists. And SQL users. So you see, SQL users, there are also two kinds of users. Those who just learned it once and never get an update. That definitely exists. But I think the most problematic area is actually in the ORM world or rooted in the ORM world, where the people just really think because the ORM tool is doing the best because, well, they are specialized on using SQL. They're authors of the ORM tool. So this is probably the best you can do. But that's unfortunately totally wrong.
Starting point is 00:02:32 So the whole idea that SQL is just about relational things is just outdated. That was right in 92, but it was already wrong in 99. And somehow all the people managed to miss that message. They didn't get the memo, so to say. Yeah, that's such an interesting point. I hadn't put two and two together with, I've listened to your talks before on this around the change from being relational only to being able to handle other data types or other data models. But the R in ORM, is that what you mean by the relational part of it? Yeah. Also, the whole idea that the database is just for persistency.
Starting point is 00:03:17 Look at Java, JPA, the Java Persistency API. Just the idea that the database just stores data. That's a wrong idea for SQL. It might be a right idea for some other systems. But for SQL, the fundamental idea of SQL is that we have a separation of the persistent data layout and the transient data layout that we need for one query right now. Because one other thing that I like to put emphasis on is that SQL is not a query language. SQL is more like a transformation language. If you look at query languages, go to any web page, open the CSS and look at the selectors of CSS.
Starting point is 00:04:05 This is a query language. A query language can pick out one small piece of something larger, and that's it. But now look at SQL. With SQL, we are most of the time focusing on transforming data, transforming the persistent data we have stored on the disk to last forever, more or less, to answer a question we have at hand right now. That might be a different question tomorrow. It's most certainly a different question.
Starting point is 00:04:32 And the whole idea of SQL is that we normalize. Yeah, there is this bad word, normalization. We normalize the data persistently on the disk so that the shape of the data does not change often. The shape not, the data itself, the contents of the table, but not the shape, not so often. Now compare how static is the on-disk schema if you have probably normalized it compared to the questions you try to answer with those data. You get everyday new questions from marketing or from reporting or from whatever. And the idea of SQL is to have a flexible transformation between the rather static schema you have on disk, because once you have collected terabytes of data on disk, then it's hard to change the model you have there.
Starting point is 00:05:18 No matter how you do it, it will be hard compared to the dynamic we have on the question side. And SQL is the glue between these two layers. One thing you mentioned there is normalization being a bad word. I'm curious what you meant by that. Yeah, with normalization, all of us have learned these levels. First normal form, second normal form, third normal. Okay. Of course, that's totally correct and useful.
Starting point is 00:05:46 But to another extent, for the layman people, just causing more confusion than it is helpful. What I like to explain normalization is just, okay, don't focus on the third normal form or on any of those. Just try to figure out in which tables would the data feel comfortable. Yeah. Just try to figure out in which tables would the data feel comfortable. And naturally, there are some things like if you have a one-to-many relationship, then it needs to be two tables in SQL. Just keep it on that level. Don't go into the scientific definitions of all of these six normalization levels and some extras. Try to keep it simple, and you will still get the gain from the normalization yeah so that you have a rather static schema on disk nice i really like in what tables
Starting point is 00:06:32 with the data feel comfortable i'm i'm gonna steal that or i think that should be on t-shirts if you if you make them i'll buy awesome right back to the modern s modern SQL topic a little bit more. You mentioned that a lot of people are stuck a few decades ago with very basic SQL. The counter argument might be, well, they're solving their problems. They can do a lot of things. What are the big downsides or limitations for those folks who haven't explored some of the newer features that have been added. I'll give you an example. With SQL 92, it was quite common to do self-joins. It was required to do self-joins because some problems could only be solved by using a self-join. But self-joins are troublesome in two ways. First of all, they are complex. And second of all, they are slow because they are touching every data item twice. And now we have pretty much for every situation, we have some other syntax, most importantly window functions.
Starting point is 00:07:35 Window functions are one of the big self-join killers. And now they become – actually, the syntax is more concise. It's more clear, actually, when you look at it, rather than scratching yourself this way. You just scratch yourself this way. So it's actually better to read the query. It's actually also better to write the query once you know the syntax. And it will perform better.
Starting point is 00:07:59 And you will have less maintenance nightmares because you don't have the redundancy in the code. Like with a self-join, if you add a condition to one of the tables, you might need to add it to the other one as well or to the join condition. And this is easy to forget. Then you get wrong results. Yeah, I love those. So clarity and performance. So clarity not just for yourself writing things, but also for people coming back to it, reading it yourself in future.
Starting point is 00:08:24 And performance is a man after my own heart on that front i think a lot of people assume that or looking at new features they assume they'll get new functionality and forget that those other two things are the bigger part and because of the way sql works we likely could have done it before. That's a really good point. But it's actually true because SQL 92 was from one perspective, it was quite complete. So it was completing the relational idea. And of course you can solve almost everything, asterisk recursion,
Starting point is 00:09:00 almost everything with the old SQL. And that also makes it difficult to learn new stuff because I can solve it already so why do I need to learn something new that worked for me like decades it's still working so what is it about and that's of course a little bit more difficult to explain people I love how simply you did it
Starting point is 00:09:24 so clarity of what's happening when you look at the sql and performance what can they what what choices does the planner have when executing that yeah really cool and also i guess edge cases as well like these features can be designed to be more well we've done whole episodes on handling nulls and they can have thought about those more than, or if you have to roll your own versions of these features, you have to handle nulls yourself, which can be painful to say the least.
Starting point is 00:09:58 So one of my favorite features in this regard is the is not distinct from operator. It's the knife safe operator. So that if you have two knives on both sides that you still get through yeah you could say it's a modern sql feature according to my definition it was added after sql 92 and if you look at on my website modern sql you can check out which systems support that operator and yeah unfortunately not so many. Although I consider it something quite essential. But still, also the vendors didn't get the memo
Starting point is 00:10:32 that there is something new they need to catch up with. Dare I ask, I actually haven't used it. Is it in PostgreSQL? Do you remember? It is there. It does work syntactically. You get the right results. Okay. But there are some gotchas when it comes to optimizations.
Starting point is 00:10:50 Interesting performance. There we go. So I have had just like a month ago, I've had a case where I would like using it, but it didn't work out for performance reasons. That's so good to know. That's very very very interesting i love those charts on modern sql on when when you look up at one of these newer features and you have the charts of one row per database per dbms and then a timeline of it whether it supported it at which point it supported it,
Starting point is 00:11:25 whether there are any caveats. And every time I look at that, I think, wow, this is impressive. But also this must be so much work to maintain. I was going to ask, how do you do it? Yeah, I'm happy you asked that because this is one of the untold stories. So I'm just a one-man show
Starting point is 00:11:43 and doing this website as a side project actually i have kind of a day job as well yeah so the way i make it nowadays is that everything is covered by tests by tests that i can run automatically i test very deeply like i always check for the night case i always check for collations i always check for this and that and a lot of down to the SQL state. So quite a lot of tests. So right now it's, it's more than a hundred thousand test cases I have in my system. Wow.
Starting point is 00:12:13 But the good thing is I running then through like 10 databases. Yeah. And yeah, well then I get the result. And if everything is going fine, then, then updating the website for a new version takes me like maybe an hour of work it really depends on the system like i'm also featuring google big
Starting point is 00:12:34 query which is a cloud only system and you can imagine that i have higher latencies compared to the vm on my box yeah so for that it takes a little bit longer to run the tests. But after that, it's basically just saying make. And everything is rendered again and updated again. And yeah, well, that kind of works. And now I'm working on getting actually more features in there. Oh, nice. You mean, so what's coming up?
Starting point is 00:13:03 Yeah, so I try to catch up. Of course, I like to focus on the modern SQL features. And I also like to focus on features that are already implemented in some systems, because this is more valuable, of course. And I also have always an eye on the new features added by some systems. Like if we talk about Postgres, I think it was even in 15 that we got the functionality of the unique constraints where we can choose how it should treat the null values so this is whether you can determine that you are only allowed to have a single null value
Starting point is 00:13:38 exactly so it's the nice distinct let me just check out my own. Oh, I like it. NullsDistinct. So yeah, it was introduced with 15. So now you can choose for a unique constraint if it has a nullable column whether one null should rule out other nulls or not. And that was now added to the new
Starting point is 00:14:00 standard of SQL we have in the year 2023, so last year. We got a new standard and it's in there. But even a year before it was put into the standard, it arrived in Postgres. Oh, nice. So this is kind of the features I tried to catch up with the new stuff introduced in the standard
Starting point is 00:14:19 as well as in newer releases of the systems. Awesome. I watched a talk of yours, I think it was about five years ago now, though. So I've only heard your opinions of like the previous SQL standards, not the latest one. What were your main thoughts on 2023?
Starting point is 00:14:36 So 2023, it took quite a long while. The one before was 2016. So that's longer than intended. So there's a target of like four to five years. Oh, really? Good. The big thing in the latest release of the standard is the query language for querying graphs in a more native syntax. So if you know the Cypher query language, then you get the idea.
Starting point is 00:15:03 It's basically about drawing with S-IARDS a path through a graph that you want to match on. And there is now a standard of its own next to SQL, which is basically independent of SQL. So it's a new language, a new international standard that defines the QV language or generally the language to work with such databases, like how to insert something into a graph and how to query the graph. So this is kind of independent of SQL, except that the same people who do the SQL standard also do this new standard. And obviously, we would like to have that capability also in SQL. So additionally,
Starting point is 00:15:48 besides that entirely new standard, there is now a new part of the SQL standard because the SQL standard is actually made up of several parts. And now there's a new part so that we can use the query functionality part of the new standard inside SQL, in our queries. So that basically in the from clause, so we have match operators, and we can use that SGART kind of querying directly in SQL. And that's most certainly the killer feature of 2023. But of course, it may take a while until vendors catch up. There are some commercial implementations are available, but they predate the standard. Therefore, they don't implement
Starting point is 00:16:33 the standard down to the last sentence. So I'm curious how this evolves over time and if the standard can still have the power to unify these languages that's something that we have to watch out for but besides that one mega topic yeah and that's probably the reason why it took so long yeah besides that we have got many small things like the null handling that i was mentioning one really nice thing is that we can now into literal numeric values. So if you write a number like 1000, you can now put in the underscore as a separator for let's say 1000 digits so that you can easily spot when it's a million or a billion what it is, so that you can make it more readable. You see, it doesn't change anything what you can do with the language, it just makes it nicer and more maintainable.
Starting point is 00:17:28 And there are also, in respect to JSON, there are also some additions, actually quite fundamental additions. Now the standard has an actual JSON type. It didn't have one before. The 2026 standard introduced a lot of JSON functionality. Most importantly, the JSON query language. That's, most importantly, the JSON query language. That's a query language, the JSON path language. But it didn't introduce a type for JSON.
Starting point is 00:17:51 And now with 2023, they have added the type for JSON, along with some other simplifications and extensions. So there are quite a few small enhancements that just make everyday life easier. But the big thing is, I think, the property graph language. Yeah, makes sense. I imagine that's going to be a heck of a lot of work for folks to implement. But, well, if it's going to implement it at all. True. Because the SQL standard, it has the core functionality,
Starting point is 00:18:24 which every product is supposed to implement. And by testing it, I know it's not working out. And on top of that, it has optional features. And all the advanced features are basically optional features. So, Windows can choose to either implement it or not. But if they implement it, then they should follow the standard. Got it. Okay. it or not but if they implement it then they should follow the standard got it okay and i think i read somewhere but maybe this is misguided or outdated that vendors can claim that they support sql if they support sql 92 but do they how does it work for the non-optional features for the
Starting point is 00:19:01 for the core standards yeah um true the short answer is not at all okay yeah and that's part of the story behind modern sql and the charts you mentioned yeah yeah i think a standard is nice okay but if nobody cares about it then then it doesn't have any any power to unify the language but as soon as you see the differences and the mistakes the products have, then there is a motivation, then there is an outside pressure to actually comply with the standard. And I can see it. I've actually noticed it today that when I publish something on my website,
Starting point is 00:19:41 the vendors are looking there and they are questioning when I say, okay, it doesn't work in that product. Then they ask, hey, I thought this is supposed to work. Why not? And then I can show them the test case I have and that it doesn't work. And some of them fix those issues quite
Starting point is 00:19:59 quickly, like in two weeks. Brilliant. So sometimes it just happens. I publish a new article mentioning 10 systems, and one of them is behaving differently, maybe wrong. And just coincidentally, two weeks later, there's a new release by that system that behaves correctly. So that happened.
Starting point is 00:20:18 Yeah, that happens. Would you like to give any kudos or praise to the folks that are managing to turn those around so quickly? In that case, it was the H2 database. Oh, really? The H2 database. On the other hand, of course, there are systems that have longer cycles and then it takes longer or maybe happens never.
Starting point is 00:20:35 But on the other hand, if I look at Oracle, if I may talk about Oracle briefly. Yeah, please do. The latest release, the 23 release, finally fills the gaps that are there for decades. Like they finally introduced a Boolean type. They have ignored it for so many decades and have just said, okay, make it a car one. Why? No. And that's fine.
Starting point is 00:20:56 And now ultimately, there's movement coming in there. Even from Oracle. Even from Oracle, yeah. The whole industry has actually much more movement than like 10 years before. Like that MySQL is getting Windows function support and recursion support and all of that was something I didn't expect, actually. When I was starting with modern SQL in 2015, it was rather static. That there were a few products doing the modern stuff and there were a few products not doing the modern stuff. These were the two things.
Starting point is 00:21:29 I did not expect the non-modern stuff to catch up. It was more like, yeah, well, they will stick there forever. But then, ultimately, they started to get a lot of dynamics into the language support that they offer.
Starting point is 00:21:46 Nice. And we all benefit, right? And thanks for bringing up Oracle. I think they're a great case. I think they often are one of the first to support some of these newer things, even if they had in the past some of these gaps. It definitely gets brought up quite a lot to me when people choose, like when people mention why they chose PostgreSQL standard compliance and things working as expected which aren't exactly the same thing but are quite close to the same thing gets put up quite often as one of the reasons they trust PostgreSQL over other at the time open source databases but even yeah commercial databases so it is important, and we all benefit ultimately if everybody can work together on these things,
Starting point is 00:22:29 if it works as expected. That's cool. Other than your site, of course, being very important in this shift, what else do you think might have helped with this change? I think the MySQL move is actually pretty important because now the most commonly used database actually in the world, besides SQLite, of course, but even SQLite has moved. Those systems supporting these features means these are not exotic features of a few vendors. I think as soon as this is there in the MySQL documentation, then it becomes more official being SQL rather than just being Postgres or Oracle or whatever.
Starting point is 00:23:13 I think that fact that MySQL 8 got all of some of these modern features makes quite a big difference how people look at these features. Yeah, that was a huge huge release i remember my sql it was one of the eight releases that included and this is not related to sql well not related to sql standard at all but they got explained analyzed as well which was a big step forward in the things i look at so yeah huge huge release and and we're going to get hate mail for this but perhaps oracle deserves some credit
Starting point is 00:23:45 for that now being the stewards of the mysql code base and project yeah i was not expecting that when mysql ended up at oracle i was rather expecting it like being just yeah come over to the real database like like this this kind of of thing but that didn't happen. And if we now look at MariaDB, which is basically the continuation of the old team at MySQL, then we can see, well, they diverge quite a lot. And while MySQL was rather focusing on getting the basics right, MariaDB is more focusing on getting features in. Yeah. And on the long run, I think the basics are worth it.
Starting point is 00:24:28 And I think we can see that with Postgres because in Postgres, the basics were right from the beginning, I would say, at least most of them. And that pays off on the long run. So I'm also happy to see MariahDB being still there. Yeah. It's another competitor, yeah, especially against MySQL in the open source world.
Starting point is 00:24:47 But on the long run, let's see how long they can survive. Yeah, and as users, we benefit from the competition, right? Like as obviously as community members, like I'm definitely very biased, but I do like, I want there to be competitors.
Starting point is 00:25:04 It would be, I don't think Postgres as a project would benefit from being the only relational database standing in 20 years time. Historically, I don't think that's been the place where most innovation happens or the most like customer benefit happens. So, yeah, I love to see it and appreciate your work on this as well. So I didn't want to minimize that. I do think you have had an impact, but it's cool your work on this as well. So I didn't want to minimize that. I do think you have had an impact, but it's cool to see other reasons as well. Of course, there are many reasons for that. You also see the NoSQL movement, remember?
Starting point is 00:25:33 Yeah. In the meanwhile, most of the NoSQL vendors that are still there have some QL, whether they claim it is SQL or it is just some similar QL, whether they claim it is SQL or it is just some similar QL. Then we have new vendors that start up with new SQL dialects, like DuckDB and so on, that also bring fresh ideas into the SQL dialect. I think some of the ideas they are bringing up into their own dialect are actually worth thinking about bringing into the standard
Starting point is 00:26:02 and maybe other products in in either way so that as i said there's a lot of dynamics at the moment let's go back to what you would like to see let's say if somebody's like well you know what i i do mostly just use the sql i learned when i first learned yeah okay maybe i do a bit of json stuff here and there i've learned what i've had to use a lateral join or window function once but you know that's about it what what should they be doing how could they go about learning more well this this is pretty much the same for the last not 20 but 10 years yeah the most undervalued feature of modern sql is and always was window functions. Interesting.
Starting point is 00:26:51 If you don't master window functions yet, then stop listening now. We will wait for you and learn window functions. And then continue listening. So this is really the single most important and powerful feature. That's definitely my recommendation if you are not into that as well. Other than that, let me give you one problem I see more often nowadays to change the mind about how to use new features because you were mentioning Chasen as well. Yeah. I see Chasen used in a way that I'm not sure it's the best way. Okay, great. It's quite often.
Starting point is 00:27:27 And the way I'm explaining it is that data types like JSON or JSONB or even like Boolean, they have several use cases, actually three use cases. A data type can be used in the persistent schema. It's a great table, you can see a data type can be used in the persistent schema. It's a great table, you can say. A data type. But also important, a data type can be used during the transformation. And I'll give you an example there. Like the Boolean type, we use most of the time just during the transformation.
Starting point is 00:28:02 Because in the where clause, in the on clause, in the when clause of case, and so on, and so on, and so on, we have Boolean expressions, and we're using Boolean values just during the transformation. It's not coming from the base table from the disk. It's not going up to the clients through the wire just for the transformation. That's the second use case for data types. And the third use case is the API layer.
Starting point is 00:28:24 So meaning how do we communicate with the clients? And of course, we think about tabular representation. Yeah, of course, of course, of course. And maybe a normalized one. But here, I say no. The normalization is fine for the disk,
Starting point is 00:28:39 but it's not so much fine for the API layer. And here, I love JSON. Yeah. Because if you think of a tag system, you have some entities and you can add tags on them, then obviously the normalized way to do it is to have two tables. And I still think that's a good approach in many, many cases.
Starting point is 00:29:00 Yeah. But on the API layer, if you join them, you will get multiples of the entity data, which is meaningless. That doesn't make sense. But here comes basically the JSON array and you aggregate the tags into a JSON field that you
Starting point is 00:29:16 then give to the client API. And then you just keep the one row of the entity and have the list, a list, yeah, of tags there. And here you see, this is where I personally think that that's the most important use case for JSON. It's not about storing JSON on disk. Yeah, that can make sense sometimes as well, no question.
Starting point is 00:29:38 Yeah. But the main use case I see for JSON is on the API side. Yeah. Or for what it was originally intended right yeah maybe but what i see the people when they say yeah we're using chasen what they mean is they store it on disk yes yes good point awesome well what is there anything you'd point people at have you given a talk on anything like that before any good materials for people not so recently you know there was this this corona break so no conferences i'm still not talking a lot in public like like there was another podcast recorded the german one like a few months back but that's it so follow my blog
Starting point is 00:30:18 yeah modern sql.com and also use the index loop.com. This is where I publish my updates. And what you can expect is basically these matrices, who can do what, use cases for the features that I test. That's, of course, also important. What is the benefit of knowing, okay, that feature does work in that system, but I don't know what to use that feature for. Yeah, well, that's what you can expect on a more or less regular basis, like once a month or twice a month, something like that. Nice. And I think I just might have even been this morning was reading your half year update as well, which was cool. So, yeah, I highly recommend those resources. We've shouted them out a few times on the podcast in previous episodes. And your book as well, SQL Performance Explained, was fantastic.
Starting point is 00:31:05 So thank you for that. Actually, on that note, when's Modern SQL, the book, coming? Yeah, well, I started in 2015 with the intention to make it a book. Oh, really? Yes, that was the original intention. So when I did the other book, the SQL Performance Explained, it also started as a blog and then was later on put into a book. And the blog is still available for free on
Starting point is 00:31:27 use-the-index-look.com. And the idea was to do the same thing in modern SQL, but then happened what I've already told. Something which I thought it was static became very dynamic. Yeah. And now it's hard to catch up. But there is progress and there
Starting point is 00:31:43 will be a book. I'm still working on it. I'm using the draft as material in my trainings. So it is getting better. But there is no definite timeline for it. As should always be the way with things in the future. Well, good luck with that. Thank you so much for coming on.
Starting point is 00:32:02 It's been a pleasure having you. Best of luck. Yeah, thank you. And thanks for having me. Bye-bye. Bye.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.