Postgres FM - Modern SQL
Episode Date: February 2, 2024Michael is joined by Markus Winand, creator of use-the-index-luke.com and modern-sql.com, and author of SQL Performance Explained, to discuss Modern SQL — what Markus means by it, why it's ...important, some benefits, some examples, and at least one phrase that should be on a t-shirt. Here are some links to things they mentioned:Modern SQL (site) https://modern-sql.comUse the index, Luke! (site) https://use-the-index-luke.comSQL Performance Explained (book) https://sql-performance-explained.comOur episode on NULLs https://postgres.fm/episodes/nulls-the-good-the-bad-the-ugly-and-the-unknownIS DISTINCT FROM (null-safe not equals comparison) https://modern-sql.com/caniuse/is-distinct-fromUNIQUE NULLS DISTINCT https://modern-sql.com/caniuse/unique-nulls-distinctModernes SQL ist mehr als SELECT * FROM (a German language podcast Markus was on recently) https://engineeringkiosk.dev/podcast/episode/99-modernes-sql-ist-mehr-als-select-from-mit-markus-winandMarkus’ website https://winand.at~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is produced by:Michael Christofides, founder of pgMustardNikolay Samokhvalov, founder of Postgres.aiWith special thanks to:Jessie Draws for the elephant artworkÂ
Transcript
Discussion (0)
Hello and welcome to PostgresFM, a weekly show about all things PostgresQL.
I am Michael, founder of PGMustard, and today I am joined by Marcus Winnand,
creator of the popular sites UseTheIndexLuke, ModernSQL, author of SQL Performance Explained,
performance trainer and consultant, and according to your LinkedIn, SQL Renaissance Ambassador.
It's a pleasure to have you here today, Marcus. Thank you for joining us.
Yes, thank you.
Awesome. Well, I suggested the topic this week, but it is very much your topic.
It's modern SQL.
And I'd love you to give us an intro into what you mean by modern SQL
and also how you came to be such an ambassador for it.
So it basically goes back when I realized that many, many people think that the old SQL, so what is more technically described as SQL 92, is pretty much the only SQL.
We have joins, yeah, we have them for decades, and people think that's it because that's what they learned decades ago and so on.
They didn't realize that the story didn't stop there.
It went on. And they didn't realize that the story didn't stop there. It went on.
And we got what I refer to as more modern features,
which are not rooted in the relational idea,
like window functions or recursive queries,
or more modernly like JSON support, document support at all.
We have XML for quite a while in SQL.
This is what I kindly refer to as modern SQL, as opposed to the
grandparents SQL everybody has learned like decades ago. So do you see this more as an issue
with people that have been around for a few decades and got stuck in their ways? Or do you
see it also with new folks coming along and getting taught by those people or the lessons they're learning only include the most basic rudimentary SQL?
I would draw the line a little bit differently there.
I think the problem is more related to ORM users and SQL users.
So you see, SQL users, there are also two kinds of users.
Those who just learned it once and never get an update. That definitely exists. And SQL users. So you see, SQL users, there are also two kinds of users.
Those who just learned it once and never get an update.
That definitely exists.
But I think the most problematic area is actually in the ORM world or rooted in the ORM world,
where the people just really think because the ORM tool is doing the best because, well, they are specialized on using SQL.
They're authors of the ORM tool. So this is probably the best you can do. But that's unfortunately totally wrong.
So the whole idea that SQL is just about relational things is just outdated. That was
right in 92, but it was already wrong in 99.
And somehow all the people managed to miss that message.
They didn't get the memo, so to say.
Yeah, that's such an interesting point.
I hadn't put two and two together with, I've listened to your talks before on this around the change from being relational only to being able to handle other data types or other data models.
But the R in ORM, is that what you mean by the relational part of it?
Yeah. Also, the whole idea that the database is just for persistency.
Look at Java, JPA, the Java Persistency API.
Just the idea that the database just stores data.
That's a wrong idea for SQL.
It might be a right idea for some other systems.
But for SQL, the fundamental idea of SQL is that we have a separation of the persistent data layout and the transient data layout that we need for one query right now.
Because one other thing that I like to put emphasis on is that SQL is not a query language.
SQL is more like a transformation language.
If you look at query languages, go to any web page, open the CSS and look at the selectors of CSS.
This is a query language.
A query language can pick out one small piece of something larger, and that's it.
But now look at SQL.
With SQL, we are most of the time focusing on transforming data,
transforming the persistent data we have stored on the disk to last forever, more or less,
to answer a question we have at hand right now.
That might be a different question tomorrow.
It's most certainly a different question.
And the whole idea of SQL is that we normalize.
Yeah, there is this bad word, normalization.
We normalize the data persistently on the disk so that the shape of the data does not change often.
The shape not, the data itself, the contents of the table, but not the shape, not so often.
Now compare how static is the on-disk schema if you have probably normalized it compared
to the questions you try to answer with those data.
You get everyday new questions from marketing or from reporting or from whatever.
And the idea of SQL is to have a flexible transformation between the rather static schema you have on disk, because once you have collected terabytes of data on disk, then it's hard to change the model you have there.
No matter how you do it, it will be hard compared to the dynamic we have on the question side.
And SQL is the glue between these two layers.
One thing you mentioned there is normalization being a bad word.
I'm curious what you meant by that.
Yeah, with normalization, all of us have learned these levels.
First normal form, second normal form, third normal.
Okay.
Of course, that's totally correct and useful.
But to another extent, for the layman people, just causing more confusion than it is helpful.
What I like to explain normalization is just, okay, don't focus on the third normal form or on any of those.
Just try to figure out in which tables would the data feel comfortable.
Yeah. Just try to figure out in which tables would the data feel comfortable. And naturally, there are some things like if you have a one-to-many relationship, then it needs to be two tables in SQL.
Just keep it on that level.
Don't go into the scientific definitions of all of these six normalization levels and some extras.
Try to keep it simple, and you will still get the gain from the
normalization yeah so that you have a rather static schema on disk nice i really like in what tables
with the data feel comfortable i'm i'm gonna steal that or i think that should be on t-shirts if you
if you make them i'll buy awesome right back to the modern s modern SQL topic a little bit more. You mentioned that a lot of people are stuck a few decades ago with very basic SQL. The counter argument might be, well, they're solving their problems. They can do a lot of things. What are the big downsides or limitations for those folks who haven't explored some of the newer features that have been added. I'll give you an example.
With SQL 92, it was quite common to do self-joins.
It was required to do self-joins because some problems could only be solved by using a self-join.
But self-joins are troublesome in two ways.
First of all, they are complex.
And second of all, they are slow because they are touching every data item twice.
And now we have pretty much for every situation, we have some other syntax, most importantly window functions.
Window functions are one of the big self-join killers.
And now they become – actually, the syntax is more concise.
It's more clear, actually, when you look at it,
rather than scratching yourself this way.
You just scratch yourself this way.
So it's actually better to read the query.
It's actually also better to write the query once you know the syntax.
And it will perform better.
And you will have less maintenance nightmares
because you don't have the redundancy in the code.
Like with a self-join, if you add a condition to one of the tables, you might need to add it to the other one as well or to the join condition.
And this is easy to forget.
Then you get wrong results.
Yeah, I love those.
So clarity and performance.
So clarity not just for yourself writing things, but also for people coming back to it, reading it yourself in future.
And performance is a man after my own heart on that front i think a lot of people assume that
or looking at new features they assume they'll get new functionality and forget that those other
two things are the bigger part and because of the way sql works we likely could have done it before.
That's a really good point.
But it's actually true because SQL 92 was from one perspective, it was quite complete.
So it was completing the relational idea.
And of course you can solve almost everything,
asterisk recursion,
almost everything with the old SQL.
And that also makes it difficult to learn new stuff because I can solve it already
so why do I need to learn something new
that worked for me like decades
it's still working
so what is it about
and that's of course a little bit more difficult to explain people
I love how simply you did it
so clarity of what's happening
when you look at the sql and performance what can they what what choices does the planner have when
executing that yeah really cool and also i guess edge cases as well like these features can be
designed to be more well we've done whole episodes on handling nulls
and they can have thought about those more than,
or if you have to roll your own versions of these features,
you have to handle nulls yourself,
which can be painful to say the least.
So one of my favorite features in this regard
is the is not distinct from operator.
It's the knife safe operator. So that if you have two knives on both sides that you still get through yeah you could
say it's a modern sql feature according to my definition it was added after sql 92 and if you
look at on my website modern sql you can check out which systems support that operator and
yeah unfortunately not so many.
Although I consider it something quite essential.
But still, also the vendors didn't get the memo
that there is something new they need to catch up with.
Dare I ask, I actually haven't used it.
Is it in PostgreSQL? Do you remember?
It is there.
It does work syntactically.
You get the right results.
Okay.
But there are some gotchas when it comes to optimizations.
Interesting performance.
There we go.
So I have had just like a month ago, I've had a case where I would like using it, but it didn't work out for performance reasons.
That's so good to know.
That's very very very interesting
i love those charts on modern sql on when when you look up at one of these newer features and
you have the charts of one row per database per dbms and then a timeline of it whether it supported
it at which point it supported it,
whether there are any caveats.
And every time I look at that,
I think, wow, this is impressive.
But also this must be so much work to maintain.
I was going to ask, how do you do it?
Yeah, I'm happy you asked that
because this is one of the untold stories.
So I'm just a one-man show
and doing this website as a side project actually i have kind
of a day job as well yeah so the way i make it nowadays is that everything is covered by tests
by tests that i can run automatically i test very deeply like i always check for the night
case i always check for collations i always check for this and that and a lot of down to the SQL state.
So quite a lot of tests.
So right now it's,
it's more than a hundred thousand test cases I have in my system.
Wow.
But the good thing is I running then through like 10 databases.
Yeah.
And yeah,
well then I get the result.
And if everything is going fine,
then,
then updating the website for a new version takes
me like maybe an hour of work it really depends on the system like i'm also featuring google big
query which is a cloud only system and you can imagine that i have higher latencies compared to
the vm on my box yeah so for that it takes a little bit longer to run the tests.
But after that, it's basically just saying make.
And everything is rendered again and updated again.
And yeah, well, that kind of works.
And now I'm working on getting actually more features in there.
Oh, nice.
You mean, so what's coming up?
Yeah, so I try to catch up.
Of course, I like to focus on the modern SQL features.
And I also like to focus on features that are already implemented in some systems,
because this is more valuable, of course.
And I also have always an eye on the new features added by some systems.
Like if we talk about Postgres, I think it was even in 15 that we got the
functionality of the unique constraints where we can choose how it should treat the null values
so this is whether you can determine that you are only allowed to have a single null value
exactly so it's the nice distinct let me just check out my own. Oh, I like it.
NullsDistinct. So yeah, it was introduced with
15. So now you can
choose for a unique
constraint if it has a nullable column
whether one null should
rule out other nulls or not.
And that was now added to the new
standard of SQL we have in the
year 2023, so last year.
We got a new standard and it's in there.
But even a year before it was put into the standard,
it arrived in Postgres.
Oh, nice.
So this is kind of the features I tried to catch up
with the new stuff introduced in the standard
as well as in newer releases of the systems.
Awesome.
I watched a talk of yours,
I think it was about five years ago now, though.
So I've only heard your opinions
of like the previous SQL standards,
not the latest one.
What were your main thoughts on 2023?
So 2023, it took quite a long while.
The one before was 2016.
So that's longer than intended.
So there's a target of like four to five years.
Oh, really?
Good.
The big thing in the latest release of the standard is the query language for querying graphs in a more native syntax.
So if you know the Cypher query language, then you get the idea.
It's basically about drawing with S-IARDS a path through a graph
that you want to match on. And there is now a standard of its own next to SQL, which is basically
independent of SQL. So it's a new language, a new international standard that defines the
QV language or generally the language to work with such
databases, like how to insert something into a graph and how to query the graph.
So this is kind of independent of SQL, except that the same people who do the SQL standard
also do this new standard.
And obviously, we would like to have that capability also in SQL. So additionally,
besides that entirely new standard, there is now a new part of the SQL standard because the SQL
standard is actually made up of several parts. And now there's a new part so that we can use the
query functionality part of the new standard inside SQL, in our queries.
So that basically in the from clause, so we have match operators,
and we can use that SGART kind of querying directly in SQL.
And that's most certainly the killer feature of 2023.
But of course, it may take a while until vendors catch up.
There are some commercial implementations are available, but they predate the standard. Therefore, they don't implement
the standard down to the last sentence. So I'm curious how this evolves over time and if the
standard can still have the power to unify these languages that's something that we have to watch
out for but besides that one mega topic yeah and that's probably the reason why it took so long
yeah besides that we have got many small things like the null handling that i was mentioning
one really nice thing is that we can now into literal numeric values. So if you write a number like 1000, you can now put in the
underscore as a separator for let's say 1000 digits so that you can easily spot when it's a
million or a billion what it is, so that you can make it more readable. You see, it doesn't change
anything what you can do with the language, it just makes it nicer and more maintainable.
And there are also, in respect to JSON,
there are also some additions,
actually quite fundamental additions.
Now the standard has an actual JSON type.
It didn't have one before. The 2026 standard introduced a lot of JSON functionality.
Most importantly, the JSON query language. That's, most importantly, the JSON query language.
That's a query language, the JSON path language.
But it didn't introduce a type for JSON.
And now with 2023, they have added the type for JSON, along with some other simplifications and extensions.
So there are quite a few small enhancements that just make everyday life easier.
But the big thing is, I think, the property graph language.
Yeah, makes sense.
I imagine that's going to be a heck of a lot of work for folks to implement.
But, well, if it's going to implement it at all.
True.
Because the SQL standard, it has the core functionality,
which every product is supposed to implement.
And by testing it, I know it's not working out.
And on top of that, it has optional features.
And all the advanced features are basically optional features.
So, Windows can choose to either implement it or not.
But if they implement it, then they should follow the standard.
Got it. Okay. it or not but if they implement it then they should follow the standard got it okay and i think i read somewhere but maybe this is misguided or outdated that vendors can claim that they
support sql if they support sql 92 but do they how does it work for the non-optional features for the
for the core standards yeah um true the short answer is not at all okay yeah and that's part
of the story behind modern sql and the charts you mentioned yeah yeah i think a standard is nice
okay but if nobody cares about it then then it doesn't have any any power to unify the language
but as soon as you see the differences and the mistakes the products have,
then there is a motivation,
then there is an outside pressure to actually comply with the standard.
And I can see it.
I've actually noticed it today that when I publish something on my website,
the vendors are looking there and they are questioning when I say,
okay, it doesn't work in that product. Then they ask,
hey, I thought this is supposed to work. Why
not? And then I can show them
the test case I have
and that it doesn't work.
And some of them
fix those issues quite
quickly, like in two weeks.
Brilliant.
So sometimes it just happens.
I publish a new article mentioning 10 systems, and one of them is behaving differently, maybe
wrong.
And just coincidentally, two weeks later, there's a new release by that system that
behaves correctly.
So that happened.
Yeah, that happens.
Would you like to give any kudos or praise to the folks that are managing to turn those
around so quickly?
In that case, it was the H2 database.
Oh, really?
The H2 database.
On the other hand, of course, there are systems that have longer cycles and then it takes
longer or maybe happens never.
But on the other hand, if I look at Oracle, if I may talk about Oracle briefly.
Yeah, please do.
The latest release, the 23 release, finally fills the gaps that are there for decades.
Like they finally introduced a Boolean type.
They have ignored it for so many decades and have just said, okay, make it a car one.
Why?
No.
And that's fine.
And now ultimately, there's movement coming in there.
Even from Oracle.
Even from Oracle, yeah. The whole industry has actually much more movement than like 10 years before.
Like that MySQL is getting Windows function support and recursion support and all of that was something I didn't expect, actually.
When I was starting with modern SQL in 2015, it was rather static.
That there were a few products doing the modern stuff and there were
a few products not doing the modern stuff.
These were the two things.
I did not expect
the non-modern
stuff to catch up.
It was more like, yeah, well, they will stick
there forever. But then, ultimately,
they started to
get a lot of dynamics into
the language support that they offer.
Nice. And we all benefit, right? And thanks for bringing up Oracle. I think they're a great case.
I think they often are one of the first to support some of these newer things, even if they had in
the past some of these gaps. It definitely gets brought up quite a lot to me when people choose,
like when people mention why they chose PostgreSQL standard
compliance and things working as expected which aren't exactly the same thing but are quite close
to the same thing gets put up quite often as one of the reasons they trust PostgreSQL over other
at the time open source databases but even yeah commercial databases so it is important, and we all benefit ultimately
if everybody can work together on these things,
if it works as expected.
That's cool.
Other than your site, of course, being very important in this shift,
what else do you think might have helped with this change?
I think the MySQL move is actually pretty important because now the most commonly used database actually in the world, besides SQLite, of course, but even SQLite has moved.
Those systems supporting these features means these are not exotic features of a few vendors. I think as soon as this is there in the MySQL documentation,
then it becomes more official being SQL
rather than just being Postgres or Oracle or whatever.
I think that fact that MySQL 8
got all of some of these modern features
makes quite a big difference
how people look at these features.
Yeah, that was a huge huge release i remember my
sql it was one of the eight releases that included and this is not related to sql well not related to
sql standard at all but they got explained analyzed as well which was a big step forward in the things
i look at so yeah huge huge release and and we're going to get hate mail for this but perhaps oracle deserves some credit
for that now being the stewards of the mysql code base and project yeah i was not expecting that
when mysql ended up at oracle i was rather expecting it like being just yeah come over to
the real database like like this this kind of of thing but that didn't happen. And if we now look at MariaDB,
which is basically the continuation of the old team at MySQL, then we can see, well, they diverge quite a lot.
And while MySQL was rather focusing on getting the basics right,
MariaDB is more focusing on getting features in.
Yeah.
And on the long run, I think the basics are worth it.
And I think we can see that with Postgres
because in Postgres, the basics were right from the beginning,
I would say, at least most of them.
And that pays off on the long run.
So I'm also happy to see MariahDB being still there.
Yeah.
It's another competitor, yeah,
especially against MySQL in the open source world.
But on the long run,
let's see how long they can survive.
Yeah, and as users,
we benefit from the competition, right?
Like as obviously as community members,
like I'm definitely very biased,
but I do like,
I want there to be competitors.
It would be, I don't think Postgres as a project would benefit from being the only relational database standing in 20 years time.
Historically, I don't think that's been the place where most innovation happens or the most like customer benefit happens.
So, yeah, I love to see it and appreciate your work on this as well.
So I didn't want to minimize that. I do think you have had an impact, but it's cool your work on this as well. So I didn't want to minimize that.
I do think you have had an impact,
but it's cool to see other reasons as well.
Of course, there are many reasons for that.
You also see the NoSQL movement, remember?
Yeah.
In the meanwhile, most of the NoSQL vendors that are still there
have some QL, whether they claim it is SQL
or it is just some similar QL, whether they claim it is SQL or it is just some similar QL.
Then we have new vendors that start up with new SQL dialects,
like DuckDB and so on, that also bring fresh ideas into the SQL dialect.
I think some of the ideas they are bringing up into their own dialect
are actually worth thinking about bringing into the standard
and maybe other products in in either way so that
as i said there's a lot of dynamics at the moment let's go back to what you would like to see let's
say if somebody's like well you know what i i do mostly just use the sql i learned when i first
learned yeah okay maybe i do a bit of json stuff here and there i've learned what i've had to use
a lateral join or window function once but
you know that's about it what what should they be doing how could they go about learning more
well this this is pretty much the same for the last not 20 but 10 years yeah the most undervalued
feature of modern sql is and always was window functions. Interesting.
If you don't master window functions yet, then stop listening now.
We will wait for you and learn window functions.
And then continue listening.
So this is really the single most important and powerful feature.
That's definitely my recommendation if you are not into that as well. Other than that, let me give you one problem I see more often nowadays to change the mind about how to use new features because you were mentioning Chasen as well.
Yeah.
I see Chasen used in a way that I'm not sure it's the best way.
Okay, great. It's quite often.
And the way I'm explaining it is that data types like JSON or JSONB
or even like Boolean, they have several use cases,
actually three use cases.
A data type can be used in the persistent schema.
It's a great table, you can see a data type can be used in the persistent schema. It's a great table, you can say. A data type.
But also important, a data type can be used during the transformation.
And I'll give you an example there.
Like the Boolean type, we use most of the time just during the transformation.
Because in the where clause, in the on clause, in the when clause of case, and so on, and so on, and so on,
we have Boolean expressions,
and we're using Boolean values just during the transformation.
It's not coming from the base table from the disk.
It's not going up to the clients through the wire
just for the transformation.
That's the second use case for data types.
And the third use case is the API layer.
So meaning how do
we communicate with the clients?
And of course, we think about
tabular representation. Yeah, of course,
of course, of course. And maybe a normalized
one. But here, I say
no. The normalization
is fine for the disk,
but it's not so much fine for the
API layer.
And here, I love JSON.
Yeah.
Because if you think of a tag system,
you have some entities and you can add tags on them,
then obviously the normalized way to do it is to have two tables.
And I still think that's a good approach in many, many cases.
Yeah.
But on the API layer, if you join them,
you will get multiples of the
entity data, which is meaningless. That doesn't
make sense. But here comes
basically the JSON array
and you aggregate the tags
into a JSON field that you
then give to the client
API. And then you just
keep the one row of the entity and have
the list, a list,
yeah, of
tags there. And here you see,
this is where I personally think that that's the most important use case for JSON.
It's not about storing JSON on disk. Yeah, that can make sense sometimes as well, no question.
Yeah.
But the main use case I see for JSON is on the API side.
Yeah. Or for what it was originally intended right
yeah maybe but what i see the people when they say yeah we're using chasen what they mean is
they store it on disk yes yes good point awesome well what is there anything you'd point people at
have you given a talk on anything like that before any good materials for people not so recently you know there was this this
corona break so no conferences i'm still not talking a lot in public like like there was
another podcast recorded the german one like a few months back but that's it so follow my blog
yeah modern sql.com and also use the index loop.com. This is where I publish my updates. And what you can expect is basically these matrices, who can do what, use cases for the features that I test.
That's, of course, also important.
What is the benefit of knowing, okay, that feature does work in that system, but I don't know what to use that feature for.
Yeah, well, that's what you can expect on a more or less regular basis, like once a month or twice a month, something like that.
Nice. And I think I just might have even been this morning was reading your half year update as well, which was cool.
So, yeah, I highly recommend those resources.
We've shouted them out a few times on the podcast in previous episodes.
And your book as well, SQL Performance Explained, was fantastic.
So thank you for that.
Actually, on that note, when's Modern SQL, the book, coming?
Yeah, well, I started in 2015 with the intention to make it a book.
Oh, really?
Yes, that was the original intention.
So when I did the other book, the SQL Performance Explained, it also started as a blog and then
was later on put into a book. And the
blog is still available for free on
use-the-index-look.com. And
the idea was to do the same thing in
modern SQL, but then happened
what I've already told. Something which I
thought it was static became very dynamic.
Yeah. And now
it's hard to catch up.
But there is progress and there
will be a book.
I'm still working on it.
I'm using the draft as material in my trainings.
So it is getting better.
But there is no definite timeline for it.
As should always be the way with things in the future.
Well, good luck with that.
Thank you so much for coming on.
It's been a pleasure having you.
Best of luck.
Yeah, thank you.
And thanks for having me.
Bye-bye.
Bye.