Postgres FM - jOOQ
Episode Date: December 13, 2024Michael and Nikolay are joined by Lukas Eder, the creator of jOOQ, to discuss what it is, some nice developer experience features it has, and some fun things he's come across from a Postgres ...perspective. Here are some links to things they mentioned:Lukas Eder https://postgres.fm/people/lukas-ederjOOQ https://www.jooq.org/ DSL https://en.wikipedia.org/wiki/Domain-specific_language SQL Dialects https://www.jooq.org/javadoc/latest/org.jooq/org/jooq/SQLDialect.htmlMERGE https://www.postgresql.org/docs/current/sql-merge.html match_recognize https://modern-sql.com/feature/match_recognize JOOQ, joy of SQL (talk by Kevin Davin) https://www.youtube.com/watch?v=8Ej47GZX9D8  BUFFERS enabled for EXPLAIN ANALYZE by default (commit for Postgres 18) https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=c2a4078ebad71999dd451ae7d4358be3c9290b07 PostGIS https://postgis.net/ 10 SQL Tricks That You Didn’t Think Were Possible (blog post by Lukas) https://blog.jooq.org/10-sql-tricks-that-you-didnt-think-were-possible/ jOOQ questions on Stack Overflow https://stackoverflow.com/questions/tagged/jooq Our episode on NULLs https://postgres.fm/episodes/nulls-the-good-the-bad-the-ugly-and-the-unknown ~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is produced by:Michael Christofides, founder of pgMustardNikolay Samokhvalov, founder of Postgres.aiWith special thanks to:Jessie Draws for the elephant artworkÂ
Transcript
Discussion (0)
Hello, hello, and welcome to Postgres FM, weekly show about all things PostgresQL.
I am Michael, founder of PgMustard.
I'm joined, as usual, by Nikolai, founder of Postgres AI.
Hey, Nikolai.
Hey, Michael.
How are you?
No, not falling for that.
And we are joined by a wonderful guest, Lucas Eder, founder of DateGeekery and the creator
of Duke, a very popular framework.
Thank you so much for joining us, Lucas.
Thank you for having me, Michael.
All right. It's our pleasure. So how do you describe Duke and what's it helping folks with?
So Duke is an internal domain-specific language written in Java where it models the SQL language
directly as a Java API. This helps Java developers to write SQL queries more efficiently and more
correctly within their Java applications.
There's a code generator that generates the entire schema as Java objects,
and you can interact with those objects and create type-safe queries.
So that's the base of it.
Yeah, super cool.
But I'm super impressed with how much you focus on developer experience in general.
I hear quite a few people say they really love Duke,
and they give about 15
different examples of things they love about it. But what are the main things you think that people
particularly like about it? Well, the first thing is, and I've studied the market extensively before
making Juke, and the first thing is, it really looks as much as SQL as it's possible to do when
you model an internal DSL API in Java. So of course, there's some limitations. But when you model an internal DSL API in Java.
So of course there's some limitations,
but when you have in mind there's a group by clause,
you just start typing group by where you expect it and then it just starts compiling.
So there's no surprises.
There's almost no surprises in terms of SQL syntax.
So if you know SQL, then you immediately know the Duke API.
You don't really have to learn it.
I guess that's one of the most popular things.
But other than that, it's also very nicely integrated into the Java language or Kotlin
and Scala. That works as well. And if you've ever coded a PL SQL in Oracle or a PGPL SQL in
Postgres, you kind of like the way how the SQL language is embedded in the procedural language.
So if you loop, if you iterate over a result set in those procedural languages,
you have the same kind of type safety and you have the possibility, for instance,
to insert bind variables into your statements at the right spot
and you don't have to worry about the infrastructure and the logistics of connecting to the database.
And it just feels embedded.
And I think that's what a lot of people really want to do,
even more than having the type safeties,
this feeling of embeddedness into the target language.
And when you map your result sets to Java objects,
that also feels very native.
So it feels like the database is part of the Java, of the JVM.
Yeah, even to the point of getting, like,
autocomplete suggestions in IDEs.
That is super nice.
Yeah, there are little details like this.
So when you say autocomplete,
there's also the possibility to comment
on your database tables and comments,
and that translates to JavaDoc.
So whatever you comment directly in the database,
it's translated to documentation
inside of your Java program.
You hardly ever have to go to the database to see what's there.
Nice.
Yeah.
In terms of the history, I did look it up in, well, at least the GitHub history.
How long have you been working on it and how does that feel?
I think the first public version started in 2009.
There were prototypes before that in 2008, I think, but 2009 was the first public version. Not on GitHub then. I don't think GitHub was very popular at the time. And then Subversion studies at EPFL in Lausanne.
There was a professor who said you could model any kind of programming language as an internal DSL.
I'm not sure if he used that term yet.
I think DSL was more coined more recently. But he said you could model any language in terms of an API if you just have Java interfaces that return interfaces that return interfaces.
And this kind of idea stuck with me for all this time,
but I didn't have a use case back then yet.
And when I started working in the industry in 2006,
it was my first Java job.
I did some PHP work before, but Java started in 2006.
I've seen that all the companies that I've applied to
and even the one that I worked at,
they implemented dynamic SQL libraries because this was something that everyone needs, right?
So everyone has dynamic SQL queries when you have a UI with various input boxes and they're all
optional. So you have to compose your SQL queries in a dynamic way. And everyone wants to avoid
string concatenation because it's not safe and it's boring.
So they did that for both.
They had one for SQL
and they had one for Ibernit query language,
which is essentially the same thing,
just more limited, but the same concept.
And no one actually thought about making this a true DSL.
It was always just a query builder with some weird API.
So you had Java-style methods
where you add clauses to the query object, and it didn't feel like SQL. It felt like a tool to
solve exactly this dynamic SQL problem, but it didn't feel like you didn't enjoy using this thing.
I think JPA still has this thing with their criteria API, which is still very useful if you want to do dynamic JPQL.
But I've never heard anyone enjoy using that API
because you have to learn it.
So you have to learn one more thing.
And it only serves this one purpose,
or at least that's what it feels like.
It serves this purpose of dynamic JPQL.
And that's when I started prototyping it.
And in the very beginning, I had to first implement the infrastructure.
It wasn't a very diesel-style API either.
So I created a query builder that could do some select from where,
very simple conditions.
But then I started to really implement this idea
because I remembered, okay, this professor, he had this idea,
and I'm going to try this. Is this really possible?
And it turns out it was, and it's really crazy
how many SQL statements today we have in Juke
that are all using this kind of approach
where you just start with the same object,
and then you start typing, and you autocomplete your SQL syntax,
and it's all one huge graph of and you autocomplete your SQL syntax. And it's
all one huge graph of methods that helps you construct your SQL queries. So Duke has come
a long way since 2009, I'd say. Yeah, right. I was looking in your documentation about how many
different, how many different, you call them families of SQL dialects that you support. So
how many different databases, but also the different versions of different databases
that may or may not support different syntax and SQL features.
How do you maintain that?
Well, first off, luckily, the database products are all very, very backwards compatible.
If we take one or two of them, we don't really care as much.
But most of them really
value backwards compatibility a lot. So supporting new versions is just looking at the new features
and adding support for them, but the old stuff still works. So at least that's already covered
there, more or less. But other than that, I mean, supporting so many dialects per se is, yeah,
it's a lot of work and a lot of testing, of course, a lot of automated regression testing to make sure nothing breaks.
And also, Juke really values backwards compatibility a lot,
just like the database products.
So when you embed the Juke query into your database application,
you don't want stuff to break.
So if I'm not allowed to break stuff, I'm probably not breaking stuff.
So this kind of simplifies maintenance as well,
because stuff that once works is probably not going to break just like that. I mean,
there's some internal refactorings, but there are no really major, how to put it,
major paradigm shifts where everything works in a different way now, and I was breaking everything. So with automated regression tests,
it's hard, but it's not that hard, I think.
The hard part is just to find out some weird syntax
that implements a standard feature on this particular dialect,
and that's quite creative at times,
especially with the newer dialects
that have just been released recently
that are not that mature yet.
Or some dialects are very opinionated.
So you kind of have to think in terms of why did they make those decisions and how can
I map standard SQL onto this database?
But I think that's one of the things that people really like when they work, especially
when they work with different database products.
So if you have a company that works with maybe Postgres and SQL Server, you don't actually have to think about the differences all the time.
So you can just write substring in the Juke API and you don't care if it's substr or instr or whatever the database product calls it.
It's just substring and Ju Duke will translate it for you.
But there are differences as well, right?
And JSON is a big topic, right?
Huge.
JSON?
Yeah.
Yeah.
Functions and, yeah.
It's crazy.
I mean, Postgres was one of the first to actually support JSON,
but it didn't actually...
Before standard.
...standardizing stuff.
So someone had standardized it, Oracle did, a couple of years ago,
and now Postgres started implementing those standards.
From what I've seen, I'm still not engaged with that part yet,
but you'd think now that there's a standard, things have stabilized,
but still every database product does it differently.
It's crazy. The devil is in the details.
Yeah, and have you
like, do you need sometimes to check the
standard or like you don't need it at all?
I do it all the time, yeah. I'm trying
to, the Juke API,
if it's not about a vendor-specific feature,
the Juke API really tries to follow
the SQL standard in terms of syntax.
So a vendor-specific feature
would be Postgres on conflict class,
which I think was a mistake.
They should have done merge from the beginning
and not invent something new.
But now we have both.
And in that case,
Juke also models the on-conflict class
like Postgres did.
And then SQLite copied it
and a couple of others, I think.
But for instance, merge is a standard SQL feature
and many database products have some extensions,
which I then study, but I always first look at the standard.
What does the standard do?
Because that's a thing that's going to still be there in 50 years.
And different implementations might have their quirks,
which sometimes I hope they deprecate eventually
and move towards the standard. Because usually I think the standard is quite nicely...
Well, they behave differently. So many people got used to on-conflict.
Yeah, I get the point of doing on-conflict. I mean, it's simpler for
80% of the use cases, obviously.
Yeah. Also, Merge only recently received support of returning
clause, which is not standard at all, right? As I remember.
The SQL standard has the data change delta
table, which was implemented by DB2. And then
some minor database products like H2 implemented this as well. And it's
not as powerful as returning, especially
because it only allows you to
fetch either the data
before or after
the updates. So for update statements,
which is also part of merge,
you maybe want to return both
versions, the version before and after
the update. With
returning, in principle, that's
feasible. so Oracle implemented
this recently.
Did Postgres
allow for accessing both
versions of the road?
I don't remember as well.
I'm only returning...
Well, if you delete, definitely
you have ability to return
old thing you deleted.
Well, there's only the old thing.
I mean, everything is deleted.
Let me quickly check.
Why not?
But with updates, I mean, SQL Server always supported both versions,
so before and after.
And DB2 with the standard syntax supports only,
you have to choose which one, but you can't return both.
And Oracle 23.ai now supports both as well, like SQL Server.
Oracle uses the Postgres syntax,
the non-standard syntax, also with the returning
keyword, so
maybe we'll stop it from there.
If you don't do it yet
in Postgres. I think in Postgres
we cannot, for updates, we cannot
return old
data. Yeah, we have access to it
in triggers, but it's different, right?
But, yeah, so
only the
new value can be seen there.
But for deletes, it's... Probably more useful.
Well, I can imagine some cases
when we want to report, for example,
what happened.
Yeah, when you do audit, you want to
have both. Yeah, maybe
let's add it right supports this delta
t data change delta table and it accepts a merge statement as well so in principle in the standard
you could have returning with a difference but it feels like more like a heavier approach in terms
of manipulation returning is just like one line right right? Return a star, that's it. Yes, yes.
Super easy.
It seems easier, yeah.
Also, there are tons of limitations.
I mean, you put your data change delta table inside of a select query,
but then you can hardly do anything with that select query.
For instance, I'm not sure exactly what is forbidden,
but unions, I think, are forbidden and joins are forbidden.
So maybe even aggregations, I'm not sure anymore, what exactly what is forbidden but unions i think are forbidden and joins are forbidden so
maybe even aggregations i'm not sure anymore but you're very limited with what you can do so i'm
not sure if there's any benefit of allowing that in the select statement as opposed to just returning
but ultimately it's kind of the same thing as what postgres does when you put dml statements in in
with so that that kind of behaves the same way,
at least from a user perspective.
Speaking of with, I know Juke supports CTEs.
Does it support recursive CTEs?
Yeah, yeah.
Good. Okay.
So everything, lateral join and so on, everything.
Yeah, yeah.
Cool.
There's a high chance if SQL supports something
and it's not really very funky like the match-recognize class, then
Juke will support it as well. So match-recognize
could be supported as well
in Juke, but it's such an edge case
and still only supported in Oracle
and some very esoteric
database products that Juke doesn't support yet.
So I'm skipping
this for now, but with is
everyone uses with, so Juke
can support it as well.
Including recursive with i mean there's some table valued functions that are generate series in postgres
yeah which have to be emulated elsewhere so if you want to ever probably hardly anyone does that
migrate from postgres to uh whatever uh and you want to translate your old Generate Series queries, then Juke will translate that to a recursive width.
Cool. Nice, nice.
In fact, I feel like you skipped over that a little bit,
but that's such a cool feature that basically...
Translation?
Yeah, exactly.
We can get access to standard features that Postgres hasn't implemented,
or whatever database we're using hasn't implemented yet
because you do the transformation, I guess.
How do you, translation, did you say?
Yeah, I call it translation.
So there's even, you can use Juke as a purely translate product.
Some customers actually do that.
Mostly when they migrate from Oracle to Postgres,
you can just embed Juke as a JDBC driver into your Java application,
and you don't actually have to use Duke directly.
So let's say this is a legacy application,
and you use a JDBC or something JDBC-based like Hibernate or MyBattis,
and you just put Duke in the middle as a translating layer,
and probably 80% of your queries will work directly on Postgres.
Of course, there's always edge cases, and you have to do testing.
But at least your migration project
will go much, much faster, because you can directly
use Oracle queries on those other database products.
Like for instance, Oracle has connect by,
and that can be translated to with queries, with recursive.
That's very hard, so some cases don't work.
But maybe many cases do, and you don't have to actually look at those anymore.
Oracle's special outer join syntax can be translated to left joins in standard SQL,
so your migration just goes much faster.
You can also use it not as an embedded library, but just as the website is free,
or you use it locally as a CLI tool, where you just have files, you have input files,
and Juke translates those directly,
and you can manually check if it's correct.
So translation is a, I'd say, minor use case.
A lot of companies work with only one database product,
and maybe in 10 years they might reconsider their choices and migrate,
but even then, most companies stick with
the Davis product they started with.
And so, many
Juke users don't actually use this feature.
But some do.
May I ask a slightly off-topic
question about JDBC, since you
spent a lot of time with Java, obviously?
There is such thing as extra
float digits, right?
Extra what? Extra float digits right extra what?
extra float digits okay
there is such setting
and by default
it's zero in Postgres
or one
yeah
no one
it's one
I've never heard of it
okay
but
yeah
I remember some
bugs
in my code
and I usually
try to use
pure SQL and PSQL, if I can.
I remember that Java developers implemented
very same thing, same queries, behavior was very different.
And then I noticed that DB were, who else,
other Java-based IDEs also behaved the same way
Java application did.
And it was not the same as I saw in PSQL.
It happened with these extra flow digits.
Okay, if you didn't notice this, this is maybe like some very edge corner case.
But I wonder, like, why GDBC keeps different default?
There's, like, layers of some defaults.
This can be very annoying.
But, okay.
Well, the main thing that you always see,
the main difference when you work with JDBC
as opposed to P-SQL is,
in P-SQL you usually don't work with bind variables,
so you don't have the whole kind of effect
that you get when you have bind variables,
although in Postgres this might not be that big of a difference. get when you have bind variables, although in Postgres,
this might not be that big of a difference. But when you work with JDBC, you always have prepared statements and bind variables. And let's say my background is mostly Oracle. So
with bind variables, you have an execution plan cache, and the cache will store some assumptions
about your bind variables, like a normalized distribution and expectation regarding cardinalities etc so your actual bind variable
might be if your data set is skewed your actual bind variable may be completely different from
what oracle expects and and then the execution plan will be bad but from what i take uh postgres
doesn't have this problem because it doesn't have this feature. So this would be one of the main differences.
But you were referring to some logical error, not...
Yeah, default and values returned are different from what I see in PC.
Yeah, but I've never heard of this.
Well, okay.
Maybe it's already fixed, by the way.
It was like five years ago or so I saw it.
So, yeah.
Good. already fixed by the way it was like five years ago or so i i saw it so yeah good well on the on the topic of jdbc i mean are you do you have any issues with it it seems like it's a remarkable
piece of software that's just chugging along but equally it doesn't get much love it doesn't get
much love like i only i generally only hear negative things about it but it must be amazing
given how how much it's used it i think it's the best API that Java has,
even better than collections and everything else.
It's really, I mean, okay,
maybe they stole it from Microsoft from ODBC.
It's kind of the same thing.
But it's really the best thing that Java has
because it's such a good standard.
I mean, everything can build on top of it.
And you don't even have to write SQL queries.
So there's, for instance, Neo4j,
which is a NoSQL database with their own query language, and you can use JDBC with that database as well, as long as you want to have some tabular results. And it's a very good abstraction
of the network protocol layer. And the reason why people hate it, it's not targeted at developers,
so you can use it directly if you want. But it's a bit
hard to use. You have a lot of objects. You have the connection object, and you have to create a
statement object, and you have to remember to close it. And you have to remember to create a
result set object and close that as well. And it's not integrated with the rest of the JDK libraries.
Like, for instance, when you iterate the result set, you have your own API methods instead of
just a list or iterable. So you can't use the for each loop on a result set, for
instance. So the ergonomics of JDBC is not up to date. I once tried to convince the spec expert
group, it's a mailing list, to update their API. There's a couple of things that I think at the
time it was Java 8, but even now you could improve a couple of things that I think at the time it was Java 8, but even now you could
improve a couple of things without changing anything fundamental in JDBC. For instance,
you can skip the statement part. A lot of times you don't actually have to think about preparing
the statement explicitly. You have to do it when you want to reuse the prepared statement.
So you can save some time in case there's a resource or some cache connection
directly. But a lot of times that's not the case in some database products. And a lot of times it
doesn't even matter at all. So you could just have a connection and run the query. So that's a minor
improvement that would just reduce one step for each statement you write. Yeah, but Oracle couldn't allocate any resources to such a project, even if the people on the
list agreed that many ideas were reasonable.
So I guess that's it.
It's an amazing piece of software for integration products like Juke or Hibernate or MyBatis
and everything else to build on top of, and all the drivers, they just work.
So a database company can just publish a driver,
and it will work in all kinds of software.
For instance, if you work with dBeaver,
it just works with all the database products, right?
So you don't have to think about,
how do I connect to this new database thing,
like ClickHouse or DuckDB or whatever?
It just works.
All these database products, they just work
because they all use the same API in Java. So it's really a wonderful piece of technology, I think. So I totally agree.
Yeah, this is exactly where I saw this problem comparing to, I mean, in DB we're comparing to
Java and I realized, okay, it's also Java application. So yeah, good.
Yeah, but that was probably
just the driver bug.
I don't think it has to do
with JVC as an API.
I mean, obviously,
the drivers can have bugs.
Right, right.
So you see bugs often, right?
I mean, in these...
Yeah, not necessarily
in the drivers.
I mean, the drivers,
they really solve
a very basic problem.
So it's mostly about
network connections and these kind of things that no one actually cares about unless you
have to optimize something like multiplexing or whatever. But I think this stuff kind of just
works. And when I find bugs, it's more SQL related. So it's inside of the database. And
I do find a ton of them. Very rarely in Postgres, I have to say. I think
in my career I found two or three bugs in Postgres only.
Good.
As opposed to hundreds and others.
We didn't pay you to say that.
No. It's really very good. It's surprisingly good. I mean, also absolutely
no regressions at all. So if there's a bug, it's an edge case that no one thought about,
and it's really a very weird kind of thing.
By the way, how do you check regression?
Just conversion from...
In the database products?
Yeah, for Juke, for example, some code is written,
and then you just check the conversion to SQL,
or you check results as well?
No, you can't just assert the SQL statement.
I mean, Duke's integration tests have a standard database, which has about five tables.
It's very simple.
So there's a bookstore with books and authors and it's not a big database.
So it has four books and two authors, four languages, this kind of thing.
But then I just know there are four books and they have these titles and every kind of query has to return exactly the same result.
So there are so many queries being run in each database product, maybe about 10,000 queries, I think, and also update statements and all kinds of statements that make the assumption that if you have this kind of input database,
then this must be the output.
Or also, if I create a table with Juke,
then it has to have these properties,
and throughout the API it has to be the same thing
for every database product.
Right. I just wonder, earlier you mentioned that
usually database systems maintain reverse compatibility,
but sometimes it's broken.
For example, in Postgres 12, CTEs, before they were materialized, like optimization fans,
so every step is materialized before 12.
But in 12, default behavior changed.
So is it something that Druk should care about or not?
I don't think that's a logical regression. I mean, it was a conscious decision
and it only affects the performance.
There's no logical difference of the result.
Databases are allowed to do these kind of things.
I recall there was once a logical change
in the update statement in Postgres,
which was an incompatible change
when you update a row.
So you use the row syntax,
you use parentheses,
and you update two columns at once.
And suddenly PostGrid required the row keyword,
but only there.
This was a very weird thing.
There was one more recently,
one more change related to subqueries.
I don't remember exactly.
It was a demonstration that included
generated series, I remember.
And yeah, maybe it was like up to five years ago,
but there was some change which was logical as well.
But I understand, you're like SQL standard,
don't care about indexes,
care only about result. Yeah. Right, I understand, you're like SQL standard, don't care about indexes, care only about result.
Yeah.
Right. I see, I see.
No, I mean, a juk doesn't make any guarantees at all with respect to performance either. So
from a juk perspective, juk doesn't try to fix your SQL. If you write a really crap query with,
I mean, crap, what does it even mean? So if you write a bad query with a lot of nested subqueries,
et cetera, et cetera etc etc in principle there's
nothing wrong with it right so there's you will find hundreds of blog posts that especially from
from old times when optimizers were really bad still where people advised against using derived
tables then they advised against using correlated sub queries and then but there's nothing logically
wrong with these things right so you can nest as many levels as you want, as long as it's correct.
It's correct, and the optimizer should figure it out.
And who is Juke to judge you for your query style?
So Juke doesn't fix these things or even assert anything,
and it would be completely wrong for Juke to do these kind of things
because you want to write exactly this SQL,
and Juke should render exactly this SQL.
You have your reasons, right?
I understand. And my question is maybe like, I'm curious how do people usually approach performance?
For example, if some query is written and using jukes
with like chain of select from blah blah and
then it turns out
it's slow
and then we find a fix
and we need to adjust something and we query
if it was rock SQL
is it usually simple to adjust
a juked version of query?
Yeah, I mean there's no difference
I mean
you're probably going to use
I'm not sure anymore what the postgres version is called
but you have your statistics views in your database your performance schema and you query
those to see what's what went wrong and then you make assumptions you rewrite the query and you
test again maybe even even using a benchmark of course then the benchmark is written in java but
i don't see a difference when you write a Duke query or a native SQL query.
It's the same process.
I was watching a really good video by Kevin Devan who did a talk on Duke,
but one of the things that he really liked about the product
was it logs out to the console,
the actual SQL statement as well that it ran.
When you do a debug level.
Yeah, which is super helpful for performance.
So if that was slow,
you can then run it in your favorite editor
in P-SQL wherever with explain, analyze, etc., etc.
Buffers.
Yeah, always buffers.
It's a hot topic this week.
But yeah, so that seems like a really developer-friendly way
of getting back to the SQL
and then diagnosing the performance issue.
This was a feature from day one.
My assumption was that when you develop, you have debug mode logging.
And when you go to production, you switch that to info or even warning.
So you're not going to have the performance penalty of re-rendering the query.
It's even formatted.
So you can actually see the query.
It's not just one line, which is when you execute the query, it's just one line of SQL
strings to have it more compact. But when you debug log it, the version contains the bind
variables. So you can just copy paste the whole query with the bind variables in a formatted way
and study it. I thought that was what everyone wanted. And also you get the first five records
of the result set also in the debug log, because that's probably also what you want while you develop, at least.
Yeah, in a pretty little table.
So you can immediately see what's going on.
And even then, I mean, one of the reasons why I implemented it this way
is when you have a huge result set with a lot of columns,
that bothers you while developing.
So you kind of start thinking, do I really actually need all these columns?
Because if you don't, then you have a better debug log.
So I kind of think you have to punish developers
as early as possible for their performance problems.
But I mean, Juke doesn't judge you.
Juke doesn't judge you.
It just gives you some tools.
Just giving you some convenience looking at logs.
Okay.
Yeah, that's interesting.
So reduce number of columns.
Good.
Changing tact a little bit,
I think you've got
a really interesting perspective
on Postgres from a broader landscape
of SQL dialects
and databases in general.
What kind of things do you see
that are relatively unique to Postgres
or do you see people in the Postgres world particularly using or liking?
I mean, the one thing that is very unique to Postgres, two things from my perspective.
The first one is it's very developer-centric.
So you can see that with other database products like Oracle, DB2 or SQL Server,
they're very production-centric, very operation-centric.
They used to base their entire sales on these kind of things.
So this means you have a lot of production tools as well,
which are great, actually.
So I'm still waiting for the like of Oracle Enterprise Manager,
if you know it, for Postgres,
where you can analyze production workloads
and query what kind of workload you had five months ago,
and it's still there. I mean, you have to pay for these kind months ago, and it's still there.
I mean, you have to pay for these kind of extras,
but it's still there.
You can still analyze anomalies in terms of performance,
what happened five months ago.
So these kind of tools are a bit lacking,
but on the other hand, because it's so developer-focused,
you have this whole extension system,
which is unprecedented, and I haven't seen it since either.
So anyone can extend anything in any layer of the database product.
So you have these wonderful extensions like PostGIS, for instance,
which is not really an extension because it's also part of the SQL standard,
but it has been implemented as an extension just, I guess, to show what is possible.
And such a huge additional product can be made as an extension, just, I guess, to show what is possible. And such a huge additional product can be made as an extension.
And you have indexes and data types and all kinds of things
that other vendors have to either offer out of the box or they don't have it.
You can't extend Oracle, really.
So that's really something very, very amazing.
But as I said, from the other perspective,
from an operations perspective,
this is something that personally,
I think Postgres is still very much behind Oracle,
for instance.
So I'm an Oracle guy.
I can't really comment on SQL Server or others.
But the Oracle database where I worked at
before I made Juke,
it could easily handle very, very complex queries
that produced 500 lines of execution plan
on billions of rows,
and it would run in milliseconds.
And I don't think you can do that with Postgres.
I wouldn't risk it.
So we all remember the days
when there was a hard limit
of number of joins in Postgres,
from which, starting from where,
you didn't have any smart optimizer anymore
to reorder the joins in the right order.
I forgot what the limit was,
but I think you had to be very careful
if you have more than 10 joins or something like that,
because then you would just join from left to right syntactically,
and this is horrible.
Join collapse limit should be eight.
Yeah, that's the join collapse limit.
So once you know these kind of things,
obviously make the right choices,
and you probably
won't run into big issues, but it's just, it's a very unique focus that I often missed in the past
from Oracle, for instance, or completely ignored developer experience for decades. It was like,
you don't even have a Boolean type, right? So this is hard for me to understand. It's such a simple
thing, you think, it's such a simple thing, you'd think. It's such
a simple thing, so very, very useful, but Oracle wouldn't have implemented it until recently. So
this is a focus that Postgres has, and few of the others had. Even MySQL wasn't very developer
friendly from that perspective. So this is very unique. If developers were the only ones to choose the database product it's
always postgres because i mean you're just gonna implement everything with postgres you can write
store procedures you can have your own indexes you can extend it yourself probably you don't
have to because someone already wrote an extension for whatever you need oh the second thing is it's
it's amazing how standards compliant it is so a lot of times Postgres really waits until something is standardized
before they implement it, instead of innovating.
Right now, we are waiting on UID version 7 because of standards.
It didn't get into version 17 of Postgres because RFC was not finalized.
Every library already started to support it, but Postgres decided to wait on RFC,
which is a very conservative decision, right? Most of the times, yes. So I think that's a
good thing in the end, because the standard, in my opinion, in most parts is well written.
It's a very complicated document, of course, but I think in terms of syntax,
everything feels like SQL.
So some database products think they have to invent some syntax,
and it just doesn't look like SQL anymore.
It's just a very weird, funky thing that doesn't fit the rest of the language.
And the standard is very, very consistent, I think, in most parts.
And to wait for the standard to appear is a good thing.
So sometimes there's an exception to the rule, as I said, on conflict was one of these exceptions.
JSON as well, in case of which I think it was something everyone needed and the standard was severely lacking.
And Postgres just did their own thing.
So I kind of think a lot of people kind of regret the heterogeneity of the JSON API.
So you have to always consult the docs to see if something uses a JSON path, which is a standard thing in the JSON world, or if it's just an array of path elements, or if this function returns a table or a set of something or whatever data structure. So I think now with the JSON standard,
this is going to be much cleaner
once it's completely rolled out.
So I'm not up to date
where the Postgres stands.
I'm still not supporting that syntax.
But from what I've seen,
it's going into the right direction.
Yeah, some parts support it.
Some parts, I think, are still bending.
Yeah.
Yeah, because I think a lot went in in 17.
Right, yeah.
And speaking of this, I'm going to be really curious.
I saw a commit recently regarding temporal primary keys.
Is that correct?
And it is absolutely not the standard.
So this is very curious.
I'm very curious about this.
Are you aware of this commit?
I'm not.
It's still on a branch, I think.
It's not going to be merged.
But, I mean, Postgres has these range types.
Right.
And they're useful, obviously.
But the SQL standard since 2011
defined how you should write temporal databases,
both bitemporal, so you have system versioning
and then logical versioning.
And this is something that is still lacking in Postgres.
And I think a lot of people would like to see that.
Yeah.
In Postgres there's opinion, like there's an opinion that it's in the past, right?
Because it wasn't in the past, it was removed.
Right.
So that's a nice temporal joke.
An extension, I think?
It was in Postgres originalgres, the original Berkeley version.
Oh, really?
This is how MVCC inherited from these
versions of tuples.
So your time travel
basically was supported very, very
long ago, but then it was removed.
So, like, interesting idea.
I think for performance reasons?
Yeah, so I think so.
But yeah, this is a super interesting like specific topic sometimes we
do need to implement something with triggers and so on like which is very yeah yeah it's very hard
to do manually i think the standard helps a lot with with uh i mean you can write an update
statement and it's automatically transformed into a delete and insert so if you have to split the
record logically this is really a pain to do manually, right?
It's not super pain because you can have,
for example, Postgres, you know,
I remember your very old post,
everything is a table.
Or maybe it was a post of like 10 tips or something
and one of the tips was like,
consider everything as a table, right?
And in Postgres, you can collapse all columns
into a single record type and insert.
Like, for example, in JSON form right now now i would insert it and i would have a shadow table i implemented this several
times you have a shadow table which serves as like archive for all tables which has has trigger
to track changes right and then you you just track timest action, like it was delete or update, who did it,
and then whole row you pack it in a single column, breaking first normal form.
Okay, but then you normalize in a very bad way.
And secondly, this sounds like a lot of work.
It's not a lot of work.
It's like three, five lines of code, that's it.
But it's not possible in JOOCfree version because triggers are not supported, right?
Triggers are supported only in enterprise
version. I mean, if you want
to write the trigger with Jukes,
yes, then you have to use the
commercial editions, but
hardly anyone writes triggers with
Juke. I mean, this is only useful when you have to have
when you support five
database products, and you
want to write the trigger only once, you write it
with Juke. But I mean, Juke
doesn't care if there's a trigger on the database table,
right? So if there's one,
it's transparent to Juke.
Yeah, okay.
You can't really say Juke doesn't support them.
It's just you can't create them
with Juke. Great answer.
Free version, you mean?
Yeah.
I'm very curious. I'm not against this.
It's good. I mean, it's great
that you build both
open source product on Apache 2
license and business.
I was just curious,
have you ever had
thoughts about going
full closed source or
fully open source?
Well, it was fully open source in the beginning.
So when I started this, I had no idea this could be a business.
I mean, I just had to try it and then put it out there for people to see
and give me feedback.
Is this a good idea?
Should I do it like that?
And there was a very active early community of people who worked with the
all open source juke.
And then I started seeing even more and more banks and insurance companies using it, but never contributing, as you know the story.
And I thought it was my fault.
Oracle users.
Let's generalize.
Oracle users.
Well, that was the choice I made, but it doesn't matter, you know?
It doesn't matter what they're using in the end.
I just never got any...
I mean, I got contributions, but I didn't care about code.
I cared about money much more than code.
So to make a sustainable business, in the end,
I thought there are models where you pay for support.
But in my opinion, if you want to make money with support,
then you have to earn your money by working for it.
So I earn money when I work. and if i have a license model then i earn money when customers work right so i thought this
was going to scale much better i like that a lot i also think if you charge for support you're
incentivized to hurt the experience and i think i wasn't going to say that but that's how i think
but i think i think it's not surprising that as somebody that focuses so much on developer experience and wants to keep pushing that forward and making that good, it's unsurprising that you don't want to be incentivized to make it bad.
That makes perfect sense to me.
Yeah, I mean, that's my thought as well. I don't want to name companies or names, but I've heard some open source products
are very, very unusable
and you really need the support
and not all of them, of course.
I mean, Postgres and Linux are examples
which this isn't the case.
But there are some products that are super complicated
and they're just pushed out there
and you're free to use it,
but you won't figure
it out. So you have to call someone even to install it. And that's what I'm thinking. So
in those cases, you need a sales representative that pushes the product down someone's throat
and they make a lot of money with the support contract. And I think it's less honest than
licensing, which people think about, okay, is this a good product?
Should I really spend money on it?
Because I have to pay it up front, right?
From the day one, I have to pay money to use this product.
And they're going to test it really, really, really well.
And there's a lot of competition.
So, I mean, people could use Hibernate or whatever.
But people choose Duke despite the cost, which in the end is very cheap for a company.
So, I think cost is never the issue.
But they're still going to evaluate it because they have to still go through purchasing and make it a business case for the purchase.
So I really want to have a very, very good experience.
And one part of this is I also always answered all the questions on Stack Overflow within a short period of time, because if I answer it, it's going to be authoritative and I'm going to answer with all the details and all the caveats and all the edge cases.
So anyone else who finds the question will find this question and the answer and they're not going to ask me again, right?
So Google is my friend and now ChatGPD is my friend as well, because ChatGPT can answer Duke questions very well
because of all the answers I've given on Stack Overflow.
Yeah, open knowledge, right?
Yeah.
I don't want to earn money with knowledge.
I mean, I could sell books and all these kind of things,
but it's just the wrong kind of effort, I think.
Destruction, right?
Yeah.
From the core.
I wonder, for enterprise, right? Yeah. From the core. I wonder,
for enterprise,
like,
paid versions,
is source available
or it's like kind of...
Yeah, you get the source code.
Oh, that's great.
And the right to modify,
so you can,
if you have a bug and...
You can fix it yourself
and propose a fix.
No,
or maybe it's not even a bug,
but you think it's a bug,
but it's actually not.
So,
you disagree, you can fix it, and some companies do that.
This is what we like in commercial software, like full transparency and see how it works inside. Yeah. I mean, most people don't actually fix stuff, but maybe they want to debug stuff to better understand it.
And if you've ever worked with a commercial JDBC driver like I do, with the Oracle driver, it's always like,
oh my God, there's a null pointer exception in the middle of the driver,
and I have no idea what I'm doing wrong from an API perspective.
So I have to reverse engineer it, not with the code,
but just try an error with API calls until I find out why the exception,
why the bug inside of the driver happens.
And with source code, it would be much easier, right? So bugs happen, but if you know exactly why the bug happens, then you have a
much easier life than if you have to guess. Yeah. Last question from me. What do you think about
nulls in general? Nulls? Nulls, yeah. Or they're less than the curves. I mean mean I understand why they have been
introduced because it was easy to do
right and
I mean there are two takes here
so nulls in most languages
are mostly a pain
because they lead to exceptions
and null in SQL is
mostly a pain because it doesn't lead to exceptions
it's just a value
like many others but it's a special value and it just propagates.
It's these kind of philosophies
and both are right and wrong.
But what else would you want to do, right?
I mean, you kind of need an absent value, right?
I'm not sure if the idea of the unknown value is so useful,
but an absent value is very useful.
I mean, sometimes you just, you can't normalize everything to sixth normal form to avoid nulls,
right?
So it's very pragmatic.
It's very pragmatic indeed.
Yeah.
But in SQL, we have so many places where people make mistakes all the time.
Yeah.
SQL is weird about it.
I mean, this unknown value, I used to do a SQL training and it was about 30 minutes about it was just nulls
and how it's different here from there.
For instance, two nulls are not distinct, right?
Yes.
But they're not the same either.
So there are different terms in the SQL standard,
same and distinctness.
And then even operators are inconsistent.
So personally, I think Oracle did the pragmatic thing when they concatenate,
when you concatenate a null value to a string, you probably don't want the
result to be null, I think.
You would just want to ignore the concatenation.
This is pragmatic, but it's very unlogical and leads to a ton of side
effects, which probably means that it was a bad idea to start with. And there are NALs in other languages, I mean, in regular application languages.
And I remember how I struggled trying to explain this three-value logic and
observing what people do in code, in regular application code with NALs. They do things I
would never do in SQL, right? So different philosophies, right?
Yeah, there are different philosophies.
But I'm not sure if the SQL philosophy is really useful.
I mean, I'm trying to explain how it works by explaining not in, the not in predicate.
Oh, yeah.
When you have a null and not in, everything falls apart.
I mean, it's very logical and consistent.
You can explain it with three-valid logic, but it's never useful, right?
This kind of invalidates the whole predicate.
Yeah, it's a bomb.
It's a bomb.
Yeah.
And, I mean, usually people don't put a null value
in a not-in list, but what about a subquery?
So not-in select something,
and then you have a null value in there,
and you don't get results.
Why?
It's horrible.
Horrible.
But it's a fun anecdote. We had a whole episode about nulls with Michael long ago. Yeah. there and you don't get results. Why? It's horrible. Horrible.
We had a whole episode about NOS with Michael long ago.
This topic is definitely
it's not going to
go away and I'm quite
sure I will make more mistakes with
NOS in my life. But maybe the
one thing I really think SQL should have done
differently, and there are some database products
that actually do it differently,
is the default data type should be not null, right?
So when you create a table, when you don't write anything at all,
then it should be not null.
The fact that the default is nullable is wrong. I like that.
This forces you to define default.
Or, well, no, no, no.
The default word is confusing.
You mean if I create
id integer 8,
I shouldn't have to say
not null? Yes.
You should have to say, I want
this to be nullable,
because then it's a conscious decision, and
you know what you're doing. But the fact that
maybe by accident you have something
nullable, that's just
wrong.
And this isn't standard, right?
Or if you remember.
I don't remember.
I think maybe the standard says it's implementation specific.
We definitely can't
change it now though.
No, you can't.
But I remember Sybase is one of the
database parts that does it differently
and ClickHouse as well.
Interesting.
Another one I forgot.
So in ClickHouse, every column is by default not null?
Yes.
That's cool. Okay.
Although they messed it up as well,
so you can still insert null in a not null column.
You just get a zero.
What?
It makes it worse.
Okay.
That's terrible.
Amazing.
I think this is
a performance idea.
You actually,
if you just ignore
the check,
you kind of don't
check for nulls,
then you're faster,
but whatever.
Lukas,
I'm super conscious
of your time.
Is there any last
things you wanted to say?
I hope there are
many, many more
exciting Postgres releases in the near future.
I'm looking forward to all the good stuff that's coming,
especially if it's ever coming, temporal tables from SQL 2011.
That would be very exciting.
Nice. Okay, cool.
Thank you for coming. I enjoyed it.
Thank you for having me.
Yeah, very much so. And catch you next week, Nikolai. For having me. Thank you. Yeah, very much so.
And catch you next week, Nikolai.
Bye-bye.
Have a great week.
Bye.
Bye.