Postgres FM - jOOQ

Starting point is 00:00:00 Hello, hello, and welcome to Postgres FM, weekly show about all things PostgresQL. I am Michael, founder of PgMustard. I'm joined, as usual, by Nikolai, founder of Postgres AI. Hey, Nikolai. Hey, Michael. How are you? No, not falling for that. And we are joined by a wonderful guest, Lucas Eder, founder of DateGeekery and the creator

Starting point is 00:00:18 of Duke, a very popular framework. Thank you so much for joining us, Lucas. Thank you for having me, Michael. All right. It's our pleasure. So how do you describe Duke and what's it helping folks with? So Duke is an internal domain-specific language written in Java where it models the SQL language directly as a Java API. This helps Java developers to write SQL queries more efficiently and more correctly within their Java applications. There's a code generator that generates the entire schema as Java objects,

Starting point is 00:00:49 and you can interact with those objects and create type-safe queries. So that's the base of it. Yeah, super cool. But I'm super impressed with how much you focus on developer experience in general. I hear quite a few people say they really love Duke, and they give about 15 different examples of things they love about it. But what are the main things you think that people particularly like about it? Well, the first thing is, and I've studied the market extensively before

Starting point is 00:01:15 making Juke, and the first thing is, it really looks as much as SQL as it's possible to do when you model an internal DSL API in Java. So of course, there's some limitations. But when you model an internal DSL API in Java. So of course there's some limitations, but when you have in mind there's a group by clause, you just start typing group by where you expect it and then it just starts compiling. So there's no surprises. There's almost no surprises in terms of SQL syntax. So if you know SQL, then you immediately know the Duke API.

Starting point is 00:01:41 You don't really have to learn it. I guess that's one of the most popular things. But other than that, it's also very nicely integrated into the Java language or Kotlin and Scala. That works as well. And if you've ever coded a PL SQL in Oracle or a PGPL SQL in Postgres, you kind of like the way how the SQL language is embedded in the procedural language. So if you loop, if you iterate over a result set in those procedural languages, you have the same kind of type safety and you have the possibility, for instance, to insert bind variables into your statements at the right spot

Starting point is 00:02:17 and you don't have to worry about the infrastructure and the logistics of connecting to the database. And it just feels embedded. And I think that's what a lot of people really want to do, even more than having the type safeties, this feeling of embeddedness into the target language. And when you map your result sets to Java objects, that also feels very native. So it feels like the database is part of the Java, of the JVM.

Starting point is 00:02:42 Yeah, even to the point of getting, like, autocomplete suggestions in IDEs. That is super nice. Yeah, there are little details like this. So when you say autocomplete, there's also the possibility to comment on your database tables and comments, and that translates to JavaDoc.

Starting point is 00:02:59 So whatever you comment directly in the database, it's translated to documentation inside of your Java program. You hardly ever have to go to the database to see what's there. Nice. Yeah. In terms of the history, I did look it up in, well, at least the GitHub history. How long have you been working on it and how does that feel?

Starting point is 00:03:20 I think the first public version started in 2009. There were prototypes before that in 2008, I think, but 2009 was the first public version. Not on GitHub then. I don't think GitHub was very popular at the time. And then Subversion studies at EPFL in Lausanne. There was a professor who said you could model any kind of programming language as an internal DSL. I'm not sure if he used that term yet. I think DSL was more coined more recently. But he said you could model any language in terms of an API if you just have Java interfaces that return interfaces that return interfaces. And this kind of idea stuck with me for all this time, but I didn't have a use case back then yet. And when I started working in the industry in 2006,

Starting point is 00:04:15 it was my first Java job. I did some PHP work before, but Java started in 2006. I've seen that all the companies that I've applied to and even the one that I worked at, they implemented dynamic SQL libraries because this was something that everyone needs, right? So everyone has dynamic SQL queries when you have a UI with various input boxes and they're all optional. So you have to compose your SQL queries in a dynamic way. And everyone wants to avoid string concatenation because it's not safe and it's boring.

Starting point is 00:04:46 So they did that for both. They had one for SQL and they had one for Ibernit query language, which is essentially the same thing, just more limited, but the same concept. And no one actually thought about making this a true DSL. It was always just a query builder with some weird API. So you had Java-style methods

Starting point is 00:05:06 where you add clauses to the query object, and it didn't feel like SQL. It felt like a tool to solve exactly this dynamic SQL problem, but it didn't feel like you didn't enjoy using this thing. I think JPA still has this thing with their criteria API, which is still very useful if you want to do dynamic JPQL. But I've never heard anyone enjoy using that API because you have to learn it. So you have to learn one more thing. And it only serves this one purpose, or at least that's what it feels like.

Starting point is 00:05:39 It serves this purpose of dynamic JPQL. And that's when I started prototyping it. And in the very beginning, I had to first implement the infrastructure. It wasn't a very diesel-style API either. So I created a query builder that could do some select from where, very simple conditions. But then I started to really implement this idea because I remembered, okay, this professor, he had this idea,

Starting point is 00:06:06 and I'm going to try this. Is this really possible? And it turns out it was, and it's really crazy how many SQL statements today we have in Juke that are all using this kind of approach where you just start with the same object, and then you start typing, and you autocomplete your SQL syntax, and it's all one huge graph of and you autocomplete your SQL syntax. And it's all one huge graph of methods that helps you construct your SQL queries. So Duke has come

Starting point is 00:06:31 a long way since 2009, I'd say. Yeah, right. I was looking in your documentation about how many different, how many different, you call them families of SQL dialects that you support. So how many different databases, but also the different versions of different databases that may or may not support different syntax and SQL features. How do you maintain that? Well, first off, luckily, the database products are all very, very backwards compatible. If we take one or two of them, we don't really care as much. But most of them really

Starting point is 00:07:05 value backwards compatibility a lot. So supporting new versions is just looking at the new features and adding support for them, but the old stuff still works. So at least that's already covered there, more or less. But other than that, I mean, supporting so many dialects per se is, yeah, it's a lot of work and a lot of testing, of course, a lot of automated regression testing to make sure nothing breaks. And also, Juke really values backwards compatibility a lot, just like the database products. So when you embed the Juke query into your database application, you don't want stuff to break.

Starting point is 00:07:38 So if I'm not allowed to break stuff, I'm probably not breaking stuff. So this kind of simplifies maintenance as well, because stuff that once works is probably not going to break just like that. I mean, there's some internal refactorings, but there are no really major, how to put it, major paradigm shifts where everything works in a different way now, and I was breaking everything. So with automated regression tests, it's hard, but it's not that hard, I think. The hard part is just to find out some weird syntax that implements a standard feature on this particular dialect,

Starting point is 00:08:18 and that's quite creative at times, especially with the newer dialects that have just been released recently that are not that mature yet. Or some dialects are very opinionated. So you kind of have to think in terms of why did they make those decisions and how can I map standard SQL onto this database? But I think that's one of the things that people really like when they work, especially

Starting point is 00:08:44 when they work with different database products. So if you have a company that works with maybe Postgres and SQL Server, you don't actually have to think about the differences all the time. So you can just write substring in the Juke API and you don't care if it's substr or instr or whatever the database product calls it. It's just substring and Ju Duke will translate it for you. But there are differences as well, right? And JSON is a big topic, right? Huge. JSON?

Starting point is 00:09:14 Yeah. Yeah. Functions and, yeah. It's crazy. I mean, Postgres was one of the first to actually support JSON, but it didn't actually... Before standard. ...standardizing stuff.

Starting point is 00:09:25 So someone had standardized it, Oracle did, a couple of years ago, and now Postgres started implementing those standards. From what I've seen, I'm still not engaged with that part yet, but you'd think now that there's a standard, things have stabilized, but still every database product does it differently. It's crazy. The devil is in the details. Yeah, and have you like, do you need sometimes to check the

Starting point is 00:09:49 standard or like you don't need it at all? I do it all the time, yeah. I'm trying to, the Juke API, if it's not about a vendor-specific feature, the Juke API really tries to follow the SQL standard in terms of syntax. So a vendor-specific feature would be Postgres on conflict class,

Starting point is 00:10:06 which I think was a mistake. They should have done merge from the beginning and not invent something new. But now we have both. And in that case, Juke also models the on-conflict class like Postgres did. And then SQLite copied it

Starting point is 00:10:21 and a couple of others, I think. But for instance, merge is a standard SQL feature and many database products have some extensions, which I then study, but I always first look at the standard. What does the standard do? Because that's a thing that's going to still be there in 50 years. And different implementations might have their quirks, which sometimes I hope they deprecate eventually

Starting point is 00:10:45 and move towards the standard. Because usually I think the standard is quite nicely... Well, they behave differently. So many people got used to on-conflict. Yeah, I get the point of doing on-conflict. I mean, it's simpler for 80% of the use cases, obviously. Yeah. Also, Merge only recently received support of returning clause, which is not standard at all, right? As I remember. The SQL standard has the data change delta table, which was implemented by DB2. And then

Starting point is 00:11:19 some minor database products like H2 implemented this as well. And it's not as powerful as returning, especially because it only allows you to fetch either the data before or after the updates. So for update statements, which is also part of merge, you maybe want to return both

Starting point is 00:11:38 versions, the version before and after the update. With returning, in principle, that's feasible. so Oracle implemented this recently. Did Postgres allow for accessing both versions of the road?

Starting point is 00:11:53 I don't remember as well. I'm only returning... Well, if you delete, definitely you have ability to return old thing you deleted. Well, there's only the old thing. I mean, everything is deleted. Let me quickly check.

Starting point is 00:12:07 Why not? But with updates, I mean, SQL Server always supported both versions, so before and after. And DB2 with the standard syntax supports only, you have to choose which one, but you can't return both. And Oracle 23.ai now supports both as well, like SQL Server. Oracle uses the Postgres syntax, the non-standard syntax, also with the returning

Starting point is 00:12:30 keyword, so maybe we'll stop it from there. If you don't do it yet in Postgres. I think in Postgres we cannot, for updates, we cannot return old data. Yeah, we have access to it in triggers, but it's different, right?

Starting point is 00:12:46 But, yeah, so only the new value can be seen there. But for deletes, it's... Probably more useful. Well, I can imagine some cases when we want to report, for example, what happened. Yeah, when you do audit, you want to

Starting point is 00:13:02 have both. Yeah, maybe let's add it right supports this delta t data change delta table and it accepts a merge statement as well so in principle in the standard you could have returning with a difference but it feels like more like a heavier approach in terms of manipulation returning is just like one line right right? Return a star, that's it. Yes, yes. Super easy. It seems easier, yeah. Also, there are tons of limitations.

Starting point is 00:13:31 I mean, you put your data change delta table inside of a select query, but then you can hardly do anything with that select query. For instance, I'm not sure exactly what is forbidden, but unions, I think, are forbidden and joins are forbidden. So maybe even aggregations, I'm not sure anymore, what exactly what is forbidden but unions i think are forbidden and joins are forbidden so maybe even aggregations i'm not sure anymore but you're very limited with what you can do so i'm not sure if there's any benefit of allowing that in the select statement as opposed to just returning but ultimately it's kind of the same thing as what postgres does when you put dml statements in in

Starting point is 00:14:01 with so that that kind of behaves the same way, at least from a user perspective. Speaking of with, I know Juke supports CTEs. Does it support recursive CTEs? Yeah, yeah. Good. Okay. So everything, lateral join and so on, everything. Yeah, yeah.

Starting point is 00:14:18 Cool. There's a high chance if SQL supports something and it's not really very funky like the match-recognize class, then Juke will support it as well. So match-recognize could be supported as well in Juke, but it's such an edge case and still only supported in Oracle and some very esoteric

Starting point is 00:14:36 database products that Juke doesn't support yet. So I'm skipping this for now, but with is everyone uses with, so Juke can support it as well. Including recursive with i mean there's some table valued functions that are generate series in postgres yeah which have to be emulated elsewhere so if you want to ever probably hardly anyone does that migrate from postgres to uh whatever uh and you want to translate your old Generate Series queries, then Juke will translate that to a recursive width.

Starting point is 00:15:06 Cool. Nice, nice. In fact, I feel like you skipped over that a little bit, but that's such a cool feature that basically... Translation? Yeah, exactly. We can get access to standard features that Postgres hasn't implemented, or whatever database we're using hasn't implemented yet because you do the transformation, I guess.

Starting point is 00:15:29 How do you, translation, did you say? Yeah, I call it translation. So there's even, you can use Juke as a purely translate product. Some customers actually do that. Mostly when they migrate from Oracle to Postgres, you can just embed Juke as a JDBC driver into your Java application, and you don't actually have to use Duke directly. So let's say this is a legacy application,

Starting point is 00:15:50 and you use a JDBC or something JDBC-based like Hibernate or MyBattis, and you just put Duke in the middle as a translating layer, and probably 80% of your queries will work directly on Postgres. Of course, there's always edge cases, and you have to do testing. But at least your migration project will go much, much faster, because you can directly use Oracle queries on those other database products. Like for instance, Oracle has connect by,

Starting point is 00:16:16 and that can be translated to with queries, with recursive. That's very hard, so some cases don't work. But maybe many cases do, and you don't have to actually look at those anymore. Oracle's special outer join syntax can be translated to left joins in standard SQL, so your migration just goes much faster. You can also use it not as an embedded library, but just as the website is free, or you use it locally as a CLI tool, where you just have files, you have input files, and Juke translates those directly,

Starting point is 00:16:49 and you can manually check if it's correct. So translation is a, I'd say, minor use case. A lot of companies work with only one database product, and maybe in 10 years they might reconsider their choices and migrate, but even then, most companies stick with the Davis product they started with. And so, many Juke users don't actually use this feature.

Starting point is 00:17:12 But some do. May I ask a slightly off-topic question about JDBC, since you spent a lot of time with Java, obviously? There is such thing as extra float digits, right? Extra what? Extra float digits right extra what? extra float digits okay

Starting point is 00:17:27 there is such setting and by default it's zero in Postgres or one yeah no one it's one I've never heard of it

Starting point is 00:17:35 okay but yeah I remember some bugs in my code and I usually try to use

Starting point is 00:17:44 pure SQL and PSQL, if I can. I remember that Java developers implemented very same thing, same queries, behavior was very different. And then I noticed that DB were, who else, other Java-based IDEs also behaved the same way Java application did. And it was not the same as I saw in PSQL. It happened with these extra flow digits.

Starting point is 00:18:12 Okay, if you didn't notice this, this is maybe like some very edge corner case. But I wonder, like, why GDBC keeps different default? There's, like, layers of some defaults. This can be very annoying. But, okay. Well, the main thing that you always see, the main difference when you work with JDBC as opposed to P-SQL is,

Starting point is 00:18:36 in P-SQL you usually don't work with bind variables, so you don't have the whole kind of effect that you get when you have bind variables, although in Postgres this might not be that big of a difference. get when you have bind variables, although in Postgres, this might not be that big of a difference. But when you work with JDBC, you always have prepared statements and bind variables. And let's say my background is mostly Oracle. So with bind variables, you have an execution plan cache, and the cache will store some assumptions about your bind variables, like a normalized distribution and expectation regarding cardinalities etc so your actual bind variable might be if your data set is skewed your actual bind variable may be completely different from

Starting point is 00:19:15 what oracle expects and and then the execution plan will be bad but from what i take uh postgres doesn't have this problem because it doesn't have this feature. So this would be one of the main differences. But you were referring to some logical error, not... Yeah, default and values returned are different from what I see in PC. Yeah, but I've never heard of this. Well, okay. Maybe it's already fixed, by the way. It was like five years ago or so I saw it.

Starting point is 00:19:43 So, yeah. Good. already fixed by the way it was like five years ago or so i i saw it so yeah good well on the on the topic of jdbc i mean are you do you have any issues with it it seems like it's a remarkable piece of software that's just chugging along but equally it doesn't get much love it doesn't get much love like i only i generally only hear negative things about it but it must be amazing given how how much it's used it i think it's the best API that Java has, even better than collections and everything else. It's really, I mean, okay, maybe they stole it from Microsoft from ODBC.

Starting point is 00:20:13 It's kind of the same thing. But it's really the best thing that Java has because it's such a good standard. I mean, everything can build on top of it. And you don't even have to write SQL queries. So there's, for instance, Neo4j, which is a NoSQL database with their own query language, and you can use JDBC with that database as well, as long as you want to have some tabular results. And it's a very good abstraction of the network protocol layer. And the reason why people hate it, it's not targeted at developers,

Starting point is 00:20:43 so you can use it directly if you want. But it's a bit hard to use. You have a lot of objects. You have the connection object, and you have to create a statement object, and you have to remember to close it. And you have to remember to create a result set object and close that as well. And it's not integrated with the rest of the JDK libraries. Like, for instance, when you iterate the result set, you have your own API methods instead of just a list or iterable. So you can't use the for each loop on a result set, for instance. So the ergonomics of JDBC is not up to date. I once tried to convince the spec expert group, it's a mailing list, to update their API. There's a couple of things that I think at the

Starting point is 00:21:24 time it was Java 8, but even now you could improve a couple of things that I think at the time it was Java 8, but even now you could improve a couple of things without changing anything fundamental in JDBC. For instance, you can skip the statement part. A lot of times you don't actually have to think about preparing the statement explicitly. You have to do it when you want to reuse the prepared statement. So you can save some time in case there's a resource or some cache connection directly. But a lot of times that's not the case in some database products. And a lot of times it doesn't even matter at all. So you could just have a connection and run the query. So that's a minor improvement that would just reduce one step for each statement you write. Yeah, but Oracle couldn't allocate any resources to such a project, even if the people on the

Starting point is 00:22:08 list agreed that many ideas were reasonable. So I guess that's it. It's an amazing piece of software for integration products like Juke or Hibernate or MyBatis and everything else to build on top of, and all the drivers, they just work. So a database company can just publish a driver, and it will work in all kinds of software. For instance, if you work with dBeaver, it just works with all the database products, right?

Starting point is 00:22:33 So you don't have to think about, how do I connect to this new database thing, like ClickHouse or DuckDB or whatever? It just works. All these database products, they just work because they all use the same API in Java. So it's really a wonderful piece of technology, I think. So I totally agree. Yeah, this is exactly where I saw this problem comparing to, I mean, in DB we're comparing to Java and I realized, okay, it's also Java application. So yeah, good.

Starting point is 00:23:04 Yeah, but that was probably just the driver bug. I don't think it has to do with JVC as an API. I mean, obviously, the drivers can have bugs. Right, right. So you see bugs often, right?

Starting point is 00:23:17 I mean, in these... Yeah, not necessarily in the drivers. I mean, the drivers, they really solve a very basic problem. So it's mostly about network connections and these kind of things that no one actually cares about unless you

Starting point is 00:23:29 have to optimize something like multiplexing or whatever. But I think this stuff kind of just works. And when I find bugs, it's more SQL related. So it's inside of the database. And I do find a ton of them. Very rarely in Postgres, I have to say. I think in my career I found two or three bugs in Postgres only. Good. As opposed to hundreds and others. We didn't pay you to say that. No. It's really very good. It's surprisingly good. I mean, also absolutely

Starting point is 00:24:02 no regressions at all. So if there's a bug, it's an edge case that no one thought about, and it's really a very weird kind of thing. By the way, how do you check regression? Just conversion from... In the database products? Yeah, for Juke, for example, some code is written, and then you just check the conversion to SQL, or you check results as well?

Starting point is 00:24:26 No, you can't just assert the SQL statement. I mean, Duke's integration tests have a standard database, which has about five tables. It's very simple. So there's a bookstore with books and authors and it's not a big database. So it has four books and two authors, four languages, this kind of thing. But then I just know there are four books and they have these titles and every kind of query has to return exactly the same result. So there are so many queries being run in each database product, maybe about 10,000 queries, I think, and also update statements and all kinds of statements that make the assumption that if you have this kind of input database, then this must be the output.

Starting point is 00:25:10 Or also, if I create a table with Juke, then it has to have these properties, and throughout the API it has to be the same thing for every database product. Right. I just wonder, earlier you mentioned that usually database systems maintain reverse compatibility, but sometimes it's broken. For example, in Postgres 12, CTEs, before they were materialized, like optimization fans,

Starting point is 00:25:35 so every step is materialized before 12. But in 12, default behavior changed. So is it something that Druk should care about or not? I don't think that's a logical regression. I mean, it was a conscious decision and it only affects the performance. There's no logical difference of the result. Databases are allowed to do these kind of things. I recall there was once a logical change

Starting point is 00:26:05 in the update statement in Postgres, which was an incompatible change when you update a row. So you use the row syntax, you use parentheses, and you update two columns at once. And suddenly PostGrid required the row keyword, but only there.

Starting point is 00:26:21 This was a very weird thing. There was one more recently, one more change related to subqueries. I don't remember exactly. It was a demonstration that included generated series, I remember. And yeah, maybe it was like up to five years ago, but there was some change which was logical as well.

Starting point is 00:26:40 But I understand, you're like SQL standard, don't care about indexes, care only about result. Yeah. Right, I understand, you're like SQL standard, don't care about indexes, care only about result. Yeah. Right. I see, I see. No, I mean, a juk doesn't make any guarantees at all with respect to performance either. So from a juk perspective, juk doesn't try to fix your SQL. If you write a really crap query with, I mean, crap, what does it even mean? So if you write a bad query with a lot of nested subqueries,

Starting point is 00:27:04 et cetera, et cetera etc etc in principle there's nothing wrong with it right so there's you will find hundreds of blog posts that especially from from old times when optimizers were really bad still where people advised against using derived tables then they advised against using correlated sub queries and then but there's nothing logically wrong with these things right so you can nest as many levels as you want, as long as it's correct. It's correct, and the optimizer should figure it out. And who is Juke to judge you for your query style? So Juke doesn't fix these things or even assert anything,

Starting point is 00:27:37 and it would be completely wrong for Juke to do these kind of things because you want to write exactly this SQL, and Juke should render exactly this SQL. You have your reasons, right? I understand. And my question is maybe like, I'm curious how do people usually approach performance? For example, if some query is written and using jukes with like chain of select from blah blah and then it turns out

Starting point is 00:28:05 it's slow and then we find a fix and we need to adjust something and we query if it was rock SQL is it usually simple to adjust a juked version of query? Yeah, I mean there's no difference I mean

Starting point is 00:28:19 you're probably going to use I'm not sure anymore what the postgres version is called but you have your statistics views in your database your performance schema and you query those to see what's what went wrong and then you make assumptions you rewrite the query and you test again maybe even even using a benchmark of course then the benchmark is written in java but i don't see a difference when you write a Duke query or a native SQL query. It's the same process. I was watching a really good video by Kevin Devan who did a talk on Duke,

Starting point is 00:28:51 but one of the things that he really liked about the product was it logs out to the console, the actual SQL statement as well that it ran. When you do a debug level. Yeah, which is super helpful for performance. So if that was slow, you can then run it in your favorite editor in P-SQL wherever with explain, analyze, etc., etc.

Starting point is 00:29:11 Buffers. Yeah, always buffers. It's a hot topic this week. But yeah, so that seems like a really developer-friendly way of getting back to the SQL and then diagnosing the performance issue. This was a feature from day one. My assumption was that when you develop, you have debug mode logging.

Starting point is 00:29:30 And when you go to production, you switch that to info or even warning. So you're not going to have the performance penalty of re-rendering the query. It's even formatted. So you can actually see the query. It's not just one line, which is when you execute the query, it's just one line of SQL strings to have it more compact. But when you debug log it, the version contains the bind variables. So you can just copy paste the whole query with the bind variables in a formatted way and study it. I thought that was what everyone wanted. And also you get the first five records

Starting point is 00:30:01 of the result set also in the debug log, because that's probably also what you want while you develop, at least. Yeah, in a pretty little table. So you can immediately see what's going on. And even then, I mean, one of the reasons why I implemented it this way is when you have a huge result set with a lot of columns, that bothers you while developing. So you kind of start thinking, do I really actually need all these columns? Because if you don't, then you have a better debug log.

Starting point is 00:30:28 So I kind of think you have to punish developers as early as possible for their performance problems. But I mean, Juke doesn't judge you. Juke doesn't judge you. It just gives you some tools. Just giving you some convenience looking at logs. Okay. Yeah, that's interesting.

Starting point is 00:30:48 So reduce number of columns. Good. Changing tact a little bit, I think you've got a really interesting perspective on Postgres from a broader landscape of SQL dialects and databases in general.

Starting point is 00:31:02 What kind of things do you see that are relatively unique to Postgres or do you see people in the Postgres world particularly using or liking? I mean, the one thing that is very unique to Postgres, two things from my perspective. The first one is it's very developer-centric. So you can see that with other database products like Oracle, DB2 or SQL Server, they're very production-centric, very operation-centric. They used to base their entire sales on these kind of things.

Starting point is 00:31:29 So this means you have a lot of production tools as well, which are great, actually. So I'm still waiting for the like of Oracle Enterprise Manager, if you know it, for Postgres, where you can analyze production workloads and query what kind of workload you had five months ago, and it's still there. I mean, you have to pay for these kind months ago, and it's still there. I mean, you have to pay for these kind of extras,

Starting point is 00:31:48 but it's still there. You can still analyze anomalies in terms of performance, what happened five months ago. So these kind of tools are a bit lacking, but on the other hand, because it's so developer-focused, you have this whole extension system, which is unprecedented, and I haven't seen it since either. So anyone can extend anything in any layer of the database product.

Starting point is 00:32:11 So you have these wonderful extensions like PostGIS, for instance, which is not really an extension because it's also part of the SQL standard, but it has been implemented as an extension just, I guess, to show what is possible. And such a huge additional product can be made as an extension, just, I guess, to show what is possible. And such a huge additional product can be made as an extension. And you have indexes and data types and all kinds of things that other vendors have to either offer out of the box or they don't have it. You can't extend Oracle, really. So that's really something very, very amazing.

Starting point is 00:32:43 But as I said, from the other perspective, from an operations perspective, this is something that personally, I think Postgres is still very much behind Oracle, for instance. So I'm an Oracle guy. I can't really comment on SQL Server or others. But the Oracle database where I worked at

Starting point is 00:32:58 before I made Juke, it could easily handle very, very complex queries that produced 500 lines of execution plan on billions of rows, and it would run in milliseconds. And I don't think you can do that with Postgres. I wouldn't risk it. So we all remember the days

Starting point is 00:33:15 when there was a hard limit of number of joins in Postgres, from which, starting from where, you didn't have any smart optimizer anymore to reorder the joins in the right order. I forgot what the limit was, but I think you had to be very careful if you have more than 10 joins or something like that,

Starting point is 00:33:33 because then you would just join from left to right syntactically, and this is horrible. Join collapse limit should be eight. Yeah, that's the join collapse limit. So once you know these kind of things, obviously make the right choices, and you probably won't run into big issues, but it's just, it's a very unique focus that I often missed in the past

Starting point is 00:33:52 from Oracle, for instance, or completely ignored developer experience for decades. It was like, you don't even have a Boolean type, right? So this is hard for me to understand. It's such a simple thing, you think, it's such a simple thing, you'd think. It's such a simple thing, so very, very useful, but Oracle wouldn't have implemented it until recently. So this is a focus that Postgres has, and few of the others had. Even MySQL wasn't very developer friendly from that perspective. So this is very unique. If developers were the only ones to choose the database product it's always postgres because i mean you're just gonna implement everything with postgres you can write store procedures you can have your own indexes you can extend it yourself probably you don't

Starting point is 00:34:36 have to because someone already wrote an extension for whatever you need oh the second thing is it's it's amazing how standards compliant it is so a lot of times Postgres really waits until something is standardized before they implement it, instead of innovating. Right now, we are waiting on UID version 7 because of standards. It didn't get into version 17 of Postgres because RFC was not finalized. Every library already started to support it, but Postgres decided to wait on RFC, which is a very conservative decision, right? Most of the times, yes. So I think that's a good thing in the end, because the standard, in my opinion, in most parts is well written.

Starting point is 00:35:17 It's a very complicated document, of course, but I think in terms of syntax, everything feels like SQL. So some database products think they have to invent some syntax, and it just doesn't look like SQL anymore. It's just a very weird, funky thing that doesn't fit the rest of the language. And the standard is very, very consistent, I think, in most parts. And to wait for the standard to appear is a good thing. So sometimes there's an exception to the rule, as I said, on conflict was one of these exceptions.

Starting point is 00:35:50 JSON as well, in case of which I think it was something everyone needed and the standard was severely lacking. And Postgres just did their own thing. So I kind of think a lot of people kind of regret the heterogeneity of the JSON API. So you have to always consult the docs to see if something uses a JSON path, which is a standard thing in the JSON world, or if it's just an array of path elements, or if this function returns a table or a set of something or whatever data structure. So I think now with the JSON standard, this is going to be much cleaner once it's completely rolled out. So I'm not up to date where the Postgres stands.

Starting point is 00:36:34 I'm still not supporting that syntax. But from what I've seen, it's going into the right direction. Yeah, some parts support it. Some parts, I think, are still bending. Yeah. Yeah, because I think a lot went in in 17. Right, yeah.

Starting point is 00:36:48 And speaking of this, I'm going to be really curious. I saw a commit recently regarding temporal primary keys. Is that correct? And it is absolutely not the standard. So this is very curious. I'm very curious about this. Are you aware of this commit? I'm not.

Starting point is 00:37:04 It's still on a branch, I think. It's not going to be merged. But, I mean, Postgres has these range types. Right. And they're useful, obviously. But the SQL standard since 2011 defined how you should write temporal databases, both bitemporal, so you have system versioning

Starting point is 00:37:21 and then logical versioning. And this is something that is still lacking in Postgres. And I think a lot of people would like to see that. Yeah. In Postgres there's opinion, like there's an opinion that it's in the past, right? Because it wasn't in the past, it was removed. Right. So that's a nice temporal joke.

Starting point is 00:37:39 An extension, I think? It was in Postgres originalgres, the original Berkeley version. Oh, really? This is how MVCC inherited from these versions of tuples. So your time travel basically was supported very, very long ago, but then it was removed.

Starting point is 00:37:57 So, like, interesting idea. I think for performance reasons? Yeah, so I think so. But yeah, this is a super interesting like specific topic sometimes we do need to implement something with triggers and so on like which is very yeah yeah it's very hard to do manually i think the standard helps a lot with with uh i mean you can write an update statement and it's automatically transformed into a delete and insert so if you have to split the record logically this is really a pain to do manually, right?

Starting point is 00:38:26 It's not super pain because you can have, for example, Postgres, you know, I remember your very old post, everything is a table. Or maybe it was a post of like 10 tips or something and one of the tips was like, consider everything as a table, right? And in Postgres, you can collapse all columns

Starting point is 00:38:42 into a single record type and insert. Like, for example, in JSON form right now now i would insert it and i would have a shadow table i implemented this several times you have a shadow table which serves as like archive for all tables which has has trigger to track changes right and then you you just track timest action, like it was delete or update, who did it, and then whole row you pack it in a single column, breaking first normal form. Okay, but then you normalize in a very bad way. And secondly, this sounds like a lot of work. It's not a lot of work.

Starting point is 00:39:17 It's like three, five lines of code, that's it. But it's not possible in JOOCfree version because triggers are not supported, right? Triggers are supported only in enterprise version. I mean, if you want to write the trigger with Jukes, yes, then you have to use the commercial editions, but hardly anyone writes triggers with

Starting point is 00:39:37 Juke. I mean, this is only useful when you have to have when you support five database products, and you want to write the trigger only once, you write it with Juke. But I mean, Juke doesn't care if there's a trigger on the database table, right? So if there's one, it's transparent to Juke.

Starting point is 00:39:55 Yeah, okay. You can't really say Juke doesn't support them. It's just you can't create them with Juke. Great answer. Free version, you mean? Yeah. I'm very curious. I'm not against this. It's good. I mean, it's great

Starting point is 00:40:10 that you build both open source product on Apache 2 license and business. I was just curious, have you ever had thoughts about going full closed source or fully open source?

Starting point is 00:40:26 Well, it was fully open source in the beginning. So when I started this, I had no idea this could be a business. I mean, I just had to try it and then put it out there for people to see and give me feedback. Is this a good idea? Should I do it like that? And there was a very active early community of people who worked with the all open source juke.

Starting point is 00:40:49 And then I started seeing even more and more banks and insurance companies using it, but never contributing, as you know the story. And I thought it was my fault. Oracle users. Let's generalize. Oracle users. Well, that was the choice I made, but it doesn't matter, you know? It doesn't matter what they're using in the end. I just never got any...

Starting point is 00:41:04 I mean, I got contributions, but I didn't care about code. I cared about money much more than code. So to make a sustainable business, in the end, I thought there are models where you pay for support. But in my opinion, if you want to make money with support, then you have to earn your money by working for it. So I earn money when I work. and if i have a license model then i earn money when customers work right so i thought this was going to scale much better i like that a lot i also think if you charge for support you're

Starting point is 00:41:37 incentivized to hurt the experience and i think i wasn't going to say that but that's how i think but i think i think it's not surprising that as somebody that focuses so much on developer experience and wants to keep pushing that forward and making that good, it's unsurprising that you don't want to be incentivized to make it bad. That makes perfect sense to me. Yeah, I mean, that's my thought as well. I don't want to name companies or names, but I've heard some open source products are very, very unusable and you really need the support and not all of them, of course. I mean, Postgres and Linux are examples

Starting point is 00:42:15 which this isn't the case. But there are some products that are super complicated and they're just pushed out there and you're free to use it, but you won't figure it out. So you have to call someone even to install it. And that's what I'm thinking. So in those cases, you need a sales representative that pushes the product down someone's throat and they make a lot of money with the support contract. And I think it's less honest than

Starting point is 00:42:41 licensing, which people think about, okay, is this a good product? Should I really spend money on it? Because I have to pay it up front, right? From the day one, I have to pay money to use this product. And they're going to test it really, really, really well. And there's a lot of competition. So, I mean, people could use Hibernate or whatever. But people choose Duke despite the cost, which in the end is very cheap for a company.

Starting point is 00:43:04 So, I think cost is never the issue. But they're still going to evaluate it because they have to still go through purchasing and make it a business case for the purchase. So I really want to have a very, very good experience. And one part of this is I also always answered all the questions on Stack Overflow within a short period of time, because if I answer it, it's going to be authoritative and I'm going to answer with all the details and all the caveats and all the edge cases. So anyone else who finds the question will find this question and the answer and they're not going to ask me again, right? So Google is my friend and now ChatGPD is my friend as well, because ChatGPT can answer Duke questions very well because of all the answers I've given on Stack Overflow. Yeah, open knowledge, right?

Starting point is 00:43:51 Yeah. I don't want to earn money with knowledge. I mean, I could sell books and all these kind of things, but it's just the wrong kind of effort, I think. Destruction, right? Yeah. From the core. I wonder, for enterprise, right? Yeah. From the core. I wonder,

Starting point is 00:44:05 for enterprise, like, paid versions, is source available or it's like kind of... Yeah, you get the source code. Oh, that's great. And the right to modify,

Starting point is 00:44:15 so you can, if you have a bug and... You can fix it yourself and propose a fix. No, or maybe it's not even a bug, but you think it's a bug, but it's actually not.

Starting point is 00:44:24 So, you disagree, you can fix it, and some companies do that. This is what we like in commercial software, like full transparency and see how it works inside. Yeah. I mean, most people don't actually fix stuff, but maybe they want to debug stuff to better understand it. And if you've ever worked with a commercial JDBC driver like I do, with the Oracle driver, it's always like, oh my God, there's a null pointer exception in the middle of the driver, and I have no idea what I'm doing wrong from an API perspective. So I have to reverse engineer it, not with the code, but just try an error with API calls until I find out why the exception,

Starting point is 00:45:01 why the bug inside of the driver happens. And with source code, it would be much easier, right? So bugs happen, but if you know exactly why the bug happens, then you have a much easier life than if you have to guess. Yeah. Last question from me. What do you think about nulls in general? Nulls? Nulls, yeah. Or they're less than the curves. I mean mean I understand why they have been introduced because it was easy to do right and I mean there are two takes here so nulls in most languages

Starting point is 00:45:33 are mostly a pain because they lead to exceptions and null in SQL is mostly a pain because it doesn't lead to exceptions it's just a value like many others but it's a special value and it just propagates. It's these kind of philosophies and both are right and wrong.

Starting point is 00:45:51 But what else would you want to do, right? I mean, you kind of need an absent value, right? I'm not sure if the idea of the unknown value is so useful, but an absent value is very useful. I mean, sometimes you just, you can't normalize everything to sixth normal form to avoid nulls, right? So it's very pragmatic. It's very pragmatic indeed.

Starting point is 00:46:14 Yeah. But in SQL, we have so many places where people make mistakes all the time. Yeah. SQL is weird about it. I mean, this unknown value, I used to do a SQL training and it was about 30 minutes about it was just nulls and how it's different here from there. For instance, two nulls are not distinct, right? Yes.

Starting point is 00:46:35 But they're not the same either. So there are different terms in the SQL standard, same and distinctness. And then even operators are inconsistent. So personally, I think Oracle did the pragmatic thing when they concatenate, when you concatenate a null value to a string, you probably don't want the result to be null, I think. You would just want to ignore the concatenation.

Starting point is 00:46:56 This is pragmatic, but it's very unlogical and leads to a ton of side effects, which probably means that it was a bad idea to start with. And there are NALs in other languages, I mean, in regular application languages. And I remember how I struggled trying to explain this three-value logic and observing what people do in code, in regular application code with NALs. They do things I would never do in SQL, right? So different philosophies, right? Yeah, there are different philosophies. But I'm not sure if the SQL philosophy is really useful. I mean, I'm trying to explain how it works by explaining not in, the not in predicate.

Starting point is 00:47:36 Oh, yeah. When you have a null and not in, everything falls apart. I mean, it's very logical and consistent. You can explain it with three-valid logic, but it's never useful, right? This kind of invalidates the whole predicate. Yeah, it's a bomb. It's a bomb. Yeah.

Starting point is 00:47:52 And, I mean, usually people don't put a null value in a not-in list, but what about a subquery? So not-in select something, and then you have a null value in there, and you don't get results. Why? It's horrible. Horrible.

Starting point is 00:48:04 But it's a fun anecdote. We had a whole episode about nulls with Michael long ago. Yeah. there and you don't get results. Why? It's horrible. Horrible. We had a whole episode about NOS with Michael long ago. This topic is definitely it's not going to go away and I'm quite sure I will make more mistakes with NOS in my life. But maybe the one thing I really think SQL should have done

Starting point is 00:48:22 differently, and there are some database products that actually do it differently, is the default data type should be not null, right? So when you create a table, when you don't write anything at all, then it should be not null. The fact that the default is nullable is wrong. I like that. This forces you to define default. Or, well, no, no, no.

Starting point is 00:48:41 The default word is confusing. You mean if I create id integer 8, I shouldn't have to say not null? Yes. You should have to say, I want this to be nullable, because then it's a conscious decision, and

Starting point is 00:48:58 you know what you're doing. But the fact that maybe by accident you have something nullable, that's just wrong. And this isn't standard, right? Or if you remember. I don't remember. I think maybe the standard says it's implementation specific.

Starting point is 00:49:13 We definitely can't change it now though. No, you can't. But I remember Sybase is one of the database parts that does it differently and ClickHouse as well. Interesting. Another one I forgot.

Starting point is 00:49:29 So in ClickHouse, every column is by default not null? Yes. That's cool. Okay. Although they messed it up as well, so you can still insert null in a not null column. You just get a zero. What? It makes it worse.

Starting point is 00:49:43 Okay. That's terrible. Amazing. I think this is a performance idea. You actually, if you just ignore the check,

Starting point is 00:49:52 you kind of don't check for nulls, then you're faster, but whatever. Lukas, I'm super conscious of your time. Is there any last

Starting point is 00:50:01 things you wanted to say? I hope there are many, many more exciting Postgres releases in the near future. I'm looking forward to all the good stuff that's coming, especially if it's ever coming, temporal tables from SQL 2011. That would be very exciting. Nice. Okay, cool.

Starting point is 00:50:19 Thank you for coming. I enjoyed it. Thank you for having me. Yeah, very much so. And catch you next week, Nikolai. For having me. Thank you. Yeah, very much so. And catch you next week, Nikolai. Bye-bye. Have a great week. Bye. Bye.

CODACE Plant Stand

Postgres FM - jOOQ

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

CODACE Plant Stand

Postgres FM - jOOQ

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.