Postgres FM - SQL vs NoSQL

Starting point is 00:00:00 Hello and welcome to PostgresFM, a weekly show about all things Postgres Queryl. I am Michael, founder of PJ Mustard, and as usual, I'm joined by Nicolai, founder of Postgres AI. Hey, Nicolai. Hi, Michael. And we have a special guest today,

Starting point is 00:00:12 Franck Pachaud, who is a developer advocate now at MongoDB, formerly at YugoBioteDB, which is a distributed Postgres Queryl database, also an AWS Data Hero and Oracle Certified Master. So welcome, Frank. Hi. Thanks. Thanks for having me there.

Starting point is 00:00:30 And former PostgreSQL blogger. Yeah. No? Or yes? Oh, yeah, yeah. I will continue to blog about all databases. It's just that depends on the time I have. Sounds good.

Starting point is 00:00:43 So I saw you are going to give a talk at some Postgres conference in India, right? PgConf India? Yeah. So still planning to do it, right? Yeah, yeah, yeah. And also Germany, I just got the acceptance whenever I go. I'm very curious, like you during data and your work using JSONs and these weird queries, right, chains of something, and then at weekend or something you present SQL talks. How is it going to be played in your mind? I'm very curious. It's all about databases.

Starting point is 00:01:24 I mean, it's all the same. All the same? Yeah, you can do data modeling, document data modeling on Postgres. You can do it on Oracle. You can do it on MongoDB. You can normalize your data on SQL databases or no SQL databases. data on SQL databases, on no SQL databases, the concepts are all the same. Of course, there are little difference like how NURLS are under, for example, or how you join or you don't join, but yeah. NURLS, let's postpone. It's a special topic.

Starting point is 00:01:57 It's not for the start. Okay. I remember a series of blog posts from Michael Stonebreaker about criticizing document databases for lack of normalization and so on. So you are saying now that it's totally possible to apply normalization in document database. Is this what you're trying to say? Maybe I'm getting wrong? I've also changed my mind probably because for two reasons. First, the applications have changed. I think the normalized model was really

Starting point is 00:02:36 good for those monolithic databases where all use cases was the enterprise information system in one database, running all use cases. And then you need a normalized way to structure the data that is shared by the whole company and all kinds of users. Today, it's a bit different. You have multiple services, multiple microservices. They might have different databases. And then the concern of normalization may be different.

Starting point is 00:03:10 For example, if you consume data only to read it and not update it, you can denormalize a bit more. So that's one reason. And I think the main reason is also the applications have changed today in application programming languages you use documents in nested structure, objects, object graphs, looks like more like documents, so it's easier to move it to applications. But I don't get it because we had documents for forever. For example, code designed relational model originally

Starting point is 00:03:53 dealing with banking systems, right? 60s, 70s and it was not convenient to have nesting at that point. Before rational model we know there were, what's the name, like net and I forgot names but basically closer to... Hierarchical models and natural models? Yeah, yeah, exactly and the idea was it's really inconvenient when we keep a document as a whole. And we need to split it into pieces and basically divide and conquer, right?

Starting point is 00:04:32 We split into pieces and that's how we get flexibility and start working. And we had documents at that time as well, like invoices or transactions like between financial institutions and so on. So I don't see the big change, just amount of data and so on, right? And I don't fully understand why the idea of microservices or something, as I understand you are bringing like when we have many, many databases, many services, why is it changing this? Because in my head it's vice versa. If we have many, many databases, many services. Why is it changing this? Because in my head, it's vice versa. If we have many services, we do need to structure and split into more atomic pieces of our data.

Starting point is 00:05:17 And the article I mentioned, it's called Schema Later Considered Harmful. After my post, actually this is why I named my sub-transactions blog post also considered harmful. And some folks mentioned on Hacker News mentioned that there is an article considered harmful, considered harmful. Considered harmful titles considered harmful titles.

Starting point is 00:05:45 So it's basically not a good way to name articles. But the blog post is quite good. Schema design, normalization still makes sense. If you don't do it, you deal with bad consequences later. So please let me understand. It depends on your use case. And also something. I've been working on relational databases where you normalised, but basically when I

Starting point is 00:06:12 learnt databases at university, it was all about normalisation and then when you start to work, you hear people talking about denormalising everything. And of course course you just need to think about the access patterns. Let me just add this, sorry for interrupting, but let me just add, I totally agree, if we over normalize then we deal with very simple fact that you cannot create one index on two tables right and you want because example, filtering on one table, filtering on another table, you want a single index scan. Definitely, this is what we do also.

Starting point is 00:06:55 My team and I, we do during consulting practice, we say, okay, here we do need to denormalize. But my point is, if you take Mongo, other document databases, they just provoke you to avoid normalization at all. In relational data system, we can ... Okay, am I wrong? Yeah, for me, you are wrong. And I think that's also one reason MongoDB was interested to have a developer advocate coming from SQL databases, is that users tend to think that they have to denormalize everything and to put everything

Starting point is 00:07:37 in one document, which is wrong. The idea in MongoDB is to put together what you insert together or what you query together, but in different documents, if you query differently. Just to take an example, another entry system, you don't want to put together the customers and the orders, because you don't want one document per customer where you just add orders that can be a lot every year but the orders themselves the orders and the other items which we usually put in two tables in SQL databases just because they have different

Starting point is 00:08:22 cardinalities that's something you can put in a single document because you insert an order with all the items. You have nobody, you will just update one item of the order and you query them together. Of course, it depends on the system. If you're in a system that analyzes the order lines for marketing purpose by the product and you don't care about the customer or the other, then maybe the modeling is different and this is where different use

Starting point is 00:08:53 cases are. But it's not about putting everything in one document. And that's also why it's good to do some design reviews because it's easy for a developer to start and put everything in one document, just moving what they have in Java to the database, but still needs design and still need to think about what you embed, like denormalize, or what you reference, like you would reference

Starting point is 00:09:22 with foreign keys in a SQL database. Okay, I hear you, I think I understand you, but still, you say users have this tendency to think, and for example user Michael Stonebreaker says that he noticed that maybe it's possible to normalize, of course, but in relational databases there is a big tendency to normalize first and then denormalize when needed. In document store databases, we have opposite tendency to avoid normalization first and then normalize when we have pain. And the whole article called Schema Later considered harmful. I think as I understand this article, it's about that the relational approach, direction of movement is more beneficial in general case than opposite. What do you think? Remember that relational databases were made at a time where we were designing the data before looking at the use cases. The normalization and the data model doesn't care about the use cases. You just model the data. You have orders and multiple other items and other belongs to a customer, you do a static model of your data. And then you bring the application use cases

Starting point is 00:10:51 and you can optimize them with indexes, but you don't change the data model for the use cases. But this is not really how applications are developed today. Today, applications come with a main use case and rent fast access for this use case. And for another use case, they just check if they can do it on the same database or maybe do some event streaming,

Starting point is 00:11:19 put that in another database and doing elsewhere. That really has changed. Today, even application that run on SQL databases, I see people starting a data model, knowing the access patterns. And then maybe you can denormalize. For example, it's okay to denormalize something that is not updated.

Starting point is 00:11:44 The big danger to denormalize something that is not updated. The big danger to denormalize something that may be updated is that you have to update in multiple places, which is a risk of inconsistency if you forget one and which is also a performance issue, especially when you distribute, then you have distributed transactions at multiple places. then you have distributed transactions at multiple places. But data that you do not update, and there is a lot of data that we don't update, we just add a new version of it. For example, a customer is creating a new order, you will not update the order. If he adds a new item, that will be a new order, but the existing order has been validated. You don't update this data later.

Starting point is 00:12:29 Usually you have a timestamp, and even if you change something, then you just add the new version. So the applications have changed, and I'm not saying that one is better than the other, but when we listen to the developers, we see that they don't want to build this EOD diagram. That was never true. So that's also something. Nobody does that anymore, building your diagrams. Or only our AI assistant does, but it's just a side function for it.

Starting point is 00:13:02 But I don't understand why we cannot do it on relational databases and still have all the good stuff because we have JSON, let's just put it there and so on. And it's very good to mix both. I've seen a lot of applications on Oracle, on Postgres, on Yuga by DB where it's a mix where you have tables with colons because they are updated because you went indexes on it and you have a bunch of metadata information that you put in a JSON and that's also perfectly valid. So what does Mongo bring here, if we have it already?

Starting point is 00:13:45 I think the API is very different. Of course. Yeah. With MongoDB, you can really... You have your object graph in the application, in JavaScript, it's even easier, but in Java, in whatever, in Python, and you just communicate with the database, database those documents and they are stored as documents. The big problem with SQL databases is also something that has changed when applications have changed. At the time where everything was done in the database, in stored procedures or pre-compiled procedures or whatever, then that was okay.

Starting point is 00:14:26 But with object-oriented programming, you had this mismatch and you need an object-relational mapping to map from one to the other if you don't want to do a bunch of queries in text-treeing,ing, send through JDBC. So what MongoDB brings at that point is an API that really fits with the programming language. And then it stores it as documents

Starting point is 00:14:58 rather than mapping that to relational tables. Yes, this is what the HDBB and HQL are trying to solve. They try to reinvent SQL to have this what you describe. Yeah but you mentioned the object OOP and ER. I thought I think this is in the past already both. No I'm joking, I'm joking. So for me it's like... What do you use today if you... I mean, applications are built with objects? Well, I personally am a big fan of things what guys like Hasura, SuperBase, others do with thin layer providing API straight away right away without the need to write this middleware right and it's great like this

Starting point is 00:15:52 this source better than object relational mapping but people do object relational mapping but but at the same time I doubt a lot of guys who create projects or they do or actual OOP with patterns and so on. It's kind of like somehow not cool anymore. It's my perception. I'm far from actual application programming lately. But it's also this old debate where do you put your business logic? Of course. Ideally in SQL database you put it in the database because data is processed there but then you are constrained to specific languages. Well, right, but if you put it to application

Starting point is 00:16:34 you also have dependency on this language you chose. To me the question about where to put logic To me, the question about where to put logic became much easier to understand since like 10 years ago when Angular and React, they gained popularity and a lot of logic. And actually WebTOP went zero how many years ago already, like 20 years ago, right? All this shifted a lot of client-oriented logic to clients, to front-end, right? And this gave space to have logic closer to data, like constraints and what we usually do with triggers, some dependencies, propagation of changes or something. It gave opportunity to keep it in database, where it should be, because otherwise, if you don't do it closer to database

Starting point is 00:17:26 At some point when company grows project grows you add some other tools or application layers or something code and You need to Reimplement the same logic in different places and there is no strong guarantee that different places and there is no strong guarantee that it will be well maintained. Yeah, but the problem is just I totally agree and they are very successful database centric applications but what developers when they went to use Java not PLPG SQL and not PLSQL and just because try to hire a SQL developer or a PLPG SQL developer,

Starting point is 00:18:08 that will be more difficult than hiring a team of Java developers. Right, right. Michael wanted to ask something. I don't know if this is a change of topic, but I think it's on the same path, which is around developer experience. And I know it's a subjective term, but I do think when, at least when Mongo into the market, but I think NoSQL databases in general, they promised a few things. One was like a really good getting started experience, very quick, easy, you don't have to think much type, you know, no schema to worry about and just get started. And that's good for some things and not so good in other ways But it also promised a couple of other things and I think we can learn a lot from these things

Starting point is 00:18:48 In like in terms of why why was it popular? Like what why did mongo take off? Why was no score so popular for so long? It also promised Kind of infinite or at least horizontal scalability and that's something we've historically struggled with I know you worked on distributed sequel, but's something we've historically struggled with. I know you worked on Distributed SQL, but it's something we've historically struggled with in the SQL world. And then, yeah, I think that combination of things seemed really interesting to me. And I wondered if you had opinions on

Starting point is 00:19:18 what is it about that developer experience that really resonated with people? For me, that's really developer experience where MongoDB was really successful. The scalability, I don't really know because I didn't use MongoDB at that time and then I've seen scalability in SQL databases. The scalability come from the data model

Starting point is 00:19:43 where you can have an easy sharding key. Yeah. As soon as you have an easy sharding key, you can distribute that on mostly all databases today. On Postgres, you have multiple options like Citus, like Aurora, Limitless, where if you have a sharding key, you can distribute. So I don't think it's really the point today. The point is really developer experience, as you say, it's easy to start and it's easy to integrate to your programming language also, not having something else to learn, a different language,

Starting point is 00:20:20 but also a different brave, you're thinking about what you need to log, thinking about foreign keys, thinking about performance when you read from multiple tables. But it's also, the easy to start is also a problem. And basically, I'm working in the dev rel team where team where most of the job is helping users, developers to do some proper data modeling design because it's easy to start, which is good when you start a proof of concept, but at some point, like in any database, you need to do some design. And the more easy it is to start, the more difficult it is to realize that, okay, we are not in a proof of concept anymore.

Starting point is 00:21:09 We'll put that in production. It's an application that will evolve in the coming years. Then we need to look at the design. This is one of the major activity in the DevRel team. It's not like being developer advocate for YugaBytes was really about awareness because it's a new database So you just need to to to let people know it MongoDB people know it You just need to make them successful with maybe a bit more complex use cases and and do some data modeling So I have a question about how like your personal experience and this decision you made, obviously, recently.

Starting point is 00:21:50 It feels like you switched teams, like in soccer or football. So my question was any transfer cost? That's a very good question. So let me explain how it was. I was really happy at Jugabyte about the team, about the colleagues, about the product. I was really not looking for another job. And when other companies contacted me, I was like, oh, sorry, I'm happy where I am. And when MongoDB contacted me, it was more by curiosity, like, why are no SQL databases interested by my experience?

Starting point is 00:22:34 And this is why I started discussions by curiosity. And then this is where I realized that it was really an interesting approach that helping users on document databases with the knowledge of SQL databases, being able to discuss with those who use Postgres, who use MongoDB, who have a new use case, they want to know if they can do it on both, or one is better than the other. That was interesting. And I was like, okay, I should think about that. And then of course, there is an offer that was interesting enough to say, okay, why waiting,

Starting point is 00:23:13 just going there, but I could have the same, the same offer from from Yuga Bytes. So it's not really what makes the decision. It may, maybe it just push you to to say why not now rather than waiting six months or one year but no the other point was really learning something new I really like learning something new and all the content I create is me about learning yeah well. My first reaction was of course I became very upset and I started to think is it like sudden change of your views or maybe you slowly like became more unsatisfied with state of relational and SQL world and so on. So I asked our AI assistant and as you know, we have all your blog posts. So I asked to research among blog posts where you talked about NoSQL

Starting point is 00:24:13 and SQL and to my surprise it said you had such posts in the past and it's not a sudden change of views. So the verdict from AI was it's not a sudden change of views. So the verdict from AI was it's not a sudden change of views. But when I started asked to dig deeper, it was obvious that maybe the key reason was nulls in your past blog posts. The key criticism point was how null behavior. And I was going to raise this, and I did it during weekend and I was going to raise this, and I did it during weekend, and I was going to discuss this, but as I already tweeted, or exited, I don't know how to say, yesterday, what happened yesterday in the morning, my team made mistake, and I actually, I looked at that Merge Request myself, so it was not a safe operation leading to a nasty bug which led to multiple companies

Starting point is 00:25:06 receiving a few emails from us with wrong data. It was because of just comparison, not involving three-value logic. I was beaten by this so many times. I had a startup where I was stuck, my own startup, I was stuck seven months without growth. Although I knew there should be growth, but there is no growth. And then I almost gave up. And then I digged deeper into the code and found this bug. Again, not now safe comparison comparison we fixed it and in a few weeks we had 80,000 registrations per day I almost gave up on that startup

Starting point is 00:25:51 this was like all nothing kind of you know it's just this distinct from or distinct like or call ask you can fix it in multiple ways but if you overlook it's just a single line of problem which can cost you a lot of money and time. And maybe a whole startup can depend on it as in my story. So I'm definitely with you in the criticism of null and not in with null values, right? I'm not really criticizing it because I love the free value logic. I love NURLS because I think I understand it. I also think I understand.

Starting point is 00:26:34 I also love exactly. Yeah, yeah, yeah. But it took me 20 years to understand it. And then I can understand that a developer who already has a lot of things to learn do not want to spend time on something that looks like mathematics. It's good, it's kind of like I know kind of came from academia right and I learned quickly during my university time because I had very good professor, big specialist in databases and I quickly learned it but it took me 20 years to stop liking it because I see

Starting point is 00:27:11 reality says nobody like everyone steps on this rake all the time including myself so yeah you need to be pragmatic but also you can also solve all problems in SQL databases. Just don't use null. Just set all columns, not null. And that works. And you were talking about normalization. Just normalize a bit more. If you are tempted to put a null in a colon,

Starting point is 00:27:40 then it's probably because this colon belongs to another table. And then it will not be a null, it will be the absence this colon belongs to another table and then it will not be a null It will be the absence of a row in another table just Go forward fully full full normalization and Do not allow any null and and that will work. I mean It will work, but you will not have those errors. Maybe you will have some performance issues.

Starting point is 00:28:05 Exactly. Performance issues will be inevitable. I see null like denormalization. It's a shortcut that is easy. It's so easy just to say, okay, let's put a null because it doesn't have a value. If it doesn't have a value, it should not have a row in the table. Yeah. I also remember, imagine you have a CTO or some leader who understands nulls. Imagine all those poor application developers who write Java JavaScript doesn't matter. The PHP code, Ruby code,

Starting point is 00:28:47 and this CTO with this understanding of NALSE in SQL constantly putting pressure like you again you use it wrong in your code. And I was this person. And right now I'm like I think just NALSE is a good concept but the world says please no. It just doesn't work well. So that's why I say I don't like them. I will take another analogy. I think the best editor is VI.

Starting point is 00:29:17 I also agree. Because I had to learn it. We had to learn it. Inside T-MAX. It was hard to learn it. We had to learn it, but when you know it, you are very efficient with it. But I can understand that a junior today do not want to learn all those V.I. commands. Same for Null. I mean, if you learn it and if you spend all your life doing SQL, then yeah, it's good, but that's not the reality.

Starting point is 00:29:47 Yeah, so back to Mongo and let's talk a little bit about the alternative and if we go out of SQL world, but stay inside databases, what's happening to nulls and empty values, unknown values and so on, zeros, empty strings. Should it be considered all the same or no? In SQL, for me in SQL it's easy. A null is a value that exists but you just don't know the value. Your top manager has a salary but you don't know the value. Right. Your top manager has a salary, but you don't know it. So if you have to put all salaries in a database, then it will have a null and maybe you will put it one day just because you don't know it yet at the time where you insert. The problem is that null is

Starting point is 00:30:41 used for other things, for something that doesn't exist. You know, when in Excel we say NA doesn't apply. And if you use this as doesn't apply in JavaScript, you're just trying to store it and have the same logic when you query the database. So MongoDB does that. It's very similar to not exist. You have those documents where you can declare an attribute or not. And in most cases, if it's not there, it's similar

Starting point is 00:31:13 to null. And if you want to say explicitly it exists, but I don't know the value, then add something else like a Boolean that says that says okay we don't know it. Yeah by the way you mentioned you like it it's good concept but I'm thinking like so many caveats like for example if you take null value and do plus one it will be also null like unknown remains unknown because we don't know what we're using. If you don't know a value, then you can add one and you still don't know the value. If you say at the same time, if you use aggregate sum, it's not like that. It uses zero instead of now, right? Yeah, because you summed, it's defined as summing the known values.

Starting point is 00:32:03 You cannot explain this. It's defined as summing the known values. You cannot explain this. It's not logical. It's just this. Because sum is just one argument plus one different. Just a sequence of plus operations, right? But... Depends on how you define the aggregation. If it's even the sum of the known value if we have

Starting point is 00:32:27 three rows salary like one dollar two dollars and null dollars now right yeah if we just perform explicit summarization it will be result will be null but if we use some which should be the same result it will be null. But if we use sum, we should be the same result. It will be not the same. It will be three. Depends on how you define it, but SQL defines that as the sum of the values that you know. Yeah. I apologize. It gives you an idea. For example, and it makes sense.

Starting point is 00:32:58 I mean, if you have one million rows and you ask for a sum, you probably don't run an unknown just because one is not known. At least you know the sum and of the existing ones. But let me apologize and explain what's happening here. I just flipped the board and made you defend the SQL world, which is interesting because it shows that you have courage to become specialist in both worlds. This is interesting. For me, I changed the company and I helped different users, but I did not change what I think about databases. I mean, I've been working a lot with Awaken. I still think it's a very good database, but I can understand that people want to move out of it, and it's probably not because of the features. I like Postgres, but I also think that there is something else

Starting point is 00:33:51 to do in the storage and to distribute it. I like HuguabyteDB, but I also understand that some people may want to use something else. And same for MongoDB. That's interesting. I just want to help users when I can help them. And also something, especially on Twitter, but we see a lot of people comparing databases like MySQL is better than Postgres or Postgres is better than MySQL or whatever. And what

Starting point is 00:34:22 I always say is that the best database is the one that you know. If you know how to administrate better SQL server on Windows, then that's probably the best database for you. It's not for me. And if you are more successful with the null behavior in document databases, then probably you should use document databases. So my goal is just to have people be successful and use the right database depending on what they know. The worst that you can do is work with MongoDB and do the same design as you did on SQL database or the opposite, putting everything

Starting point is 00:35:08 in document in Postgres just because you have learned MongoDB first, that will probably not be good. You need to understand how it works, read an execution plan in both cases, understand how the indexes are used. I kind of agree for products where you're the only user like if I'm choosing between iOS and Android or we were talking before the call about macOS or Windows if I'm the only person affected I understand choosing what I know best but I feel like with databases we're often choosing for a team for an

Starting point is 00:35:42 organization for a company and it's not just what I know best, even if I'm the tech lead or even if I am the decision maker, I need to factor in what do my team know best, what can we hire most easily, what's easiest to operate, or how long will this project last? Is it a proof of concept project or is it our main system? There's a bunch of other factors I think are really important. And do you think, you brought up use cases at the beginning. I think that's like super important because we often do know the use cases.

Starting point is 00:36:12 We often do know the access patterns. So picking the one that is best for that makes more sense to me than like which one I know best personally. But I do take your point that if you take that as like an organization, which one do you operationally best best as an organization, like that it does still fit, but I do think there's some subtle difference there. What do you think? I think that there are a lot of use cases that can be successful on many databases. Yeah. Of course, there are some special cases

Starting point is 00:36:45 that are really put at the maximum throughput needed where you have really to define the right technology for it. But let's say you have time series. Time series coming from IOT and you have queries on them. Of course, you can use a time series database, but you can also do it on Postgres with a time series extension or not. And you can also do it on a document database. If you do it correctly, I think you have a lot of choices for many use cases. And finally, the enterprises that need a specific database because of the very high scale of it,

Starting point is 00:37:34 they finally build their own database. Or they trick the one database to use it really like their own database. But I think you really have the choice. Many use cases, you can do that on Postgres, you can do that on Yuga, but you can do that on Oracle, you can do that on MongoDB, you can do that on DynamoDB. But if you do it in a database where you don't know exactly how Null works or the isolation, the AC properties, the logs are working, then you can also be successful on any database for many use cases,

Starting point is 00:38:16 but you can also be very bad on any database if you don't care. So it's more about the people. I totally agree. Not your personal choice about the people. And I remember discussions when I was doing consulting. I remember discussing with a customer for something where it would have made sense to use stored procedure. And they were growing all microservices, Java, all that. And they just told me, yeah, but if we do it in SQL, PL SQL was on Oracle at that time, we are four in the team who can do that and maintain that.

Starting point is 00:38:58 And then if any problem is there, we are four to be on call. If we do it in Java, we have 200 developers in India, we have 200 developers in US. If there is a problem during the night, they will manage it. So the good choice, even if it's not the best for performance, for design, for whatever, the good choice is also something where you can sleep and have a team that can manage it. Well, right now AI can help you fix bugs, tests and so on. Oh, yeah.

Starting point is 00:39:30 It's easier, right? I have a couple of questions from friends and I think you know them, but I'm not going to reveal names. First question, is MongoDB adding SQL to the product? I don't think this is in the roadmap at all. And I don't think people are asking for that. Let's look at another OnoSQL database, DynamoDB. When DynamoDB added the SQL syntax on top of it using party QL,

Starting point is 00:40:02 it was never used. of it using party QL. It was never used. And the main reason was that users was afraid of it because with the API that with the document API, they know what happens. The big difference, I mentioned the API, but there is a big difference between NoSQL and SQL. In SQL, you have a declarative language where you don't know how the data is accessed until you read the execution plan, which is good because you have an application that is independent of the physical data model, but it's also more difficult because the developer has no idea how it works in production before looking at the execution plan. And when looking at the execution plan, the developer may have to work a long time to understand why the bad execution plan is chosen.

Starting point is 00:41:00 Is it because of statistics, not good index, whatever. It's complex. With the NoSQL APIs, you code the data access. It depends on the database. For example, in DynamoDB, if you want to use an index, you have to query the index. In MongoDB, you have this data independence where you query on the collection, and if the index can be used, it can be used. But you control the data that is accessed. For example, when you design your documents,

Starting point is 00:41:30 you design something that is joined when you insert it, not at runtime, where a query planner will decide if it starts with one table or another table. And it has some good and bad. I remember in consulting, spending a long time with developers, looking at the execution plan, and they know their data, and they know their access pattern, and they immediately tell me, of course, that's not the right execution plan.

Starting point is 00:42:01 We must start with this table and then look up into this one. Okay, perfect. I can use an int, pgin plan, for example, in Postgres to validate that it's a better execution plan. And then the developer is happy. Yeah, perfect. I want that. And then they're like, okay, but it's not finished. Now, we need to figure out how to get the right execution plan without the int. And with consulting, people were paying the day just to get the right execution plan that they know initially was the best one. With an OSQL API, you are closer to what happens physically, and then you have more control on that, and some developers prefer that.

Starting point is 00:42:52 Next question was what do you think Postgres can or should learn from MongoDB? Maybe this, right? Is it possible to... I have one more. I think they do major upgrades really well. Oh yes. Well. But we can learn that from a lot of databases. Yeah, previous question was because I had like maybe outdated knowledge that many NoSQL systems implemented some dialect of SQL for example Cassandra with SQL, right? Yeah, but they are not used to it. It's only syntax, it's not SQL, it's not a declarative language. It's just syntax.

Starting point is 00:43:29 And I don't see the point. If you have an API that is integrated with your programming language, why do you want to write a string in Java that you send to the database if you don't have to? In SQL, you have to do that because you have this data independence and very different language. But I don't really see the point. But I forgot what you mentioned. Yeah, I said I said the vice versa what Postgres could learn from. Michael answered upgrades. I concur with you, definitely. But that is related. In SQL databases, in relational

Starting point is 00:44:09 databases, to have this data independence, logical and physical data independence, where you query, in SQL databases, you query a logical model. We were talking about normalization. This is the logical model. Maybe physically everything is stored in one table. You don't really care from the relational SQL point of view. But then to map the logical model to the physical model, you need a catalog, a dictionary. And this is what is difficult during upgrades because you need to change the catalog and the catalog

Starting point is 00:44:46 is shared. You can short the data, you can distribute the data, but the catalog must be shared because they must use the same dictionary. And that's easier with no SQL database because you have much less to share about the metadata because the catalog is in the application. The schema, we were talking about schema-less or schema-on-read or on-write. The big difference is that most of the schema is in the application. And then if you upgrade the application,

Starting point is 00:45:18 you have a new version of the application, it knows the new schema. And the two versions can work together if you take care that when you read a document, you know how to read it. Great answer. Yeah, last question. What do you think about systems which are built on top of Postgres? Like FerrarDB and DocumentDB recently released by Microsoft.

Starting point is 00:45:42 Oh, yes, yes, that's a good point. So beyond the funny thing that DocumentDB is an AWS database, but the name belongs to Microsoft because before putting a MongoDB-like API on Cosmos DB, it was called DocumentDB. So Microsoft did that multiple times, put it in Cosmos DB to see if it will be more popular. So first it's kind of a mess. Different API, similar, you don't know the name where it comes from, but I really like what the FerretDB people are doing.

Starting point is 00:46:26 And for me, as a developer advocate, I really like that there is a MongoDB API on multiple databases. In Oracle, you can also have a MongoDB API. The more you make it popular, the more you help users to use another API without changing the database, that's perfect. From a marketing point of view, I don't think it's a big problem either because it's not only about the API. What I think that the big customers of MongoDB like with MongoDB

Starting point is 00:47:03 is that they have in front a company that is doing only one thing. The company is doing only MongoDB. It's not like a vehicle that has a database, but also another database and cloud and managed service and software. MongoDB is doing only MongoDB. So if they use MongoDB on MongoDB, they have hundreds of people doing support on it. I cannot agree here because I remember MongoDB company, or it's called MongoDB, sorry if I'm right. So I remember they also did some Postgres when they first released BI Connector. Remember this story? They used Postgres. be able to use Tableau and other systems for data analysis, BI and

Starting point is 00:47:54 so on. They needed to make some bridge to SQL world and they used Postgres for that. It was very interesting story. I have no idea. My point was more like you can do some MongoDB on Percona, you can do some MongoDB on Awake, you can do some MongoDB on Azure. That can work, but if you are a big customer and you want support, you probably want support from the original one. I hear you speaking as a member of this team, new member of this team, but I also have like

Starting point is 00:48:34 my master have a note that MongoDB is not pure open source. It is not pure open source. While Feridibi is Apache 2.0 which is a pure open source. Well, FaireDB is Apache 2.0, which is pure open source. So this is one of the most... Yeah, yeah, yeah. Of course, I'm a big fan of open source. I would prefer that it is open source, but I can also understand. You know, why they had to change the license? Because AWS was taking everything.

Starting point is 00:49:02 And finally, today, AWS is a major partner. So it was probably a good move. Probably today it could be open source. But I can understand given the history that they went to protect the managed service. Open source is eating commercial software, clouds are eating open source software. You remember this sequence of fish picture, right? Yeah, okay. I think no more questions from me. It was very super interesting and enjoy. Thank you for coming. Thank you very much.

Starting point is 00:49:37 I really like also what you do, how you can come with so many different topics on every week. I think you never missed a week for us. So yeah, that's really nice. Great. That's really kind of you, Frank. Thank you for joining. Thank you. Have a great week. Bye bye.

Your Ad Here

Postgres FM - SQL vs NoSQL

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.