The Good Tech Companies - Why PostgreSQL Is the Bedrock for the Future of Data

Episode Date: April 26, 2024

This story was originally published on HackerNoon at: https://hackernoon.com/why-postgresql-is-the-bedrock-for-the-future-of-data. Explore the rise of PostgreSQL as the ...de facto database standard, its impact on software development, and the key trends driving its widespread adoption. Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #data-management, #postgresql, #timescale, #future-of-data-science, #software-development, #database-complexity, #timescaledb, #good-company, and more. This story was written by: @timescale. Learn more about this writer by checking @timescale's about page, and for more stories, please visit hackernoon.com. PostgreSQL's ascendancy as the go-to database standard is rooted in its adaptability, reliability, and extensive ecosystem. This article delves into the reasons behind its dominance, from tackling database complexity to empowering developers to build the future with confidence. Discover how PostgreSQL is revolutionizing software development and data management practices.

Transcript
Discussion (0)
Starting point is 00:00:00 This audio is presented by Hacker Noon, where anyone can learn anything about any technology. Why PostgreSQL is the bedrock for the future of data, by timescale. One of the biggest trends in software development today is the emergence of PostgreSQL as the de facto database standard. There have been a few blog posts in how to use PostgreSQL for everything, but none yet on why this is happening and more importantly why this matters until now table of contents 01 postresql is becoming the de facto database standard 02 everything is becoming a computer 03 the return of postresql 04 free yourself build the future embrace postresql 05 timescale started as postresql for time series 06 timescale expanded beyond time series 07 timescale
Starting point is 00:00:46 is now postgresql made powerful 08 coda yoda postgresql is becoming the de facto database standard greater than over the past several months postgresql for everything has become a growing greater than war cry among developers greater than greater than postgresql isn't just a simple relational database it's a data management greater than framework cry among developers. Greater than greater than, PostgresQL isn't just a simple relational database, it's a data management greater than framework with the potential to engulf the entire database realm. The trend of greater than, using Postgres for everything, is no longer limited to a few elite teams but greater than is becoming a mainstream best practice. Source. Greater than, one way to simplify your stack and reduce the moving parts. Speed up greater than development. Lower the risk and deliver more features in your startup is. Use
Starting point is 00:01:31 Postgres for everything. Postgres can replace up to millions of users. Many greater than backend technologies. Kafka. RabbitMQ. Mongo and Redis among them. Source, source, https colon slash slash utu. B, vewxmdjzipq. C equals 0U01streetUWMC7PK8Y and embeddable equals true, greater than. When I first heard about Postgres, at a time when my SQL absolutely greater than dominated, it was described to me as that database made by those math greater than nerds, and, it was described to me as, that database made by those math greater than nerds, and then it occurred to me. Yeah, those are exactly the people you greater than want making your database. Greater than greater than, source, greater than, it has made a remarkable comeback. Now that NoSQL is dead and Oracle owns greater than MySQL, what else is
Starting point is 00:02:22 there? Greater than greater than, source, greater than, Postgres is not just a relational DB. It's a way of life. Greater than greater than, source, thanks to its rock-solid foundation, plus its versatility through native features and extensions, developers can now use PostgresQL for everything, replacing complex, brittle data architectures with straightforward simplicity. This might help explain why PostgreSQL last year took the top spot from MySQL in the rankings for most popular database among professional developers, 60,369 respondents. Which database environments have you done extensive development work in over the past year? And which do you want to work in over the next year? More than 49% of respondents answered
Starting point is 00:03:05 PostgreSQL. Source. Those results are from the 2023 Stack Overflow Developer Survey. If you look across time, you can see the steady increase in PostgreSQL adoption over the past few years. While PostgreSQL was the second favorite database of Stack Overflow's Developer Survey respondents between 2020 to 2022, its usage has consistently increased. Source. 2020, 2021, 2022 This is not just a trend among small startups and hobbyists. In fact, PostgreSQL usage is increasing across organizations of all sizes. The percentage of PostgreSQL usage by company size. Source, at timescale, this trend is not new to us. We have been PostgreSQL believers for nearly a decade. That's why we built our
Starting point is 00:03:51 business on PostgreSQL, why we are one of the top contributors to PostgreSQL, why we run the annual state of PostgreSQL survey, referenced above, and why we support PostgreSQL meetups and conferences. Personally, I have been using PostgreSQL for over 13 years, when I switched over from MySQL. There have been a few blog posts on how to use PostgreSQL for everything, but none yet on why this is happening, and, more importantly, why this matters. Until now. But to understand why this is happening, we have to understand an even more foundational trend and how that trend is changing the fundamental nature of human reality. Everything is becoming a computer. Everything, our cars, our homes, our cities, our farms,
Starting point is 00:04:34 our factories, our currencies, our things, is becoming a computer. We, too, are becoming digital. Every year, we digitize more of our own identity and actions. How we buy things, how we entertain ourselves, how we collect art, how we find answers to our questions, how we communicate and connect, how we express who we are. 22 years ago, this idea of ubiquitous computing seemed audacious. Back then, I was a graduate student at the MIT AI Lab, working on my thesis on intelligent environments. My research was supported by MIT Project Oxygen, which had a noble, bold goal to make computing as pervasive as the air we breathe. To put that time in perspective,
Starting point is 00:05:16 we had our server rack in a closet. A lot has changed since then. Computing is now ubiquitous on our desks, in our pockets, in our things, and in our cloud. That much we predicted. But the second-order effects of those changes were not what most of us expected ubiquitous computing has led to ubiquitous data. With each new computing device, we collect more information about our reality, human data, machine data, business data, environmental data, and synthetic data. This data is flooding our world. Backslash dot. The data flood has led to a Cambrian explosion of databases. All these new sources of data have required new places to store them. 20 years ago, there were maybe five viable database options. Today there are several
Starting point is 00:06:01 hundred. Most of them specialized for specific use cases or data, with new ones emerging each month. Backslash dot. More data and more databases has led to more software complexity. Choosing the right database for your software workload is no longer easy. Instead, developers are forced to cobble together complex architectures that might include a relational database for its reliability, a non-relational database for its scalability, a data warehouse for its ability to serve analysis, an object store for its ability to cheaply archive old data. This architecture might even have more specialized components like a time series or vector database. Backslash dot. More complexity means less time to build. Complex architectures are
Starting point is 00:06:46 more brittle, require more complex application logic, offer less time for development, and slow down development. Complexity is not a benefit but a real cost. As computing has become more ubiquitous, our reality has become more entwined with computing. We have brought computing into our world and ourselves into its world. We are no longer just our offline identities but a hybrid of what we do offline and online. Software developers are humanity's vanguard in this new reality. We are the ones building the software that shapes this new reality. But developers are now flooded with data and drowning in database complexity. This means that developers, instead of shaping the future, are spending more and more of their time managing the plumbing. How did we get here?
Starting point is 00:07:30 Part 1. Cascading Computing Waves Ubiquitous computing has led to ubiquitous data. This did not happen overnight but in cascading waves over several decades mainframes, 1950s plus. Personal computers, 1970s plus. Internet, 1990s plus. Mobile, 2000s plus. Cloud computing, 2000s plus. Internet of things, 2010s plus. With each wave, computers have become smaller, more powerful, and more ubiquitous. Each wave also built on the previous one. Personal computers are resmaller mainframes. The internet is a network of connected computers. Smartphones are even smaller computers connected to the internet. Cloud computing democratized access to computing resources. The internet of things is smartphone components reconstructed as part of other physical things connected to the
Starting point is 00:08:20 cloud. But in the past two decades, computing advances have not just occurred in the physical world but also in the digital one, reflecting our hybrid reality social networks, 2000+, blockchains, 2010s+, generative AI, 2020s+. With each new wave of computing, we get new sources of information about our hybrid reality, human digital exhaust, machine data, business data, and synthetic data. Future waves will create even more data. All this data fuels new waves, the latest of which is generative AI, which in turn further shapes our reality. Computing waves are not siloed but cascade like dominoes. What started as a data trickle soon became a data flood, and then the data flood has led to the creation of
Starting point is 00:09:05 more and more databases. Part 2. Incremental database growth All these new sources of data have required new places to store them, or databases. Mainframes started with the Integrated Data Store, 1964, and later System R, 1974, the first SQL database. Personal computers fostered the rise of the first commercial databases, Oracle, 1977, inspired by System R, DB2, 1983, and SQL Server, 1989, Microsoft's response to Oracle. The collaborative power of the internet enabled the rise of open-source software, including the first open-source databases, MySQL, 1995, PostgreSQL, 1996. Smartphones led to the proliferation of SQLite, initially created in 2000. The internet also created a massive amount of data, which led to the first non-relational, or NoSQL, databases. Hadoop. 2006. Cassandra. 2008. MongoDB. 2009. Some called this the era of, Big Data. Part 3. Explosive database growth around
Starting point is 00:10:16 2010, we started to hit a breaking point. Up until that point, software applications would primarily rely on a single database, e.g. Oracle, MySQL, PostgreSQL, and the choice was relatively easy. But, big data, kept getting bigger. The internet of things led to the rise of machine data. Smartphone usage started growing exponentially thanks to the iPhone and Android, leading to even more human digital exhaust, cloud computing democratized access to compute and storage, amplifying these trends. Generative AI very recently made this problem worse with the creation of vector data. As the volume of data collected grew, we saw the rise of specialized databases, Neo4j for graph data, 2007, Redis for a basic key value store, 2009, InfluxDB for time series data, 2013, ClickHouse for high-scale analytics, 2016, Pinecone for vector data, 2019, and many,
Starting point is 00:11:16 many more. 20 years ago, there were maybe 5 viable database options. Today, there are a several hundred, most of them specialized for specific use cases, with new ones emerging each month. While the earlier databases promise general versatility, these specialized ones offer specific trade-offs, which may or might not make sense depending on your use case. Part 4. More databases, more problems faced with this flood and with specialized databases with a variety of trade-offs, developers had no choice but to cobble together complex architectures. These architectures typically include a relational database, for reliability,
Starting point is 00:12:00 a non-relational database, for scalability, a data warehouse, for data analysis, an object store, for cheap archiving, and even more specialized components like Ikea time series or vector database for those use cases. But more complexity means less time to build. Complex architectures are more brittle, require more complex application logic, offer less time for development, and slow down development. This means that instead of building the future, software developers find themselves spending far too much time maintaining the plumbing. This is where where today. There is a better way. The return of PostgreSQL. This is where our story takes a twist. Our hero, instead of being a shiny new database, is an old stalwart, with a name only a mother core developer could love. PostgreSQL. At first, PostgreSQL was a distant number two behind MySQL. MySQL was easier to house, had a company behind it, and a name that anyone could easily pronounce.
Starting point is 00:12:51 But then MySQL was acquired by Sun Microsystems, 2008, which was then acquired by Oracle, 2009. And software developers, who saw MySQL as the free savior from the expensive Oracle dictatorship, started to reconsider what to use. At that same time, a distributed community of developers, sponsored by a handful of small independent companies, was slowly making PostgreSQL better and better. They quietly added powerful features, like Full Text Search, 2008, Window Functions, 2009, and JSON Support, 2012. They also made the database more rock-solid, through capabilities like Streaming Replication, Hot Standby, In-Place Upgrade, 2010, Logical Replication, 2017, and by diligently fixing bugs and smoothing rough edges.
Starting point is 00:13:40 PostgreSQL is now a platform one of the most impactful capabilities added to PostgreSQL during this time was the ability to support extensions. Software modules that add functionality to PostgreSQL, 2011. Extensions enabled even more developers to add functionality to PostgreSQL independently, quickly, and with minimal coordination. Thanks to extensions, PostgreSQL started to become more than just a great relational database. Thanks to extensions, PostgreSQL started to become more than just a great relational database. Thanks to PostGIS, it became a great geospatial database. Thanks to TimescaleDB, it became a great time series database. HStore, a key value store. Age, a graph database. PGVector, a vector database. PostgreSQL became a platform. Now, developers can use PostgreSQL for
Starting point is 00:14:27 its reliability, scalability, replacing non-relational databases, data analysis, replacing data warehouses, and more. What about big data? At this point, the smart reader should ask, what about big data? That's a fair question. Historically, big data, e.g. hundreds of terabytes or even petabytes, and the related analytics queries, has been a bad fit for a database like PostgreSQL that doesn't scale horizontally on its own. That, too, is changing. Last November, we launched tiered storage, which automatically tiers your data between disk and object storage s3 effectively creating the ability to have an infinite table so while big data has historically been an area of weakness for postgresql soon no workload will be too big postgresql is the answer postgresql is how we
Starting point is 00:15:18 free ourselves and build the future free yourself build the future embrace postgresql instead of futzing with several different database systems each with its own quirks and query languages, we can rely on the world's most versatile and, possibly, most reliable database, PostgreSQL. We can spend less time on the plumbing and more time on building the future. And PostgreSQL keeps getting better. The PostgreSQL community continues to make the core better. There are many more companies contributing to PostgreSQL today, including the hyperscalers. Today's PostgreSQL ecosystem, Source, there are also more innovative,
Starting point is 00:15:56 independent companies building around core to make the PostgreSQL experience better. Supabase, 2020, is making PostgreSQL and Toa a viable Firebase alternative for web and mobile developers. Neon, 2021, and Hada, 2022, are both making PostgreSQL scale to zero for intermittent serverless workloads. Tembo, 2022, is providing out-of-the-box stacks for various use cases. Nile, 2023, is making PostgreSQL easier for SaaS applications, and many more. And, of course, there's this, Timescale, 2017. Timescale started as, PostgreSQL for Time Series. The Timescale story will probably sound a little familiar. We were solving some hard sensor data problems for IoT customers, and we were drowning in data.
Starting point is 00:16:45 To keep up, we built a complex stack that included at least two different database systems, one of which was a time series database. One day, we reached our breaking point. In our UI, we wanted to filter devices by both device underscore type and uptime. This should have been a simple SQL join. But because we were using two different databases, it instead required writing glue code in our application between our two databases. It was going to take us weeks and an entire engineering sprint to make the change.
Starting point is 00:17:15 Then, one of our engineers had a crazy idea. Why don't we just build a time series database right in PostgreSQL? That way, we would just have one database for all our data and would be free to ship software faster then we built it and it made our lives so much easier then we told our friends about it and they wanted to try it and we realized that this was something that we needed to share with the world so we open sourced our Time Series extension, TimescaleDB, and announced it to the world on April 4, 2017. Back then, PostgreSQL-based startups were Quitterare. We were one of the first. In the seven years since, we've heavily invested in both the extension and in-ore PostgreSQL cloud service, offering a better and better PostgreSQL developer experience for Time
Starting point is 00:18:01 Series and analytics. 350x faster queries, 44% higher inserts via hyperdibles, auto partitioning tables, millisecond response times for common queries via continuous aggregates, real-time materialized views, 90% plus storage cost savings via native columnar compression, infinite, low-cost object storage via tiered storage, and more. Timescale expanded beyond time series. That's where we started, in time series data, and also what we are most known for. But last year we started to expand. Timescale Vector We launched Timescale Vector, PostgreSQL ++ for AI applications, which makes PostgreSQL an even better vector database. Timescale Vector scales to over 100 million vectors, building on PG Vector with even better performance.
Starting point is 00:18:51 Innovative companies and teams are already using Timescale Vector in production at a massive scale, including OpenSost, a GitHub Events Insights platform, at 100-plus million vectors, ViRally, a social virality prediction platform, at 100-plus million vectors, ViRally, a social virality prediction platform, at 100 plus million vectors, and MarketReader, a financial insights platform, at 30 plus million vectors. PopSQL Recently, we also acquired PopSQL to build and offer the best PostgreSQL UI. PopSQL is the SQL editor for team collaboration, with autocomplete, schema exploration, versioning, and visualization. Hundreds of thousands of developers and data analysts have used PopSQL to work with their
Starting point is 00:19:32 data, whether on PostgreSQL, Timescale, or other data sources like Redshift, Snowflake, BigQuery, MySQL, SQL Server, and more. PopSQL is the SQL editor for team Collaboration Insights. We also launched Insights, the largest dogfooding effort we've ever undertaken, which tracks every database query to help developers monitor and optimize database performance. Insights overcomes several limitations OFPG underscore STAT underscore statements, the official extension to see statistics from your database. The scale has been massive and is a testament to our products and teams capability. Over 1 trillion normalized queries, i.e. queries whose parameter values have been replaced by placeholders,
Starting point is 00:20:17 have been collected, stored, and analyzed, with over 10 billion new queries ingested every day. Timescale is now PostgreSQL made powerful. Today, Timescale is PostgreSQL made powerful at any scale. We now solve hard data problems that no one else does, not just in time series but in AI, energy, gaming, machine data, electric vehicles, space, finance, video, audio, Web3, and much more. We believe that developers should be using PostgreSQL for everything, and we are improving PostgreSQL so that they can. Customers use
Starting point is 00:20:52 Timescale not just for their time series data but also for their vector data and general relational data. They use Timescale so that they can use PostgreSQL for everything. You can too. Get started here for free. Coda. Yoda. Our human reality, both physical and virtual, offline and online, is filled with data. As Yoda might say, data surrounds us, binds us. This reality is increasingly governed by software, written by software developers, by us. It's worth appreciating how remarkable that is. Not that long ago, in 2002, when I was an MIT grad student, the world had lost faith in software. We were recovering from the dot-com bubble collapse. Leading business publications proclaimed that, IT doesn't matter. Back then, it was easier for a software developer to get a good job in finance than in tech.
Starting point is 00:21:42 Which is what many of my mid-class mates did, myself included. But today, especially now in this world of generative AI, we are the ones shaping the future. We are the future builders. We should be pinching ourselves. Everything is becoming a computer. This has largely been a good thing. Our cars are safer, our homes are more comfortable, and our factories and farms are more productive. We have instant access to more information than ever before. We are more connected with each other. At times, it has made us healthier and happier. But not always. Like the force, computing has both a light and dark side. There has been growing evidence that mobile phones and social media are directly contributing to a global epidemic of teen mental illness. We are still grappling with the implications of AI and synthetic biology.
Starting point is 00:22:30 As we embrace our greater power, we should recognize that it comes with responsibility. We have become the stewards of two valuable resources that affect how the future is built, our time and our energy. We can either choose to spend those resources on managing the plumbing or embrace PostgresQL for everything and build the right future. I think you know where we stand. Thanks for reading. Hashtag Postgres4Life Source This post was written by Ajay Kulkarni.
Starting point is 00:22:57 Thank you for listening to this Hackernoon story, read by Artificial Intelligence. Visit Hackernoon.com to read, write, learn and publish.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.