The Good Tech Companies - Why PostgreSQL Is the Bedrock for the Future of Data
Episode Date: April 26, 2024This story was originally published on HackerNoon at: https://hackernoon.com/why-postgresql-is-the-bedrock-for-the-future-of-data. Explore the rise of PostgreSQL as the ...de facto database standard, its impact on software development, and the key trends driving its widespread adoption. Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #data-management, #postgresql, #timescale, #future-of-data-science, #software-development, #database-complexity, #timescaledb, #good-company, and more. This story was written by: @timescale. Learn more about this writer by checking @timescale's about page, and for more stories, please visit hackernoon.com. PostgreSQL's ascendancy as the go-to database standard is rooted in its adaptability, reliability, and extensive ecosystem. This article delves into the reasons behind its dominance, from tackling database complexity to empowering developers to build the future with confidence. Discover how PostgreSQL is revolutionizing software development and data management practices.
Transcript
Discussion (0)
This audio is presented by Hacker Noon, where anyone can learn anything about any technology.
Why PostgreSQL is the bedrock for the future of data, by timescale.
One of the biggest trends in software development today is the emergence of PostgreSQL as the de
facto database standard. There have been a few blog posts in how to use PostgreSQL for everything,
but none yet on why this is happening and more importantly why this matters
until now table of contents 01 postresql is becoming the de facto database standard 02
everything is becoming a computer 03 the return of postresql 04 free yourself build the future
embrace postresql 05 timescale started as postresql for time series 06 timescale expanded beyond time series 07 timescale
is now postgresql made powerful 08 coda yoda postgresql is becoming the de facto database
standard greater than over the past several months postgresql for everything has become a
growing greater than war cry among developers greater than greater than postgresql isn't just a simple relational database it's a data management greater than framework cry among developers. Greater than greater than, PostgresQL isn't just a simple
relational database, it's a data management greater than framework with the potential to
engulf the entire database realm. The trend of greater than, using Postgres for everything,
is no longer limited to a few elite teams but greater than is becoming a mainstream best
practice. Source. Greater than, one way to simplify your stack and reduce the moving parts. Speed up
greater than development. Lower the risk and deliver more features in your startup is. Use
Postgres for everything. Postgres can replace up to millions of users. Many greater than backend
technologies. Kafka. RabbitMQ. Mongo and Redis among them. Source, source, https colon slash slash utu. B, vewxmdjzipq.
C equals 0U01streetUWMC7PK8Y and embeddable equals true, greater than. When I first heard
about Postgres, at a time when my SQL absolutely greater than dominated, it was described to me as
that database made by those math greater than nerds, and, it was described to me as, that database made by
those math greater than nerds, and then it occurred to me. Yeah, those are exactly the people you
greater than want making your database. Greater than greater than, source, greater than, it has
made a remarkable comeback. Now that NoSQL is dead and Oracle owns greater than MySQL, what else is
there? Greater than greater than, source, greater than,
Postgres is not just a relational DB. It's a way of life. Greater than greater than, source,
thanks to its rock-solid foundation, plus its versatility through native features and extensions,
developers can now use PostgresQL for everything, replacing complex, brittle data architectures with
straightforward simplicity. This might help explain why PostgreSQL last year took the top spot from MySQL in the rankings for
most popular database among professional developers, 60,369 respondents. Which database
environments have you done extensive development work in over the past year? And which do you want
to work in over the next year? More than 49% of respondents answered
PostgreSQL. Source. Those results are from the 2023 Stack Overflow Developer Survey.
If you look across time, you can see the steady increase in PostgreSQL adoption over the past few
years. While PostgreSQL was the second favorite database of Stack Overflow's Developer Survey
respondents between 2020 to 2022, its usage has consistently
increased. Source. 2020, 2021, 2022 This is not just a trend among small startups and hobbyists.
In fact, PostgreSQL usage is increasing across organizations of all sizes.
The percentage of PostgreSQL usage by company size. Source, at timescale, this trend is not
new to us. We have been PostgreSQL believers for nearly a decade. That's why we built our
business on PostgreSQL, why we are one of the top contributors to PostgreSQL, why we run the
annual state of PostgreSQL survey, referenced above, and why we support PostgreSQL meetups
and conferences. Personally, I have been using PostgreSQL for over 13 years, when I switched over from MySQL.
There have been a few blog posts on how to use PostgreSQL for everything,
but none yet on why this is happening, and, more importantly, why this matters.
Until now. But to understand why this is happening, we have to understand an even
more foundational trend and how that trend is changing the fundamental nature of human reality.
Everything is becoming a computer. Everything, our cars, our homes, our cities, our farms,
our factories, our currencies, our things, is becoming a computer. We, too, are becoming
digital. Every year, we digitize more of our own identity and actions.
How we buy things, how we entertain ourselves, how we collect art, how we find answers to our
questions, how we communicate and connect, how we express who we are. 22 years ago, this idea of
ubiquitous computing seemed audacious. Back then, I was a graduate student at the MIT AI Lab,
working on my thesis on intelligent
environments. My research was supported by MIT Project Oxygen, which had a noble, bold goal
to make computing as pervasive as the air we breathe. To put that time in perspective,
we had our server rack in a closet. A lot has changed since then. Computing is now ubiquitous
on our desks, in our pockets, in our things,
and in our cloud. That much we predicted. But the second-order effects of those changes were
not what most of us expected ubiquitous computing has led to ubiquitous data.
With each new computing device, we collect more information about our reality, human data,
machine data, business data, environmental data, and synthetic data.
This data is flooding our world. Backslash dot. The data flood has led to a Cambrian explosion of databases. All these new sources of data have required new places to store them.
20 years ago, there were maybe five viable database options. Today there are several
hundred. Most of them specialized for specific use cases or data,
with new ones emerging each month. Backslash dot. More data and more databases has led to
more software complexity. Choosing the right database for your software workload is no longer
easy. Instead, developers are forced to cobble together complex architectures that might include
a relational database for its reliability, a non-relational
database for its scalability, a data warehouse for its ability to serve analysis, an object store
for its ability to cheaply archive old data. This architecture might even have more specialized
components like a time series or vector database. Backslash dot. More complexity means less time to build. Complex architectures are
more brittle, require more complex application logic, offer less time for development, and slow
down development. Complexity is not a benefit but a real cost. As computing has become more ubiquitous,
our reality has become more entwined with computing. We have brought computing into our
world and ourselves into its world.
We are no longer just our offline identities but a hybrid of what we do offline and online.
Software developers are humanity's vanguard in this new reality. We are the ones building the software that shapes this new reality. But developers are now flooded with data and
drowning in database complexity. This means that developers, instead of shaping
the future, are spending more and more of their time managing the plumbing. How did we get here?
Part 1. Cascading Computing Waves Ubiquitous computing has led to ubiquitous data.
This did not happen overnight but in cascading waves over several decades mainframes,
1950s plus. Personal computers, 1970s plus. Internet, 1990s plus. Mobile, 2000s plus.
Cloud computing, 2000s plus. Internet of things, 2010s plus. With each wave, computers have become
smaller, more powerful, and more ubiquitous. Each wave also built on the previous one.
Personal computers are resmaller mainframes.
The internet is a network of connected computers. Smartphones are even smaller computers connected to the internet. Cloud computing democratized access to computing resources. The internet of
things is smartphone components reconstructed as part of other physical things connected to the
cloud. But in the past two decades, computing advances have not just occurred in
the physical world but also in the digital one, reflecting our hybrid reality social networks,
2000+, blockchains, 2010s+, generative AI, 2020s+. With each new wave of computing,
we get new sources of information about our hybrid reality, human digital exhaust, machine data, business data, and synthetic data.
Future waves will create even more data. All this data fuels new waves, the latest of which
is generative AI, which in turn further shapes our reality. Computing waves are not siloed but
cascade like dominoes. What started as a data trickle soon became a data flood, and then the
data flood has led to the creation of
more and more databases. Part 2. Incremental database growth
All these new sources of data have required new places to store them, or databases.
Mainframes started with the Integrated Data Store, 1964, and later System R, 1974,
the first SQL database. Personal computers fostered the rise of the first commercial
databases, Oracle, 1977, inspired by System R, DB2, 1983, and SQL Server, 1989, Microsoft's
response to Oracle. The collaborative power of the internet enabled the rise of open-source software, including the first open-source databases, MySQL, 1995, PostgreSQL, 1996. Smartphones led to the proliferation of
SQLite, initially created in 2000. The internet also created a massive amount of data, which led to the first non-relational, or NoSQL, databases. Hadoop. 2006. Cassandra. 2008.
MongoDB. 2009. Some called this the era of, Big Data. Part 3. Explosive database growth around
2010, we started to hit a breaking point. Up until that point, software applications would primarily rely on a single database, e.g. Oracle,
MySQL, PostgreSQL, and the choice was relatively easy. But, big data, kept getting bigger. The
internet of things led to the rise of machine data. Smartphone usage started growing exponentially
thanks to the iPhone and Android, leading to even more human digital exhaust, cloud computing democratized access to
compute and storage, amplifying these trends. Generative AI very recently made this problem
worse with the creation of vector data. As the volume of data collected grew, we saw the rise
of specialized databases, Neo4j for graph data, 2007, Redis for a basic key value store, 2009, InfluxDB for time series data,
2013, ClickHouse for high-scale analytics, 2016, Pinecone for vector data, 2019, and many,
many more. 20 years ago, there were maybe 5 viable database options. Today, there are a several
hundred, most of them specialized for specific
use cases, with new ones emerging each month. While the earlier databases promise general
versatility, these specialized ones offer specific trade-offs, which may or might not make sense
depending on your use case. Part 4. More databases, more problems faced with this flood and with
specialized databases with a variety of trade-offs,
developers had no choice but to cobble together complex architectures.
These architectures typically include a relational database, for reliability,
a non-relational database, for scalability, a data warehouse, for data analysis, an object store,
for cheap archiving, and even more specialized components like Ikea time series or vector database for those use cases. But more complexity means less time to build.
Complex architectures are more brittle, require more complex application logic,
offer less time for development, and slow down development. This means that instead of building the future, software developers find themselves spending far too much time maintaining the plumbing. This is where where today. There is a better way. The return of PostgreSQL. This is
where our story takes a twist. Our hero, instead of being a shiny new database, is an old stalwart,
with a name only a mother core developer could love. PostgreSQL. At first, PostgreSQL was a
distant number two behind MySQL. MySQL was easier to house,
had a company behind it, and a name that anyone could easily pronounce.
But then MySQL was acquired by Sun Microsystems, 2008, which was then acquired by Oracle, 2009.
And software developers, who saw MySQL as the free savior from the expensive Oracle dictatorship,
started to reconsider what to use. At that same time, a distributed community of developers,
sponsored by a handful of small independent companies, was slowly making PostgreSQL better
and better. They quietly added powerful features, like Full Text Search, 2008,
Window Functions, 2009, and JSON Support, 2012. They also made the database more rock-solid,
through capabilities like Streaming Replication, Hot Standby, In-Place Upgrade, 2010,
Logical Replication, 2017, and by diligently fixing bugs and smoothing rough edges.
PostgreSQL is now a platform one of the most impactful capabilities added to PostgreSQL
during this time was the ability to support extensions. Software modules that add functionality
to PostgreSQL, 2011. Extensions enabled even more developers to add functionality to PostgreSQL
independently, quickly, and with minimal coordination. Thanks to extensions, PostgreSQL
started to become more than just a great relational database. Thanks to extensions, PostgreSQL started to become more than just a great
relational database. Thanks to PostGIS, it became a great geospatial database. Thanks to TimescaleDB,
it became a great time series database. HStore, a key value store. Age, a graph database.
PGVector, a vector database. PostgreSQL became a platform. Now, developers can use PostgreSQL for
its reliability, scalability, replacing non-relational databases, data analysis,
replacing data warehouses, and more. What about big data? At this point, the smart reader should
ask, what about big data? That's a fair question. Historically, big data, e.g. hundreds of terabytes or even
petabytes, and the related analytics queries, has been a bad fit for a database like PostgreSQL
that doesn't scale horizontally on its own. That, too, is changing. Last November, we launched
tiered storage, which automatically tiers your data between disk and object storage s3 effectively creating the ability
to have an infinite table so while big data has historically been an area of weakness for
postgresql soon no workload will be too big postgresql is the answer postgresql is how we
free ourselves and build the future free yourself build the future embrace postgresql instead of
futzing with several different database systems each with its own quirks and query languages,
we can rely on the world's most versatile and, possibly, most reliable database, PostgreSQL.
We can spend less time on the plumbing and more time on building the future.
And PostgreSQL keeps getting better.
The PostgreSQL community continues to make the core better.
There are many more companies contributing to PostgreSQL today, including the hyperscalers.
Today's PostgreSQL ecosystem, Source, there are also more innovative,
independent companies building around core to make the PostgreSQL experience better.
Supabase, 2020, is making PostgreSQL and Toa a viable Firebase alternative for web and
mobile developers. Neon, 2021, and Hada, 2022, are both making PostgreSQL scale to zero for
intermittent serverless workloads. Tembo, 2022, is providing out-of-the-box stacks for various
use cases. Nile, 2023, is making PostgreSQL easier for
SaaS applications, and many more. And, of course, there's this, Timescale, 2017.
Timescale started as, PostgreSQL for Time Series. The Timescale story will probably sound a little
familiar. We were solving some hard sensor data problems for IoT customers, and we were drowning in data.
To keep up, we built a complex stack that included at least two different database systems,
one of which was a time series database.
One day, we reached our breaking point.
In our UI, we wanted to filter devices by both device underscore type and uptime.
This should have been a simple SQL join.
But because we were using two different
databases, it instead required writing glue code in our application between our two databases.
It was going to take us weeks and an entire engineering sprint to make the change.
Then, one of our engineers had a crazy idea. Why don't we just build a time series database right
in PostgreSQL? That way, we would just have one database for all our data
and would be free to ship software faster then we built it and it made our lives so much easier
then we told our friends about it and they wanted to try it and we realized that this was something
that we needed to share with the world so we open sourced our Time Series extension, TimescaleDB, and announced it to the world
on April 4, 2017. Back then, PostgreSQL-based startups were Quitterare. We were one of the
first. In the seven years since, we've heavily invested in both the extension and in-ore
PostgreSQL cloud service, offering a better and better PostgreSQL developer experience for Time
Series and analytics. 350x faster queries, 44% higher inserts via
hyperdibles, auto partitioning tables, millisecond response times for common queries via continuous
aggregates, real-time materialized views, 90% plus storage cost savings via native columnar
compression, infinite, low-cost object storage via tiered storage, and more. Timescale expanded
beyond time series. That's where we started, in time series data, and also what we are most known
for. But last year we started to expand. Timescale Vector We launched Timescale Vector, PostgreSQL
++ for AI applications, which makes PostgreSQL an even better vector database. Timescale Vector scales to over 100 million vectors,
building on PG Vector with even better performance.
Innovative companies and teams are already using Timescale Vector in production at a massive scale,
including OpenSost, a GitHub Events Insights platform, at 100-plus million vectors,
ViRally, a social virality prediction platform, at 100-plus million vectors, ViRally, a social virality prediction platform,
at 100 plus million vectors, and MarketReader, a financial insights platform, at 30 plus million
vectors. PopSQL Recently, we also acquired PopSQL to build and offer the best PostgreSQL UI.
PopSQL is the SQL editor for team collaboration, with autocomplete, schema exploration, versioning,
and visualization.
Hundreds of thousands of developers and data analysts have used PopSQL to work with their
data, whether on PostgreSQL, Timescale, or other data sources like Redshift, Snowflake,
BigQuery, MySQL, SQL Server, and more.
PopSQL is the SQL editor for team Collaboration Insights. We also launched
Insights, the largest dogfooding effort we've ever undertaken, which tracks every database
query to help developers monitor and optimize database performance. Insights overcomes several
limitations OFPG underscore STAT underscore statements, the official extension to see
statistics from your database.
The scale has been massive and is a testament to our products and teams capability. Over 1 trillion normalized queries, i.e. queries whose parameter values have been replaced by placeholders,
have been collected, stored, and analyzed, with over 10 billion new queries ingested every day.
Timescale is now PostgreSQL made powerful.
Today, Timescale is PostgreSQL made powerful at any scale.
We now solve hard data problems that no one else does,
not just in time series but in AI, energy, gaming, machine data,
electric vehicles, space, finance, video, audio, Web3, and much more.
We believe that developers should be using
PostgreSQL for everything, and we are improving PostgreSQL so that they can. Customers use
Timescale not just for their time series data but also for their vector data and general relational
data. They use Timescale so that they can use PostgreSQL for everything. You can too. Get started here for free. Coda. Yoda. Our human
reality, both physical and virtual, offline and online, is filled with data. As Yoda might say,
data surrounds us, binds us. This reality is increasingly governed by software,
written by software developers, by us. It's worth appreciating how remarkable that is. Not that long ago, in 2002, when I was
an MIT grad student, the world had lost faith in software. We were recovering from the dot-com
bubble collapse. Leading business publications proclaimed that, IT doesn't matter. Back then,
it was easier for a software developer to get a good job in finance than in tech.
Which is what many of my mid-class mates did,
myself included. But today, especially now in this world of generative AI,
we are the ones shaping the future. We are the future builders. We should be pinching ourselves.
Everything is becoming a computer. This has largely been a good thing. Our cars are safer,
our homes are more comfortable, and our factories and farms are more productive. We have instant access to more information than ever before. We are more connected with each other.
At times, it has made us healthier and happier. But not always. Like the force,
computing has both a light and dark side. There has been growing evidence that mobile phones and
social media are directly contributing to a global epidemic of teen mental illness. We are still grappling with the implications of AI and synthetic biology.
As we embrace our greater power, we should recognize that it comes with responsibility.
We have become the stewards of two valuable resources that affect how the future is built,
our time and our energy. We can either choose to spend those resources on managing the plumbing
or embrace PostgresQL for everything and build the right future. I think you know where we stand.
Thanks for reading.
Hashtag Postgres4Life
Source
This post was written by Ajay Kulkarni.
Thank you for listening to this Hackernoon story, read by Artificial Intelligence.
Visit Hackernoon.com to read, write, learn and publish.