The Good Tech Companies - Postgres and the Lakehouse Are Becoming One System — Here’s What Comes Next

Starting point is 00:00:00 This audio is presented by Hacker Noon, where anyone can learn anything about any technology. Postgres and the Lakehouse are becoming one system, here's what comes next. By time scale, the architecture of modern data systems is undergoing a fundamental shift. Ask a developer how they build data systems today, and the answer increasingly looks like this. Postgres for the application, a Lakehouse for the analytics and data science. Postgres for the application, a lake house for the analytics and data science. Postgres, long favored for transactional workloads, has evolved into a general purpose operational database. It's trusted, flexible, and deeply extensible, powering everything from customer transactions

Starting point is 00:00:37 in crud apps, tutorial time dashboards and AI-backed product features. Its ecosystem has grown to support real-time analytics, time-scale DB, geospatial data, post-GIS, vector and full text search, PG vector and PG vector scale, and more. At the same time, the rise of open lakehouse technologies has redefined how organizations manage and analyze data at scale. Disaggregated storage, open table formats like iceberg, structured data catalogs, and composable query engines have made it possible to analyze petabyte scale data with precision and control. This architecture can offer governance, avoid vendor lock-in, and still

Starting point is 00:01:16 provide data teams flexibility in their choice of tools. What's striking isn't just the success of these technologies individually, but how often they're now being deployed together. Organizations increasingly need to support both operational workloads, powered by databases, and non-operational workloads, powered by lakehouses, often using data from the same sources people, machines, digital systems, or agents. Yet these systems are still treated in isolation, often owned by different teams, with too much friction in making them work together seamlessly. We believe that friction should not exist. In fact, we think a new, more coherent architecture is emerging. One that treats Postgres and the Lakehouse not as separate worlds, but as distinct layers of a

Starting point is 00:02:00 single, modular system, designed to meet the full spectrum of operational and analytical needs. The limits of the OLTP vs OLAP dichotomy, the old way of thinking about databases was simple, OLTP for transactions, OLAPFOR analysis. You used Postgres to power your app, and sent nightly ETL jobs to a data warehouse for internal reports and dashboards. This traditional distinction served as well when applications were simpler, and internal reporting could live on a much slower cadence. But that's no longer the case.

Starting point is 00:02:33 Modern applications are data-heavy, customer-facing, and real-time by design. They blur the lines between transactional and analytical. A financial app might run a trading engine that needs millisecond access to customer portfolios while simultaneously feeding real-time risk reports and internal dashboards. A SaaS app isn't just storing clicks, it's calculating usage metrics, triggering alerts, and serving personalized models. An industrial monitoring system might ingest tens of millions of sensor readings per hour, drive anomaly detection and alerting logic, and archive years of telemetry for long-term analytics and AI model training.

Starting point is 00:03:10 These use cases are not outliers, they are quickly becoming the norm. We increasingly see a more useful split, operational databases that power products, and lake houses that power organizations. Yet even though ownership of these types of systems are split product engineering teams responsible for the operational systems powering their products, and data teams responsible for managing Lakehouse Systems AS organizational services,

Starting point is 00:03:34 the two still need to talk to each other. They need to work on the same data and often share underlying schemas. The better they integrate and remain in sync, the more resilient and capable the system becomes. An operational medallion architecture. One pattern we see gaining traction is what we call an operational medallion architecture. Inspired by the medallion models popularized

Starting point is 00:03:56 in the data engineering world, this pattern also incorporates bronze, silver, and gold layers, not just for internal analytics, but for powering real-time, user-facing systems as well. Here's what that looks like. Bronze layer. Raw data lives in parquet or iceberg files on a WSS3 or similar low-cost bottomless storage systems. This data is typically immutable, append-only, and queryable by anything.

Starting point is 00:04:21 Query engines like a WS Athena, DuckDB, Trino, ClickHouse, or Polars, or even directly from an operational database like Postgres. Operational Silver Layer. Cleaned, filtered, validated, and deduplicated data is written into Postgres to power real-time analytics, dashboards, or application logic of user-facing products. Operational Gold Layer. Pre-aggregated data over silver data, like Postgres materialized views or TimescaleDB's continuous aggregates, serve low latency, high concurrency product experiences.

Starting point is 00:04:55 These are typically maintained within the database to ensure consistency between silver and gold layers. Crucially, each layer is queryable, and this movement of data is bidirectional. You can pull raw or transformed data from S3 directly into Postgres, akin to tightly integrated reverse ETL. You can roll up aggregates from Iceberg into Postgres tables, by one-off or standing queries against Iceberg files from Postgres. You can continuously sync a full schema or a single table from the database to the lake house.

Starting point is 00:05:25 Much as bronze are transformed, data can be read from the lake house storage layer on S3 into the database, silver and gold in the database can be written to these lake house storage formats. This avoids needing to re-implement identical pipelines in both systems, which both adds complexity and risks consistency. One common pattern we've observed in applications requiring fresh data IS writing from an upstream streaming system like Kafka or Kinesis in parallel Tobit S3, for row, unmodified bronze data and Postgres, relying on database ischemas and constraints for data validation. Then these silver tables and subsequent gold aggregates in the database are exported out to S3 again, so data teams now have access to the ground truth data that had been served to customers.

Starting point is 00:06:11 Now, each system maintains its separation of concerns. The operational database can run locked down, both to users and unfriendly queries, while data is still made available as part of the open lakehouse wherever it's needed in the org. Why now? Technical forces driving the shift. Several developments are powering this shift from the operational databases in lakehouses from being siloed to integrated. First, Iceberg has matured into a stable and flexible table format that supports schema evolution, acid transactions, and efficient compaction. It enables multiple compute engines to read from and write to the same datasets, with catalog layers that track metadata and enforce governance across the stack. Much like databases had catalogs at their core, so now do lakehouses.

Starting point is 00:06:57 Second, Postgres has continued to evolve as a platform, with extensions for columnar storage, time series data, and vector and hybrid search. What we've been building at timescale for years, Postgres now serves many products that incorporate real-time analytics and agentic workflows directly. And with emerging support for querying S3 and iceberg data directly from within Postgres, it is increasingly possible to incorporate S3 hosted data directly. So Postgresis no longer for just transactional data, with one-way ETL, CDC to Lakehouse but now acts as the serving layer for products incorporating both transactional and analytical data. This isn't just a data caching layer for pre-computed data,

Starting point is 00:07:39 but a full-fledged SQL database for further aggregations, enrichment, or joins at query time. Third, developers expect composability. Some organizations may be stuck with their legacy monolithic data platforms, but most developers and data scientists want flexibility to compose their own stacks, integrating familiar tools in ways that reflect their application's needs. The shift toward open formats and disaggregated storage fits this mindset. So does the desire for control, particularly in regulated industries or where data sovereignty matters. Put differently, the market is moving toward modular, open, developer-friendly architectures. What comes next? We believe the future of data infrastructure will be shaped by systems

Starting point is 00:08:22 that integrate operational and analytical layers more deeply, systems that treat Postgres and the Lakehouse as two sides of the same coin. This won't happen through another monolith. It will come from careful interfaces' incremental sync, shared catalogs, unified query surfaces, and from an architectural philosophy that embraces heterogeneity rather than fighting it. We're working on something new in this space. Something that builds on the strengths of Postgres and Iceberg, tightly integrates with existing lakehouse systems, and makes it dramatically easier to build full-stack data systems with operational and analytical fidelity. This isn't about using ETL to move data from legacy systems to new systems it's about building

Starting point is 00:09:02 a coherent modern data architecture that serves operational and non-operational use cases alike. Stay tuned! Thank you for listening to this Hacker Noon story, read by Artificial Intelligence. Visit hackernoon.com to read, write, learn and publish.

The Good Tech Companies - Postgres and the Lakehouse Are Becoming One System — Here’s What Comes Next

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.