The Data Stack Show - The PRQL: Data Migration Made Easy: Postgres, ClickHouse, and the Future of Analytics with Aaron Katz and Sai Krishna Srirampur

Episode Date: May 19, 2025

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building a...nd maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the Data Stack Show prequel. This is a short bonus episode where we preview the upcoming show. You'll get to meet our guests and hear about the topics we're going to cover. If they're interesting to you, you can catch the full-length show when it drops on Wednesday. Welcome back to the Data Stack Show. We are live in Oakland, California, recording at the Data Council Conference,
Starting point is 00:00:27 and we have Sy and Aaron from ClickHouse on the show today. Welcome, gentlemen. Thank you very much. Really excited to be here. All right, well, give us just a quick background. You've had a pretty incredible journey, so give us a quick background. Sure, I'm happy to start.
Starting point is 00:00:41 This is Aaron. We formed ClickHousehouse Inc, the company around the popular open source database Clickhouse about four years ago. And it's a venture backed startup headquartered in Silicon Valley, Delaware Corporation and well capitalized. This is model is to take this very popular columnar open source database and offer it as a managed service as a database. It supports a variety of different use cases, which I suspect we'll get into. And we launched our managed service, which we call ClickHouse Cloud,
Starting point is 00:01:11 two years ago, and it's gone very well. There's a lot of market demand for this type of technology. So we've got over a thousand customers on our managed service, companies like Weights and Biases, Land Chain, Versel, Twilio, Roblox, Sony, Cisco, and many others, and they're driving great benefits in terms of cost savings and also extremely low latency analytical experiences for their customers. So the company's about 300 employees globally distributed.
Starting point is 00:01:42 Over half of our team members are outside of the United States, which also shows up in terms of our customer base and our revenue mix being highly international, with over 50% of both being outside of the Americas. Love to introduce Cy. We acquired Cy's company about 10 months ago, PeerDB, where he was the founder and CEO. And they developed a CDC protocol for moving data from Postgres into ClickHouse as Postgres emerged
Starting point is 00:02:09 as one of the most popular sources of data going into our analytical database. Awesome, very excited to be here and thanks, Aaron, for the great intro. So I'm Sai and I head up ClickPipes efforts in ClickHouse. So ClickPipes is a native ingestion service which gets data into ClickHouse Cloud. So at a high level, we make it very easy to stream
Starting point is 00:02:30 and like get data from various sources like object storage or streaming sources like Kafka and also databases, right? And prior to ClickHouse, I was the CEO and co-founder at PeerDB where we were building a data replication tool with laser focus on Postgres. So the goal was to provide the world's fastest and the easiest way to move data from Postgres to data warehouses, which included Clickhouse. And interestingly, Clickhouse was one of the most adopted in the high traction connector, which is why I think Aaron acquired PeerDB. And now at Click click house, we integrated PRDB already into
Starting point is 00:03:05 click house cloud. So you just click a button and like you can start streaming Postgres data into click house and use click house for blazing fast analytics, right? So it's all native. So you don't need to have any external ETL tool to do all of this. It's all in the click house cloud experience.
Starting point is 00:03:20 And prior to PRDB, my experience is all in Postgres, right? So I was working at this database startup called Citus Data, which built a distributed Postgres database. And that database got acquired by Microsoft. So I spent eight years there, helping customers implement Postgres. So I've seen all the pain points around Postgres for analytics, which is why I built this company where making it easy
Starting point is 00:03:43 to move data from Postgres to warehouses. And now I'm working in the other side, which is Clickhouse, which makes analytics like blazing fast. So I would love to talk about like Postgres, Clickhouse. So yeah. Yeah. So Sai and Aaron, I'm really excited about talking about this Postgres topic as well, because I think teams hit this wall and they're like, okay, this doesn't work anymore, what do I do? And the thing they don't want to do is have a bunch of different solutions for each thing, right? They want like as few solutions as possible. So I wanna talk about that.
Starting point is 00:04:13 Aaron, what's the topic that you wanna hit? Perhaps we can touch on the, just the diversity of use cases that we're seeing emerge around this type of technology and the convergence of a lot of these specialized databases, and we've seen this now for the last, let's call it five years, where you have transactional databases
Starting point is 00:04:33 like Postgres or MySQL or Mongo. You've got analytical databases like ClickHouse, Apache Druid or Pino, many others. You've got relational databases, vector databases. And you can kind of see these technologies on a bit of a collision course. And just the overlap between them and what we're hearing from customers around,
Starting point is 00:04:51 the desire to simplify the database infrastructure to where they can have one or two databases satisfy a lot of these different requirements. What about you, sir? I'd love to talk about Postgres and ClickHouse. And my experiences of what I have seen at Citus, because Citus did build a real-time analytical database. What were the challenges that we saw building stuff
Starting point is 00:05:15 within Postgres, and how we saw customers move to purpose-built databases like ClickHouse. We used to hear MemSQL also at that time. We used to hear Snowflake. I would love to share those experiences and yeah. Great. Awesome. Well, let's dig in. Tons to talk about. Yeah, let's do it. Alright, that's a wrap for the prequel.
Starting point is 00:05:32 The full-length episode will drop Wednesday morning. Subscribe now so you don't miss it.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.