The Data Stack Show - The PRQL: The Shortcomings of Apache Kafka with David Yaffe and Johnny Graettinger of Estuary

Episode Date: November 6, 2023

In this bonus episode, Eric and Kostas preview their upcoming conversation with David Yaffe and Johnny Graettinger of Estuary. ...

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the Data Stack Show prequel. This is a short bonus episode where we preview the upcoming show. You'll get to meet our guest and hear about the topics we're going to cover. If they're interesting to you, you can catch the full-length show when it drops on Wednesday. This week's recording is with Johnnyny and dave from estuary.dev and i think this is going to be a really fun conversation it's a topic that we've actually covered quite a bit on the show which is streaming you know in particular real-time streaming But this is really in the context of, I think, what you use streaming for. And we really dig into sort of the Kafka side of the conversation, which we haven't covered in depth a ton.
Starting point is 00:00:56 But part of the estuary story is really reacting to real-time streaming needs, evaluating Kafka, and seeing some pretty severe shortcomings, which is why they built Estuary. Now, what's really interesting to me is, in many ways, they don't talk about Estuary as a streaming service. They kind of talk about it almost as real-time ETL, which is fascinating.
Starting point is 00:01:19 There's some open-source technology under the hood, and this is really, I think, going to be an interesting conversation because streaming is obviously a hot topic and there are multiple technologies. So really interested to see what the SRE team has built. Yeah, 100%. It was a very fascinating conversation, actually,
Starting point is 00:01:40 for many different reasons. First of all, it was pretty technical and slowly in terms of talking about Eswari itself. Actually, we had a very deep dive into Kafka, how Kafka is built, and some of the issues there that actually Eswari is addressing, like from the perspective of the architecture of the system.
Starting point is 00:02:02 Like, for example, we were talking about how compute and storage in Kafka is like very tied together and how this has been like changed with using something as RAR. And like, what does this mean in terms of like managing the system and like what type of of use cases it enables. So we did a very interesting architectural conversation around this type of system.
Starting point is 00:02:32 So anyone who is interested to understand better how Kafka and this type of streaming systems are working, definitely should listen to that. And then we talked a lot about also some important concepts like CDC, right? And why CDC is important, how we use it, and how they implemented it because the standard out there
Starting point is 00:02:54 is pretty much like using something like the BISM, but the folks at Aestuary actually implemented everything from scratch. And they have some really good reasons why they did that and they are talking like through these things so amazing people both johnny and dave like
Starting point is 00:03:13 very deep expertise in this type of technology and we had an amazing conversation ranging from the technical side of things up to the business side of things. So I think everyone should listen to them and hopefully we're going to have them again in the future because I don't think one hour was enough to go through all the different topics when it comes to streaming. All right, that's a wrap for the prequel. The full-length episode will drop Wednesday morning.
Starting point is 00:03:41 Subscribe now so you don't miss it.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.