The Data Stack Show - The PRQL: Who Needs a Stream Processing Engine?

Episode Date: November 7, 2022

In this bonus episode, Eric and Kostas preview their upcoming conversation with Zander Matheson of bytewax. ...

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the Data Stack Show prequel. We just recorded a show with Xander from ByteWax. ByteWax is a super interesting technology. It's stream processing within the Python ecosystem. One question I have for you, Costas, which we touched on a little bit in the show, but there are a lot of tools cropping up around stream processing, which we talked a little bit about on the show.
Starting point is 00:00:32 How many companies really need this, though? That's something interesting to me. There are a lot of technologies popping up, but it seems like they're primarily enterprise-level use cases. What do you think? Is it going to trickle down? Yeah, I think it will. You have to keep in mind that many times we see technology getting adopted by the enterprise primarily because the technology is not mature enough
Starting point is 00:00:59 to be adopted by a broader audience. And enterprises have the resources to maintain and make accessible to the whole organization this technology, right? Like getting something like Sling and like setting it up out there and running it and doing that consistently like blah, blah, blah, like all that stuff. It's not easy, right? Same like also like with Apache Spark, sorry, Apache Kafka, that's why you have Contra out there, right? And you have the hosted solution around that.
Starting point is 00:01:30 So when it comes to data in general, data infrastructure, it's very natural to see the enterprises being, let's say, the pioneers, because it sounds like they have the resources and the need, because of the volume or like whatever to go and do things first. And I think as we will see companies focusing more on the developer experience of things, we will see like a much broader adoption. Now, is like every shop out there is going to need like a streaming processing engine? Probably not. I don't know. But we will see.
Starting point is 00:02:10 I think there are like use cases out there that are important. I think anything that has to do with DML use cases where, I mean, when you want to actually use DML there, like create the features and like serve like recommendations, like all that stuff, as we're like streaming is like super important. So I think as ML and AI get like more and more, let's say the adopted together with all those tools like streaming become like more and more important together with like, okay, the rest of the technologies that we have there for like bus processing and more like static data processing.
Starting point is 00:02:50 David PĂ©rez- Yeah, I agree. I think the other thing, I love this. We're getting, I love it when we get into predictions because it's very dangerous territory. But also fun. Gristle ball. I think the other thing, Xander gave a really interesting example of pulling in wogs from a web server and processing them for some sort of use case. I can't remember of the challenge now is that even though the individual components are accessible, for example, there's great CDC technology out there. right? Like, let's get logs. Okay, great. Like, you have the logs, right? Can you process those logs in a streaming format?
Starting point is 00:03:48 Okay, like, you have, you know, ByteWax, you know, in order to do that, right? And you have the stuff down streaming ByteWax. But even with modern tooling, it actually is still a lot of work, even though, like, individually,
Starting point is 00:04:03 those things have gotten easier. Like, it's still hard to consume an entire use case, right? But imagine if you could just literally hook your logs up to an end to end pipeline. And it's like, well, you get sessionization at the end, right? So I think as that, I think as the ecosystem evolves, and more of those use cases are available out of the box, like adoption will go up as well. Because you may not need stream processing for everything. For example,
Starting point is 00:04:31 we don't necessarily need real-time reporting on certain things at RudderStack, right? But it is really nice if you can do it. Right? And if it's easy, then why not? You don't have to wait on batch jobs and all that sort of stuff so anyways it'll be really interesting to see how the ecosystem evolves great show with xander
Starting point is 00:04:52 bite wax is super cool so check out the repo subscribe if you haven't and i will catch you on the next one Thank you.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.