The Data Stack Show - 60: Architecting a Boring Stream Processing Tool With Ashley Jeffs of Benthos

Starting point is 00:00:00 Welcome to the Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You'll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by Rudderstack, the CDP for developers. You can learn more at rudderstack.com. Welcome back to the show. Today, we're going to talk with Ashley Jeffs,

Starting point is 00:00:32 and he is the creator and maintainer of an open source project called Benthos, and it is a stream processing service. And it's a really, really cool tool, and in many ways has a lot of alignment with what Costas and I work on in our day jobs, stream processing service written in Go and does a bunch of interesting things. I actually, Costas, have some technical questions because after reading through the documentation, he's made some decisions that are fascinating to me, maybe because of my lack of knowledge. But I think I'm most interested to know what it's been like to maintain an open

Starting point is 00:01:10 source project for five years, especially dealing with something that's pretty complex. It's not a JavaScript plugin, or not that those are immaterial. But when you talk about stream processing and integrating with services like Kafka at very large companies, you're dealing with some pretty heavy duty technology. So I'm sure that the emotional rollercoaster of doing that for a long time has been interesting and many times we don't get to see that. So hopefully Ashley will share a little bit about that with us, but what's on your mind? Yeah, two things, two topics actually. One, of course, we have plenty of technical stuff to discuss about. Streaming processing system is not the easiest thing to engineer. So there are many trade-offs and many decisions that

Starting point is 00:01:59 you have to make there. So yeah, I'm really looking forward to discuss the technical side of things. And then, of course, I'd love to hear his experience of being a maintainer of an open source project for five years. And from what I understand, he's the main and more than 98% of the contributions come from him. So he's very engaged with that. So it's going to be super interesting to hear from him how he does this and how he keeps himself motivated and all those things. Well, let's dive in and talk with Ashley. Yeah, let's do it. Ashley, welcome to the Data Stack Show.

Starting point is 00:02:36 There's no way we're going to have enough time to cover all the topics, so let's just dive right in. Give us just a brief background on you, brief overview of your career and then what you do day to day. Hi everyone, my name is Ash. Thanks for having me on the show. So I'm the core maintainer of a project called Benfos, which I've been doing for about five years. It's a data streaming service. It's decorative. And the idea is it's this operationally simple thing. And I started working on that around five years ago after working in sort of stream processing industry, which I've been doing for about eight

Starting point is 00:03:10 years. So this is, I didn't used to call myself an age engineer because the term didn't really exist, but obviously that's pretty much what I consider myself to have been that whole time. And now that's pretty much my job is just working on this project kind of indirectly, but yeah, that's my job basically. Okay. Well, I want to hear more about that. One interesting side note. I don't know if you've looked at the Google search trends for the term data engineer,

Starting point is 00:03:36 but it's crazy. It's like a hockey stick over the last five years, which is really interesting. You can see like, okay, this is kind of people were trying to figure out what to call this discipline. And then of course it's like formalized now. Well, tell us about Aventus. You started working on it five years ago. It's a really cool tool, but tell us the details on what it is, what it does, and then especially why you ended up creating it. So I kind of built it defensively. It's got two main focuses as a project. If you kind of look at it on the website, you have a quick five second glance. It's basically YAML programming, a stream processor. And the idea is that it's operationally simple.

Starting point is 00:04:18 What I mean by that is the whole premise of this project is that it's super correct in every possible way in terms of data retention and back pressure and trying to be the least headachy item in your streaming platform. And architecturally, that's quite difficult. And it's been a main focus of the project pretty much since day one. And that's because I was kind of working in a position where we were basically inventing the same product over and over again. We had this entire platform of a service that reads from something, does something to it, that's usually a single message transform, maybe some enrichments hitting third-party APIs, that kind of stuff. And then it would write it out somewhere. And we were plagued with development effort put into migrating services

Starting point is 00:05:09 because they were all slightly different. And these weird combinations of different activities that each one was responsible for. And we were just constantly rewriting these things to slightly change their behavior, recompile it, redeploy it, go through all the testing, hassle, that kind of stuff. So I was in a position where I was kind of desperate for something to just be dynamic in that you can drive that through configurations for declarative. Because these are usually just simple tasks. It's filtering, transformations, some enrichments, and a few little extra bits in between maybe like some custom logic that you

Starting point is 00:05:47 can plug in and stuff but for the most part it was just stuff that you could just describe in a couple lines of config but we just didn't have that tool so i kind of went on this weekend warrior effort to build what i would consider to be a solution to that problem. Our perspective at the time was that data was super important. It's basically our product. So delivery guarantees were very, very strict. And also we were using Kafka all over the place. So this was about eight years ago. So Kafka was, I think it was like version 0.7 at that point. We were early adopting it and slowly migrating it through this platform. And my take on it was if this thing is a disk persisted, replicated service that we're putting all this effort into operationally running, why would I have a service that has a disk buffer that is also

Starting point is 00:06:46 operationally complex like if you get disk corruption or some sort of failure then it's it's a single point of failure that could introduce data loss in your system potentially forcing you to do things like run backfills so why don't we just have a service that doesn't need anything like that it's it's always going to respect the at least once delivery guarantees without any need for extra state. It's just going to do that based on what offsets it's committing and basically what you would call a transaction and what is effectively the Kafka Streams API. So what it's supposed to be doing is making sure that you never commit an offset that you haven't effectively dealt with that message that's passed on forward. So you don't need a disk buffer to have that delivery guarantee.

Starting point is 00:07:28 And then the other piece of that puzzle was making it simple to use. So the idea is that you can slap a config together to create a pipeline. So this service is reading from Kafka, performing some sort of filtering, and then maybe applying some sort of masking, data scrubbing, whatever, and then it's writing out to NATS or 0MQ or something. I can then take that config, commit it into a repo. And if somebody comes up to me and goes, oh my God, no, we need to stop writing to NATS. We're going to change this to RabbitMQ now. Or this filter needs to change. We need to change the logic for that. I can just say, here's the config, change that, it's two lines, I can review it and then go. And to me, that was my way of ensuring that I would get to work on more fun things, like

Starting point is 00:08:16 the actual stuff that I wanted to be doing in my day job. And then obviously that naturally progressed to me only working on the boring stuff, because now I'm the maintainer of the service that's doing the boring stuff. And that's where I am now. An attempt to journey into the exciting that ends with a continuation of the boring. Inevitably going down the rabbit hole of boredom. Yeah. Well, you know, I mean, one thing actually that one thing we've talked about in the show actually with several guests is that, and I loved as I was digging into your documentation, you say in multiple places, a defining feature is that this is boring. And

Starting point is 00:08:57 we've had multiple guests who have built really large scale systems and we'll ask them about it and they'll say, it's kind of boring, but it works really well and it's extremely reliable. And so that really resonates with me because that's something that we've heard a lot. One question for you, and I know we're going to dig into this a little bit later, but there was certainly a point at which you made a decision for the project to be open source. I mean, sometimes when you're building this, especially to solve a problem that you're dealing with inside of a company, it can sort of be IP that exists inside of that company. What motivated you to decide to go open source with the project? Yeah. So everything that I did in my spare time was like a learning exercise. I would always make open source. And that was just a habit of mine because I was,

Starting point is 00:09:49 thing is if I was planning to make something open source and know that it was going to get attention, I wouldn't do it because I was so shy or so nervous about somebody actually looking at my code. But what I did is because I was so cynical about nobody's ever going to look at this, nobody's ever going to know I put this on GitHub. I would put all my little hobby projects on GitHub.

Starting point is 00:10:10 So the idea of making open source from the onset was obviously like a nervous exercise for me, but it's also the excitement of maybe this is going to help somebody. But the main reason why, so I mentioned that I kind of built this thing defensively for the company I was working at. There was just so much going on at the time. So this was a company called Datasift. And they were basically selling these firehoses of social media data, the biggest one being Twitter, and then filtering logic on top of that. So it's like lots and lots of stream processing back when everybody else was talking about Hadoop as being big data.

Starting point is 00:10:45 We were basically processing the Twitter host constantly for hundreds of gigabytes of customer filtering data. And it was this huge platform with all this stuff going on. And we were having to work pretty defensively to keep this thing going because our requirements were changing quite frequently. Because I don't know if you've realized this, but working with social media companies as a partner can sometimes be a little bit turbulent and they can do things like cut you off randomly and force you to pivot. Or change APIs or change data without. Yeah. Or we just don't want to work with you anymore. Bye. Your business is kaput. Sorry. Oh, that's awkward. So yeah, we were constantly having to churn what the platform was capable of and the teams were

Starting point is 00:11:32 amazing. The engineering staff at DataSift were fantastic, but it's still this huge effort and you've got all this technical debt because you're constantly having to change all these services and what they can do and all the capabilities and stuff. So there wasn't any capacity really to work on something like this on company time. And in all honesty, I was working on it in my spare time for two years before it was really viable. Because at the end of the day, if you've got bespoke services that are built to do a specific task, to replace that with something generic is just going to be a challenge. To build all the basic stuff needed to have a dynamic system and then to get it to perform in terms of stability and throughput, latency, that kind of stuff, is this massive effort. I didn't know that when I started,

Starting point is 00:12:25 otherwise I wouldn't have started it, but then it just, it naturally progressed. It was, it was a hobby project that nobody was really interested in. And then two years later, I come back to the company. I'm like, Hey, this might be usable now. Can we use this please? And it had already kind of got a bit of a life on, on GitHub at that point. So it just kind of carried on that way. Sure. And did the company end up using it? Oh, yeah. So they used it a fair amount in a few places where it was an immediate solution to a problem we had.

Starting point is 00:12:53 We didn't just like nuke all the other services on the platform. It was a very careful effort of we'll slowly roll this out in places where we were going to have to do some changes anyway. And then what happened is the company got bought by... Okay. we'll slowly roll this out in places where we were going to have to do some changes anyway. And then what happened is the company got bought by, and it was awesome because we had this streaming platform and the idea was we were going to sort of use that technology throughout. They're a very data heavy organization and they have a load of different teams all working in completely different ways. Yeah, they're a big company and their products are pretty cool. Yeah, the engineering teams there are fantastic. And the thing is, they're geographically distributed.

Starting point is 00:13:33 So they all do things slightly differently. They've got slightly different best practices of how they work with their data, or they did at the time. They're probably more consistent now. But yeah, so I had an opportunity then to go to all these different teams and say, hey, you're looking to interact with our streaming infrastructure. Here's a tool that can, rather than being blocked on us as a team, enrolling you on this and getting you onboarded with all this infrastructure changes and things, why don't you just run this thing yourself and

Starting point is 00:14:02 you can do it in your own time. And we're not even in the loop. This service will allow you to interact with all of our stuff, hit these enrichment services, all those things. And it took off. So again, it took a bit of time because you come to people with this generic service. And I think because of the... I mean, it's open source and it's a generic config driven service. So immediately people start thinking, is this going to be like log stashes? Is it going to take two minutes to start up? Am I going to rip my face off over the config format, that kind of thing? So people are quite skeptical.

Starting point is 00:14:38 So it takes a while to kind of demonstrate to people that you're going to get value out of this. You're going to like it. And I kind of became like an internal evangelist for, you can use the service for this thing, this thing, this thing. And when people had use cases, I immediately jumped on it because that's the bread and butter of the project. It can't continue if I'm not constantly seeing new use cases and new problems to solve. So I kind of tried to nibble on as many use cases as I could. Do you think that part of that also, I mean, you have an interesting perspective in that

Starting point is 00:15:11 you got to have a, you had very practical experience with streaming almost coming of age, right? Because back when you were using Kafka, the idea of streaming as you're talking about it is actually still pretty novel, right? In terms of the technology. So do you think also to some extent, the adoption of streaming technologies is a little bit hard, like evangelizing use cases in part just because streaming was still younger to an extent yeah so i i kind of it was kind of weird because i i started working with some teams and basically got benthos to work in a batch mode because there were use cases where it was like we've got an s3 bucket and we just want to consume the entire bucket and then write it to Kafka because all the other teams are using Kafka.

Starting point is 00:16:09 So it was one of these situations where I didn't really think about it at the time as like, oh, they're using batch. This isn't a batch product. I can't do that. It was more just a technical problem. That's pretty easy to solve. Basically, it's an input, just like any other streaming input. Once you're finished, the bucket is exhausted. You shut down gracefully. It's problem, that's pretty easy to solve. Basically, it's an input, just like any other streaming input. Once you've finished, the bucket is exhausted, you shut down gracefully. It's not massively complicated. So there was an aspect of you have to do stream at this company because that's the data bus. That's the data infrastructure of this company. We cannot do what we want to do in a batch way. The volumes are just too big. So this is how we're going to solve that problem.

Starting point is 00:16:46 And I mean, nobody at that company that I interacted with was particularly intimidated by Stream. They were all excited to play around with this new tech. And then the thing is after that, so the project kind of grew externally and more organizations started adopting it. I have never been in a position where I've had to convince anybody to use streaming because they're just coming to me. They've already got this infrastructure and they're looking for something to solve particular problems they have, and they're stumbling upon it. So if somebody asked me, how do you convince a company to adopt stream? I've got no idea. I have absolutely no clue how to do that. Well, that's a really helpful perspective.

Starting point is 00:17:28 And I think, especially in the context of social media data, and I think some of the other components of things that Meltwater provides as sort of data products, I would guess actually, now that I think about it, the demand for streaming was probably unbelievable because when you're dealing with that nature of data, social media platforms, streaming real time and getting updates as soon as you can to see trending is probably super important. Well, I have been monopolizing this. I have a million more questions, but Kostas, we talked about some really interesting technological questions. So jump in. I know you're talking a bit. Yeah, it's been a very interesting conversation. All right, let's start and try to dive a little bit deeper into the technical side of things.

Starting point is 00:18:12 And my first question is, can you give us a typical setup, including Benzos, how it fits with, let's say, the pretty much standard data stacks that we see out there? How do you see it deployed? It's often used as a plumbing tool. So you imagine you've got Kafka infrastructure. It's very often Kafka that people are using it with. There's also MQTT. It seems to be a growing use case.

Starting point is 00:18:39 But it's normally a company that's already doing some stream work. And what they've got is they've got some services. They've either got other queue systems that other teams are using. So we want to share data with some team from another company. It could be just another team at their company. And they just do things differently. They've got a different schema. They've got a different stream technology, whatever. And they just want some simple tool that they can just deploy. They don't want to invest too much time into this partnership. Maybe it's a temporary one, maybe it's going to change over time. So they just want something now, it's going to solve

Starting point is 00:19:15 that problem. They don't have to think about it. It's automatically going to have metrics and logging and that kind of stuff. It's low, low effort basically. And then what tends to happen is you start using it that way sort of defensively. And then you realize, oh, hang on a minute. We've got this other service that's just reading a topic and then doing some HTTP enrichment, or maybe it's calling some Python script or something. And all it's doing is taking a payload, modifying it slightly, and then sending it on with somewhere else. We could just do that with this Benfos instance. So why don't we do that? And then it just kind of slowly grows from that point where you delete a project that you had to maintain and you've replaced it now with a couple lines of config

Starting point is 00:20:00 and it all fits in this one service that's kind of neat. You can deploy as many of them as you want because it's stateless. It's just low effort. So it tends to be, to begin with, just a silly plumbing mechanism from one thing to another. Maybe it's just a bit of filtering or something that somebody wants and then they slowly grow. Maybe eventually people branch them out into different deployments with different configs and stuff, but they'll be doing the things. I tend to call it plumbing. I don't really know if we've got like a good term for it in data engineering, but it's not a clever task normally. It's usually single message transforms and integrating with different

Starting point is 00:20:38 services. So you might be hitting like Redis cache or something to get some hydration data based on like a document ID or something. Or maybe you're hitting a language detection service on some of the content of a message, that kind of thing. And then enriching the data with that, that kind of stuff. But it's stuff that can sometimes be considered to be quite complex problems. And the reality is it's not. It's just an integration problem. And you can put that in a nice config.

Starting point is 00:21:08 And then when things change, when somebody says, hey, our service is going to change, we no longer support that field or that thing, or here's the new schema, then you just do a quick change, commit that. And it's simple to test. That's very interesting. So how, you said that like, it's very common to see it working together with Kafka, right? So Kafka, okay, there's a whole ecosystem of like tools around it, right? How it's used together with stuff like Kafka Connect, for example, right? Which has like its purposes in a way, like to connect Kafka with other services or without other streams, then you can deploy technically some,

Starting point is 00:21:47 let's say, processing logic on top of Kafka. So you can process the data. It sounds like on Kafka, you can do everything inside Kafka at the end, right? Or at least that's what Confluence wants to happen. So why do you think that someone who already has invested in the Kafka infrastructure, they would also use Benzos? I understand after you start using it, why you keep using it and increase the use cases that you cover.

Starting point is 00:22:16 But what's the first thing that will convince someone to start using Benzos? Does it make sense what I'm asking? Yeah, I think so. So I think the main selling point, I think if somebody's got, they've got JVN components and maybe they've got Kafka, maybe they're using Apache Camel or something, and then they've got some other logic on top. I think what tends to happen when people pick Benthos, I mean, it's kind of difficult to summarize because I don't get an awful lot of feedback from the community often, but it's normally an engineer that's making the decision. It's like a data engineer in this context.

Starting point is 00:22:52 And I think their main frustration is they don't like building stuff. They don't like having a build system for these transformations they've got, especially if it's really simple stuff and especially so if they have to change it often. And they don't like the weight of some of these components. They're a little bit clunky. They're a little bit awkward to use. They want something that is more friendly to an ops person. So if you're on call and you're waking up at 3 a.m. and something has happened, maybe a server has crashed or something, and it's part of your infrastructure, and you're waking up at 3am and something has happened. Maybe a server has crashed or something and it's part of your infrastructure. And you can see in your graphs that you've had some sort of outage, like the horror stories of some of these components and waking up and thinking, oh God, now I've got to recover all of these different things. So what the problem is,

Starting point is 00:23:38 when they see this product, it's just a single static binary. It doesn't have any state. You can restart it on a whim. In fact, you can restart it constantly if you want. There'll be no data loss. When they see an outage, it's a simpler problem because you don't have to coordinate a backfill. You don't have to coordinate all these components slowly coming on over time. They probably already restarted if your infrastructure is set up for that. And you can just check on the graphs, the metrics and things that it's worked. If there's a problem,

Starting point is 00:24:08 you've deployed something that is broken, then it's just a config change. So anybody can look at that and get some idea as to what's going on. They're not reading code. They're not looking at something that got committed to some CI system and it was a full build that got deployed.

Starting point is 00:24:22 They're just looking at a config change that got deployed. So maybe there's like some mapping or something and they can just roll that back if it looks wrong, build that got deployed. They're just looking at a config change that got deployed. So maybe there's like some mapping or something and they can just roll that back if it looks wrong, that kind of thing. So I think it tends to be engineers that are making the decision. And obviously a lot of Go developers,

Starting point is 00:24:36 I didn't mention that, it's written in Go. So a lot of people who are already writing Go services, it's a natural win for them because they can write their plugins in Go rather than Java. But in terms of feature set, it's a lot of overlap with a lot of products that already exist in the Java ecosystem and are more popular. They're more widespread. So I've never gone after people making those deployments. I would never tell somebody if you've got a happy system that you're using and it's using all these products, I would never tell them you should ditch all that and use this thing. And if you've got a bespoke

Starting point is 00:25:09 service that you're happy with and it's doing all this stuff and it's your code and you're building it, keep it. If you're happy with it and it's solving the problem, then you should definitely keep that thing rather than replacing it with this weird thing that you've never seen before. Yeah. But it's more, it's this trade-off between deciding what you want to work on and what are your priorities as a team. The declarative side of things is also quite important. Like I think it fits much more naturally in the workflows that like engineers have. You mentioned like quite a few times, you can write like a config and I can review it, right?

Starting point is 00:25:43 This thing that I can review it and then we can move fast and we deploy things and we can change things faster. That has a crazy value when you're talking about an environment that needs to be alive all the time and at the same time you have to create the new logic that you need because things are changing constantly and all that stuff. I think that's also a very interesting part of the data engineering as an engineering discipline, because it's this kind of crossover between software engineering, but ops at the same time,

Starting point is 00:26:12 you have all these different like facets of the, that you have like to do at the same time. And you really have to pick the best from each one and try like to create tools that they combine the best practices from there. So I think that having this declarative way of describing what should happen there, it has amazing value. So I can understand that, especially having worked with a JVM-based infrastructure. So how would Benthos compare to other streaming processing platforms? What are the differences and the similarities between the two? So Benthos is much more focused on single message transforms.

Starting point is 00:26:51 So you get a single payload and you're doing something. You might have a batch. You can do batch processing. So say like consumer window of 100 messages and aggregate them. But it's bread and butter really is single message things. And the reason why I've focused on that is because at the time, that was the problem that I had was just single message stuff. And there wasn't really an awful lot of attention on that in the product space. We already had Spark

Starting point is 00:27:17 at that point, which was already solving the problem pretty well from what I could tell. I hadn't used it, but it seemed like, okay, windowed aggregations, that's a solved problem. We have a tool for that. And what's the nice thing for masking, filtering, transformers, enrichments, hydration, that kind of thing. So I think if I was going to compare it to these products, I would say it's probably more similar to Apache Camel. And obviously Kafka Connect as well, to an extent. And then the main difference is that it's kind of decorative from the onset. People like saying cloud native nowadays. But basically it can be deployed in Kubernetes essentially without much hassle and that kind

Starting point is 00:28:00 of thing. But then Camel's got CamelK now. So I mean, those services are becoming nicer to deploy, but not like the kind of things that you could do with the Bentos config. With the way that the config is structured, you could do crazy things. You can have multiple inputs fed into a single pipeline with their own processes and then have joined processes. You can have multiplexed outputs switched on then have joined processes. You can have multiplexed outputs switched on the contents and messages. You can have fan out all these different brokering patterns around Robin. You can have dead letter queues for processing errors and also when outputs

Starting point is 00:28:35 come offline and all that kind of stuff. So it's much more centered on plumbing, which is why I kind of put it in the sort of camel category even though it it is a stream processor does stream processing so you know it tends to get compared a little bit more with like flink and stuff it can do windowed processing but that's not really what it's for it doesn't have state necessarily it does window processing just by keeping it in memory and only committing offsets when that window is flushed. So it's not... I haven't done any performance comparisons in that place because it's kind of experimental at this point, but it can do it.

Starting point is 00:29:12 I wouldn't sell that feature of Benfos at this point. Yeah. All right. And then why did you decide to implement windowing on the platform? Same reason I did most of the stuff. I just thought it'd be fun. There's a lot of stuff in Benthos that, because it's called a stream processor and people will look at it. And what I reveal on the front page is a stream process. Reading from a streaming system does some stuff, writes it somewhere, but there's a lot of

Starting point is 00:29:41 stuff in there that does not fit the stream processing category. You can use it as an HTTP gateway if you wanted to. It supports request response. I had to put that in because of NATS and also Xero and Qth, stuff like that. So it's always had the ability to do responses to inputs. So you can just hook it up as an API gateway. It has an API for dynamically mutating streams and having multiple streams. You can use Benthos to drive itself. There's like loads of stuff in there that doesn't really fit the category. So I thought, well, I might as well put windowing in there as well. It's really fun to just hear in the world of technology and data technology, especially when you think about sort of like San Francisco

Starting point is 00:30:20 based companies that are, you know, trying to become really big. There's a lot of talk about product strategy and all this sort of stuff. And it's so wonderful to hear, like, I did that because it'd be fun. And that just brings me great joy, Ashley. It is a survival mechanism to an extent, because you're doing a lot of this stuff on your own steam. I'm maintaining this project just on my own will. So in order to do that, you have to have fun. There is no way of maintaining an open source project, especially in the early years. It's just not possible if you don't enjoy doing it to an extent. Or at least I wouldn't want anybody to suffer that experience if they didn't enjoy it, because there's no guarantee of anything with, with, especially with open source, but also any

Starting point is 00:31:10 business running a business is the same thing. There's, there's no guarantee that it's going to end up anywhere where anybody's going to use it. It could fizzle out. It could disappear. You could just get burnt out and not want to do it anymore. So if you don't enjoy it, then what's the point? Like there's no, there's no point in it. You're just punishing yourself. Sure. One question there, which I'd love to just, I think they're looking in from the outside. Sometimes it can be hard to tell what the actual experience of building and maintaining an open source project like benthos is like but could you just tell us about some of the highs and lows over the past five years and sort of you're you're basically

Starting point is 00:31:53 working with and on and consulting around benthos full-time now but what are some of the highs and lows that you've been through as you've as as you've maintained the project, which by the way, I think also congratulations are in order because that's a long time for a project that is still being used at large companies. So congratulations, because that's a huge accomplishment. Thank you. I appreciate that. So the highs are hearing that it helped somebody that's like, when somebody gets excited about the fact that it solved this issue for them, I get a deep satisfaction out of that. And you don't get it an awful lot with open source because at the end of the day, most people are going to silently download it, use it,

Starting point is 00:32:38 and you'll never hear from them again, especially if they're happy. The happier they are, the less you'll hear back from them. And I'm not judging anybody for that. I do the exact same thing. I can't complain because I use loads of open source projects and I'm not emailing the maintainer going, oh, I really enjoyed your fun today. What an unvirtuous cycle. So those are the highs when somebody actually bothers to say, Hey, this really helped. We can now focus on this thing that we want to be doing. It got rid of all these issues for us. Thank you for, for,

Starting point is 00:33:13 for making this thing. Or if somebody asks for a feature and I get it out to them quick and they're so thankful for it. Oh my God, that's amazing. Thank you so much. Especially if it was low effort, if it took me like five minutes and they're like, oh my God, that's amazing. You're incredible. I get a lot of satisfaction out of that. The lows are obviously bugs. If somebody has had a bug and they've had some sort of suffering, the behavior hasn't been quite what they expected or something's broken or whatever. I think, so I have a thing. I can't just leave a bug. I'll tag it, I'll label it on GitHub as a bug and it gets closed that day. I can't deal with bugs

Starting point is 00:33:55 being known and not dealt with. And that's mostly just, I just can't handle it. I won't be able to sleep, which is, it's great because it means that I deal with them. I don't have a backlog of bugs that are constantly getting worse or interacting with each other, that kind of thing. But obviously that has a toll. Sometimes I just want to enjoy my evening and a bug arrives and now that's my evening. There's nothing else I can do about that. But they don't, to be honest, when you deal with bugs really quick, it does have an effect. I think there's obviously lots of blogs out there about dealing with bugs as a team and stuff and how you should prioritize them and all that stuff. And I think that obviously I wouldn't say to everybody deal with bugs as soon as they're known because that's just not practical.

Starting point is 00:34:42 But it definitely has had a positive impact on the project. The other thing is whenever anybody has a question, if it's a question that isn't already answered in the documentation somewhere, I consider that a bug and I will try and make an effort to fix that either with a guide or fleshing out the component docs or something, making some example or whatever. And that has been positive because obviously as a solo maintainer, you only have so much time. So you can't be answering questions constantly. So it's a defensive move in a way to always treat big questions as a bug

Starting point is 00:35:17 and just deal with them quick. But those are the lows because I have to deal with it. And it's me. It's a personal issue with me. I could, I could get a therapist and I could deal with that. I've chosen not to at this point because it's not, it's not that big a problem. It's not as if it's every evening. I get like a bug a month or something. So it's not, it's not.

Starting point is 00:35:41 I guess if you were constantly missing dinner, it would become an issue. And then maybe, maybe you would call the therapist. My wife would not have that. She would not have me missing dinner. What I would do is I would go and eat begrudgingly and then I would come back. Doesn't get in the way of family functions. Yeah. Yeah. That's great. That's great. So, okay, let's go back to the technical questions again. And then we can come back to open source because we have quite a few questions to ask there. Let's discuss a little bit about the architecture of

Starting point is 00:36:11 Benthos, how you architected Benthos and what are the main components and give us a little bit of insight of the choices that you've made in the trade-offs there and why. Cool. Okay. The main premise of benthos as an architecture is that it's i kind of called it transactional model transaction means a lot lots of different people now unfortunately because i i used it as a very general term at the time but basically all inputs in benthos obviously there's lots of them and they all have different paradigms for how to deal with acknowledgements and things. And obviously Kafka being the one that's most different to all the others in that it's just a numerical commit.

Starting point is 00:36:51 But basically every input within Ben gets wrapped in a mechanism for propagating an acknowledgement from anywhere else in the service back to that input where it knows how to deal with it. And then it pushes it down a pipeline, which is Go channels. It took about Go for hours, but basically Golang channels are used heavily as a way of essentially plumbing different layers of the service. Because it's dynamic, there could be any number of processing threads for the suite vertical scaling. There could be any number of different inputs feeding into one or more outputs. So what happens is the message gets wrapped in a transaction. It gets sent down a channel, which is also the mechanism for back pressure.

Starting point is 00:37:49 If there's nothing ready to deal with that message, it can't go anywhere. And then essentially that makes its way downstream. So it goes through a processing layer. They receive transactions of messages. They actually receive a message batch, but usually if you're reading a non-batched source, then it's a batch of size one. But all the processes can do whatever they want on it. If they filter it intentionally, so it gets removed, they call the acknowledgement. And then the input will do things like send that acknowledgement directly back to, if it's Google PubSub, then it will act that. If it's RabbitMQ, it'll act it. If it's Kafka, it'll mark the offset as ready to commit. The important thing with Kafka, I'll go back on that because there's a whole topic around how the Kafka input works. But basically, it eventually makes its way to the output layer. The output layer could be brokered. You could have multiple outputs. They could be of composed. So they're generic components in themselves.

Starting point is 00:38:49 You can compose brokers on brokers on brokers if you want to, but they are responsible for essentially enacting the behavior that a user would expect by default. So if it's a switch multiplexer, you've got five outputs, a message gets routed to three. The message is not acknowledged until those three outputs have confirmed receipt. Obviously, some outputs are better at that than others. And obviously, you can tune them to an extent. So with Kafka, you can tune whether or not it's reporting all the replicas were written to or not. But basically, you have some way of knowing that the message is successfully written somewhere,

Starting point is 00:39:17 then it gets acknowledged. And then it's up to the input to do whatever. So most inputs, so for the easy queue systems like NATS and GCP PubSub and stuff, where ordering isn't as important, people don't really consider that when they're processing messages from those. You can just keep pushing messages down the pipeline. And if there's capacity, then it'll get processed. If there's back pressure on the output, naturally it makes its way up to the input pretty quickly. And then when it's freed, the components gracefully resume. With Kafka, by default, topic partitions are processed in

Starting point is 00:39:55 parallel. So if you've got 10 processing threads and you've got 10 topic partitions, you've potentially got 10 threads saturated. Not necessarily if they're not balanced well, but in theory, you've got 10. But messages of a partition are processed in order. So your options there are you can batch them and process multiple messages of a topic that way. Or what you can do is you can increase, I call it like a checkpoint limit. But basically, how many messages are you willing to process out of order?

Starting point is 00:40:25 And what I do there is I keep track. So if you say like, we want to be able to process a hundred messages async, whatever order, we don't really care about that. We just want to process them fast. I limit the number of messages and I track which offsets we've actually acknowledged. And I will only commit up to the point where all the messages from that commit number down have already been acknowledged. So there's potential there for duplicates. So say you process 100 messages, the first one that went through the pipeline, for whatever reason, hasn't been acknowledged yet because it's blocked somewhere. All the others have, well, guess what? None of them have been committed yet until that final one has gone.

Starting point is 00:41:04 And that ensures that when you restart the service, you don't get data loss. But then the trade-off there is that you could potentially get duplicates next time you restart it. So it's like the difficulty with a service like this is finding the common mechanism that's going to satisfy all these different input types. They're all different ways of handling acknowledgements and what they're typically used for as well. Because obviously some people might want to do ordered processing with a key system like NATS, but then most people don't really care. So

Starting point is 00:41:35 you can kind of enable it, but by default, you're just going for throughput and vertical scaling. Whereas Kafka, typically people care about the ordering and they want to do batched processing of some type. So you kind of manage it that way. But essentially what I've got now, I've had to refactor the components multiple times to make sure that I could do all this stuff. But basically they all kind of fit their own paradigm now. And yeah, I think I probably missed a million things there. It's it's fine, it's fine. But I have a question. How important is ordering based on your experience

Starting point is 00:42:12 with streaming processing? That's a good question. So I do, so for me personally, it's never been an issue because I've never worked on a system that actually cared. In event sourcing land, then it's super important, I would imagine. I've had obviously people come to me and have a discussion

Starting point is 00:42:30 about how can we guarantee ordering? What about in the event of failures and stuff? If we're retrying messages, how do we guarantee we're getting the right ordering and stuff like that? And I mean, it's a complex problem to make sure in all cases, every single edge case, you've definitely got the correct ordering. But I think it is possible, just like the perfectly secure system is possible. But yeah, I think it is doable. But I think mostly I would attribute that to event sourcing. So you're processing a stream of actions and you need to make sure that they're done in the right order because it has obviously an effect on the outcome. But yeah, to be honest, I would have normally traditionally described as a system where it probably doesn't matter because you're doing single message transforms

Starting point is 00:43:12 anyway, enrichments and stuff. But then obviously if you're using it to bridge between services and something downstream does care about ordering, then obviously it also has to respect ordering. So I did opt. I think some services have gone down the path of not really care about ordering, then obviously it also has to respect ordering. So I did opt, I think some services have gone down the path of not really caring about ordering too much. And maybe there's a way of dealing with it. I am tempted in the next major version bump to reconsider whether or not I make it default because obviously it does make scaling easier for people. If just by default, it doesn't really care scaling easier for people if just by default it doesn't really care and it's letting you use however many processor threads you've got.

Starting point is 00:43:50 But for now, it's straight to an ordering until you give it the explicit instruction to allow it to process the Kafka housing. And based on your experience, again, what's the main trade-off that you have to change in order to have ordering, right? Is it just performance? Is there something else? You mentioned something about duplicates. So there are differences there with delivery semantics also. So what are the main trade-offs that an engineer needs to have in their mind when they opt for having strict ordering? If you don't care about delivery guarantees, then the main problem is just throughput,

Starting point is 00:44:30 is how easy is it to do vertical scaling? If you're forcing order processing and you've got a limited number of topic partitions, because that's tied to your Kafka deployment, like the number of partitions is something that somebody else has probably made the decision of. You might not even have control over it. So you on the processing side, oh, I've got 24 CPU cores. Lucky me. If there's only three partitions and you're doing ordered processing, then you're stuck. You've got three CPUs. Unless you can vertically scale the individual

Starting point is 00:45:00 message processor, then you're kind of out of luck on that. But if you care about delivery guarantees, the forced ordering only makes it, in terms of Bentos, to a Bentos user, it just means you've got to configure one extra field essentially to kind of manually determine how much parallelism you're willing to go for. So because messages aren't persisted by the service, what it's doing is it's making sure that it's never committing an offset that would result in one of the messages that hasn't been finished yet being lost forever. So the reason why you can potentially get duplicates there is because

Starting point is 00:45:40 if you choose to process messages out of order with Kafka, then obviously that means that messages that came after a particular offset could be finished and dealt with. The next service has already got them in there. They've got a new life in the suburbs, whereas some messages are hung up or whatever. They haven't been dealt with for whatever reason. You cannot commit that offset because any, any other act,

Starting point is 00:46:05 if you commit that offset or you do anything else with it, then the next time the service is restarted, you're not going to consume those messages again. So like the whole, like basically with bento sign, you have to be strict because I'm not, I'm not maintaining a disc persisted buffer or anything like that. So those messages don't exist anywhere else.

Starting point is 00:46:22 I'm using Kafka's disk persistence for that. So yeah, it's one of those things where my role is to basically document what's the symptom of doing that. Like if you want to get better CPU scaling, what is the solution to that thing? Because right now there's a guarantee that you might not want, you might not care about. So do you have any plans like, or you're considering to add like some kind of state

Starting point is 00:46:47 that would like help with this kind of situations? Or you are like, absolutely, you have absolutely decided that it's going to be stateless. Like Benfos is going to be stateless. I did actually have, so the first, before I went to version one, so for like three years or something, I did have a disk buffer as an optional. The reason why that's particularly useful is if you've got a chain of lots of services that are synchronous. Imagine you've got HTTP to HTTP to HTTP to 0MQ or something. Because of the acknowledgement system, there's no disk buffer in any of those individual components.

Starting point is 00:47:23 It means the acknowledgement has to propagate all the way up. So it's the same problem that people get with massive microservice architectures where the service that begins the request chain has to wait forever and any disconnects cause a duplicate. So I did have a memory, a memory buffer is still in, and I had a disk buffer as well. I got rid of it because I thought, well, I'm not sure anybody needs it. I just want to see if I can get away with not having it. And nobody asked for it back. So I just never, it's actually still there in the code base because I wasn't sure if somebody was using it as like a library or something.

Starting point is 00:48:00 So I've left it there just in case it's being used in somebody else's project. But it's not in the code base. And to be honest, I think I like the idea of having to solve, essentially, in order to not have state, in order to not have this operational complexity of something that a person running the service has to know about, we're using the disk for this thing, so don't delete that. And if the disk is corrupt, you're going to have to follow this step, this step, this step. And if the server crashes, you're going to have to do a backfill. And we don't know for how long. In order to avoid that, the burden's on me to make a stateless version of that same feature functional. And it normally ends up just being, I've got to be more considerate

Starting point is 00:48:43 with how I do things. So in the basic stream processing world, where it's just about acknowledgements, the burden is on me to solve having a transactional acknowledgement system and also being able to vertically scale and also being able to do things like batch sends and all this other stuff. Because when you've got a disk buffer, that stuff is easy. You write it to the disk buffer and when you're done with it, you delete it from the buffer. It's more difficult in my world because in order to do things like a nice back pressure and shutting down gracefully, all that stuff, I have to be super strict about when are we going to allow things to close and what happens if messages haven't been acknowledged when we're shutting down? How are we going to read N messages from this queue system without necessarily acknowledging them immediately? What are the difficulties there for each of the individual

Starting point is 00:49:37 queue systems? But I feel like that's my role as somebody building a generic service. That's my problem because I've accepted that problem. I've accepted the role of giving you this generic tool. And therefore, if I didn't try my hardest to make this thing stateless and easy to deploy, I haven't really done my bit. I've not fulfilled my role. If I just give you a service that's as complicated as something that you would have made easily is, and the config system is just as complicated as your code would have been, just use your code. Why would you involve me in the equation at all? I'm not doing anything for you. I'm not fulfilling any purpose here. So why do I exist? I ask myself that every day. Well, that's a whole other podcast episode.

Starting point is 00:50:31 But usually when you encounter something boring, it's because there's a lack of opinion. And so this is an ironic situation where the bore the characteristics characteristics of being boring are actually because of like extremely strong opinions that you have to have about about the architecture which is which is really interesting it's like it's it's more so it's it's super strict on the most difficult mode of operation because i've got a lot of people who use it for logging. They just use it for moving logs around from their services where they don't care about data loss. If I told them we're dropping 50% of your messages, they probably don't even care. It's just logging.

Starting point is 00:51:16 Who cares? And I don't even think they know that it's got these strong delivery guarantees because they don't need to know. Because it's one of the things where I can be, I've basically made a really strict decision to be super opinionated about something. But the important thing is that the opinion is, it's not really burdensome for anybody. It's not really a problem. And I think that's kind of where the trick is

Starting point is 00:51:40 in these generic services is to have the opinion that is least hands-on for people when they're coming in. Because if it was lossy, right, and somebody wants to deploy this and it does have a mode of being not lossy, but you've got to read a manual to do that, it's a nightmare. The burden is on you as a user to make sure that you've plugged all these gaps that the service naturally has to make sure that data is actually going to be delivered somewhere and that you're not just going to lose it on an outage that you hadn't foreseen. Whereas on the end that I'm on, where everything's super strict and it's locked down, but you do get the vertical scaling and all that stuff.

Starting point is 00:52:20 People just don't realize. People are accidentally building these really resilient pipelines unbeknownst to them maybe they're angry about it i don't know i have i have a ton more technical questions but we also have to respect the time here and we really need like to discuss a little bit about open source. So I have a question that I want to ask you about that. You described how you decided to make this project open source, right? And it's been like five years now that the project is out there. So it's been out for a while. Can you describe a little bit how the traction happened with the project or how you perceived that like the project started getting traction was something that like you tried to do deliberately or like

Starting point is 00:53:11 something that just happened because people were I don't know like organically finding about it how did you end up having such popular project today so I was really lucky, primarily. I had successful open source projects before this in the throwing a library over the wall, and it got some stars on GitHub, and people used it for stuff, very hands-off projects. And my method was just write something that I want to use, and I think is interesting, post it on the golang subreddit shout out to the golang subreddit and then it might get picked up in some newsletters and that sort of stuff and then i would leave it because once it has enough eyes only needs a few it will just pass by word of mouth was my experience and i wasn't going to challenge that experience because i hate sharing my stuff it seems ironic

Starting point is 00:54:03 because of all the content i put out, but I hate sharing my own stuff because I feel really guilty about it. I feel like I'm spamming everybody and going out of my way to force myself onto their screens. So this podcast is ironic, but that was my experience up until this point. We also have, sorry for interrupting, we also have a marketeer here

Starting point is 00:54:22 whose job is like to spam people out there, right? That's a very elegant way of describing my... Your job would give me so much anxiety. But yeah, so I had a project that I liked. Like I liked Benthos after two years. I wanted to use it, but I wasn't convinced other people would feel the same. So I was kind of reluctant to really do much with it. I think I posted it on some forums and things, but I was really lucky because being at Meltwater,

Starting point is 00:54:55 they were such a welcoming engineering community that I was kind of forced out of my shell a little bit. I was kind of pushed and encouraged a lot of like, this is cool. You should share it with more people. People should see this thing. So it kind of encouraged me to come out of my shell a little bit and start evangelizing it. That was mostly internal. Then I struggled because I hate writing blog posts and especially marketing ones. So I just didn't have the energy to go any further than that. It had organic use in the company. There were people who were, the great thing about engineers with like word of mouth marketing is that engineers churn at such a high rate that

Starting point is 00:55:40 you can go to one organization and kind of of evangelize this product within like two years half their engineering team have spread to other places and it's a virus like they're going to introduce it to all their engineering friends so word of mouth is i i think is is the main driver of bentos but there was one fateful day where i so i made a video kind of outlining the rough architecture specifically what I'd done wrong, put that on YouTube and put that on the Golang subreddit. And it got picked up by a couple of newsletters or something. And I got a bit of attention that way.

Starting point is 00:56:14 And then I tried posting on Hacker News a bunch of times, no success, no interest whatsoever. And then one day I wake up in the morning and it's on the front page and some random stranger had stolen my karma. And it was right up there and got a load of attention. And I think that was the first time where the attention was enough that after that point, I had a constant feed of new people coming in. Because obviously the word of mouth is a constant, steady growth, but you need something to boost you to the point where enough people are seeing it that you actually have enough attention. Because I did have people using it up until that point, but it didn't feel like it was enough to justify investing a lot of energy into this thing.

Starting point is 00:56:57 It was a fun hobby project when I felt like it, but I wasn't going to double down on this is definitely something people want until I saw that. I think that was kind of like a turning point where I put more effort into kind of growing it and kind of trying to build out the community and things. But I would still say the majority of the growth of the project is just word of mouth. I'm not like, I'm not paying for sponsorships. I'm not doing particularly well on, on blog posts still. So it's just, it's just stuff like this, I guess. And then people, people telling other people about it and growing the community. I think a lot of people see the graphics and then they want to show their

Starting point is 00:57:37 friends and they want to get the stickers. And so that helps spread it a little bit. We got to, well, we need to, I need some background on this. So the blob fish, right? That's what what it's that's what it's called it's a blobfish yeah okay give us the backstory i love it i mean i kept smiling as i was going through the site in the docs because i would meet a new version of blobfish uh every time and it's so great so the the all the libraries i used to make so the things that i did before benthos and probably the stuff i'll do after it as well are always accompanied with

Starting point is 00:58:12 some dumb logo because you've got to have a logo for your project right you've otherwise nobody's going to take it seriously and i i used to i used to i was obsessed with the idea of just having the most unpalatable logo for something, because it will be included when people vendor their dependencies. So the idea of companies that are serious and actually have a purpose on this planet, they're doing something important. Having these dumb graphics somewhere on their servers. I just loved it. I love the idea. One, one of them, one of the libraries I've got is, is, is a Turkey just looking glam. Like it's just looking glam and it's just a library called gaps. And I just, I love the idea of people,

Starting point is 00:58:57 professional people in a professional environment, relying on this thing and seeing that graphic once a week or something. You know what? You're probably way closer to being a great marketer than you even realize. It's definitely, I have to say, like, the more fun I have doing the documentation stuff, the better it does generally. Because I think it comes off like people love documentation that's just not very serious it's laden with dumb humor and silly quips or none of my examples are serious in the slightest they're all the goofiest dumbest examples i could possibly muster but the the graphic for benthos being a blobfish was just me finding an ugly animal or traditionally ugly animal my logo is obviously is a real and a real fish so

Starting point is 00:59:49 oh this is a controversial topic here uh so it is it's a real animal and it's got a proper name which i don't know lots of people are going to be upset about that i don't know the real name of this particular fish and it's d it's a deep sea fish so when you're looking at the picture of it a blobfish it's actually because it's it's been depressurized because obviously it's in the it's in the normal atmosphere so it's not in a particularly happy way so really my graphic is a dead fish um but i've kind of i've shied away from calling it a dead blobfish and i just i just call it a blob nowadays it's just a blob with a face yeah but that's the brilliant thing about that particular logo because it's a blob you can put it into all kinds of different form factors sure and different

Starting point is 01:00:37 designs different shapes it's perfect for marketing materials and swag now who designs the different because there's a lot of variations of the blobfish yeah who does there are who's the mastermind i i do the bulk of them i make the bulk in fact i'm i'm i'm the brain behind all of the different variants and their particular equipment it's normally topical it's normally you know for a particular example and then my wife has graciously helped me out with a couple of them she is a graphic designer and she does that begrudgingly because she doesn't like my she doesn't like the blah the blah she thinks it's a mockery of her career well there's there's no need to dwell on that. It sounds like she's very supportive.

Starting point is 01:01:26 She's supportive, but she's not happy about it. Yeah. Well, this has been so great. We're at time here, but this has been a wonderful conversation. Really quickly, if someone wants to check out the project, where should they go? benthos.dev.

Starting point is 01:01:43 And if you want to hang out, there is a Discord. There's a link at the top, community, click that. It'll take you to a bunch of links. You can either join the Gophers Slack where I've got a channel on there, or you can join the Discord server, which is all ours. That's where you can find BlobBot, the famous Discord bot, and me as well, and the fabulous Benthos community is there as well. Great. And if someone were really motivated to get Blobfish stickers, how do they do that? Do you have to make a commitment? There are ways. There are ways of getting Blobfish stickers.

Starting point is 01:02:16 If you do a blog post and let me know, then you'll definitely, it doesn't have to be related to Pentos. You just do a shout out at the bottom of your blog post. Hey, by the way, pentos.dev, I'll give you some stickers. I'm good with that. But you have to give me your address address i don't know if people are going to trust me with their address well you're open source and your logo is a blobfish so that seems innocuous i'm on the internet yeah i'd much more readily give you my address than maybe like a marketer or someone yeah from from my perspective I think that's the wrong call,

Starting point is 01:02:47 but we'll let people make their own minds up. Awesome. Well, Ashley, this has been a really wonderful show. Amazing, amazing project. And best of luck as you continue to build out. Thank you very much. Thank you for having me. It's been fun. There are so many things from that episode that stick out. But as I rolled it around in my mind, I think the thing that stuck out, which we didn't

Starting point is 01:03:11 talk about explicitly, but the world of data internationally is so big. I hadn't heard of Bentos before we started prepping for the episode, which isn't a huge surprise because I'm not necessarily the target audience, but there are just so many teams working on so many different data products at so many different companies. And you have a tool like Bentos that's being used at, you know, large organizations solving pretty critical problems. And it was just a good reminder for me of sort of the breadth of the entire market and how important data has become at every type of company. So it kind of, it just made me step back and appreciate that because a lot of times you see sort of the usual suspects in terms of names around data processing. Kafka has talked about a ton and all of these different tools.

Starting point is 01:04:04 And to see a project like Bento's having an impact, it's like, man, it is really a big world. And there's so many different cool products out there. And I love learning about the specific ways that, the specific problems that Ventos solves. Absolutely. And it's especially interesting with Asli today, because if you remember at some point, he mentioned that when he started working on this project, his title wasn't data engineer because data engineer was not a thing back then, right? While today, everyone is talking about data engineers. So yeah, it's very interesting. There are many tools and there are many tools that are actually exist because someone had the need inside the company

Starting point is 01:04:48 to automate their job and get like more time to work in more interesting things. Exactly what Ashley was talking about, right? And that's like, I think, part of, let's say, the charm of like engineering, software engineering in general.

Starting point is 01:05:03 I don't know. I really enjoyed the conversation today. I think Asli is like an amazing person. He's a much better marketeer than he thinks, by the way. I think with... Totally agree. Totally agree. I mean, the work he has done with the logo and all the content that he has created and

Starting point is 01:05:24 everything, like it's amazing. It's amazing. I would encourage everyone to go and check the website, bentos.dev. A lot of cool stuff, technical stuff, but also overall, it's a great experience.

Starting point is 01:05:40 Even if you don't need a tool like Bentos, go and check it out. It's amazing and I hope that we are going to have more time to spend with him because he's a treasure of knowledge around these kind of very complex systems. And we have many more technical discussions to make with him. So I'm really looking forward to chat with him again in the future. Absolutely. That's the show for today.

Starting point is 01:06:02 Give us feedback, Eric, at thedatastackshow.com. And we'd love to get your feedback and any questions that you have about any of the episodes. And we'll catch you on the next one. We hope you enjoyed this episode of The Data Stack Show. Be sure to subscribe on your favorite podcast app to get notified about new episodes every week. We'd also love your feedback. You can email me, ericdodds, at eric at datastackshow.com. That's E-R-I-C at datastackshow.com. The show is brought to you by Rudderstack, the CDP for developers.

Starting point is 01:06:39 Learn how to build a CDP on your data warehouse at rudderstack.com.

The Data Stack Show - 60: Architecting a Boring Stream Processing Tool With Ashley Jeffs of Benthos

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.