The Data Stack Show - 60: Architecting a Boring Stream Processing Tool With Ashley Jeffs of Benthos
Episode Date: November 3, 2021Highlights from this week’s conversation include:A brief overview of Ashley’s background (2:47)Benthos’ creation and the problems it was meant to address (4:01)Use cases for Benthos (18:25)Key f...eatures of Benthos that make it stand out (22:23)Adding windowing to Benthos for fun (29:23)The highs and lows of maintaining an open source project for five years (32:17)The architecture of Benthos (36:23)The importance of ordering in streaming processing (42:15)Gaining traction with an open source project (53:21)Benthos’ blobfish mascot (58:03) The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Transcript
Discussion (0)
Welcome to the Data Stack Show.
Each week we explore the world of data by talking to the people shaping its future.
You'll learn about new data technology and trends and how data teams and processes are
run at top companies.
The Data Stack Show is brought to you by Rudderstack, the CDP for developers.
You can learn more at rudderstack.com.
Welcome back to the show.
Today, we're going to talk with Ashley Jeffs,
and he is the creator and maintainer of an open source project called Benthos,
and it is a stream processing service.
And it's a really, really cool tool,
and in many ways has a lot of alignment with what
Costas and I work on in our day jobs, stream processing service written in Go and does a
bunch of interesting things. I actually, Costas, have some technical questions because after
reading through the documentation, he's made some decisions that are fascinating to me,
maybe because of my lack of knowledge. But I think I'm most interested to know what it's been like to maintain an open
source project for five years, especially dealing with something that's pretty complex. It's not a
JavaScript plugin, or not that those are immaterial. But when you talk about stream processing and
integrating with services like Kafka at very large companies, you're dealing with some pretty heavy duty technology. So
I'm sure that the emotional rollercoaster of doing that for a long time has been
interesting and many times we don't get to see that. So hopefully Ashley will share a little
bit about that with us, but what's on your mind? Yeah, two things, two topics actually.
One, of course, we have plenty of technical stuff to discuss about. Streaming processing
system is not the easiest thing to engineer. So there are many trade-offs and many decisions that
you have to make there. So yeah, I'm really looking forward to discuss the technical side of things. And then,
of course, I'd love to hear his experience of being a maintainer of an open source project
for five years. And from what I understand, he's the main and more than 98% of the contributions
come from him. So he's very engaged with that. So it's going to be super interesting to hear from him how he does this and how he keeps
himself motivated and all those things.
Well, let's dive in and talk with Ashley.
Yeah, let's do it.
Ashley, welcome to the Data Stack Show.
There's no way we're going to have enough time to cover all the topics, so let's just
dive right in.
Give us just a brief background on you, brief overview of your career and then what you do
day to day. Hi everyone, my name is Ash. Thanks for having me on the show. So I'm the core
maintainer of a project called Benfos, which I've been doing for about five years. It's a data
streaming service. It's decorative. And the idea is it's this operationally simple thing. And I
started working on that around five years ago
after working in sort of stream processing industry, which I've been doing for about eight
years. So this is, I didn't used to call myself an age engineer because the term didn't really
exist, but obviously that's pretty much what I consider myself to have been that whole time.
And now that's pretty much my job is just working on this project kind of indirectly,
but yeah, that's my job basically.
Okay.
Well, I want to hear more about that.
One interesting side note.
I don't know if you've looked at the Google search trends for the term data engineer,
but it's crazy.
It's like a hockey stick over the last five years, which is really interesting.
You can see like, okay, this is kind of people were trying to figure out what to call this discipline. And then of course it's like formalized now.
Well, tell us about Aventus. You started working on it five years ago. It's a really cool tool,
but tell us the details on what it is, what it does, and then especially why you ended up
creating it. So I kind of built it defensively. It's got two main focuses as a project. If you
kind of look at it on the website, you have a quick five second glance. It's basically
YAML programming, a stream processor. And the idea is that it's operationally simple.
What I mean by that is the whole premise of this project is that it's super correct in every possible way in terms of data
retention and back pressure and trying to be the least headachy item in your streaming platform.
And architecturally, that's quite difficult. And it's been a main focus of the project
pretty much since day one. And that's because I was kind of working in
a position where we were basically inventing the same product over and over again. We had this
entire platform of a service that reads from something, does something to it, that's usually
a single message transform, maybe some enrichments hitting third-party APIs, that kind of stuff.
And then it would write it out somewhere. And we were plagued with development effort put into migrating services
because they were all slightly different.
And these weird combinations of different activities that each one was
responsible for.
And we were just constantly rewriting these things to slightly change their
behavior, recompile it, redeploy it, go through all the testing, hassle, that kind of stuff. So I was in a position where I was
kind of desperate for something to just be dynamic in that you can drive that through
configurations for declarative. Because these are usually just simple tasks. It's filtering,
transformations, some enrichments, and a few little extra bits in between maybe like some custom logic that you
can plug in and stuff but for the most part it was just stuff that you could just describe in a
couple lines of config but we just didn't have that tool so i kind of went on this weekend warrior
effort to build what i would consider to be a solution to that problem. Our perspective at the time
was that data was super important. It's basically our product. So delivery guarantees were very, very strict. And also we were using Kafka all over the place. So this was about eight years ago.
So Kafka was, I think it was like version 0.7 at that point. We were early adopting it and slowly
migrating it through this platform. And my take on it was if this thing is a disk persisted,
replicated service that we're putting all this effort into operationally running,
why would I have a service that has a disk buffer that is also
operationally complex like if you get disk corruption or some sort of failure then it's
it's a single point of failure that could introduce data loss in your system potentially
forcing you to do things like run backfills so why don't we just have a service that doesn't
need anything like that it's it's always going to respect the at least once delivery guarantees
without any need for extra state. It's just going to do that based on what offsets it's committing
and basically what you would call a transaction and what is effectively the Kafka Streams API.
So what it's supposed to be doing is making sure that you never commit an offset that you haven't
effectively dealt with that message that's passed on forward. So you don't need a disk buffer to have that delivery guarantee.
And then the other piece of that puzzle was making it simple to use. So the idea is that you can
slap a config together to create a pipeline. So this service is reading from Kafka,
performing some sort of filtering, and then maybe applying some sort of masking,
data scrubbing, whatever, and then it's writing out to NATS or 0MQ or something. I can then take that config,
commit it into a repo. And if somebody comes up to me and goes, oh my God, no, we need to stop
writing to NATS. We're going to change this to RabbitMQ now. Or this filter needs to change.
We need to change the logic for that. I can just say, here's the config, change that, it's two lines, I can review it and then go.
And to me, that was my way of ensuring that I would get to work on more fun things, like
the actual stuff that I wanted to be doing in my day job.
And then obviously that naturally progressed to me only working on the boring stuff, because
now I'm the maintainer of the service that's doing the boring stuff.
And that's where I am now.
An attempt to journey into the exciting that ends with a continuation of the boring.
Inevitably going down the rabbit hole of boredom.
Yeah. Well, you know, I mean, one thing actually that one thing we've talked about in the show actually with several guests is that, and I loved as I was digging into
your documentation, you say in multiple places, a defining feature is that this is boring. And
we've had multiple guests who have built really large scale systems and we'll ask them about it and they'll say, it's kind of boring,
but it works really well and it's extremely reliable. And so that really resonates with me because that's something that we've heard a lot. One question for you, and I know we're going to
dig into this a little bit later, but there was certainly a point at which you made a decision
for the project to be open source. I mean, sometimes when you're
building this, especially to solve a problem that you're dealing with inside of a company,
it can sort of be IP that exists inside of that company. What motivated you to decide to go open
source with the project? Yeah. So everything that I did in my spare time was like a learning exercise. I would always make open source.
And that was just a habit of mine because I was,
thing is if I was planning to make something open source
and know that it was going to get attention,
I wouldn't do it because I was so shy or so nervous
about somebody actually looking at my code.
But what I did is because I was so cynical
about nobody's ever going to look at this,
nobody's ever going to know I put this on GitHub.
I would put all my little hobby projects on GitHub.
So the idea of making open source from the onset was obviously like a nervous exercise
for me, but it's also the excitement of maybe this is going to help somebody.
But the main reason why, so I mentioned that I kind of built this thing defensively for
the company I was working at.
There was just so much going on at the time.
So this was a company called Datasift.
And they were basically selling these firehoses of social media data, the biggest one being Twitter, and then filtering logic on top of that.
So it's like lots and lots of stream processing back when everybody else was talking about Hadoop as being big data.
We were basically processing the Twitter host constantly for hundreds of gigabytes of
customer filtering data. And it was this huge platform with all this stuff going on.
And we were having to work pretty defensively to keep this thing going because our requirements
were changing quite frequently. Because I don't know if you've realized this, but working with social media companies as a partner can sometimes
be a little bit turbulent and they can do things like cut you off randomly and force you to pivot.
Or change APIs or change data without.
Yeah. Or we just don't want to work with you anymore. Bye. Your business is kaput. Sorry. Oh, that's awkward. So yeah,
we were constantly having to churn what the platform was capable of and the teams were
amazing. The engineering staff at DataSift were fantastic, but it's still this huge effort and
you've got all this technical debt because you're constantly having to change all these services and
what they can do and all the capabilities and stuff. So there wasn't any
capacity really to work on something like this on company time. And in all honesty, I was working on
it in my spare time for two years before it was really viable. Because at the end of the day,
if you've got bespoke services that are built to do a specific task, to replace that with something generic is just going to be a challenge.
To build all the basic stuff needed to have a dynamic system and then to get it to perform in terms of stability and throughput, latency, that kind of stuff, is this massive effort.
I didn't know that when I started,
otherwise I wouldn't have started it, but then it just, it naturally progressed. It was,
it was a hobby project that nobody was really interested in. And then two years later,
I come back to the company. I'm like, Hey, this might be usable now. Can we use this please?
And it had already kind of got a bit of a life on, on GitHub at that point. So
it just kind of carried on that way. Sure. And did the company end up using it?
Oh, yeah.
So they used it a fair amount in a few places
where it was an immediate solution to a problem we had.
We didn't just like nuke all the other services on the platform.
It was a very careful effort of we'll slowly roll this out
in places where we were going to have to do some changes anyway.
And then what happened is the company got bought by... Okay. we'll slowly roll this out in places where we were going to have to do some changes anyway.
And then what happened is the company got bought by, and it was awesome because we had this streaming platform and the idea was we were going to sort of use that technology throughout. They're
a very data heavy organization and they have a load of different teams all working in completely different ways. Yeah, they're a big company and their products are pretty cool.
Yeah, the engineering teams there are fantastic.
And the thing is, they're geographically distributed.
So they all do things slightly differently.
They've got slightly different best practices of how they work with their data,
or they did at the time.
They're probably more consistent now.
But yeah, so I had an opportunity then to go to all these different teams and say,
hey, you're looking to interact with our streaming infrastructure. Here's a tool that can,
rather than being blocked on us as a team, enrolling you on this and getting you onboarded
with all this infrastructure changes and things, why don't you just run this thing yourself and
you can do it in your own time. And we're not even in the loop. This service will allow you to interact with all of our stuff,
hit these enrichment services, all those things. And it took off. So again, it took a bit of time
because you come to people with this generic service. And I think because of the... I mean,
it's open source and it's a generic config driven service.
So immediately people start thinking, is this going to be like log stashes?
Is it going to take two minutes to start up?
Am I going to rip my face off over the config format, that kind of thing?
So people are quite skeptical.
So it takes a while to kind of demonstrate to people that you're going to get value out
of this.
You're going to like it.
And I kind of became like an internal evangelist for, you can use the service for this thing,
this thing, this thing. And when people had use cases, I immediately jumped on it
because that's the bread and butter of the project. It can't continue if I'm not constantly
seeing new use cases and new problems to solve. So I kind of tried to nibble on as many use cases as I could.
Do you think that part of that also, I mean, you have an interesting perspective in that
you got to have a, you had very practical experience with streaming almost coming of age,
right? Because back when you were using Kafka,
the idea of streaming as you're talking about it is actually still pretty novel, right? In terms of
the technology. So do you think also to some extent, the adoption of streaming technologies
is a little bit hard, like evangelizing use cases in part just because streaming was still younger
to an extent yeah so i i kind of it was kind of weird because i i started working with some teams
and basically got benthos to work in a batch mode because there were use cases where it was like
we've got an s3 bucket and we just want to consume the entire bucket and then write it to Kafka because all the other teams are using Kafka.
So it was one of these situations where I didn't really think about it at the time as like,
oh, they're using batch. This isn't a batch product. I can't do that. It was more just a
technical problem. That's pretty easy to solve. Basically, it's an input, just like any other
streaming input. Once you're finished, the bucket is exhausted. You shut down gracefully. It's problem, that's pretty easy to solve. Basically, it's an input, just like any other streaming
input. Once you've finished, the bucket is exhausted, you shut down gracefully. It's not
massively complicated. So there was an aspect of you have to do stream at this company because
that's the data bus. That's the data infrastructure of this company. We cannot do what we want to do
in a batch way. The volumes are just too big. So this is how we're going to solve that problem.
And I mean, nobody at that company that I interacted with was particularly intimidated by Stream.
They were all excited to play around with this new tech.
And then the thing is after that, so the project kind of grew externally and more organizations started adopting it.
I have never been in a position where I've had to convince anybody to use streaming because they're just coming to me.
They've already got this infrastructure and they're looking for something to solve
particular problems they have, and they're stumbling upon it. So if somebody asked me,
how do you convince a company to adopt stream? I've got no idea. I have absolutely no clue how
to do that. Well, that's a really helpful perspective.
And I think, especially in the context of social media data, and I think some of the
other components of things that Meltwater provides as sort of data products, I would
guess actually, now that I think about it, the demand for streaming was probably unbelievable
because when you're dealing with that nature of data, social media platforms,
streaming real time and getting updates as soon as you can to see trending is probably super
important. Well, I have been monopolizing this. I have a million more questions, but Kostas,
we talked about some really interesting technological questions. So jump in. I know
you're talking a bit. Yeah, it's been a very interesting conversation. All right, let's start and try to dive a little bit deeper into the technical side of things.
And my first question is, can you give us a typical setup, including Benzos, how it
fits with, let's say, the pretty much standard data stacks that we see out there?
How do you see it deployed?
It's often used as a plumbing tool.
So you imagine you've got Kafka infrastructure.
It's very often Kafka that people are using it with.
There's also MQTT.
It seems to be a growing use case.
But it's normally a company that's already doing some stream work.
And what they've got is they've
got some services. They've either got other queue systems that other teams are using.
So we want to share data with some team from another company. It could be just another team
at their company. And they just do things differently. They've got a different schema.
They've got a different stream technology, whatever. And they just want some simple tool that they can just
deploy. They don't want to invest too much time into this partnership. Maybe it's a temporary one,
maybe it's going to change over time. So they just want something now, it's going to solve
that problem. They don't have to think about it. It's automatically going to have metrics and
logging and that kind of stuff. It's low, low effort basically. And then what tends to happen
is you start using it that way sort of defensively. And then you realize, oh, hang on a minute.
We've got this other service that's just reading a topic and then doing some HTTP enrichment,
or maybe it's calling some Python script or something. And all it's doing is taking a payload,
modifying it slightly, and then sending it on with somewhere else. We could just do that with this Benfos instance.
So why don't we do that? And then it just kind of slowly grows from that point where you delete
a project that you had to maintain and you've replaced it now with a couple lines of config
and it all fits in this one service that's kind of neat.
You can deploy as many of them as you want because it's stateless.
It's just low effort.
So it tends to be, to begin with, just a silly plumbing mechanism from one thing to another.
Maybe it's just a bit of filtering or something that somebody wants and then they slowly grow.
Maybe eventually people branch them out into different deployments with different configs and stuff, but they'll be doing the things. I tend to call it
plumbing. I don't really know if we've got like a good term for it in data engineering, but it's
not a clever task normally. It's usually single message transforms and integrating with different
services. So you might be hitting like Redis cache or something to get some hydration data based on
like a document ID or something.
Or maybe you're hitting a language detection service on some of the content of a message,
that kind of thing.
And then enriching the data with that, that kind of stuff.
But it's stuff that can sometimes be considered to be quite complex problems.
And the reality is it's not.
It's just an integration problem. And you can put that in a nice config.
And then when things change, when somebody says, hey, our service is going to change,
we no longer support that field or that thing, or here's the new schema, then you just do
a quick change, commit that.
And it's simple to test.
That's very interesting.
So how, you said that like, it's very common to see it working together with Kafka, right? So Kafka, okay, there's a whole ecosystem of like tools around it, right?
How it's used together with stuff like Kafka Connect, for example, right? Which has like
its purposes in a way, like to connect Kafka with other services or without other streams, then you can deploy technically some,
let's say, processing logic on top of Kafka.
So you can process the data.
It sounds like on Kafka, you can do everything inside Kafka at the end, right?
Or at least that's what Confluence wants to happen.
So why do you think that someone who already has invested in the Kafka infrastructure,
they would also use Benzos?
I understand after you start using it, why you keep using it and increase the use cases
that you cover.
But what's the first thing that will convince someone to start using Benzos?
Does it make sense what I'm asking?
Yeah, I think so. So I think the main selling point, I think if somebody's got, they've got
JVN components and maybe they've got Kafka, maybe they're using Apache Camel or something,
and then they've got some other logic on top. I think what tends to happen when people
pick Benthos, I mean, it's kind of difficult to summarize because I don't get an awful lot
of feedback from the community often, but it's normally an engineer that's making the decision.
It's like a data engineer in this context.
And I think their main frustration is they don't like building stuff.
They don't like having a build system for these transformations they've got, especially if it's really simple stuff and especially so if they
have to change it often. And they don't like the weight of some of these components. They're a
little bit clunky. They're a little bit awkward to use. They want something that is more friendly
to an ops person. So if you're on call and you're waking up at 3 a.m. and something has happened,
maybe a server has crashed or something, and it's part of your infrastructure, and you're waking up at 3am and something has happened. Maybe a server has crashed or something and it's part of your infrastructure. And you can see in your graphs that you've had
some sort of outage, like the horror stories of some of these components and waking up and
thinking, oh God, now I've got to recover all of these different things. So what the problem is,
when they see this product, it's just a single static binary. It doesn't have any state. You
can restart it on a whim.
In fact, you can restart it constantly if you want.
There'll be no data loss. When they see an outage, it's a simpler problem because you don't have to coordinate a backfill.
You don't have to coordinate all these components slowly coming on over time.
They probably already restarted if your infrastructure is set up for that.
And you can just check on the graphs, the metrics and things that it's worked.
If there's a problem,
you've deployed something that is broken,
then it's just a config change.
So anybody can look at that
and get some idea as to what's going on.
They're not reading code.
They're not looking at something
that got committed to some CI system
and it was a full build that got deployed.
They're just looking at a config change
that got deployed.
So maybe there's like some mapping or something and they can just roll that back if it looks wrong, build that got deployed. They're just looking at a config change that got deployed. So maybe there's like some mapping or something
and they can just roll that back if it looks wrong,
that kind of thing.
So I think it tends to be engineers
that are making the decision.
And obviously a lot of Go developers,
I didn't mention that, it's written in Go.
So a lot of people who are already writing Go services,
it's a natural win for them
because they can write their plugins in Go
rather than Java.
But in terms of feature set, it's a lot of overlap with a lot of products that already exist in the Java ecosystem and are more popular. They're more widespread. So I've never gone
after people making those deployments. I would never tell somebody if you've got a happy system
that you're using and it's using all these products, I would never tell them you should ditch all that and use this thing. And if you've got a bespoke
service that you're happy with and it's doing all this stuff and it's your code and you're building
it, keep it. If you're happy with it and it's solving the problem, then you should definitely
keep that thing rather than replacing it with this weird thing that you've never seen before.
Yeah. But it's more, it's this trade-off between deciding what you want to work on
and what are your priorities as a team.
The declarative side of things is also quite important.
Like I think it fits much more naturally in the workflows that like engineers have.
You mentioned like quite a few times, you can write like a config and I can review it, right?
This thing that I can review it and then we can move fast and we deploy things and we
can change things faster.
That has a crazy value when you're talking about an environment that needs to be alive
all the time and at the same time you have to create the new logic that you need because
things are changing constantly and all that stuff.
I think that's also a very interesting part of the data engineering as an engineering discipline,
because it's this kind of crossover between software engineering,
but ops at the same time,
you have all these different like facets of the,
that you have like to do at the same time.
And you really have to pick the best from each one and try like to create
tools that they combine the best practices from there.
So I think that having this declarative way of describing what should happen there, it has amazing value. So I can understand
that, especially having worked with a JVM-based infrastructure. So how would Benthos compare to
other streaming processing platforms? What are the differences and the similarities between the two?
So Benthos is much more focused on single message transforms.
So you get a single payload and you're doing something.
You might have a batch.
You can do batch processing.
So say like consumer window of 100 messages and aggregate them.
But it's bread and butter really is single message things.
And the reason why I've focused
on that is because at the time, that was the problem that I had was just single message stuff.
And there wasn't really an awful lot of attention on that in the product space. We already had Spark
at that point, which was already solving the problem pretty well from what I could tell.
I hadn't used it, but it seemed like, okay, windowed aggregations, that's a solved problem. We have a tool for that. And what's the nice thing
for masking, filtering, transformers, enrichments, hydration, that kind of thing. So I think if I was
going to compare it to these products, I would say it's probably more similar to Apache Camel.
And obviously Kafka Connect as well, to an extent.
And then the main difference is that it's kind of decorative from the onset.
People like saying cloud native nowadays.
But basically it can be deployed in Kubernetes essentially without much hassle and that kind
of thing.
But then Camel's got CamelK now. So I mean, those services are becoming
nicer to deploy, but not like the kind of things that you could do with the Bentos config.
With the way that the config is structured, you could do crazy things. You can have multiple
inputs fed into a single pipeline with their own processes and then have joined processes.
You can have multiplexed outputs switched on then have joined processes. You can have multiplexed
outputs switched on the contents and messages. You can have fan out all these different brokering
patterns around Robin. You can have dead letter queues for processing errors and also when outputs
come offline and all that kind of stuff. So it's much more centered on plumbing, which is why I
kind of put it in the sort of camel category even though it it is a stream
processor does stream processing so you know it tends to get compared a little bit more with like
flink and stuff it can do windowed processing but that's not really what it's for it doesn't have
state necessarily it does window processing just by keeping it in memory and only committing
offsets when that window is flushed. So it's not...
I haven't done any performance comparisons in that place because it's kind of experimental
at this point, but it can do it.
I wouldn't sell that feature of Benfos at this point.
Yeah.
All right.
And then why did you decide to implement windowing on the platform?
Same reason I did most of the stuff.
I just thought it'd be fun. There's a lot of stuff in Benthos that, because it's called a
stream processor and people will look at it. And what I reveal on the front page is a stream
process. Reading from a streaming system does some stuff, writes it somewhere, but there's a lot of
stuff in there that does not fit the stream processing category. You can use it as an HTTP gateway if you wanted to. It supports request response.
I had to put that in because of NATS and also Xero and Qth, stuff like that. So it's always
had the ability to do responses to inputs. So you can just hook it up as an API gateway.
It has an API for dynamically mutating streams and having multiple streams.
You can use Benthos to drive
itself. There's like loads of stuff in there that doesn't really fit the category. So I thought,
well, I might as well put windowing in there as well. It's really fun to just hear in the world
of technology and data technology, especially when you think about sort of like San Francisco
based companies that are, you know, trying to become really big. There's a lot of talk about product strategy and all this sort of stuff. And it's so wonderful to hear,
like, I did that because it'd be fun. And that just brings me great joy, Ashley.
It is a survival mechanism to an extent, because you're doing a lot of this stuff on your own steam. I'm maintaining this project just on my own will.
So in order to do that, you have to have fun.
There is no way of maintaining an open source project, especially in the early years.
It's just not possible if you don't enjoy doing it to an extent.
Or at least I wouldn't want anybody to suffer that experience if they didn't enjoy it, because
there's no guarantee of anything with, with, especially with open source, but also any
business running a business is the same thing. There's, there's no guarantee that it's going to
end up anywhere where anybody's going to use it. It could fizzle out. It could disappear.
You could just get burnt out and not want to do it anymore. So if you don't enjoy it,
then what's the point? Like there's no,
there's no point in it. You're just punishing yourself. Sure. One question there, which I'd
love to just, I think they're looking in from the outside. Sometimes it can be hard to tell
what the actual experience of building and maintaining an open source project like benthos is like but could you just tell us
about some of the highs and lows over the past five years and sort of you're you're basically
working with and on and consulting around benthos full-time now but what are some of the highs and
lows that you've been through as you've as as you've maintained the project, which by the way,
I think also congratulations are in order because that's a long time for a project that is still
being used at large companies. So congratulations, because that's a huge accomplishment.
Thank you. I appreciate that. So the highs are hearing that it helped somebody that's like,
when somebody gets excited about the fact that it solved this
issue for them, I get a deep satisfaction out of that. And you don't get it an awful lot with
open source because at the end of the day, most people are going to silently download it, use it,
and you'll never hear from them again, especially if they're happy. The happier they are, the less
you'll hear back from them. And I'm not judging
anybody for that. I do the exact same thing. I can't complain because I use loads of open source
projects and I'm not emailing the maintainer going, oh, I really enjoyed your fun today.
What an unvirtuous cycle.
So those are the highs when somebody actually bothers to say, Hey,
this really helped.
We can now focus on this thing that we want to be doing. It got rid of all these issues for us. Thank you for, for,
for making this thing.
Or if somebody asks for a feature and I get it out to them quick and they're
so thankful for it. Oh my God, that's amazing. Thank you so much.
Especially if it was low effort, if it took me like five minutes and they're like,
oh my God, that's amazing. You're incredible. I get a lot of satisfaction out of that.
The lows are obviously bugs. If somebody has had a bug and they've had some sort of suffering,
the behavior hasn't been quite what they expected or something's broken or whatever. I think, so I have a thing. I can't just leave a bug.
I'll tag it, I'll label it on GitHub as a bug and it gets closed that day. I can't deal with bugs
being known and not dealt with. And that's mostly just, I just can't handle it. I won't be able to
sleep, which is, it's great because it means that I deal with them.
I don't have a backlog of bugs that are constantly getting worse or interacting with each other,
that kind of thing. But obviously that has a toll. Sometimes I just want to enjoy my evening
and a bug arrives and now that's my evening. There's nothing else I can do about that.
But they don't, to be honest, when you deal with bugs really quick, it does have an effect.
I think there's obviously lots of blogs out there about dealing with bugs as a team and stuff and how you should prioritize them and all that stuff.
And I think that obviously I wouldn't say to everybody deal with bugs as soon as they're known because that's just not practical.
But it definitely has had a positive impact on the
project. The other thing is whenever anybody has a question, if it's a question that isn't already
answered in the documentation somewhere, I consider that a bug and I will try and make an effort to
fix that either with a guide or fleshing out the component docs or something, making some example
or whatever. And that has been positive because obviously as a solo maintainer,
you only have so much time.
So you can't be answering questions constantly.
So it's a defensive move in a way to always treat big questions as a bug
and just deal with them quick.
But those are the lows because I have to deal with it.
And it's me.
It's a personal issue with me. I could,
I could get a therapist and I could deal with that.
I've chosen not to at this point because it's not,
it's not that big a problem. It's not as if it's every evening.
I get like a bug a month or something. So it's not, it's not.
I guess if you were constantly missing dinner, it would become an issue. And then maybe, maybe you would call the therapist.
My wife would not have that. She would not have me missing dinner.
What I would do is I would go and eat begrudgingly and then I would come back.
Doesn't get in the way of family functions.
Yeah. Yeah. That's great.
That's great. So, okay, let's go back to the technical questions again.
And then we can come back to open source because
we have quite a few questions to ask there. Let's discuss a little bit about the architecture of
Benthos, how you architected Benthos and what are the main components and give us a little bit of
insight of the choices that you've made in the trade-offs there and why.
Cool. Okay. The main premise of benthos as an
architecture is that it's i kind of called it transactional model transaction means a lot
lots of different people now unfortunately because i i used it as a very general term at the time but
basically all inputs in benthos obviously there's lots of them and they all have different paradigms
for how to deal with acknowledgements and things.
And obviously Kafka being the one that's most different to all the others in that it's just a numerical commit.
But basically every input within Ben gets wrapped in a mechanism for
propagating an acknowledgement from anywhere else in the service back to that input where it knows
how to deal with it. And then it pushes it down a pipeline, which is Go channels. It took about
Go for hours, but basically Golang channels are used heavily as a way of essentially plumbing different layers of the service.
Because it's dynamic, there could be any number of processing threads for the suite vertical scaling.
There could be any number of different inputs feeding into one or more outputs.
So what happens is the message gets wrapped in a transaction.
It gets sent down a channel, which is also the mechanism for back pressure.
If there's nothing ready to deal with that message, it can't go anywhere. And then essentially that makes its way downstream. So it goes through a processing layer.
They receive transactions of messages. They actually receive a message batch,
but usually if you're reading a non-batched source, then it's a batch of size one.
But all the processes can do whatever they want on it. If they filter it intentionally, so it gets removed, they call the acknowledgement. And
then the input will do things like send that acknowledgement directly back to,
if it's Google PubSub, then it will act that. If it's RabbitMQ, it'll act it. If it's Kafka,
it'll mark the offset as ready to commit. The important thing with Kafka, I'll go back on that
because there's a whole topic around how the Kafka input works. But basically, it eventually makes its way to the output layer. The output layer could be brokered. You could have multiple outputs. They could be of composed. So they're generic components in themselves.
You can compose brokers on brokers on brokers if you want to, but they are responsible for essentially enacting the behavior that a user would expect by default. So if it's
a switch multiplexer, you've got five outputs, a message gets routed to three.
The message is not acknowledged until those three outputs have confirmed receipt. Obviously,
some outputs are better at that than others.
And obviously, you can tune them to an extent.
So with Kafka, you can tune whether or not it's reporting all the replicas were written
to or not.
But basically, you have some way of knowing that the message is successfully written somewhere,
then it gets acknowledged.
And then it's up to the input to do whatever.
So most inputs, so for the easy queue systems like NATS and GCP PubSub and stuff,
where ordering isn't as important, people don't really consider that when they're processing
messages from those. You can just keep pushing messages down the pipeline. And if there's
capacity, then it'll get processed. If there's back pressure on the output, naturally it makes
its way up to the input pretty quickly. And then when it's freed,
the components gracefully resume. With Kafka, by default, topic partitions are processed in
parallel. So if you've got 10 processing threads and you've got 10 topic partitions, you've
potentially got 10 threads saturated. Not necessarily if they're not balanced well, but in theory, you've got 10.
But messages of a partition are processed in order.
So your options there are you can batch them
and process multiple messages of a topic that way.
Or what you can do is you can increase,
I call it like a checkpoint limit.
But basically, how many messages are you willing to process out of order?
And what I do there is I keep track. So if you say like, we want to be able to process a hundred messages
async, whatever order, we don't really care about that. We just want to process them fast.
I limit the number of messages and I track which offsets we've actually acknowledged.
And I will only commit up to the point where all the messages
from that commit number down have already been acknowledged. So there's potential there for
duplicates. So say you process 100 messages, the first one that went through the pipeline,
for whatever reason, hasn't been acknowledged yet because it's blocked somewhere. All the others
have, well, guess what? None of them have been committed yet until that final one has gone.
And that ensures that when you restart the service, you don't get data loss.
But then the trade-off there is that you could potentially get duplicates next time you restart
it.
So it's like the difficulty with a service like this is finding the common mechanism
that's going to satisfy all these different input types.
They're all different ways of handling acknowledgements and
what they're typically used for as well. Because obviously some people might want to do
ordered processing with a key system like NATS, but then most people don't really care. So
you can kind of enable it, but by default, you're just going for throughput and vertical scaling.
Whereas Kafka, typically people care about the ordering and they want to do batched processing of some type. So you kind of manage it that way. But essentially what I've got
now, I've had to refactor the components multiple times to make sure that I could do all this stuff.
But basically they all kind of fit their own paradigm now. And yeah, I think I probably
missed a million things there. It's it's fine, it's fine.
But I have a question.
How important is ordering
based on your experience
with streaming processing?
That's a good question.
So I do, so for me personally,
it's never been an issue
because I've never worked on a system
that actually cared.
In event sourcing land,
then it's super important, I would imagine. I've had obviously people come to me and have a discussion
about how can we guarantee ordering? What about in the event of failures and stuff? If we're
retrying messages, how do we guarantee we're getting the right ordering and stuff like that?
And I mean, it's a complex problem to make sure in all cases, every single edge case, you've definitely got the correct ordering.
But I think it is possible, just like the perfectly secure system is possible.
But yeah, I think it is doable.
But I think mostly I would attribute that to event sourcing.
So you're processing a stream of actions and you need to make sure that they're done in the right order because it has obviously an effect on the outcome. But yeah, to be honest, I would have normally traditionally described
as a system where it probably doesn't matter because you're doing single message transforms
anyway, enrichments and stuff. But then obviously if you're using it to bridge between services and
something downstream does care about ordering, then obviously it also has to respect ordering.
So I did opt. I think some services have gone down the path of not really care about ordering, then obviously it also has to respect ordering. So I did opt,
I think some services have gone down the path of not really caring about ordering too much.
And maybe there's a way of dealing with it. I am tempted in the next major version bump to
reconsider whether or not I make it default because obviously it does make scaling easier
for people. If just by default, it doesn't really care scaling easier for people if just by default it doesn't
really care and it's letting you use however many processor threads you've got.
But for now, it's straight to an ordering until you give it the explicit instruction
to allow it to process the Kafka housing.
And based on your experience, again, what's the main trade-off that you have to change in order to have ordering, right?
Is it just performance? Is there something else?
You mentioned something about duplicates.
So there are differences there with delivery semantics also.
So what are the main trade-offs that an engineer needs to have in their mind when they opt for having strict ordering? If you don't care about delivery guarantees,
then the main problem is just throughput,
is how easy is it to do vertical scaling?
If you're forcing order processing
and you've got a limited number of topic partitions,
because that's tied to your Kafka deployment,
like the number of partitions is something
that somebody else has probably made the decision of.
You might not even have control over it. So you on the processing side,
oh, I've got 24 CPU cores. Lucky me. If there's only three partitions and you're doing ordered processing, then you're stuck. You've got three CPUs. Unless you can vertically scale the individual
message processor, then you're kind of out of luck on that. But if you care about delivery guarantees, the forced ordering only makes it, in terms
of Bentos, to a Bentos user, it just means you've got to configure one extra field essentially
to kind of manually determine how much parallelism you're willing to go for.
So because messages aren't persisted by the service,
what it's doing is it's making sure that it's never committing an offset
that would result in one of the messages that hasn't been finished yet
being lost forever.
So the reason why you can potentially get duplicates there is because
if you choose to process messages out of order with Kafka,
then obviously that means that messages that came after a particular offset
could be finished and dealt with.
The next service has already got them in there.
They've got a new life in the suburbs,
whereas some messages are hung up or whatever.
They haven't been dealt with for whatever reason.
You cannot commit that offset because any, any other act,
if you commit that offset or you do anything else with it,
then the next time the service is restarted,
you're not going to consume those messages again.
So like the whole,
like basically with bento sign,
you have to be strict because I'm not,
I'm not maintaining a disc persisted buffer or anything like that.
So those messages don't exist anywhere else.
I'm using Kafka's disk persistence for that.
So yeah, it's one of those things where my role
is to basically document what's the symptom of doing that.
Like if you want to get better CPU scaling,
what is the solution to that thing?
Because right now there's a guarantee that you might not want,
you might not care about.
So do you have any plans like, or you're considering to add like some kind of state
that would like help with this kind of situations?
Or you are like, absolutely, you have absolutely decided that it's going to be stateless.
Like Benfos is going to be stateless.
I did actually have, so the first, before I went to version one,
so for like three years or something, I did have a disk buffer as an optional.
The reason why that's particularly useful is if you've got a chain of lots of services that are synchronous.
Imagine you've got HTTP to HTTP to HTTP to 0MQ or something.
Because of the acknowledgement system, there's no disk buffer in any of those individual components.
It means the acknowledgement has to propagate all the way up. So it's the same problem that people get with massive
microservice architectures where the service that begins the request chain has to wait
forever and any disconnects cause a duplicate. So I did have a memory, a memory buffer is still in,
and I had a disk buffer as well. I got rid of it because I thought, well, I'm not sure anybody needs it.
I just want to see if I can get away with not having it.
And nobody asked for it back.
So I just never, it's actually still there in the code base
because I wasn't sure if somebody was using it as like a library or something.
So I've left it there just in case it's being used in somebody else's project.
But it's not in the code base. And to be honest, I think I like the idea of having to solve,
essentially, in order to not have state, in order to not have this operational complexity of
something that a person running the service has to know about, we're using the disk for this thing,
so don't delete that. And if the disk is corrupt, you're going to have to follow this step, this step,
this step. And if the server crashes, you're going to have to do a backfill. And we don't
know for how long. In order to avoid that, the burden's on me to make a stateless version of
that same feature functional. And it normally ends up just being, I've got to be more considerate
with how I do things. So in the basic stream processing world, where it's just about acknowledgements, the burden is on me to solve having a transactional acknowledgement system and also being able to vertically scale and also being able to do things like batch sends and all this other stuff.
Because when you've got a disk buffer, that stuff is easy.
You write it to the disk buffer and when you're done with it, you delete it from the buffer.
It's more difficult in my world because in order to do things like a nice back pressure
and shutting down gracefully, all that stuff, I have to be super strict about
when are we going to allow things to close and what happens if messages haven't been acknowledged when we're
shutting down? How are we going to read N messages from this queue system without necessarily
acknowledging them immediately? What are the difficulties there for each of the individual
queue systems? But I feel like that's my role as somebody building a generic service. That's my problem because I've accepted
that problem. I've accepted the role of giving you this generic tool. And therefore, if I didn't try
my hardest to make this thing stateless and easy to deploy, I haven't really done my bit. I've not
fulfilled my role. If I just give you a service that's as complicated as something that you would have made easily is, and the config system
is just as complicated as your code would have been, just use your code. Why would you
involve me in the equation at all? I'm not doing anything for you. I'm not fulfilling
any purpose here. So why do I exist? I ask myself that every day.
Well, that's a whole other podcast episode.
But usually when you encounter something boring,
it's because there's a lack of opinion.
And so this is an ironic situation where the bore the characteristics characteristics of being boring are actually
because of like extremely strong opinions that you have to have about about the architecture which is
which is really interesting it's like it's it's more so it's it's super strict on the most difficult
mode of operation because i've got a lot of people who use it for logging. They just
use it for moving logs around from their services where they don't care about data loss. If I told
them we're dropping 50% of your messages, they probably don't even care. It's just logging.
Who cares? And I don't even think they know that it's got these strong delivery guarantees
because they don't need to know. Because it's one of the things where I can be,
I've basically made a really strict decision
to be super opinionated about something.
But the important thing is that the opinion is,
it's not really burdensome for anybody.
It's not really a problem.
And I think that's kind of where the trick is
in these generic services is to have the opinion
that is least hands-on for people when they're coming in.
Because if it was lossy, right, and somebody wants to deploy this
and it does have a mode of being not lossy,
but you've got to read a manual to do that, it's a nightmare.
The burden is on you as a user to make sure that you've plugged
all these gaps that the service naturally has to make sure that data is actually going to be delivered somewhere and that you're not just going to lose it on an outage that you hadn't foreseen.
Whereas on the end that I'm on, where everything's super strict and it's locked down, but you do get the vertical scaling and all that stuff.
People just don't realize.
People are accidentally building these really resilient pipelines unbeknownst to them maybe they're angry about it i don't know
i have i have a ton more technical questions but we also have to respect the time here
and we really need like to discuss a little bit about open source. So I have a question that I want to ask you about that.
You described how you decided to make this project open source, right? And it's been like
five years now that the project is out there. So it's been out for a while. Can you describe a
little bit how the traction happened with the project or how you perceived that like
the project started getting traction was something that like you tried to do deliberately or like
something that just happened because people were I don't know like organically finding about it
how did you end up having such popular project today so I was really lucky, primarily. I had successful open source projects
before this in the throwing a library over the wall, and it got some stars on GitHub,
and people used it for stuff, very hands-off projects. And my method was just write something
that I want to use, and I think is interesting, post it on the golang subreddit shout out to the golang subreddit
and then it might get picked up in some newsletters and that sort of stuff and then i would leave it
because once it has enough eyes only needs a few it will just pass by word of mouth was my experience
and i wasn't going to challenge that experience because i hate sharing my stuff it seems ironic
because of all the content i put out, but I hate sharing my own stuff
because I feel really guilty about it.
I feel like I'm spamming everybody
and going out of my way to force myself onto their screens.
So this podcast is ironic,
but that was my experience up until this point.
We also have, sorry for interrupting,
we also have a marketeer here
whose job is like to spam people out there, right?
That's a very elegant way of describing my...
Your job would give me so much anxiety.
But yeah, so I had a project that I liked.
Like I liked Benthos after two years.
I wanted to use it, but I wasn't convinced other people would feel the same.
So I was kind of reluctant to really do much with it.
I think I posted it on some forums and things, but I was really lucky because being at Meltwater,
they were such a welcoming engineering community that I was kind of forced out of my shell
a little bit.
I was kind of pushed and encouraged a lot of like,
this is cool. You should share it with more people. People should see this thing. So it kind
of encouraged me to come out of my shell a little bit and start evangelizing it. That was mostly
internal. Then I struggled because I hate writing blog posts and especially marketing ones. So I just didn't have the energy to go any further
than that. It had organic use in the company. There were people who were, the great thing about
engineers with like word of mouth marketing is that engineers churn at such a high rate that
you can go to one organization and kind of of evangelize this product within like two years
half their engineering team have spread to other places and it's a virus like they're going to
introduce it to all their engineering friends so word of mouth is i i think is is the main
driver of bentos but there was one fateful day where i so i made a video kind of outlining the
rough architecture specifically what I'd done wrong,
put that on YouTube and put that on the Golang subreddit.
And it got picked up by a couple of newsletters or something.
And I got a bit of attention that way.
And then I tried posting on Hacker News a bunch of times, no success, no interest whatsoever.
And then one day I wake up in the morning and it's on the front page and some random
stranger had stolen my karma.
And it was right up there and got a load of attention.
And I think that was the first time where the attention was enough that after that point,
I had a constant feed of new people coming in.
Because obviously the word of mouth is a constant, steady growth, but you need something to boost you to the point where enough people are seeing it that you actually have enough attention.
Because I did have people using it up until that point, but it didn't feel like it was enough to justify investing a lot of energy into this thing.
It was a fun hobby project when I felt like it, but I wasn't going to double down on this is definitely something people want until I saw that. I think that was kind of like a turning point where I put more effort into kind of growing it
and kind of trying to build out the community and things.
But I would still say the majority of the growth of the project is just word of mouth.
I'm not like, I'm not paying for sponsorships.
I'm not doing particularly well on, on blog posts still. So it's just,
it's just stuff like this, I guess. And then people,
people telling other people about it and growing the community.
I think a lot of people see the graphics and then they want to show their
friends and they want to get the stickers.
And so that helps spread it a little bit.
We got to, well, we need to, I need some background on this.
So the blob fish, right? That's what what it's that's what it's called it's a blobfish yeah okay
give us the backstory i love it i mean i kept smiling as i was going through the site in the
docs because i would meet a new version of blobfish uh every time and it's so great
so the the all the libraries i used to make so the things that
i did before benthos and probably the stuff i'll do after it as well are always accompanied with
some dumb logo because you've got to have a logo for your project right you've otherwise nobody's
going to take it seriously and i i used to i used to i was obsessed with the idea of just having the most unpalatable logo for something, because it will be included when people vendor their dependencies.
So the idea of companies that are serious and actually have a purpose on this planet, they're doing something important.
Having these dumb graphics somewhere on their servers.
I just loved it.
I love the idea. One,
one of them, one of the libraries I've got is, is, is a Turkey just looking glam. Like it's just
looking glam and it's just a library called gaps. And I just, I love the idea of people,
professional people in a professional environment, relying on this thing and seeing that graphic
once a week or something.
You know what? You're probably way closer to being a great marketer than you even realize.
It's definitely, I have to say, like, the more fun I have doing the documentation stuff,
the better it does generally. Because I think it comes off like people love documentation that's just not very serious it's laden with dumb humor and silly
quips or none of my examples are serious in the slightest they're all the goofiest dumbest
examples i could possibly muster but the the graphic for benthos being a blobfish was just me
finding an ugly animal or traditionally ugly animal my logo is obviously is a real and a real fish so
oh this is a controversial topic here uh so it is it's a real animal and it's got a proper name
which i don't know lots of people are going to be upset about that i don't know the real name of
this particular fish and it's d it's a deep sea fish so when you're looking at the
picture of it a blobfish it's actually because it's it's been depressurized because obviously
it's in the it's in the normal atmosphere so it's not in a particularly happy way so really my
graphic is a dead fish um but i've kind of i've shied away from calling it a dead blobfish and i just i just call it a blob nowadays it's
just a blob with a face yeah but that's the brilliant thing about that particular logo
because it's a blob you can put it into all kinds of different form factors sure and different
designs different shapes it's perfect for marketing materials and swag now who designs the
different because there's a lot of variations of the
blobfish yeah who does there are who's the mastermind i i do the bulk of them i make the
bulk in fact i'm i'm i'm the brain behind all of the different variants and their particular
equipment it's normally topical it's normally you know for a particular example and
then my wife has graciously helped me out with a couple of them she is a graphic designer and she
does that begrudgingly because she doesn't like my she doesn't like the blah the blah she thinks
it's a mockery of her career well there's there's no need to dwell on that. It sounds like she's very supportive.
She's supportive, but she's not happy about it.
Yeah.
Well, this has been so great.
We're at time here,
but this has been a wonderful conversation.
Really quickly, if someone wants to check out the project,
where should they go?
benthos.dev.
And if you want to hang out, there is a Discord.
There's a link at the top, community, click that. It'll take you to a bunch of links. You can either
join the Gophers Slack where I've got a channel on there, or you can join the Discord server,
which is all ours. That's where you can find BlobBot, the famous Discord bot, and me as well,
and the fabulous Benthos community is there as well.
Great. And if someone were really motivated to get Blobfish stickers,
how do they do that? Do you have to make a commitment?
There are ways. There are ways of getting Blobfish stickers.
If you do a blog post and let me know, then you'll definitely,
it doesn't have to be related to Pentos.
You just do a shout out at the bottom of your blog post.
Hey, by the way, pentos.dev, I'll give you some stickers.
I'm good with that. But you have to give me your address address i don't know if people are going to trust me with their address
well you're open source and your logo is a blobfish so that seems innocuous
i'm on the internet yeah i'd much more readily give you my address than maybe like a marketer
or someone yeah from from my perspective I think that's the wrong call,
but we'll let people make their own minds up.
Awesome. Well, Ashley, this has been a really wonderful show.
Amazing, amazing project.
And best of luck as you continue to build out.
Thank you very much. Thank you for having me.
It's been fun.
There are so many things from that episode that stick out.
But as I rolled it around in my mind, I think the thing that stuck out, which we didn't
talk about explicitly, but the world of data internationally is so big.
I hadn't heard of Bentos before we started prepping for the episode, which isn't a huge surprise because I'm not necessarily the target audience, but there are just so many teams
working on so many different data products at so many different companies. And you have a tool like
Bentos that's being used at, you know, large organizations solving pretty critical problems.
And it was just a good reminder for me of sort of the breadth of the entire market and
how important data has become at every type of company. So it kind of, it just made me step back
and appreciate that because a lot of times you see sort of the usual suspects in terms of
names around data processing. Kafka has talked about a ton and all of these different tools.
And to see a project like Bento's having an impact, it's like, man, it is really a big world. And
there's so many different cool products out there. And I love learning about the specific ways that,
the specific problems that Ventos solves. Absolutely. And it's especially interesting
with Asli today, because if you remember at some point, he mentioned that when he started working on this project, his title wasn't data engineer because data engineer was not a thing back then, right?
While today, everyone is talking about data engineers.
So yeah, it's very interesting. There are many tools and there are many tools that are actually exist
because someone had the need
inside the company
to automate their job
and get like more time
to work in more interesting things.
Exactly what Ashley was talking about, right?
And that's like, I think,
part of, let's say,
the charm of like engineering,
software engineering in general.
I don't know.
I really enjoyed the conversation today.
I think Asli is like an amazing person.
He's a much better marketeer than he thinks, by the way.
I think with...
Totally agree.
Totally agree.
I mean, the work he has done with the logo and all the content that he has created and
everything, like it's amazing.
It's amazing.
I would encourage everyone to go and
check the website, bentos.dev.
A lot of cool stuff, technical
stuff, but also
overall, it's a great
experience.
Even if you don't need a tool
like Bentos, go and check it out.
It's amazing and I hope that we are going to have more time to spend with him because he's a treasure
of knowledge around these kind of very complex systems.
And we have many more technical discussions to make with him.
So I'm really looking forward to chat with him again in the future.
Absolutely.
That's the show for today.
Give us feedback, Eric, at thedatastackshow.com.
And we'd love to get your feedback and any questions that you have about any of the episodes.
And we'll catch you on the next one.
We hope you enjoyed this episode of The Data Stack Show.
Be sure to subscribe on your favorite podcast app to get notified about new episodes every week.
We'd also love your
feedback. You can email me, ericdodds, at eric at datastackshow.com. That's E-R-I-C at datastackshow.com.
The show is brought to you by Rudderstack, the CDP for developers.
Learn how to build a CDP on your data warehouse at rudderstack.com.