The Data Stack Show - The PRQL: Is It Viable to Manage Integrations Open Source?
Episode Date: January 21, 2022Eric and Kostas preview the upcoming show featuring Douwe Maan of Meltano. ...
Transcript
Discussion (0)
Welcome to the Data Sack Show prequel.
We just recorded an episode with Dawa, who is the CEO of Meltano, and it's a really interesting
platform.
I'm going to start off with a somewhat provocative question for you, Kostas.
So in the world of data, and so in the world of data and especially
in the world of data integration, one of the problems is maintenance of integrations,
right? I mean, that's something you literally work on every single day when you're not with
me on the podcast, of course, which is the most important thing that you do, but it's a huge
problem, especially as the number of, you know, sort of business tools, infrastructure
tools, et cetera, grows.
The stack is growing in complexity.
We've talked about that on the show.
Do you think it's viable to manage those open source, the integrations, right?
I mean, do you have this tension between sort of closed source where you have very tight
control over what's going on,
and you can force prioritization. Whereas if you have, you know, a couple hundred connectors,
like we talked about with DAWA, managed open source, there's inevitable neglect.
So do you think it's viable? Yeah, I think it is. Actually, to be honest, I was very excited to see the birth of Synger because it looked
like, let's say, a way to solve this problem of maintaining an open set of integrations
at the end out there.
At the beginning, it didn't go that well, mainly because of external reasons, which
is like the company that invented it got acquired,
and priorities changed, and all that stuff.
But we see a kind of renaissance right now of Singer,
with companies like Meltano trying to create
some kind of governance around it.
If they succeed, I think it's going to be,
I wouldn't say unstoppable,
but I think it's going to be a very unique mode say like unstoppable, but I think it's going to be like a very unique
mode out there compared like to the companies who are trying like to maintain everything like
in closed source. Right. Yeah. So now it's not easy mainly because the problem with like
maintaining integrations is that it's integration. Like it's a little bit like a fit zone project or product at the end, right?
So figuring out exactly how you can govern this
and maintain quality and all that stuff,
I don't think it's solved as a problem yet,
but I think we are getting closer to that.
So yeah, I'm very super excited to see
like what's happening in this space.
And of course with Meltano
and like all the people coming from GitLab that they have have a ton of experience with open source. I think that the
right people like to try and tackle this problem. So I'm super excited to see how this is going to
evolve. I agree. I think, and I want your opinion on this. We're close to time here, but
when we think about the data stack increasing in complexity and we hear,
you know, we, you know, kind of laugh about abstract concepts like the data mesh, you know,
and other ways that people are trying to sort of, you know, frame the way that we think about these,
the challenge of all of these different components and changing and, you know, all that sort of stuff,
the package management type paradigm for the data stack that Dawa mentioned
is I think one of the most compelling answers to,
compelling tactical answers to sort of the challenge
of trying to have a framework to think about the complexity
as it relates to I'm trying to actually make
all this stuff work together in my day-to-day job. What do you think? to, I'm trying to actually make all this stuff work
together in my day-to-day job. What do you think? Okay. I'm a bit biased mainly because I'm coming
like from a, like I have like a data, sorry, an engineering background. So, and I'm also like a
big, let's say proponent of like not trying to reinvent the wheel. Right. So if I have like to,
let's say if I have in order to solve a problem, let's say I have two options. One is like to use a metaphor or try to invent a new word. I prefer the metaphor, right? Like for me, it's much more valuable and much more like productive to be like, okay, there are disciplines out there that are dealing with the same problems for like decades.
Why are we trying to create new paradigms instead of trying to get the paradigms that
we have experienced that they are working and try to adapt them to what we are doing
here?
And that also helps with communication because what we forget is that when we create new
products, it's not just like the software that we're building, right?
We also need to build the language around it, educate
the people, help the people understand
like, what we are doing.
And if you're talking about
like, data engineers, yeah, like, packet
management, like, makes much more sense.
It's much easier, like, to understand what this thing is
about than, like, a data mesh. Like, what is data mesh?
I don't know. Like,
it's a mesh of data. Yeah, like, and
I don't want to say that, mesh is not something meaningful, right?
Sure, sure.
It is.
But what I'm trying to emphasize here is how much more difficult is it to communicate what
the data mesh is compared to something like packet management, right?
I'm going to interpret that as you agreeing with me which makes me feel great i
always agree with you always all right well if you want to if you want to hear someone far more
qualified to tackle these subjects definitely keep an eye out for the next episode that we
just recorded with dawa from meltano and we'll also get to hear a lot about how meltano was
born inside of gitlab which is really fun and super interesting so we'll also get to hear a lot about how Meltano was born inside of GitLab,
which is really fun and super interesting. So we'll catch you on the next show.