The Data Stack Show - 117: DX for Data Tooling with Taylor Murphy of Meltano
Episode Date: December 14, 2022Highlights from this week’s conversation include:Taylor’s journey into data (3:09)What’s been going on at Meltano recently? (7:28)Addressing basic problems in data even with advancements in tech...nology (12:23)What makes Meltano unique in the space (16:53)Why the CLI experience is important (25:37)Quality vs quantity in supporting connectors (35:51)What does data ops look like for Meltano (46:44)Takeaways and closing thoughts (52:56)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Transcript
Discussion (0)
Welcome to the Data Stack Show.
Each week we explore the world of data by talking to the people shaping its future.
You'll learn about new data technology and trends and how data teams and processes are run at top companies.
The Data Stack Show is brought to you by Rudderstack, the CDP for developers.
You can learn more at rudderstack.com.
Welcome to the Data Stack Show, Costas. We talk all the time about how
we want to have guests back on the show to catch up with them. And we were able to do that. We
tracked down Taylor from Meltano. We had Dawa on a while ago. I can't remember how long ago,
but it was a while ago. And it was a fascinating conversation.
They're building some super interesting things. And so we're going to catch up with Taylor
who leads product. And I think I just want to hear how things are going. I mean,
they were kind of building this almost command line interface, you know, sort of
configuration layer for the data stack in general across
pipelines, orchestration, et cetera, which is very compelling for a number of reasons.
And so I want to hear how that's going.
And of course, they, you know, are big investors in the Singer system and all those protocols
and that entire community.
So yeah, I'm just excited to see, to hear how things have gone.
How about you?
Yeah.
I'm very also like curious to see like where Multano is today.
Multano is one of these products or companies, both that's like, when you
see like their, you know, like their, what they have done,
how they have started, how long they've been around and how hard they are trying
like to build a business around that, like really makes you like appreciate
like what it means, like how important perseverance is like for building a
business and like that's something that I have to recognize them and something that they
should also like be very proud of.
Right.
So these folks just don't give up.
And so that's what amazed me.
So I want to see like where they are today.
And one of the things that I definitely want to discuss with Taylor is about
like developer experience, like that's how they differentiate like the product, compared
like to the competition out there.
So yeah, I think we are going to have like a very interesting discussion
about how you can approach like the problem of data pipelines in a different way.
Well, let's dig in with Taylor and talk about it.
Taylor, welcome to the Data Sack Show.
We are so excited to talk about Meltano again.
We had Dawa on the show before, and we always say that one of the best parts is actually
recording a show and then checking back in later.
So we're super excited to hear about what's been happening at Meltana.
Yeah, thanks for having me. Really, really excited to have the conversation.
Okay, so how did you get into data? Give us the backstory.
Yeah, so my background is in chemical engineering. And coming out of grad school, I decided I didn't
want anything to do with that and kind of looked for a way to use the skills that I gained in grad school in an interesting way.
And kind of the data side really caught my attention.
I joined a startup in Nashville that was focused on genetic testing in the healthcare space.
And really, that's where I grew a lot of my data chops.
Prior to that, I used MATLAB and Excel and was doing some relatively simple data modeling.
But it was there that we had real business needs.
That's where I fell
in love with regular expressions and built my Python and SQL skills. I was there for four and
a half years and then moved over to GitLab, where I started as a data engineer and was able to lead
the team as the company grew from 200 people up to over well over 1000 people as it made its way
to its IPO. And there was huge for my career because we were able to be very open about
everything we were doing. That's also where we started the Meltano project, which is where I'm
at now. I was able to join that team in 2021 as the head of product and data. And I've been there
for coming up on a year and a half now as we've grown the community, grown the company, and are
really trying to make a really fantastic ELT tool. Awesome. Well, first thing I have to say is
I don't actually believe you
that you were doing simple things in Excel
because anyone I know who's fallen in love
with regular expressions who started in Excel,
my experience is that they were essentially
building software in Microsoft Excel before actually discovering notebooks.
And then that sort of is great freedom.
Basically, yeah.
It was doing things you probably shouldn't do with these tools because you're unaware of software development and the way this other industry had evolved.
We were using, I think, Subversion for some of our code practices. And we literally had like four computers that were running some of the
models we were doing. It was this whole world when I actually started working with actual software
engineers. I was like, oh, there's a better way to do this. Yeah, totally. No, I mean,
I literally remember working with someone who would like we had a computer and they would just
like run stuff in Excel overnight. And it's like, this is absolutely insane.
Whatever it takes.
I love it.
Okay, Costas is going to last
because I love when I get to ask this question.
So chemical engineering background
and now you work in data.
What lessons did you bring with you
from chemical engineering?
And do you still use any of those
in your day-to-day work with data?
I think so. I've talked to a lot of former chemical engineers, people who have
gone from chemical engineering to other disciplines, a lot of them programming, some
like went to law. The big things that I go back to from my engineering training are really about
understanding systems and understanding how these pieces fit together
and things move. One of the biggest skills I learned coming from grad school in particular
was really how to troubleshoot problems, how to take, you know, I'm having a bad outcome,
whatever it is, maybe this result doesn't look good or this equipment isn't running.
And to really have a disciplined approach to breaking problems down, subdividing them and
finding, okay, is the problem, you know, before or after this step? And it seems kind of simple, but it is a practice
until you kind of see it work a few times in the real world. It can be, you know, kind of foreign
to some folks when they're faced with a problem on their computer, they get a stack trace in their
code. How do you then go and subdivide the problem? And that's, I think is the biggest thing. But then
also just thinking, like,
systematically of understanding, like, mass balances and what are my inputs, what are my outputs?
Where can I see things happening?
And then how can I break the problem down even further?
And it's just, it's engineering.
It's problem solving.
It's taking, you know, what you know
and maybe learning some new things
to solve interesting problems.
Yeah, super interesting.
Yeah.
I'm always fascinated by that
because you think about,
and I am way outside of my expertise,
but free radicals,
and when you think about chemical stuff,
there's behavior that's extremely difficult to predict,
even in controlled environments.
It's like, oh, well,
actually a lot of those same attributes
are true to all sorts of data as well.
Super interesting.
Okay.
Well, tell us about Meltano.
So, I mean, the Singer ecosystem is, you know, it's sort of a huge amount of its worth to the work that you've invested in it.
It's growing.
That's super exciting.
When we talked to you a while ago,
that was a huge focus.
You're also looking at sort of the ops layer as well.
So tell us, you know,
what's been going on in Meltano
over the last six months,
you know, from a product perspective.
And then why don't you just also tell our listeners
like the vision of the company?
Because it's been a while.
Yeah.
So Meltano really exists, I think, to bring a better way of working with data to this project came in, a lot of the founding team was
from GitLab. And kind of the DevOps principle was built into how we think about things. And
Meltano really was, you know, a data team should build a data platform or build their do their work
modeled after software development. That meant and particularly in the GitLab framing, like
one one tool that can kind of do it all. The big difference between GitLab and particularly in the GitLab framing, like one tool that can kind of do it all.
The big difference between GitLab and Meltano is GitLab is like all first party stuff and Meltano has a lot of third party software that you can integrate with it.
We've gone through a couple of refocusing moments in the company when DAWA took over the project in 2020, really focused on kind of the open source ELT side and saw a lot of
traction with that. As we spun it out, we wanted to focus on this larger vision of becoming the
foundation for your ideal data stack for any team's ideal data stack. And what that meant is
like, how do we work with the rest of the ecosystem? We're doing a really good job with
making the Singer ecosystem better, enabling you
to run taps and targets smoothly, orchestrate them well. But there's this whole other ecosystem of
tooling that it can be hard to fit into the different parts of your stack. And so when we
spun out, we started moving towards this larger vision of, okay, Meltano can be the foundation.
You can bring in Airflow. You can bring in different tools, Superset, Metabase, anything really that's open source or has either a container or is Python installable.
And we made specific product choices to make that happen.
We introduced a new command to allow you to run composable pipelines.
It's Meltano Run.
So you can chain together your tap, your target, dbt, great expectations you know, some further downstream jobs.
We've also enhanced things around the Stinger ecosystem.
So it's not just a tap and a target.
You can also intercept data in between.
It's called the stream map and filter data, anonymize it, you know, drop data, do whatever
you need to do and kind of give you that level of control.
And so we still very much believe in that larger vision.
But as we like would go to conferences and talk to people,
people get really excited about this idea.
Like, oh yeah, data ops, platform infrastructure.
It's exciting.
They understand eventually why people need it.
But also we recognize it wasn't meeting people
where they were today.
We were maybe a little bit further
than a lot of folks in the industry actually are.
And most problems are like, yeah, this is really cool.
I would love to be able to do this.
I'm still struggling with my extract and load,
just like pure data movement problems.
So what we've been doing here in the past few months
really is just refocusing,
doubling, tripling down on the ELT side of the story
and beefing up the SDK for writing taps and targets,
enhancing functionality within Melpano,
specifically around ELT to be a fantastic solution for that. But all the pieces are there for this
larger story. And I'm excited for us to get to the point where we can earn the right to continue
investing in that because I think we as a company still believe very much in that mission.
Yeah, yeah. Super interesting. Costas and I were just talking about Coalesce.
Costas wasn't able to join us there,
but one of my big takeaways was,
as advanced as all the technology is,
and you walk around the vendor booths,
and there's some amazing stuff out there,
when you talk to the practitioners
who are doing this work on the ground,
a huge number are still trying to solve
the fundamental challenges. a huge number are still trying to solve the fundamental
challenges. A huge number are. And so that really resonates because I think it's easy.
I mean, you work for a data vendor, you're building out product and all that sort of stuff.
And it's way easier for us to look into the future because that's part of our job than for our customers
right who you know certainly are doing that actually have a lot of pain points that they
need to solve as part of their job today um and a lot of those problems are basic okay so
i have a question for you on that like what why do you think with all of this advanced technology like why do you think the
problems are still basic for a huge proportion of the practitioners and companies out there
yeah this i love this question because i think it gets to the like an industry-wide challenge and
i think this will change over time as more data practitioners kind
of come up through the ranks of different organizations. My hypothesis and what I've
seen in several places and with folks I've talked to is like data isn't a strong consideration from
early in the company's life cycle or its overall genesis, or maybe it's a really old company and they've gone through a lot of change.
When data is kind of an afterthought
or seen as something of just like,
oh, we can pay for this,
we can invest X amount of dollars
and we're going to get some return with our data.
I think it really does a disservice
to the people on the teams
that have to implement this kind of work.
And for me, data has to be kind of foundational to how you think about running more modern
business, particularly tech businesses.
But anything you're doing in a company is generating some form of data and you need
to have that data lens.
One of the reasons, not to get too highfalutin here, but one of the reasons I really fell
in love with data engineering and chose the infrastructure and the hardcore, like low-level
data pieces, I felt it was so foundational to functioning and to a lot of these problems that
we want to solve that one, it's like great career stability. Like people are always going to have
data problems, but two, I just saw like, you can't do all these fun data science-y things unless you
have a solid foundation of good data engineering best practices and workflows.
So part of it, I think, is just, you know, there are people who don't maybe understand what the current state of the art or capabilities are with data and how to use it to better operationalize all parts of their business.
But that's changing as people kind of come up through organizations and they get a little bit of power.
They're ahead of data at a new company and they can affect that change. But people are just at
different stages of this journey of learning, Hey, I enjoy building charts, but now I need to learn a
bit more about software engineering and how some of this works. So it's a maturing practice with
professionals that are gaining more skills and gaining more influence across different industries
every day. But does that kind of answer?
Yeah, no, that's super helpful.
That's super helpful.
And the other thing you got to the root of it, whether it's a newer company know, a legacy, you know, or sort of like legacy enterprise that's been around for a long time and they're trying to become more data driven, you know, sort of different sides of the same coin. entire company committed to something that you work really hard at and early on, actually,
it doesn't bear a lot of day-to-day fruit, right? It just seems like extra work that you're investing
for the future. And that takes a huge amount of commitment and foresight from a company to be able
to do that. Yeah. And I think there's parallels in software engineering. Like, are you investing in a really good engineering culture that works well with your product team and can deliver, you know, bring back insights and have just a positive feedback loop?
It's not a one-time thing where you put in some resources and you get something out where it's really functional, both on engineering, both on data.
And there's just so many similarities, I think, between data teams and software engineering teams.
It's that investment, that kind of of positive flywheel across the entire organization.
And I think early days for a lot of companies, it is a bit of a leap of faith if they haven't seen it in practice.
And I'm hopeful now we see we have more people that are true believers in a positive sense.
They're informed by data and their experience.
But you are able to articulate why it's valuable to invest in data in these processes and to build that flywheel.
Yep.
I love it.
All right, Costas.
I could keep going, but please, please jump in.
I know you have so many questions.
Costas Pintasilauskis Yeah.
Yeah.
So first of all, I'm super excited that I have someone from the product side
because I can make like, you know, like some really hard questions.
Like, for example, why someone should choose Meltano today
instead of like something like Fivetran or Airbyte or Sysdata, right?
Yeah.
So yeah, why?
Like, what's so much better
about like Multan or Bloomberg
like, let's say,
the other solutions out there?
Yeah.
Our focus right now
is on a very particular persona.
So if you are a data engineer
or, you know,
very data engineer adjacent
who is comfortable
on the command line,
isn't afraid of Python stack trace,
and wants that control over your software, that's when Meltano is going to be a really
good choice for you today. We've kind of saw that gap in the market where there are good
point and click solutions for day one situations to move your data. When we've been talking to a
lot of users, and hopefully potential customers as we build out our managed offering,
the pain points that we're hearing are,
cost is rising and I don't have a good sense of why
or how I could even improve it.
And there are problems that crop up
that I can't fix and I'm stuck in some sort of support
hell as it were.
And what we're aiming to do is kind of give users control back over their
data platform, but in a way that
we are still able to help them solve
problems, but when something goes wrong, and something will go
wrong, I think that's something that
other companies don't necessarily like to admit, like,
oh, we've solved this problem, data's moved,
don't worry about it, point and click and you're good.
Something's going to change. Something about the
system outside of your control is going to change,
and you have to be able to adapt to it and to respond to it. So Meltano's going to be a Something about the system outside of your control is going to change, and you have to be able to adapt to it
and to respond to it.
So Meltano is going to be a good choice for you
when you want to understand the code
that's running in your system,
whether it's the tap or the target or even dbt,
and have that transparency.
We've also built in kind of the software development
best practices into the product.
So there are YAML files that define your configuration, the state of your system.
And if you've worked with software engineers and they're going to be begging for tools
like that because they understand the value of version control.
So that's a long-winded answer, but the day one experience of Meltano is continually improving,
but Meltano is going to really excel today for the day two problems that you're going
to encounter when something is changing and you need to adjust your system and you want to test it and move forward with confidence.
Yeah, that makes a lot of sense.
I'd love to discuss more later about the developer experience and why it's so different.
But why do you think a company 5 trillion bytes or 6 data?
They didn't go after an experience that is, let's say,
more native to the data engineer.
Because at the end, it's not like 5 trillion bytes is used by someone else inside the organization.
You will end up...
The pipelines, the core of the, like the data engineer is doing.
They have a lot to do with these tools, right?
So why they didn't do that?
Yeah, I'm curious about that as well.
And I think there's a couple of hypotheses I have around that.
One is that, you know, we have the advantage of coming into the market a bit later where these companies are a bit more established.
And previously, it had been data analysts that had been doing a lot of this work.
I think data engineering is still relatively a new title.
I don't think data engineer is ever going to be called the sexiest job of the 21st century. and as I do more product and have like these you know pseudo sales conversations and talk to users
it's very easy to get pulled into the idea of oh okay you're facing this problem we'll just
you know build this ui for you and you can kind of point and click problems will
will kind of be solved but you're not actually you're not actually, you're talking to like, you're talking to the buyer, but not necessarily the user all the time. The advantage that Meltano has had in the market is,
I think, for three, you know, almost four years now, it's been completely open source,
free to use, and has been able to organically kind of attract this audience of data engineers.
And as we talk to them, you know, they're the ones implementing these products. And yeah,
they want the convenience of not to worry about things.
But when they do have to worry about it, they really need to solve some of these problems.
And so we talk to people who are paying customers, you know, a five train of Stitch.
And they're like, yeah, it works for some of these things, but I would really like, you know Meltano to come in and give them a lot of that control back and hopefully be a better experience that they can build the kind of the foundation of their entire stack on.
Yeah, it makes a lot of sense.
But I mean, Meltano is still trying like to build like a SaaS business, right?
Like without like a self-serve solution that you post for your customers.
So you still have like to take care of, let's say, the infrastructure, the issues there.
You need to run the operations around the technology itself.
Obviously, someone can do it on their own.
They won't like to use the open source version of it.
But at the end, someone who's going to pay Meltano, they're going to be paying
like for something that's hosted by you.
So, I mean, that's like what is like also the similarity with something like
Fivetran or like even Airbyte, because, and I'm saying like even Airbyte,
because Airbyte also have like an open source version of it, but at the end,
like that's how they also make money.
You go like to their hosted version and you pay for it.
Right.
So, and things will go wrong for you too.
Like Salesforce at some point will be like, no, we're not going like to reply
on your request, like what to do, you know, and like suddenly like the pipeline breaks.
Right.
So what is like different in the experience that needs to be made, let's say, for a cloud-hosted product
that makes it, let's say, much more convenient or native as an experience
for a developer compared to, let's say, data analysis.
Yeah, so a couple of thoughts there.
We are doubling down on the command line interface
as the primary interface,
at least initially for a managed offering.
What we're talking with are kind of our early alpha users.
And full transparency, we're in the process of building this.
We're pre-alpha, but we have some folks lined up
that are excited to use it.
They're comfortable using the command line interface
to interact with the product.
There will be an API as well
if they need to kind of orchestrate things themselves.
And the UI will come eventually at some point
because we're just going to need some form of UI
to check basic things
and not everybody always wants to go to the command line
to check things.
But in terms of getting like your work done,
it's going to come from the command line interface primarily. The other piece is transparency around what's
happening within the managed platform. Most likely, we will at least have like a source
available version of what's what we're actually running on the managed like the code itself will
be proprietary, but you can actually see like, here's the code. A lot of this is informed, I think, by our GitLab history, where GitLab is,
you know, they have a free open source version of GitLab, and then everything else is their
enterprise edition. But you can see all the code, and you can actually make contributions if you
want. And I think that's a really exciting model, because it allows people that there are certain
groups of people that will be able to say, hey, I want you to go ahead and manage it. But I'm also like smart and I can figure these things out. If I can help you
quickly figure out a bug, it's going to help me get my support ticket figured out faster.
That's the second aspect. And then the third aspect is, hey, here's the actual code that's
running for your tap and your target. If for whatever reason you need to fork the tap snowflake or target
Postgres or whatever it happens to be, you can fork that, still run that fork on Meltano, and then we
can work with you to merge it back into the main branch of whatever connector Meltano or ourselves
are managing and allow people to quickly solve their own problems because there's a lot of
downstream components that rely on data engineering instead of saying, hey, there's a problem with Fivetran and it's out of my hands.
Some folks may want that because it does kind of shield them from whatever political pressure
they may feel inside. But for folks who are like, this is mission critical and I don't really care
to worry about the deployment of the stuff, but I do like to know what code
is actually running and if it's Python and if it's built on our SDK, it would be
pretty quick to change it.
So those are the kind of
the paths that we're threading
of what makes a better
developer experience
and invites people into
kind of how we're building
this product and business.
Okay.
That's super interesting.
So let's start with, like,
the CLI experience.
Why do you think, like,
CLI is, like, so important
for a developer?
And it's more, let's say, important than a graphical user interface?
Yeah, it definitely speaks to a different audience and definitely a different persona.
When you're on the command line, it's utilitarian.
I think there are fun things that you can do to make the user experience more enjoyable.
But there's nothing generally, if it's a well-designed command line, that's like getting in your way of getting the job done.
It speaks, I think it communicates hopefully to people that were like, we're here to get the job done and kind of get out of your way. And that's why I like fell in love with dbt as a product because it I've, you know, with GitLab has never used like
dbt cloud, it's only ever used dbt core, used it from the command line, it was just a very
comfortable interface. And then it also works with all of these other tools that you have on the
command line in bash, built off kind of the Unix philosophy of piping things together and so i think it just
it does speak that audience and it's also you know for me as i've learned more and more over
my career about software engineering it's like oh if you have a good you know kind of api back end
you can build whatever ui you want but you can also build this command line it's quicker you
can iterate faster and if you want something, it's less work
than building this whole UI.
So it enables us to kind of move and iterate faster
and invites people in again to kind of contribute
if they have ideas.
Some of our features and flags and different commands
were contributed by the community because,
hey, I need to be able to add this to my project,
but I don't want to install it.
Cool, we took a PR for that to have a no install option.
And now it's available for everybody. So that's't want to install it. Cool. We took a PR for that to have a no install option and now it's available for everybody.
That's how I think about it.
Yeah.
That's super, super interesting.
And like, how do you, like from a product perspective, like, I mean, you know,
there has been like so much work done in like research and processes around like
user experience, how like to run AP tests,, to figure out what's the right color there.
All the stuff that we know about building, let's say, a very graphical experience for the user.
But what about the CLI?
How do you figure out what's a good experience?
How do you design a CLI?
How do you do that?
Yeah.
I think we're trying to figure that out.
I think there are definitely,
there's prior art that we can lean upon.
I'm, you know, for me personally,
I was a data engineer prior to this,
and now this is my first true product role.
So there's a bit of learning on the job.
But the benefits of the way I think
we're building Meltano is that it's,
it is in the open, it's open source.
We have this community and it's a great way to,
to get that feedback.
Talking to people is some of the best way that I've found to just figure this
stuff out.
Like my takeaway from being,
you know,
doing product and talking to other product managers is just like the more you
can talk to your users,
the better off the product will probably be because you're integrating all of
that information.
We also invite people in like, well, usually have specs around, hey, this is what we're
thinking for this specific functionality, whether it's like a new command and like,
what are the sub commands?
What are the structure?
We also had fantastic engineers who bring their software engineering skills and say
like, hey, this is what I would recommend.
What do you think of this?
And me going okay yeah the
problem we're trying to solve it does this you know here's kind of the overall ergonomics so
yeah it's small iterations and then doing it in a way that it's not you know fully irreversible
i think we needed to roll something back yeah i love that like i hope like one day you write like
a blog or something like the experience of like building a CLI.
Like I truly believe that there's nodes.
I think there's like a lot of experience with people that they have built that
stuff out there, but I don't think that like from the perspective of like the
product discipline, we have modified this information in a way that like people
can go and like learn, right?
Like and find this information out there.
So I don't know if you ever do it, please let me know. I'd love to read that. but like people can go and like learn, right? Like and find this information out there. So
I don't know if you ever do it, please let me know. I'd love to read that. Yeah.
It's super interesting. It's something about like, I carry a lot of, so like,
I'm very like curious, like personally, like how we can
define like developer experience and how we can build like, we see like tools in a
more structured way tools you know yeah and more products are the way you know
yeah i'm starting to you know doing it a relatively you know new job i think you you learn all the
things you don't actually know so i literally i just started reading the design of everyday things
i can't remember the author's name but excited to dive more into to design more broadly and just
kind of bring everything to bear because a lot of
like what i brought to the product job is you know at one point i was in the target persona
and now i get to talk to a ton of people that are in our target persona understand where you know my
experience is different from theirs and that's what has made this really enjoyable it's like i
get to build help build a product that is solving problems that you know i experienced personally
in the past and that i know a lot of people are experiencing today.
And yeah, that's the fun part of being in product.
There are also like fun parts that are not that fun, but we'll discuss that another time.
Today, let's stay positive, right?
All right.
So, okay.
I think like we've had like a good idea of of how the experience of working with
Multan is different.
One of the very interesting problems when it comes to ETL solutions
that has engineering, product, and business, let's say, consequences,
depending on what kind of strategies we're going to follow there,
is the connector.
At the end, without the connectors, there's no idea.
You need to pull data from somewhere and pull the data somewhere else.
And there's a lot of discussion about this.
There's a long tail of connectors out there.
There are some very important connectors out there.
How do you deal with that at Meltano?
I see that.
Like for example, like on, I would like browsing like the website, like read fast. I saw the, like the comparison between like Fivetran and Airbytes.
Like you claim that you support like 300 plus like connectors, for example,
compared like to, I don't know, 150 or 200 plus like the others.
What does this mean?
Like how, like how do you adapt in this, like in a situation where you have
like 300 connectors, like what are these?
What, like, why do we need all these connectors?
David Pérez de Mesa- Yeah.
So that number comes from, we have our, it's called the Meltano Hub where
we're listing all of these connectors.
And to be super clear, this is our understanding of the larger Singer ecosystem.
So when Meltano was started, Singer was already a project initially supported by Stitch, now Talon.
And when we say there's 350 plus connectors for Meltano, there are at least 300 connectors that we found in the wider community that other people have made that conform to the Singer specification. And that's where the power
comes in, in these long tail connectors is you can write a connector and as long as it meets the
Singer spec in terms of the data that's being output from this tap, it can be accepted by any
target. We, for the longest time, really took a somewhat hands-off approach to the maintenance of the connectors themselves and said, okay, we're going to address some of these problems around transparency, around testing, around building new ones.
But we haven't taken on the burden and the challenge of maintaining these as first-party connectors.
That has actually shifted.
We've now taken on, we're starting with a lot of the
database taps and targets, but it really is like a decentralized, you know, open source community
where people say, hey, I have this connector, I'm going to build this tap and it solves my problems.
Maybe it solves yours. And so you might need to fork the code. We are, you know, in an effort to
be more competitive with some of these other tools. We are, like I said, taking over the
maintenance of these, the database taps and targets effort to be more competitive with some of these other tools. We are, like I said, taking over the maintenance of these database taps and targets.
But they are built on top of the Meltano Singer SDK, which is really a lot of people's first introduction to Meltano.
They're like, oh, I need to build this custom connector for whatever reason, whether it's some, you know, weird API or they just want to pull some data internally.
And then for some whatever reason, they couldn't find it.
People find us a lot through the SDK. And and so we are investing heavily in and improving the sdk
we recently brought a batch message type which basically means instead of one key part of the
singer spec is that every record is output on standard out in a new line json format and says
like recording here's the data that's good especially when you're maybe coming from an API, but for like database sources in particular, that can obviously be very slow.
So this batch message type is basically a pointer to a file where we'll say, hey, we're going to
extract all the data, write it down to a file. The batch message gets sent to the target and the
target knows where to go pick up that file. And we're seeing, you know, 30 to 90 times X data flow improvement
doing that method.
Yes, so it basically means
there's a lot of,
there's an active community.
I think that's one of the differences too.
If you look at Fivetran,
they maintain all,
you can't see the code
and they're going to be limited
in kind of the long tail
that you can support.
Airbyte is, you know,
in a better place than Fivetran because
they are open source. They are currently
in a monorepo and so everything kind of has to
be in their main repo.
I don't want to completely misspeak, but I don't know
that you can run forks of connectors
within the main Airbyte platform.
And whereas we're just saying
it's good to have a decentralized
system, and that's where
TanaHub comes in to show just how active the community really is.
But it can be really hard to tell for someone on the ground of like, is Singer dead?
I go into this Slack channel, but a lot of what you don't see is people just using it day to day, pushing gigabytes of data through these connectors because it's not as transparent.
And so that's what we've really tried to do with some of the features that we've brought into the market.
Okay.
That's super interesting.
Okay.
So how do you balance like quantity and quality of connectors, right?
Because I'm pretty sure that like if you took five down, they will tell you like, yeah, they would have everything closed.
But like the quantity of our connectors is like super high.
When you allow like everyone like to go and contribute out there, which is the complete opposite of that, like, okay, anyone can do whatever they want, like with the code that they contribute there. So how do you balance that?
Like how, let's say, Meltano as, let's say,
a coordinator of this decentralized hub
of like creating connectors can help
like ensure the quality of this connector.
Because at the end, it is important, right?
Like if I'm a new user and I see out there
like five different implementations
of like a connector for Salesforce.
Which one do I choose and why?
Right.
And what if something goes wrong?
Like, is it Meltano's problem or is it like the contributor's problem?
And if the contributor does not reply, you know, like you have all these open source, like standard issues, right?
That you have to deal with.
So how do you do that?
Like as Meltan, right?
Yeah.
I think, frankly, we're going to figure that out.
It's absolutely going to be based on the SDK.
And so what we're seeing with that
is we're getting a lot of good contributions
as people maybe discover weird quirks
about a particular API that they're working with.
They'll implement the fix in their connector
and that improvement comes into the SDK. And so likely like Meltano is not going to offer support for
connectors that weren't built on Meltano SDK. But as it makes sense to say like, hey, a lot of our
users are using Facebook or Google Ads, you know, a lot of the marketing ops type data sources.
If they're built on the SDK, I think we will absolutely start to take on the maintenance
of those.
Because that solid foundation, you know, one improvement for a particular connector can spread out across all of them.
I think the other balance is recognizing that people do have like different quality and stability needs. Some folks are fine with a community tap that maybe isn't fully tested,
but they can just try it out and see and see what happens. One of the things that I haven't mentioned about Meltano is that it has this native understanding and built-in feature around
environments. And so if you have a staging table, or if you want to write locally to DuckDB,
you can test out the quality and the capabilities of different tools,
particularly,
you know,
taps and targets in a safe manner.
And then if you like what you see,
you can just run that in production and override certain configuration.
And that Maltano makes it easy.
And that's kind of like the software development principle of having testing
and continuous integration and things defined in code is you can have the
safe space to test things.
So I think for us, as we actually build out manage, actually start to onboard customers,
we'll have these conversations around like, well, what are the data sources that you want?
And we'll just kind of we'll kind of go from there. But the thing that's interesting is a
lot of these connectors actually work really well for the majority of people's use cases.
And it's only when you start to like really push the boundaries hard on some of the data
volumes that it starts to maybe be challenging for some particular data teams.
And so I'm just, I'm excited to have those conversations and see what we need to do.
But like, it's absolutely going to be based on the SDK.
I actually have a question that for both of you, because one thing that's interesting, because both of you have such deep experience in this world.
But one interesting thing is, if you need something, let's say, you know, modified or custom that isn't offered out of the box by a black box sas provider a la you know five train
or whatever like one of the challenges i think a lot of companies run into is like okay well we'll
run sort of these like core pipelines and like a five train and use the interface and set it and
forget it but then you go from there and it's like you you build something custom or even use
open source technology to manage something custom and so now you're managing the same basic data flow across two like very
different ecosystems and but it's basically the same process orchestration becomes hard like there
are a number of challenges there one thing that's interesting to me just hearing you talk to that, Taylor, is that, okay, so you have, like, let's say, supported connectors that are, you know,
or taps that are like core or whatever. But if I need to develop something custom,
I'm not actually going to a completely different ecosystem. That's like, fairly compelling.
Is that part of the thesis? And Costas, does that make sense to you?
Like having built similar technology?
So I would say absolutely part of the thesis is
if you are quickly able to solve your own problem
and then fork the code and run it,
as long as it conforms to the Singer spec,
and I'm sure we'll have some guardrails around that
where validating it outputs Singer data.
But you should be able to run that with them
like the managed
Meltano platform
because you could run it
with self-hosted Meltano.
So with a managed platform,
you should be able
to run that.
And that way,
you aren't forced
to either go,
I'm going to go buy
another SaaS tool
that happens to randomly do this
or I'm just going to
stand up some random
Python script.
Yeah.
We can help you like
have those best practices
while quickly solving your
problems.
And then once it's up and running, you can kind of behind the scenes, like
incrementally bring it back into the fold of like the well-maintained mature
data process.
And you don't have to like breach for these other tools.
Yeah.
For me, what is like very interesting with that, and just to add to what Taylor was saying about the developer experience, if you want to define developer experience, you have two very important interfaces.
One is the CLI, and the other one is the SDK.
And there is a reason that the developers need access to both of them.
Like, okay, we can chat a little about that.
But having access to an SDK that you can use to modify the behavior of the system
in a predictable and like safe way, it's super important when we are talking about
like something that it's consumed and it's used as a system by a developer.
Now, obviously like a developer will prefer to have the connector there
working, right, like not have like to write that, or wouldn't like that, right?
But that's why you're an engineer because there are edge cases, there are like
issues that you only care about, that's why you're in the company, and you might have to be able to extend the behavior of the system
that you are working with.
And that's, I think, a very big difference between developer experience
and user experience is that user experience is like super guardrails,
right, like what you can do on a user interface is defined by the visual
components that are there with predefined behavior.
While when you're talking about developers, you need also to give them,
let's say, the tools to extend or change somehow the behavior of the system.
Right?
And yeah, it makes total sense when you're working with this persona.
Now we can debate if this persona is like the best persona for this problem,
which is moving the data around.
My opinion is that it is.
But someone else might have like, I don't know, like, I might have like a different
opinion and that's like fair, right?
That's why we're competing out there.
But yeah, like I think it, for me, it's like a very interesting approach of like
solving the problem because always like traditional, like a big problem, like I think for me, it's like a very interesting approach of like solving the problem.
Because always like traditional, like a big problem, like with these platforms was that, okay, this is an open set of connectors. Like, how do you maintain that?
Like, that's not scalable.
Like, you cannot have like an organization with an army of developers out there who are maintaining like every silly like connector for an API answer.
And by the way, it's super hard to find people who want to do that job.
Anyone who has tried to hire developers who are going to maintain connectors,
they know how hard it is to do that.
So building this developer experience, I think is like a response to like how
we can build like a
scalable solution to
the problem like moving
data around so yeah
I think the point that
really stuck out to me
what you were saying
was like the modular
and like being able to
extend it and it's
definitely you know how
we kind of built
Meltano generally
recently we've taken an
effort and this is
moving away from the
singer side a little bit but out of the box with Meltano, you can run dbt, you can run Airflow,
and we've been, that's been pretty consistent for a while now. But now we've developed what
we're calling an EDK, an extension developer kit, and basically solving the problem of,
if I wanted to change how Airflow or even dbt was integrated with Meltano previously it took a
lot of effort to do that you had to understand both the code in the Meltano code base and then
like what other like weird repos we might have had for how dbt gets installed or how Airflow
gets installed and then also like the Airflow DAG generator that we had the edk comes in to
basically have a single repo have a you know similar developer experience to the SDK to make it easy to add new components that run well in Multano.
So we've rebuilt, they're in kind of preview mode and they probably won't be in GA for a while, for Superset, and we have the community contributions around Dagster, Elementary,
and a couple of other tools
that are built with the EDK,
give you basically the wrapper
around how this tool interfaces with Meltano.
And I'm really excited about it
because it paves the way for the future
for this longer data ops platform
that we've talked about and hinted at.
And with our managed offering,
like you'll be able to run dbt on cloud as well.
It's not just for the Yale side of things,
even though that's what we're focused on.
So that's all in an effort to make it, you know,
your data stack like more composable
and a really good developer experience.
That's super, super exciting.
Okay, I'm going to stop asking questions
about developer experience and
connectors because we can continue doing that like for days.
And I have like one last question and then I'll give like the
meatball to Eric.
So you mentioned like a number of additional tools out there outside of
like the ATL and the LTO, like the connectors.
So there is this new concept of like DataOps, right?
And I would assume that it's the context of DataOps that like includes
also like orchestration and like quality or like modeling and like all that stuff.
So I want to ask you like, what is DataOps for you, like for Meltano and how it relates to Meltano itself
as a product. Yeah. So DataOps, I think I really give a lot of credit to the folks from Data
Kitchen because they have their DataOps manifesto, which I've looked at a number of times across my
career. And frankly, I think it does a fairly good job of describing the idea and the
philosophy on it. The majority of the pieces that are or the items that are listed, I think they
have like 18 or something like that. A lot of them recognize that the DataOps term is really about
processes around people. A small part of DataOps is a technological solution. But the problem I think that DataOps as a term kind of addresses
is just about recognizing that a lot of data problems have people problems
and that there is a technological component to it
and that there's a way of working that enables you to achieve the outcomes you want
faster, more stably, with a higher level of quality,
and frankly, in a way that's maybe more enjoyable to do. I think the reductive way of talking about
data ops is that, oh, it's just it's DevOps for data that doesn't fully recognize that there are
stark differences in working with data, particularly around orchestration, managing state,
and that things like CICD are great, but can be way more challenging
when you're talking about working with a Snowflake database
or working with multiple terabytes of data.
So for me, DataOps, I think simply is just a bit of a marketing term
talking about a way to work better as data professionals,
recognizing that building your data platform and building your data practice is a lot more akin to software engineering than it is to
maybe another discipline. For Meltana specifically, I think we really lean into that software
engineering side of things of building your data platform like it was a software engineering
product. And I think that manifests in how
the features of the product
look and how people experience them
through the YAML files for the command line interface.
But yeah, I
think in a lot
of conversations I've had with folks, people like, they've
heard about DataOps and they get excited, but again
it comes back to like, what problems are you experiencing?
And for us, it's
there are better ways of working.
And we believe a lot of those are working more like software engineers
than working like another type of, you know, tech worker.
Henry Suryawirawanacke... That's great.
I think Eric, we should like try to have an episode about data ops and like
just chat about that, like get some, uh, people to like... Eric Bozdaf get some... I think it would be awesome.
Yeah.
Yeah.
And you should be part of the panel there.
Like we should do that.
I think it's very interesting, like when we have like new terms entering like an
industry and being able like to, you know, like clarify, like make it more clear of
like what this thing is, right.
Because that's the, that's the problem you see, like, and that's, by the way, a problem
that is caused a lot by marketing because the terms themselves, like, okay,
they have their own meaning.
Like whenever like a new term arises, I think there is a reason for that.
But marketing is trying to like really aggressively capitalize on that and use
it as a way like to communicate something.
And many times like problems arise from that.
I've seen like a lot with like concept like data mesh, for example, right?
Which is like, okay, like if you read like at the end what the data mesh is,
it's okay, like make, make sense what you are reading there, right?
But you have like such an aggressive and in some
cases also like bad marketing happening around them that like it really like destroys like
the semantics behind it that are communicated to people and that
hurts the industry at the end right so i feel like if we can have like discussions with people that
you know they are like experienced and they have like a very honest like approach and not, again, it's not, I'm not going against marketing here, right. But just
trying to describe reality. I think it's going to be very beneficial, like for the people who are
like listening to the show too. I think we should do that. Putting our product hat on, I think just
like focusing on the problems that people are having and that data mesh, data ops, data contracts are tools that are trying to solve problems.
And I just like being honest that like a tool is not going to magically solve your problem.
There is always going to be some sort of people aspects that you have to deal with.
But I do believe that technology can enable better ways of working.
And so I don't know.
I don't know that conversation.
We would have the full definition of this is what DataOps is forever and always.
But inviting people in to understand these are the problems we're trying to solve
and this is how this came about, I think would be very beneficial.
Yeah, let's do that.
Eric, all yours.
I love it. Well, we're at the buzzer.
So I have several more things to discuss,
but we're going to have to do it on another episode.
I will say right here at the end, though,
this episode has confirmed my theory, Costas,
which I opined to you about in a recent Shop Talk episode
about logic moving further and further down the stack.
And I think CLI is the best example of that, right?
It's going lower and lower. So it's been
very validating for me in terms of that theory about business logic being expressed as code.
So thank you, Taylor, for validating one of my wild theories. And congrats on all the work you've
done at Meltano. What an ecosystem.
I mean, amazing contributions and best of luck as you continue to build.
Thank you so much for having me on.
I really enjoyed the conversation and glad I could confirm your hypothesis around the industry. What a fascinating product. And my big takeaway is that you don't hear this very often,
but Meltano as a company has a huge vision
for being a data ops layer for the stack.
But they really listened to their customers
and went back to the main pain point
that their customers had,
which is actually on the pipeline side of things.
And so I just think that takes a lot of courage
as a company to say,
we have this grand vision of what we set out to build,
but we're probably too early for that.
And so we're going to listen to our customers
and go back to those components of the product
and make them better
so that we can better serve those customers.
And I was just really impressed by that.
I think that's such a refreshing thing to hear.
It doesn't sound as cool as,
you know, we're breaking new ground
with a data ops layer,
which they actually are doing that.
But they're also just making a lot of things way better about their core
product and the core problem they solved and what they're hearing from customers.
And so I just really appreciated that.
Yeah, a hundred percent.
I think what you just described is let's say a proof of like the quality of the people that they run
both the business and the product, the company, so that's not easy to achieve.
And like I think we should congratulate them for that, right?
And I think it's also like, you can see like how valuable it is to have someone
leading your product function who comes like with a very deep knowledge and
understanding the problem space and makes it awesome that this is happening here
because Taylor was a practitioner.
Like he was dealing with this.
Like, so he can empathize with the user and he can build something that
iterate much faster on like, you know, like converging the solution to like much, much
faster to the solution, like compared to other like products out there.
So yeah, that was like super refreshing and super encouraging.
And like, it was like lovely to chat with him and hear all the opinions
and share the knowledge that he has
about how to build a product
that is going to be successful in the long term
and not just trying to capitalize on the hype today,
which is great.
Yep, I love it.
Well, if you enjoyed that,
many more great episodes and guests to come.
Subscribe if you haven't and we'll catch you on the next one. Eric Dodds at eric at datastackshow.com. That's E-R-I-C at datastackshow.com.
The show is brought to you by Rudderstack,
the CDP for developers.
Learn how to build a CDP on your data warehouse at rudderstack.com.