Drill to Detail - Drill to Detail Ep. 113 ‘Data Teams, Portable and Data Integration’s Long Tail’ featuring Special Guest Ethan Aaron
Episode Date: October 2, 2024Mark Rittman is joined in this episode by Ethan Aaron, Founder & CEO of Portable, to talk about what you should do if you’re the first data hire at a company, when and when not to hire a consult...ant, the founding story of Portable.io and the long tail (and economic model) of the data integration connectors market.Ethan Aaron LinkedIn ProfilePortable.io Homepage“The biggest misconceptions about data integrations..” (LinkedIn)“You join a 100 person company as the head of data. What should you do?” (LinkedIn)“The stuff no one will tell you about running a data team” (LinkedIn)
Transcript
Discussion (0)
So welcome to this second episode in a new series of Drill to Detail sponsored by Rippman
Analytics and I'm your host Mark Rippman.
So I'm very pleased to be joined this episode by none other than Ethan Aaron.
So welcome to the show, Ethan.
Thanks so much for having me, Mark. Excited for the conversation today.
Thank you. So Ethan, for anyone who doesn't know you,
just maybe just introduce who you are and what you're currently doing at Portable.
Totally. So I'm the founder and CEO of Portable.
Portable is an ELT tool. So we help data teams extract data from 1500 different applications and centralize it into
their data warehouses, Snowflake, BigQuery, Redshift, et cetera.
And my background before this, I was running the data team at LiveRamp.
I was the data person at a startup.
And even today inside of Portable, I am the data person at a startup, and even today inside of Portable, I am the data person. So a combination of both building integrations, helping data teams with ELT, as well as thinking
about best practices and strategies for managing internal analytics or external analytics,
depending on how your data team creates value.
Okay, fantastic.
So how did you get into, I mean, you mentioned that you were doing work at LiveRamp and so
on, but how did you get into, I suppose, the work you do now? Because I
think you were at Goldman Sachs, first of all, when you were doing different sort of work. Let's
give us a bit of a story about how you got into this kind of world. I love to just learn new stuff
is the answer on that. So undergrad, I studied mechanical engineering and business because I
couldn't decide if I liked technology more or I liked business more. Then I went to Goldman Sachs and I was doing real estate investing. So I was doing
real estate, private equity. We were buying office buildings, multifamily units, et cetera.
And it seems like a finance job, which it was, but I found myself getting really excited about
the operational side about, Hey, like we have all
these meetings, certain documents, can we centralize them into one place so we can
slice and dice the information across, um, all the information we have. So I was getting really
excited about the operational side and less excited about the, um, the actual deals. So
a couple of years in, I was like, I need to go to a startup. Something about it just felt
right. So I joined a 12-person data startup called Arbor. And I knew nothing about data.
I knew nothing about startups. And I knew nothing about sales. My job was supposed to be sales.
But I was like, that sounds amazing. I get to learn about all these cool things.
So I showed up on the first day. And the CEO, instead of saying, hey, great, let's do sales. His question was, do you know SQL? Do
you know Shell Script? Can you build dashboards and implement customers? And I was like, no,
but I can figure that out. So I bought five or six books on what I thought were good books on SQL.
I found out looking back at one of them now,
it was just a book on MySQL databases, which I didn't know it had the word SQL in it.
So I was reading all these books trying to figure out how this stuff works. Same thing on the
shell script front. And then I was just banging my head against our production, like a read replica of our database.
And the goal was, what insights does our CEO need to run the business and track how things are going?
So that's where it started, was really that just day where someone was like, hey, we need insights.
Like, can you figure this out?
And I was like, yeah, that sounds fun.
And that was 2016.
So it's been eight years and I've gone pretty deep into this world at this point.
Okay, excellent. And then you ended up at LiveRamp. So I've heard the name LiveRamp a few
times. What do you do there then? So we got acquired by LiveRamp. And the first year I was
the head of product for publishers. So LiveRamp is a big ad tech, mar tech company works with the, works with the fortune 500
moves petabytes of data.
It's an integration company.
And I was helping publishers did that for about a year.
Um, and then I looked around and realized LiveRamp was a thousand person division of
a publicly traded company and it didn't have a centralized analytics
function. So I wrote the job description to become the head of business intelligence at LiveRamp.
And the exec team said, that sounds great. Do it. So kind of similar to my first role in data,
where I showed up and they were like, do you know SQL? And I was like, no, but I'll figure it out. Same thing kind of happened at LiveRamp where I was like, do I know how to stand up all of our
data architecture and infrastructure? And do I know what our execs care about? At the time, no,
but it was a phenomenal way for me to learn what matters to executives inside of a thousand person
company. How do you find that out? And a lot of it just comes down to
having lots of conversations with lots of people and asking them what they care about. And then
also looking internally at the various data infrastructure we already had in place,
because we did shock. We didn't have a centralized team, but marketing had their own
data warehouse with their own tooling product, had a different data warehouse with their own tooling
finance. Someone had set up their own data stack so like we had these pieces none of the people
were talking to each other and it was a phenomenal place for me to go look at okay here are all the
tools we could be using here are all the approaches we could take is there one approach that would
work where we could centralize all of this so that we don't have to restart three times every time we want to answer a question um so did data there uh for about a year and then moved over to work for the
chief strategy officer and head of m&a and i spent about a year taking a step back and instead of
working like standing up infrastructure interviewing execs to figure out what matters and actually solving problems like writing queries against our data sources. I was looking at the data integration
ecosystem through the lens of partnerships and acquisitions. So accommodate like anything from
customer data platforms, iPass tools to ETL tools, ELT tools to anything in the market tech
ad tech landscape. And I was coming at it through the lens of like, what what's out there that could
be a strategic asset to I rep and as I was I was as I was digging in, to me, I got really excited
about some of the some of the no code, ETLcode ELT capabilities.
And that's really what kicked off the idea for portable.
Okay, okay.
We'll come back to that in a bit,
because obviously there's a massive area to talk about there.
But really, I suppose the way I'm particularly sort of aware of you
was through you're a bit of a man with an opinion, really.
And you post on LinkedIn, you run sort of low key data meetups. So let's
get on to some of the things that you've been talking about. And in particular, I want to talk
about some of the thoughts you've had around or some of the thinking you have around, I suppose,
you know, when you start a date, when you start a company's head of data, what should you do?
What should you do prioritizing totally and my general advice
both when you start as the head of data at a company or just today if you're already inside
of a data company take a step back and realize that the reason why you are there is to create
value for other people inside the organization like you are not going to go close the next deal. So you have to be there
effectively in service of other people inside the company. So the first thing I recommend
when you join, or even today, if you haven't done it in a while is go figure out what matters
to the people running the company. And hopefully like you are, or you become one of the people
running the company. But even if you are, you still need to figure out what everyone else cares about.
You have to go talk to the CEO.
You have to go understand what the board is asking about.
You have to talk to the CRO or the CMO or the head of HR.
And because every, like, as I've worked in analytics, I've worked in strategy, I'm CEO now.
And from the top down, companies have a plan. They have a strategy. They say, hey,
we could do a million things, but the best three things for us to do are A, B, and C.
If you were working on things as a data team that are not A, B, and C, the three things that are
top priority for the CEO, for the strategy of the business, and for the board, people are going to ignore it. It's not going to be high value, and they're not going to view you
as a critical contributor to the future of the business. And you immediately get delegated to,
oh, the data team's a cost center over there on the side, not we need this person to have a seat
at the table when we're talking about the future of our business. So I don't, I don't think about what the wrong answer is to go look at the data
or look at the tools. The right answer, in my opinion, is figure out what the people
at the company need to make better decisions and run the business and start there. Don't,
you could close your computer
for the first week just go talk to people if it's a virtual company you have to have your computer
but that's always where i start yeah do you find that's particularly the the case now people are
organizations i suppose are less interested in investing in data for the sake of data
and they're looking to invest in it for certain reasons or for business purposes. Do you find that's particularly an issue now where
the desire just to do data for data's sake is less there?
Definitely. Oh, 100%. It's one of those things where five years ago and today,
a lot of companies have similar sized data teams. Let's say you're a, like, let's say you're a hundred person e-com brand. Five years ago, you had one data person, maybe two. Today,
you have one data person, maybe two. So you would think that they were very similar environments.
They are not. Five years ago, they just hired their one or two data people and they don't know,
they didn't know why they were there they
didn't know what they were going to do they didn't know how big of an opportunity data was so it was
this like upward you want more money for your data team you want more time to go do r&d go for it and
you end up with this data for data sake thing um and this the sad part is that one to two person
data team from five years ago, five years ago,
grew to five people.
And then it grew to 10 people as rates were super low and valuations were crazy.
And no one along the way was asking the question of what do our execs care about?
How am I going to justify the 10 person headcount, which could be millions of dollars?
And then that team got cut down to two people
again. And so like the environment we're in today is not a, like execs don't trust,
in my opinion, execs don't trust most data teams. They look at them as like, I trusted you before
you had 10 people, didn't work. Now we're down to two.
And the question is, why do we have two?
Do we need two?
Or do we need one?
Do we even need one?
So it's this question of along the way, we always should have been asking the question
of how do we add value to the business?
But we had the benefit of trust five years ago. Whereas today we don't,
which is kind of good because it means you have to justify everything. It's painful. A lot of
people don't like doing it because five years ago they didn't have to. Um, but in today's market,
like the, the execs have seen it not work and it's like the whole fool me once thing but it's
like um they they need to know that every additional headcount or every additional
hundred thousand dollars you want to spend on tooling is going to be off five times so
that's that's kind of the market we're in today And it just means you can't go stare at an IDE for 40 hours a week and assume people will be happy with that.
You have to go find the stuff that matters to the business, and then you can go do the stuff that you might think is fun.
Do you think really over the last, say, five years, the role of the head of data has changed a fair bit?
It used to be really more about how quickly could you hire a team and who do you know?
What do you think?
I kind of, I think about,
I don't think five years ago,
people thought it was a,
can you hire everyone type of role.
That's what it turned into
because the budget was just there.
I think that's kind of not just a data thing.
That's been everywhere.
It's like you joined a company five years ago,
most companies were growing quickly and you could go from a one-person team to a three-person team to a 10-person team.
So that was kind of the default.
There was a lot of cool technology that was also coming out.
All the tech companies in the data world were wildly overfunded, which meant they could give things away.
They didn't have to charge for stuff.
You could try them all all and it seemed fun. Whereas today, if you joined five years ago and you said, I need more headcount, I want
to hire a team, people would be like, yeah, that sounds cool.
Let's do it.
That'll add more value somehow.
Today, there's a lot of companies where there's no expectation you'll be able to hire anyone.
There's no expectation your'll be able to hire anyone. Like there's no expectation your team's going to grow.
So they like, you have to realize that the team you're in today is probably going to be the same size, if not smaller in a year.
And people, not just leaders of data teams, but people on data teams need to realize the
skill set that's valuable today is a generalist skill set.
Whereas the skill set of five years ago was a hiring skill set that's valuable today is a generalist skill set. Whereas the skill set of
five years ago was a hiring skill set. And three years ago was a specific skill. You could be a
data scientist, or you could be an analytics, or you could be a data engineer because you're on a
10 person team. You were a data scientist two years ago on a 10 person team. And now your team
is two people. You're no longer a data scientist.
Like you are now a generalist and you have to be a generalist if you want to
do what needs to be done for your business. So I think that that's, what's being rewarded today
is people that can be scrappier, that can learn more parts of a tech stack that can do more with less um
whereas three years ago that was different five years ago it was but today it's it's
create more value that cost us to the business and do it efficiently okay okay are you finding
at the moment that there are something we found i'm noticing in the uk is that there are less
projects coming along or less initiatives coming along where it's an organization that off its own back is saying we want to invest in analytics to achieve this thing.
You know, where we're finding there is there is work and there is project work is more where maybe they're doing a transformation.
They're moving from, say, on premises to cloud or whatever. you know, we're seeing less over here of, I suppose, you know, projects that are being started
voluntarily by organizations to try and improve their revenue, for example, using data. Is that
maybe just a UK thing? Or are you finding that in the US as well?
I think a lot of the hype stuff of just like, let's make an R&D investment into a thing has
shifted from data and analytics to AI. So I think you've seen more of those types of events, like
execs in bigger companies and even smaller companies really have one thing at a time
where they're like, I need to make this bet because my board is telling me to make this bet.
Should they make it? I don't know. But like five years ago, that was data. That was like,
okay, great. We need to do this. Let's hire the one person. Oh,
the one person says we need two more people and we're $100,000 in software. We got to make this
bet because it's somehow going to pay off. And there was this blind trust. I think a lot of that
model has now shifted into AI. So I think I've seen a lot more of that in like, I've seen data
teams. It's kind of sad, but the reality I've seen data teams where they were small then they grew they got a bunch of funding for just like this analytics wave
a few years ago and then they were starting to downsize and then the board came to them and said
can you help us with ai and they had a they had like a very real question of do they say no we're
the data team and continue to decline?
Or do they say, sure, we'll help you with AI and take the like exploratory budget.
So they took the exploratory budget in one case, at least.
So it's like, I think that's what that, that's shifted.
Like it's been, that big bet has been replaced by AI in a lot of companies.
And then the other part is I think a lot of, we've lost a lot of that trust.
Like when we have to earn, we have to earn it back.
Like you, someone's not going to come back in and say,
like, let's invest in data for data's sake again
after it didn't work the first time
and they had to fire all those people.
So you have to go in and be like,
here's why this is, like,
here's the business problem we're going to solve that's going to save half a million dollars a year.
That's enough to hire one person.
That's all we're going to ask for to start.
If we can find another $3 million in savings, let's do it.
But the math has to be there now.
It didn't have to be five years ago.
Okay.
So again, a lot of things you've been talking about in the past have been about saying no.
Okay. years ago okay so so again a lot of things you've been talking about in the past a bit about saying no okay so so so saying no to every data request you have that comes through saying no to people
but creating reports for people who aren't the ultimate bosses but actually are you know uh that
they report to those bosses and also saying no to expensive technology for the sake of it really i
mean so tell us about that then really what what why do you think that's important it is i've i've been in both i've done both so i worked at goldman sachs right
right out of college goldman sachs teacher and big banks teach you a lot like i'm very very happy i
did it it taught me some great things taught me some things i had to unlearn one of those was
you cannot say no so what in banking world what you effectively have to do is you work
120 hours a week. That's just, that's, that's the answer. So like you get everything done,
but you say yes to everything. Someone could say, Oh, print these papers and you have to go print
the papers. Um, is it worth doing that instead of doing something else?
Doesn't matter because you're just going to get it all done because you have to work 120 hours.
Most people don't. Most people don't work 120 hours and they have to make some sort of trade
off. And when I went to Arbor after Goldman, I learned a ton from the CEO there of just the
importance of saying no, because it frees up
time to focus on the highest value things. So like, if you think about it, if you have,
everyone has some amount of hours they can work in a week. Some people that's 40,
some people at 60, some people it's 120. Let's say it's 40. But if you only have 40 hours a week and you say yes to 39 one hour tasks, every that
are low value, you only have one hour left to work on the highest value thing for your
company.
Whereas if you say no to 39 things, you don't have 40 hours a week to go try and solve the
biggest problem your company faces right now.
If you can solve that problem, I promise you it's worth
a lot more than 40 times solving all the other. Um, it's really like the power of that
didn't sink in for me until it was kind of like unlearned everything I'd learned before. But if you go to an exec in your company
and you say, Hey, I could do these 30 things, or I could do that one and it would be great.
And it would save you $5 million from our top line or add $5 million to our top line or save us
$2 million on our bottom line. Most execs are going to tell you to do the big one. They're
going to do the thing that. They're going to say,
do the thing that I care about. Ignore all the other stuff that the junior person that works
for the junior person on my team has asked you for. You can't say no if you haven't identified
the high value thing, because then you're just saying no and you're not doing any work.
So it all comes down to say no to low value stuff in service of doing high value things.
But for that to work, you also need to be finding the high value things.
You have to spend the time to identify what are the biggest things you can do for the
business.
So to your mind, where do you see consultants adding value at the moment?
And do you see consultants being used?
And do you think infrastructure requires bringing specialists these days to set up? Yeah, I have a lot of perspectives on this.
So when I think about data teams, consultants are great. It depends on the situation. Let me
explain. There are two very, in my opinion, two very different skill sets you need out of a data
function. So let's say you want to have a one person data team. You can extrapolate this out to bigger teams too, but let's say you want to have
a one person data team. And right now you have nothing. Option number one, what a lot of people
default to is we need a data team of one. I'm going to open a job just like I do for anything
else. And I'm going to hire the data person. And the problem with that is the hiring manager,
could be the CEO, could be the CTO, could be the CFO says, I'm going to open a job for a head of analytics.
They need to know SQL, a BI tool that I've randomly picked and some other tech that someone told me is cool.
They don't know how to evaluate that person.
They don't know how to hire that person.
They don't know how to measure the success of that person.
Even people that have been doing this for a while don't know how to measure the success of a data team.
But once you hire that person, they have to do two fundamentally different jobs at different parts of their life cycle.
The first one month, six months, they have to stand up all the stuff. They have to actually go from, we have nothing to,
we need a data warehouse, a BI tool, someone to get data from our systems into the data warehouse.
And we need to actually get value in front of someone. Like you can add all the other tools
if you want, but like the minimum viable tech sec, they have to stand up is warehouse, BI tool,
some way to get data in um and the problem is and this
this was kind of me a lot this this not kind of this won't be at live ramp is if you hire someone
in to do that who isn't constantly doing that it's going to take them a lot longer so like it's me
the amount of time it took me to stand up our data stack at LiveRamp, I was learning it all from scratch.
I didn't know the latest.
I didn't even know the options for warehouses or BI tools or ETL tools.
So they were effectively paying me, which is a great learning experience for me, but they were effectively paying me to research the market to understand which tools i should use
to do the thing and then because i was researching it and i was new to this we all me and everyone
around me had to question every decision it's like do we use bigquery or do we use snowflake
do we use this elt tool or that elt tool like which which visualization tool do we use this ELT tool or that ELT tool? Like which, which visualization tool do we use?
And the problem with that is it makes everything take 10 times as long. The amount of time it would take that full-time hire to do that, like is going to take 10
times as long as if they hire someone like you or another consultant to just say, come
in, please stand up my data stack.
And you'll come with an
opinion. You'll come and be like, I've seen this before. I just stood something up last month.
And I know the nuances of your business. I know because you're pulling this type of data,
we can't use this. Or like, oh, you're a healthcare company. Therefore, HIPAA applies
that shrinks everything we can do in terms of tooling. Here's the other approach.
The standing up of all the infrastructure
is something that consultants do nonstop. So in that situation, the premium you pay,
and a lot of the times it might not even be a premium, but the idea of going to a consultant
and saying, I need all this stood up. Here are my requirements. You'll get opinions, you'll get
speed, and you'll get everything
that works together and you'll it'll also give the hiring manager who might not know a lot about data
enough smart context on the tools the outputs and how to think about it from working with the
consultant that they can then hire the person to do the steady state. The steady state person can also have like some
opinions about the tech stack, but it's like the steady state person you probably want full-time
depends on the company, but you probably want full-time person who is going to walk the hallways
of your business and spend time constantly saying, CMO, what, what matters to you right now? Like
director of marketing, what matters to you right now? CRO, what matters to you right now? Director of marketing,
what matters to you right now? CRO, what matters to you right now? So it's like,
there's two different jobs. The first job is to stand everything up. The second job is walk the
hallways and find out what matters to the business strategically every month, every quarter.
The first job, in my opinion, it's cheaper, faster. You're going to get better solutions from paying a great consultant
to do it quickly. The second one, there's actually two answers to the second one. If you can't afford
a full-time person, you hire consultants on retainer for cheaper than a full-time person,
which is another scenario where it makes sense to just use freelancers or consultants.
Once you get above effectively paying for a pain enough to justify a full-time equivalent,
then the question is just where is the best talent? If you could like, there is great,
I see, there's great IC and manager talent out there that will work in a house.
It is more difficult to find than finding great consultants,
in my opinion.
So that's how I would describe it.
Yeah.
Yeah.
Okay, okay.
So let's talk about Portable then.
Okay, so you obviously, when you were at LiveRamp,
you looked at the market, and there are a fair amount of of data integration data extraction uh sort of products out there back then and now so
so what was the problem you were trying to solve what was the i suppose the gap in the market you
were trying to solve with portable and what was the problem you're trying to solve that wasn't
being solved before by the way um if you talk to enough data integration tools, you realize they all
prioritize in the same way. They all say, what connector do you need? And they say it to the
market. And what that leads to is everyone comes in and they say, I need Salesforce. I need
Postgres. I need NetSuite. I need the most common things, Facebook ads, Google ads, Shopify. And
that's great because the market for each of those is very, very large.
The number of people that use Salesforce is very large.
The number of people that use Shopify is very large.
So there's a big market there.
What that ends up leading to, though, is you end up with 100 ETL tools that all have the same connectors.
So you end up with 100 tools that all have Salesforce same connectors. So you end up with a hundred tools that all have
Salesforce and they all have Shopify and they all have Postgres and MySQL. And as a data person,
like great, you have a hundred options for Salesforce, but like, so all these companies,
let's say you have a hundred connectors. When you need connector number 101, the thing that
none of them prioritize, the problem is none of them prioritize it. So they've all built the same
thing a hundred times over and they haven't, none of them have built any of the other stuff. So it's
this like supply and demand imbalance, in my opinion, of there is more demand for Salesforce than there is for some niche CRM system
for a vertical. However,
even though there's less demand for the niche CRM system, there is zero supply for it.
So for us, the question we asked was
is there a way to build all the stuff no one else
has ever decided to build?
And the big challenge there is just you need to be able to do this more efficiently because
if the market for that niche CRM system is very small, you can't make a three engineer
investment into building it.
So you have to be really, really efficient at building those connectors.
So when we started, our vision was, is it possible
to build 10,000 integrations? The high watermark at the time and in our space, in the ELT space,
was 150 integrations. So we were trying to 10x that and then 10x that again was our vision and
mission. And it sounded absolutely insane. I talked to founders of other companies in our vision and mission. And it sounded absolutely insane.
Like I talked to founders of other companies in our space
and they were like, this scales with people, impossible,
not gonna happen.
But that's been our goal is,
is there a way to build 1000 integrations?
And we started with the long tail.
We at this point have 1500 plus integrations,
no code integrations.
People can log in, put in
their credentials, click run. And it just means that if someone needs a niche connector, we build
it today. I, I, yesterday I built a connector on a call with a prospect or they're, they're an
existing customer, but on the call, they were like, Hey, I really need your help. Or any chance
your team's going to be able to support this integration? I was like,
can you send me the docs? And by the end of the call, this was a 30 minute call. By the end of
the call, it was live and portable. They had put in their API key, set up a new schema in Snowflake,
clicked run, and they were looking at the data in Snowflake. And this was a tool that they had
stood up the day before. and it is a startup.
There is no other,
no one else is going to build that integration.
We built it on a call.
And now it runs every hour for them through portable.
So that's where we focused up until now was all these things that no one else wants to support.
We have a path to build 10,000 integrations.
Most of where we're spending our time path to build 10 000 integrations um most of where
we're spending our time now actually is the biggest integrations so so so so what okay so the obvious
counter to what you just said there is the the approach that stitch took and airbiter taking now
which is to open source the connector part um and then have the mark and have to have to have to
crowdsource those connectors and i suppose if there is demand out there,
then logically, you know,
the open source community would kind of rise to that,
although there's not necessarily a monetization
sort of strategy there.
So, okay, why are you going to succeed?
And why did Stitch maybe, you know,
you could say failed, but they're bought by talent.
But why are you going to succeed and say Stitch
and Airbyte now aren't going to solve that problem?
Totally. So a couple of things. One, to succeed and say Stitch and Airbyte now aren't going to solve that problem? Totally.
So a couple of things.
One, I actually would say Stitch is one of the biggest success stories of any
ELT out there.
They didn't raise any money and they sold for $60 million.
So we pioneered a lot of the stuff that is being used by someone like Airbyte and other
people in the open source world and just helped define the category.
So I have a ton of respect for their team, what they've built, the product, et cetera.
I do think they proved that open source connectors are bound to fail though, is the answer. And I think Airbyte has also reproven the same thing. And let me explain this. So if you think about an open
source project, like a database, like Postgres, you've got one thing that thousands of companies,
if not more, are all working on together. If there's a problem with that one thing,
someone is going to raise their hand and say, I need this fixed. I'm going
to open a PR and update this thing. Amazing. You can now draw the analogy and say, wow, like
Singer is the same thing or Airbyte open source, same thing. Like if there's something wrong with
Airbyte open source, I'm going to open a pull request and contribute
to it. Great. They have however many contributors, a thousand contributors. That's not how integrations
work though. The way integrations work is every single connector in the singer catalog is its own
open source project. And when you look at contributor counts
and you look at how many people actually care
about contributing,
the Postgres connector from Singer
is going to be 500 people that care about it.
The Salesforce connector is going to be 400 people
that care about it.
The Pipedrive connector is going to be 20 people
that care about it.
When you get to a niche connector, one person cares about it. They're the one that maybe opened
initial pull request. And then the question there is what really incentivizes someone in one of
these long tail open source connectors to resubmit whatever change they made versus just fork it and
write the code themselves. Because there's no community around connector
number 200. No real incentive for anyone to push a change back to server or air byte for that
connector. So the way to think about open source integration problem, open source integration
solutions is fundamentally different than the way you have
to think about something like Postgres or an open source thing. You like each of those tools
is 100 to a thousand open source projects. When something breaks for a customer who's on call,
like who fixes the problem? Like you can't,'t if it's postgres someone will raise their hand
and say i'll fix this um if it's not someone's just gonna go write the code in their own local
deployment of whatever that connector is so like that's the that's the dynamic at play in the open
source world it's why the whole long tail of integrations and like it's an answer like it's
better than we can't support
it at all. Their answer is write the code and maintain it yourself and we'll run it for you.
But you have to maintain it yourself. So the biggest difference between us and them is
we treat it as our problem. If build connector 899 for you and it breaks, our team's going to
reach out to you and tell you what happened and
what we're doing to fix it. And if we can't fix it, what you need to do to help us to go fix it
on your behalf. We treat every one of these as our problem. And if something fails, I'm looking at it,
our team's looking at it, and we're saying, what do we do to fix this for our customers? Not
how do we make our customers deal with this problem themselves?
So that's the fundamental difference between us and the open source integration world is it's our problem.
If things do not work, it's my problem.
It's not our customer's problem.
Okay.
So I suppose another difference I've noticed is the way you price things.
So it's
become a bit of a kind of a meme over the years how much it costs to run say five tran although
obviously it can be a good service so so they and everyone's moving to consumption-based pricing but
you're not yeah so so why is that and how do you make money yep um there's a you kind of have to
think about the dynamics in the market on this um Um, everyone, so I get this question a lot. I'm like, Hey, fixed fee, we'll move your Salesforce data
fixed. We'll move 10 of your data sources for 1500 bucks a month. I don't care how much data
you're moving. You can move a hundred million Klaviyo records a month. I don't care. Um,
and people come to me and they're like, this is stupid. This is a bad
business. What are you doing? I don't want to work with you if every time we move data, we're losing
you money. And that's a very reasonable question to ask. The answer is the narrative that's been
spun in this world is the cost to move one record every month is wildly expensive for every ELT tool.
And part of that's true. And part of that's wildly false. The part that's wildly false is that the
compute and networking to copy and paste data from point A to point B without really any
transformation is expensive. It's not, is the fully transparent answer. The price,
like you paying $70,000 a year to copy and paste your Salesforce data is not $65,000 a year in
networking cost for an ELT vendor. Not even close is the answer. So that's the part that's wildly false. People think that the economics of an ELT tool are data volume, data volume, data volume.
That's not how it works.
The economics of an ELT tool are much simpler than that.
It's how many employees do you have is wildly the most expensive line item that is on your
balance sheet or is on your income statement.
What does that mean? So like, let's say like we're a very, very lean team. We're less than 10 people
at this point. We've invested years in building a platform on which we can build and maintain
integrations at scale for very, very large companies. Whereas someone like Earbyte is 100. 100 people, $200,000 a piece, $20 million a year in headcount cost.
I know that that dwarfs their cloud costs.
And it's open source, so they're actually punting all the cloud costs onto their customers.
So they're not paying anything for networking.
Like, why do data volumes matter at all if you're own, inside your own environment. So there's that one. And then someone like Fivetran has 1200 employees.
Now you're talking almost $300 million in headcount cost. That's where the, that's why the price is so high.
They've, the, the, Fivetran effectively created the, Fivetran as the largest player in the
space is the one that's dictated the pricing model.
And they, they effectively use it as a proxy for value is the reason why everyone uses volume-based pricing.
It's a proxy for value.
Bigger companies have more data.
They can afford to pay more.
Therefore, in order to find some way to make more money from big companies
that appears objectively fair, it's more data equals more value. If you're moving Google Search Console data
to figure out how your SEO is working, and if you're moving HubSpot data, and if you're moving
Calendly data, Google Search Console data should not cost you $50,000. It's not that valuable.
And your HubSpot data is probably pretty valuable, but it shouldn't cost you $50,000. It's not that valuable. And your HubSpot data is probably pretty valuable,
but it shouldn't cost you $30,000 a year.
So it's one of those things of,
like the value to customers is
they just want their data sources in a place
so they can build the dashboards they need
and run their business.
Like there's a price at which they should pay
that's reasonable to be able to do that. And they shouldn't every month have to worry about
Mark. Like it does. Like I've, I've, I've, I have people all the time pinging me being like
my bill from 200, $200 a month to $5,000 in a week. That's like an existential risk to my job.
Or we did a backfill and we just spent $10,000 worth of credits in a week. That's like an existential risk to my job. Or we did a backfill and we just spent
$10,000 worth of credits in a day. And that's a scary spot to be in as a data team when it's out
of your control and you're talking those types of swings in terms of money. So when I think about it, it's, we can be a great, like economically, a very solid business.
As long as we work with customers that are happy, we have, we're efficient with our headcount and
we have the tooling to do things efficiently. And if we do that, we can make our customers'
lives significantly easier by saying, you pay us a fixed fee.
We move your data.
You don't have to worry about every day, did you do a backfill of 190 million records?
It's not going to cost you anything.
It's already included in paying for portable.
So that's what's been going on and it is the markup on one row going through computer networking
is so high that um that's what we're trying to okay okay so so you also said you said back in
earlier on the conversation you said you know within a few minutes on the phone you put together an integration okay so um so so
is it is it today easy to create integrations are you just going to be getting an open api spec and
using chat gpt and and so on i mean what what's what's and what is it where is difficult parts
in it and where is where is things like ai and so on actually kind of making you faster to do this
now so so where's the complication and what is happening now just to make it easier for you um there's kind of twofold to this one
we are very very specialized in what we do so like I have personally built anywhere from 700 to 800
integrations myself I've read thousands of sets of API documentation. So there's a big difference.
Like I have to be able to decipher, me and people on my team have to be able to decipher,
is something as simple as we just pull in the, like you look at the open API spec,
you map some stuff and it moves the data. Or is there something about it that's off? And if there's something about it that's off, then the question is, are we able
to handle that? If not, what do we do to handle it? So like even today, so I got a request for
a connector from someone I looked at, I was like, oh, this looks feasible. And then I spent a little
more time getting set up. And I was like this is a non-standard way to think about
authentication. And the only way, like the only way I can tell that is I have to actually go
through their partner approval process, create an application, like read the things and then find
out that like, oh, wait a second, this isn't OAuth, AuthCode is a made up thing that they created.
And you could parse the open API spec,
like if you can't authenticate with the API
and it's not standard,
like you need to figure that out.
So like APIs are a spectrum of like,
sure, there's a very, very small number of APIs
that have a perfect open API spec,
use standard authentication mechanisms,
don't have, and are documented accurately, their schemas, their parameters, et cetera.
Then there's the middle tier, which is not really standard.
The docs aren't really great.
The pagination isn't defined.
Maybe you have to just try it and see what happens.
Or you have to call someone or talk to a support person, or you just have to realize like that 500 error
that you would think is a server error
is actually a rate limit problem.
So like it's this,
or then you get to the other end of the spectrum.
This is my favorite anecdote for connector development.
We were integrating with a shipping and logistics tool
and the solution, like the API docs are a guy named jesse like you just call jesse
and jesse explains to you how you make the api calls like that that's the other end of the
spectrum where ai isn't going to help with that um so it's and then there's all the maintenance
stuff of like how do you actually like if if the problems, the customer's problem, they can maintain it.
They can figure it all out.
It's not, it's that, that's the open source model is you build something, you write a
Python script and then the customer is going to spend their three hours a week, every week
for the entire year troubleshooting issues.
That's not, that's not your problem is the ELT tool for us.
Like we also need to figure out how do we efficiently support and
troubleshoot things. There's a lot that goes into that from our own internal tooling, monitoring,
and learning perspective. And it's not as simple as like, oh, here's a Python script, go deploy it.
If a connector can be written as a simple python script like it's a commodity
those are the first those are the first things that are going to be commoditized um it's the
stuff that's not like that where it's got weird edge cases like we we just released salesforce
and like salesforce everyone has salesforce integration it's like some of the api endpoints
return csvs like there's two different apis There's a bulk API and a rest API.
Like the pagination for the bulk API isn't standard.
It's using header parameters.
And it's like, they don't return all the fields.
You have to like get a list of fields,
then take the list of fields.
And some of them aren't available.
So you have to actually remove them.
So like, it's that type of stuff where it's not documented like you have to just try stuff talk to people figure it
out um so that that's that that's my take on it it's like i think the the stuff ai will help with
but in our company we've removed that like like we don't really have a ton of space to do tedious stuff.
We spend a lot of our time on really weird, complex things
and trying to find whatever way possible to get customers the data they need.
Okay.
So I suppose one final area that you say no in is that you go to your site
and try to access it from the EU and you don't serve
customers there. That's quite a bold move, isn't it? Yeah, it's GDPR is the answer. So the high
level on this is there's two different ways to think about privacy, security, etc. One of them
is you open everything and then you remove things as you realize you shouldn't be doing business in
different parts of the world or things come up. Our approach has just been like, we would love to do business in Europe.
We know regulation, like it's, if it was just, this is GDPR, it's done. Like it is a box that
is never going to change. It's much simpler for us to say, okay, great.
There's a price tag for us to support those rules. Let's do it. That's not how it works.
Like regulation in every, like every country has their own regulation in the U S states have different regulation in Canada. Territories have different regulation around data. And it's one of
those things of like, they're not static. They change all the time. So for us, it's just a question of how many of those
rules and changes do we, are we able to stay up to date on from a legal perspective and a privacy
and a security perspective? It's not that like, we don't have the security measures in place. It's just the
question of tomorrow, if rules change in Europe, are we going to pay the lawyer in Europe to read
the updates and tell us how that impacts our specific business is the part that we've held off on for now. But over time, we are excited to open up Europe.
It tends to be more complex because it's Europe,
but then there's also each country has their own perspective on things.
And as new cases take place and new things are introduced,
it just gets complicated.
The U.S. isn't that much. The U.S. is – California has got its own rule.
Like it's changing.
So like I'm not saying it's any simpler here.
But it's just the question of complexity.
As I said, we're a super lean team.
So our answer has been for now just say, hey, sorry, we don't support those like other regions and like that way people we
don't waste people's time and we're efficient and then when we do open things like we're excited to
to help and finally ethan um if somebody wants to find out more about portable how can they do that
so to find out about portable you can go to portable.io. You can sign up and try it today.
You can also follow me on LinkedIn.
It's Ethan Aaron.
I post my thoughts.
If you have perspectives, feel free to comment.
I'll comment back or send me a message.
I'm happy to grab time.
Okay, brilliant.
Well, it's been fantastic speaking to Ethan.
Appreciate your time now.
So thank you very much and stay in touch.
Absolute pleasure.
Really enjoyed the conversation