Drill to Detail - Drill to Detail Ep. 113 ‘Data Teams, Portable and Data Integration’s Long Tail’ featuring Special Guest Ethan Aaron

Episode Date: October 2, 2024

Mark Rittman is joined in this episode by Ethan Aaron, Founder & CEO of Portable, to talk about what you should do if you’re the first data hire at a company, when and when not to hire a consult...ant, the founding story of Portable.io and the long tail (and economic model) of the data integration connectors market.Ethan Aaron LinkedIn ProfilePortable.io Homepage“The biggest misconceptions about data integrations..” (LinkedIn)“You join a 100 person company as the head of data. What should you do?” (LinkedIn)“The stuff no one will tell you about running a data team” (LinkedIn)

Transcript
Discussion (0)
Starting point is 00:00:00 So welcome to this second episode in a new series of Drill to Detail sponsored by Rippman Analytics and I'm your host Mark Rippman. So I'm very pleased to be joined this episode by none other than Ethan Aaron. So welcome to the show, Ethan. Thanks so much for having me, Mark. Excited for the conversation today. Thank you. So Ethan, for anyone who doesn't know you, just maybe just introduce who you are and what you're currently doing at Portable. Totally. So I'm the founder and CEO of Portable.
Starting point is 00:00:44 Portable is an ELT tool. So we help data teams extract data from 1500 different applications and centralize it into their data warehouses, Snowflake, BigQuery, Redshift, et cetera. And my background before this, I was running the data team at LiveRamp. I was the data person at a startup. And even today inside of Portable, I am the data person at a startup, and even today inside of Portable, I am the data person. So a combination of both building integrations, helping data teams with ELT, as well as thinking about best practices and strategies for managing internal analytics or external analytics, depending on how your data team creates value. Okay, fantastic.
Starting point is 00:01:21 So how did you get into, I mean, you mentioned that you were doing work at LiveRamp and so on, but how did you get into, I suppose, the work you do now? Because I think you were at Goldman Sachs, first of all, when you were doing different sort of work. Let's give us a bit of a story about how you got into this kind of world. I love to just learn new stuff is the answer on that. So undergrad, I studied mechanical engineering and business because I couldn't decide if I liked technology more or I liked business more. Then I went to Goldman Sachs and I was doing real estate investing. So I was doing real estate, private equity. We were buying office buildings, multifamily units, et cetera. And it seems like a finance job, which it was, but I found myself getting really excited about
Starting point is 00:02:02 the operational side about, Hey, like we have all these meetings, certain documents, can we centralize them into one place so we can slice and dice the information across, um, all the information we have. So I was getting really excited about the operational side and less excited about the, um, the actual deals. So a couple of years in, I was like, I need to go to a startup. Something about it just felt right. So I joined a 12-person data startup called Arbor. And I knew nothing about data. I knew nothing about startups. And I knew nothing about sales. My job was supposed to be sales. But I was like, that sounds amazing. I get to learn about all these cool things.
Starting point is 00:02:42 So I showed up on the first day. And the CEO, instead of saying, hey, great, let's do sales. His question was, do you know SQL? Do you know Shell Script? Can you build dashboards and implement customers? And I was like, no, but I can figure that out. So I bought five or six books on what I thought were good books on SQL. I found out looking back at one of them now, it was just a book on MySQL databases, which I didn't know it had the word SQL in it. So I was reading all these books trying to figure out how this stuff works. Same thing on the shell script front. And then I was just banging my head against our production, like a read replica of our database. And the goal was, what insights does our CEO need to run the business and track how things are going?
Starting point is 00:03:31 So that's where it started, was really that just day where someone was like, hey, we need insights. Like, can you figure this out? And I was like, yeah, that sounds fun. And that was 2016. So it's been eight years and I've gone pretty deep into this world at this point. Okay, excellent. And then you ended up at LiveRamp. So I've heard the name LiveRamp a few times. What do you do there then? So we got acquired by LiveRamp. And the first year I was the head of product for publishers. So LiveRamp is a big ad tech, mar tech company works with the, works with the fortune 500
Starting point is 00:04:08 moves petabytes of data. It's an integration company. And I was helping publishers did that for about a year. Um, and then I looked around and realized LiveRamp was a thousand person division of a publicly traded company and it didn't have a centralized analytics function. So I wrote the job description to become the head of business intelligence at LiveRamp. And the exec team said, that sounds great. Do it. So kind of similar to my first role in data, where I showed up and they were like, do you know SQL? And I was like, no, but I'll figure it out. Same thing kind of happened at LiveRamp where I was like, do I know how to stand up all of our
Starting point is 00:04:50 data architecture and infrastructure? And do I know what our execs care about? At the time, no, but it was a phenomenal way for me to learn what matters to executives inside of a thousand person company. How do you find that out? And a lot of it just comes down to having lots of conversations with lots of people and asking them what they care about. And then also looking internally at the various data infrastructure we already had in place, because we did shock. We didn't have a centralized team, but marketing had their own data warehouse with their own tooling product, had a different data warehouse with their own tooling finance. Someone had set up their own data stack so like we had these pieces none of the people
Starting point is 00:05:29 were talking to each other and it was a phenomenal place for me to go look at okay here are all the tools we could be using here are all the approaches we could take is there one approach that would work where we could centralize all of this so that we don't have to restart three times every time we want to answer a question um so did data there uh for about a year and then moved over to work for the chief strategy officer and head of m&a and i spent about a year taking a step back and instead of working like standing up infrastructure interviewing execs to figure out what matters and actually solving problems like writing queries against our data sources. I was looking at the data integration ecosystem through the lens of partnerships and acquisitions. So accommodate like anything from customer data platforms, iPass tools to ETL tools, ELT tools to anything in the market tech ad tech landscape. And I was coming at it through the lens of like, what what's out there that could
Starting point is 00:06:33 be a strategic asset to I rep and as I was I was as I was digging in, to me, I got really excited about some of the some of the no code, ETLcode ELT capabilities. And that's really what kicked off the idea for portable. Okay, okay. We'll come back to that in a bit, because obviously there's a massive area to talk about there. But really, I suppose the way I'm particularly sort of aware of you was through you're a bit of a man with an opinion, really.
Starting point is 00:07:04 And you post on LinkedIn, you run sort of low key data meetups. So let's get on to some of the things that you've been talking about. And in particular, I want to talk about some of the thoughts you've had around or some of the thinking you have around, I suppose, you know, when you start a date, when you start a company's head of data, what should you do? What should you do prioritizing totally and my general advice both when you start as the head of data at a company or just today if you're already inside of a data company take a step back and realize that the reason why you are there is to create value for other people inside the organization like you are not going to go close the next deal. So you have to be there
Starting point is 00:07:46 effectively in service of other people inside the company. So the first thing I recommend when you join, or even today, if you haven't done it in a while is go figure out what matters to the people running the company. And hopefully like you are, or you become one of the people running the company. But even if you are, you still need to figure out what everyone else cares about. You have to go talk to the CEO. You have to go understand what the board is asking about. You have to talk to the CRO or the CMO or the head of HR. And because every, like, as I've worked in analytics, I've worked in strategy, I'm CEO now.
Starting point is 00:08:21 And from the top down, companies have a plan. They have a strategy. They say, hey, we could do a million things, but the best three things for us to do are A, B, and C. If you were working on things as a data team that are not A, B, and C, the three things that are top priority for the CEO, for the strategy of the business, and for the board, people are going to ignore it. It's not going to be high value, and they're not going to view you as a critical contributor to the future of the business. And you immediately get delegated to, oh, the data team's a cost center over there on the side, not we need this person to have a seat at the table when we're talking about the future of our business. So I don't, I don't think about what the wrong answer is to go look at the data or look at the tools. The right answer, in my opinion, is figure out what the people
Starting point is 00:09:18 at the company need to make better decisions and run the business and start there. Don't, you could close your computer for the first week just go talk to people if it's a virtual company you have to have your computer but that's always where i start yeah do you find that's particularly the the case now people are organizations i suppose are less interested in investing in data for the sake of data and they're looking to invest in it for certain reasons or for business purposes. Do you find that's particularly an issue now where the desire just to do data for data's sake is less there? Definitely. Oh, 100%. It's one of those things where five years ago and today,
Starting point is 00:09:59 a lot of companies have similar sized data teams. Let's say you're a, like, let's say you're a hundred person e-com brand. Five years ago, you had one data person, maybe two. Today, you have one data person, maybe two. So you would think that they were very similar environments. They are not. Five years ago, they just hired their one or two data people and they don't know, they didn't know why they were there they didn't know what they were going to do they didn't know how big of an opportunity data was so it was this like upward you want more money for your data team you want more time to go do r&d go for it and you end up with this data for data sake thing um and this the sad part is that one to two person data team from five years ago, five years ago,
Starting point is 00:10:45 grew to five people. And then it grew to 10 people as rates were super low and valuations were crazy. And no one along the way was asking the question of what do our execs care about? How am I going to justify the 10 person headcount, which could be millions of dollars? And then that team got cut down to two people again. And so like the environment we're in today is not a, like execs don't trust, in my opinion, execs don't trust most data teams. They look at them as like, I trusted you before you had 10 people, didn't work. Now we're down to two.
Starting point is 00:11:25 And the question is, why do we have two? Do we need two? Or do we need one? Do we even need one? So it's this question of along the way, we always should have been asking the question of how do we add value to the business? But we had the benefit of trust five years ago. Whereas today we don't, which is kind of good because it means you have to justify everything. It's painful. A lot of
Starting point is 00:11:55 people don't like doing it because five years ago they didn't have to. Um, but in today's market, like the, the execs have seen it not work and it's like the whole fool me once thing but it's like um they they need to know that every additional headcount or every additional hundred thousand dollars you want to spend on tooling is going to be off five times so that's that's kind of the market we're in today And it just means you can't go stare at an IDE for 40 hours a week and assume people will be happy with that. You have to go find the stuff that matters to the business, and then you can go do the stuff that you might think is fun. Do you think really over the last, say, five years, the role of the head of data has changed a fair bit? It used to be really more about how quickly could you hire a team and who do you know?
Starting point is 00:12:46 What do you think? I kind of, I think about, I don't think five years ago, people thought it was a, can you hire everyone type of role. That's what it turned into because the budget was just there. I think that's kind of not just a data thing.
Starting point is 00:12:59 That's been everywhere. It's like you joined a company five years ago, most companies were growing quickly and you could go from a one-person team to a three-person team to a 10-person team. So that was kind of the default. There was a lot of cool technology that was also coming out. All the tech companies in the data world were wildly overfunded, which meant they could give things away. They didn't have to charge for stuff. You could try them all all and it seemed fun. Whereas today, if you joined five years ago and you said, I need more headcount, I want
Starting point is 00:13:32 to hire a team, people would be like, yeah, that sounds cool. Let's do it. That'll add more value somehow. Today, there's a lot of companies where there's no expectation you'll be able to hire anyone. There's no expectation your'll be able to hire anyone. Like there's no expectation your team's going to grow. So they like, you have to realize that the team you're in today is probably going to be the same size, if not smaller in a year. And people, not just leaders of data teams, but people on data teams need to realize the skill set that's valuable today is a generalist skill set.
Starting point is 00:14:04 Whereas the skill set of five years ago was a hiring skill set that's valuable today is a generalist skill set. Whereas the skill set of five years ago was a hiring skill set. And three years ago was a specific skill. You could be a data scientist, or you could be an analytics, or you could be a data engineer because you're on a 10 person team. You were a data scientist two years ago on a 10 person team. And now your team is two people. You're no longer a data scientist. Like you are now a generalist and you have to be a generalist if you want to do what needs to be done for your business. So I think that that's, what's being rewarded today is people that can be scrappier, that can learn more parts of a tech stack that can do more with less um
Starting point is 00:14:46 whereas three years ago that was different five years ago it was but today it's it's create more value that cost us to the business and do it efficiently okay okay are you finding at the moment that there are something we found i'm noticing in the uk is that there are less projects coming along or less initiatives coming along where it's an organization that off its own back is saying we want to invest in analytics to achieve this thing. You know, where we're finding there is there is work and there is project work is more where maybe they're doing a transformation. They're moving from, say, on premises to cloud or whatever. you know, we're seeing less over here of, I suppose, you know, projects that are being started voluntarily by organizations to try and improve their revenue, for example, using data. Is that maybe just a UK thing? Or are you finding that in the US as well?
Starting point is 00:15:34 I think a lot of the hype stuff of just like, let's make an R&D investment into a thing has shifted from data and analytics to AI. So I think you've seen more of those types of events, like execs in bigger companies and even smaller companies really have one thing at a time where they're like, I need to make this bet because my board is telling me to make this bet. Should they make it? I don't know. But like five years ago, that was data. That was like, okay, great. We need to do this. Let's hire the one person. Oh, the one person says we need two more people and we're $100,000 in software. We got to make this bet because it's somehow going to pay off. And there was this blind trust. I think a lot of that
Starting point is 00:16:16 model has now shifted into AI. So I think I've seen a lot more of that in like, I've seen data teams. It's kind of sad, but the reality I've seen data teams where they were small then they grew they got a bunch of funding for just like this analytics wave a few years ago and then they were starting to downsize and then the board came to them and said can you help us with ai and they had a they had like a very real question of do they say no we're the data team and continue to decline? Or do they say, sure, we'll help you with AI and take the like exploratory budget. So they took the exploratory budget in one case, at least. So it's like, I think that's what that, that's shifted.
Starting point is 00:16:58 Like it's been, that big bet has been replaced by AI in a lot of companies. And then the other part is I think a lot of, we've lost a lot of that trust. Like when we have to earn, we have to earn it back. Like you, someone's not going to come back in and say, like, let's invest in data for data's sake again after it didn't work the first time and they had to fire all those people. So you have to go in and be like,
Starting point is 00:17:22 here's why this is, like, here's the business problem we're going to solve that's going to save half a million dollars a year. That's enough to hire one person. That's all we're going to ask for to start. If we can find another $3 million in savings, let's do it. But the math has to be there now. It didn't have to be five years ago. Okay.
Starting point is 00:17:41 So again, a lot of things you've been talking about in the past have been about saying no. Okay. years ago okay so so again a lot of things you've been talking about in the past a bit about saying no okay so so so saying no to every data request you have that comes through saying no to people but creating reports for people who aren't the ultimate bosses but actually are you know uh that they report to those bosses and also saying no to expensive technology for the sake of it really i mean so tell us about that then really what what why do you think that's important it is i've i've been in both i've done both so i worked at goldman sachs right right out of college goldman sachs teacher and big banks teach you a lot like i'm very very happy i did it it taught me some great things taught me some things i had to unlearn one of those was you cannot say no so what in banking world what you effectively have to do is you work
Starting point is 00:18:27 120 hours a week. That's just, that's, that's the answer. So like you get everything done, but you say yes to everything. Someone could say, Oh, print these papers and you have to go print the papers. Um, is it worth doing that instead of doing something else? Doesn't matter because you're just going to get it all done because you have to work 120 hours. Most people don't. Most people don't work 120 hours and they have to make some sort of trade off. And when I went to Arbor after Goldman, I learned a ton from the CEO there of just the importance of saying no, because it frees up time to focus on the highest value things. So like, if you think about it, if you have,
Starting point is 00:19:11 everyone has some amount of hours they can work in a week. Some people that's 40, some people at 60, some people it's 120. Let's say it's 40. But if you only have 40 hours a week and you say yes to 39 one hour tasks, every that are low value, you only have one hour left to work on the highest value thing for your company. Whereas if you say no to 39 things, you don't have 40 hours a week to go try and solve the biggest problem your company faces right now. If you can solve that problem, I promise you it's worth a lot more than 40 times solving all the other. Um, it's really like the power of that
Starting point is 00:19:54 didn't sink in for me until it was kind of like unlearned everything I'd learned before. But if you go to an exec in your company and you say, Hey, I could do these 30 things, or I could do that one and it would be great. And it would save you $5 million from our top line or add $5 million to our top line or save us $2 million on our bottom line. Most execs are going to tell you to do the big one. They're going to do the thing that. They're going to say, do the thing that I care about. Ignore all the other stuff that the junior person that works for the junior person on my team has asked you for. You can't say no if you haven't identified the high value thing, because then you're just saying no and you're not doing any work.
Starting point is 00:20:39 So it all comes down to say no to low value stuff in service of doing high value things. But for that to work, you also need to be finding the high value things. You have to spend the time to identify what are the biggest things you can do for the business. So to your mind, where do you see consultants adding value at the moment? And do you see consultants being used? And do you think infrastructure requires bringing specialists these days to set up? Yeah, I have a lot of perspectives on this. So when I think about data teams, consultants are great. It depends on the situation. Let me
Starting point is 00:21:15 explain. There are two very, in my opinion, two very different skill sets you need out of a data function. So let's say you want to have a one person data team. You can extrapolate this out to bigger teams too, but let's say you want to have a one person data team. And right now you have nothing. Option number one, what a lot of people default to is we need a data team of one. I'm going to open a job just like I do for anything else. And I'm going to hire the data person. And the problem with that is the hiring manager, could be the CEO, could be the CTO, could be the CFO says, I'm going to open a job for a head of analytics. They need to know SQL, a BI tool that I've randomly picked and some other tech that someone told me is cool. They don't know how to evaluate that person.
Starting point is 00:21:59 They don't know how to hire that person. They don't know how to measure the success of that person. Even people that have been doing this for a while don't know how to measure the success of a data team. But once you hire that person, they have to do two fundamentally different jobs at different parts of their life cycle. The first one month, six months, they have to stand up all the stuff. They have to actually go from, we have nothing to, we need a data warehouse, a BI tool, someone to get data from our systems into the data warehouse. And we need to actually get value in front of someone. Like you can add all the other tools if you want, but like the minimum viable tech sec, they have to stand up is warehouse, BI tool,
Starting point is 00:22:42 some way to get data in um and the problem is and this this was kind of me a lot this this not kind of this won't be at live ramp is if you hire someone in to do that who isn't constantly doing that it's going to take them a lot longer so like it's me the amount of time it took me to stand up our data stack at LiveRamp, I was learning it all from scratch. I didn't know the latest. I didn't even know the options for warehouses or BI tools or ETL tools. So they were effectively paying me, which is a great learning experience for me, but they were effectively paying me to research the market to understand which tools i should use to do the thing and then because i was researching it and i was new to this we all me and everyone
Starting point is 00:23:34 around me had to question every decision it's like do we use bigquery or do we use snowflake do we use this elt tool or that elt tool like which which visualization tool do we use this ELT tool or that ELT tool? Like which, which visualization tool do we use? And the problem with that is it makes everything take 10 times as long. The amount of time it would take that full-time hire to do that, like is going to take 10 times as long as if they hire someone like you or another consultant to just say, come in, please stand up my data stack. And you'll come with an opinion. You'll come and be like, I've seen this before. I just stood something up last month. And I know the nuances of your business. I know because you're pulling this type of data,
Starting point is 00:24:13 we can't use this. Or like, oh, you're a healthcare company. Therefore, HIPAA applies that shrinks everything we can do in terms of tooling. Here's the other approach. The standing up of all the infrastructure is something that consultants do nonstop. So in that situation, the premium you pay, and a lot of the times it might not even be a premium, but the idea of going to a consultant and saying, I need all this stood up. Here are my requirements. You'll get opinions, you'll get speed, and you'll get everything that works together and you'll it'll also give the hiring manager who might not know a lot about data
Starting point is 00:24:50 enough smart context on the tools the outputs and how to think about it from working with the consultant that they can then hire the person to do the steady state. The steady state person can also have like some opinions about the tech stack, but it's like the steady state person you probably want full-time depends on the company, but you probably want full-time person who is going to walk the hallways of your business and spend time constantly saying, CMO, what, what matters to you right now? Like director of marketing, what matters to you right now? CRO, what matters to you right now? Director of marketing, what matters to you right now? CRO, what matters to you right now? So it's like, there's two different jobs. The first job is to stand everything up. The second job is walk the
Starting point is 00:25:33 hallways and find out what matters to the business strategically every month, every quarter. The first job, in my opinion, it's cheaper, faster. You're going to get better solutions from paying a great consultant to do it quickly. The second one, there's actually two answers to the second one. If you can't afford a full-time person, you hire consultants on retainer for cheaper than a full-time person, which is another scenario where it makes sense to just use freelancers or consultants. Once you get above effectively paying for a pain enough to justify a full-time equivalent, then the question is just where is the best talent? If you could like, there is great, I see, there's great IC and manager talent out there that will work in a house.
Starting point is 00:26:22 It is more difficult to find than finding great consultants, in my opinion. So that's how I would describe it. Yeah. Yeah. Okay, okay. So let's talk about Portable then. Okay, so you obviously, when you were at LiveRamp,
Starting point is 00:26:40 you looked at the market, and there are a fair amount of of data integration data extraction uh sort of products out there back then and now so so what was the problem you were trying to solve what was the i suppose the gap in the market you were trying to solve with portable and what was the problem you're trying to solve that wasn't being solved before by the way um if you talk to enough data integration tools, you realize they all prioritize in the same way. They all say, what connector do you need? And they say it to the market. And what that leads to is everyone comes in and they say, I need Salesforce. I need Postgres. I need NetSuite. I need the most common things, Facebook ads, Google ads, Shopify. And that's great because the market for each of those is very, very large.
Starting point is 00:27:28 The number of people that use Salesforce is very large. The number of people that use Shopify is very large. So there's a big market there. What that ends up leading to, though, is you end up with 100 ETL tools that all have the same connectors. So you end up with 100 tools that all have Salesforce same connectors. So you end up with a hundred tools that all have Salesforce and they all have Shopify and they all have Postgres and MySQL. And as a data person, like great, you have a hundred options for Salesforce, but like, so all these companies, let's say you have a hundred connectors. When you need connector number 101, the thing that
Starting point is 00:28:05 none of them prioritize, the problem is none of them prioritize it. So they've all built the same thing a hundred times over and they haven't, none of them have built any of the other stuff. So it's this like supply and demand imbalance, in my opinion, of there is more demand for Salesforce than there is for some niche CRM system for a vertical. However, even though there's less demand for the niche CRM system, there is zero supply for it. So for us, the question we asked was is there a way to build all the stuff no one else has ever decided to build?
Starting point is 00:28:45 And the big challenge there is just you need to be able to do this more efficiently because if the market for that niche CRM system is very small, you can't make a three engineer investment into building it. So you have to be really, really efficient at building those connectors. So when we started, our vision was, is it possible to build 10,000 integrations? The high watermark at the time and in our space, in the ELT space, was 150 integrations. So we were trying to 10x that and then 10x that again was our vision and mission. And it sounded absolutely insane. I talked to founders of other companies in our vision and mission. And it sounded absolutely insane.
Starting point is 00:29:26 Like I talked to founders of other companies in our space and they were like, this scales with people, impossible, not gonna happen. But that's been our goal is, is there a way to build 1000 integrations? And we started with the long tail. We at this point have 1500 plus integrations, no code integrations.
Starting point is 00:29:44 People can log in, put in their credentials, click run. And it just means that if someone needs a niche connector, we build it today. I, I, yesterday I built a connector on a call with a prospect or they're, they're an existing customer, but on the call, they were like, Hey, I really need your help. Or any chance your team's going to be able to support this integration? I was like, can you send me the docs? And by the end of the call, this was a 30 minute call. By the end of the call, it was live and portable. They had put in their API key, set up a new schema in Snowflake, clicked run, and they were looking at the data in Snowflake. And this was a tool that they had
Starting point is 00:30:22 stood up the day before. and it is a startup. There is no other, no one else is going to build that integration. We built it on a call. And now it runs every hour for them through portable. So that's where we focused up until now was all these things that no one else wants to support. We have a path to build 10,000 integrations. Most of where we're spending our time path to build 10 000 integrations um most of where
Starting point is 00:30:45 we're spending our time now actually is the biggest integrations so so so so what okay so the obvious counter to what you just said there is the the approach that stitch took and airbiter taking now which is to open source the connector part um and then have the mark and have to have to have to crowdsource those connectors and i suppose if there is demand out there, then logically, you know, the open source community would kind of rise to that, although there's not necessarily a monetization sort of strategy there.
Starting point is 00:31:13 So, okay, why are you going to succeed? And why did Stitch maybe, you know, you could say failed, but they're bought by talent. But why are you going to succeed and say Stitch and Airbyte now aren't going to solve that problem? Totally. So a couple of things. One, to succeed and say Stitch and Airbyte now aren't going to solve that problem? Totally. So a couple of things. One, I actually would say Stitch is one of the biggest success stories of any
Starting point is 00:31:30 ELT out there. They didn't raise any money and they sold for $60 million. So we pioneered a lot of the stuff that is being used by someone like Airbyte and other people in the open source world and just helped define the category. So I have a ton of respect for their team, what they've built, the product, et cetera. I do think they proved that open source connectors are bound to fail though, is the answer. And I think Airbyte has also reproven the same thing. And let me explain this. So if you think about an open source project, like a database, like Postgres, you've got one thing that thousands of companies, if not more, are all working on together. If there's a problem with that one thing,
Starting point is 00:32:22 someone is going to raise their hand and say, I need this fixed. I'm going to open a PR and update this thing. Amazing. You can now draw the analogy and say, wow, like Singer is the same thing or Airbyte open source, same thing. Like if there's something wrong with Airbyte open source, I'm going to open a pull request and contribute to it. Great. They have however many contributors, a thousand contributors. That's not how integrations work though. The way integrations work is every single connector in the singer catalog is its own open source project. And when you look at contributor counts and you look at how many people actually care
Starting point is 00:33:09 about contributing, the Postgres connector from Singer is going to be 500 people that care about it. The Salesforce connector is going to be 400 people that care about it. The Pipedrive connector is going to be 20 people that care about it. When you get to a niche connector, one person cares about it. They're the one that maybe opened
Starting point is 00:33:29 initial pull request. And then the question there is what really incentivizes someone in one of these long tail open source connectors to resubmit whatever change they made versus just fork it and write the code themselves. Because there's no community around connector number 200. No real incentive for anyone to push a change back to server or air byte for that connector. So the way to think about open source integration problem, open source integration solutions is fundamentally different than the way you have to think about something like Postgres or an open source thing. You like each of those tools is 100 to a thousand open source projects. When something breaks for a customer who's on call,
Starting point is 00:34:20 like who fixes the problem? Like you can't,'t if it's postgres someone will raise their hand and say i'll fix this um if it's not someone's just gonna go write the code in their own local deployment of whatever that connector is so like that's the that's the dynamic at play in the open source world it's why the whole long tail of integrations and like it's an answer like it's better than we can't support it at all. Their answer is write the code and maintain it yourself and we'll run it for you. But you have to maintain it yourself. So the biggest difference between us and them is we treat it as our problem. If build connector 899 for you and it breaks, our team's going to
Starting point is 00:35:03 reach out to you and tell you what happened and what we're doing to fix it. And if we can't fix it, what you need to do to help us to go fix it on your behalf. We treat every one of these as our problem. And if something fails, I'm looking at it, our team's looking at it, and we're saying, what do we do to fix this for our customers? Not how do we make our customers deal with this problem themselves? So that's the fundamental difference between us and the open source integration world is it's our problem. If things do not work, it's my problem. It's not our customer's problem.
Starting point is 00:35:38 Okay. So I suppose another difference I've noticed is the way you price things. So it's become a bit of a kind of a meme over the years how much it costs to run say five tran although obviously it can be a good service so so they and everyone's moving to consumption-based pricing but you're not yeah so so why is that and how do you make money yep um there's a you kind of have to think about the dynamics in the market on this um Um, everyone, so I get this question a lot. I'm like, Hey, fixed fee, we'll move your Salesforce data fixed. We'll move 10 of your data sources for 1500 bucks a month. I don't care how much data
Starting point is 00:36:16 you're moving. You can move a hundred million Klaviyo records a month. I don't care. Um, and people come to me and they're like, this is stupid. This is a bad business. What are you doing? I don't want to work with you if every time we move data, we're losing you money. And that's a very reasonable question to ask. The answer is the narrative that's been spun in this world is the cost to move one record every month is wildly expensive for every ELT tool. And part of that's true. And part of that's wildly false. The part that's wildly false is that the compute and networking to copy and paste data from point A to point B without really any transformation is expensive. It's not, is the fully transparent answer. The price,
Starting point is 00:37:06 like you paying $70,000 a year to copy and paste your Salesforce data is not $65,000 a year in networking cost for an ELT vendor. Not even close is the answer. So that's the part that's wildly false. People think that the economics of an ELT tool are data volume, data volume, data volume. That's not how it works. The economics of an ELT tool are much simpler than that. It's how many employees do you have is wildly the most expensive line item that is on your balance sheet or is on your income statement. What does that mean? So like, let's say like we're a very, very lean team. We're less than 10 people at this point. We've invested years in building a platform on which we can build and maintain
Starting point is 00:37:56 integrations at scale for very, very large companies. Whereas someone like Earbyte is 100. 100 people, $200,000 a piece, $20 million a year in headcount cost. I know that that dwarfs their cloud costs. And it's open source, so they're actually punting all the cloud costs onto their customers. So they're not paying anything for networking. Like, why do data volumes matter at all if you're own, inside your own environment. So there's that one. And then someone like Fivetran has 1200 employees. Now you're talking almost $300 million in headcount cost. That's where the, that's why the price is so high. They've, the, the, Fivetran effectively created the, Fivetran as the largest player in the space is the one that's dictated the pricing model.
Starting point is 00:38:56 And they, they effectively use it as a proxy for value is the reason why everyone uses volume-based pricing. It's a proxy for value. Bigger companies have more data. They can afford to pay more. Therefore, in order to find some way to make more money from big companies that appears objectively fair, it's more data equals more value. If you're moving Google Search Console data to figure out how your SEO is working, and if you're moving HubSpot data, and if you're moving Calendly data, Google Search Console data should not cost you $50,000. It's not that valuable.
Starting point is 00:39:44 And your HubSpot data is probably pretty valuable, but it shouldn't cost you $50,000. It's not that valuable. And your HubSpot data is probably pretty valuable, but it shouldn't cost you $30,000 a year. So it's one of those things of, like the value to customers is they just want their data sources in a place so they can build the dashboards they need and run their business. Like there's a price at which they should pay
Starting point is 00:40:04 that's reasonable to be able to do that. And they shouldn't every month have to worry about Mark. Like it does. Like I've, I've, I've, I have people all the time pinging me being like my bill from 200, $200 a month to $5,000 in a week. That's like an existential risk to my job. Or we did a backfill and we just spent $10,000 worth of credits in a week. That's like an existential risk to my job. Or we did a backfill and we just spent $10,000 worth of credits in a day. And that's a scary spot to be in as a data team when it's out of your control and you're talking those types of swings in terms of money. So when I think about it, it's, we can be a great, like economically, a very solid business. As long as we work with customers that are happy, we have, we're efficient with our headcount and we have the tooling to do things efficiently. And if we do that, we can make our customers'
Starting point is 00:41:02 lives significantly easier by saying, you pay us a fixed fee. We move your data. You don't have to worry about every day, did you do a backfill of 190 million records? It's not going to cost you anything. It's already included in paying for portable. So that's what's been going on and it is the markup on one row going through computer networking is so high that um that's what we're trying to okay okay so so you also said you said back in earlier on the conversation you said you know within a few minutes on the phone you put together an integration okay so um so so
Starting point is 00:41:45 is it is it today easy to create integrations are you just going to be getting an open api spec and using chat gpt and and so on i mean what what's what's and what is it where is difficult parts in it and where is where is things like ai and so on actually kind of making you faster to do this now so so where's the complication and what is happening now just to make it easier for you um there's kind of twofold to this one we are very very specialized in what we do so like I have personally built anywhere from 700 to 800 integrations myself I've read thousands of sets of API documentation. So there's a big difference. Like I have to be able to decipher, me and people on my team have to be able to decipher, is something as simple as we just pull in the, like you look at the open API spec,
Starting point is 00:42:41 you map some stuff and it moves the data. Or is there something about it that's off? And if there's something about it that's off, then the question is, are we able to handle that? If not, what do we do to handle it? So like even today, so I got a request for a connector from someone I looked at, I was like, oh, this looks feasible. And then I spent a little more time getting set up. And I was like this is a non-standard way to think about authentication. And the only way, like the only way I can tell that is I have to actually go through their partner approval process, create an application, like read the things and then find out that like, oh, wait a second, this isn't OAuth, AuthCode is a made up thing that they created. And you could parse the open API spec,
Starting point is 00:43:26 like if you can't authenticate with the API and it's not standard, like you need to figure that out. So like APIs are a spectrum of like, sure, there's a very, very small number of APIs that have a perfect open API spec, use standard authentication mechanisms, don't have, and are documented accurately, their schemas, their parameters, et cetera.
Starting point is 00:43:51 Then there's the middle tier, which is not really standard. The docs aren't really great. The pagination isn't defined. Maybe you have to just try it and see what happens. Or you have to call someone or talk to a support person, or you just have to realize like that 500 error that you would think is a server error is actually a rate limit problem. So like it's this,
Starting point is 00:44:14 or then you get to the other end of the spectrum. This is my favorite anecdote for connector development. We were integrating with a shipping and logistics tool and the solution, like the API docs are a guy named jesse like you just call jesse and jesse explains to you how you make the api calls like that that's the other end of the spectrum where ai isn't going to help with that um so it's and then there's all the maintenance stuff of like how do you actually like if if the problems, the customer's problem, they can maintain it. They can figure it all out.
Starting point is 00:44:48 It's not, it's that, that's the open source model is you build something, you write a Python script and then the customer is going to spend their three hours a week, every week for the entire year troubleshooting issues. That's not, that's not your problem is the ELT tool for us. Like we also need to figure out how do we efficiently support and troubleshoot things. There's a lot that goes into that from our own internal tooling, monitoring, and learning perspective. And it's not as simple as like, oh, here's a Python script, go deploy it. If a connector can be written as a simple python script like it's a commodity
Starting point is 00:45:26 those are the first those are the first things that are going to be commoditized um it's the stuff that's not like that where it's got weird edge cases like we we just released salesforce and like salesforce everyone has salesforce integration it's like some of the api endpoints return csvs like there's two different apis There's a bulk API and a rest API. Like the pagination for the bulk API isn't standard. It's using header parameters. And it's like, they don't return all the fields. You have to like get a list of fields,
Starting point is 00:45:59 then take the list of fields. And some of them aren't available. So you have to actually remove them. So like, it's that type of stuff where it's not documented like you have to just try stuff talk to people figure it out um so that that's that that's my take on it it's like i think the the stuff ai will help with but in our company we've removed that like like we don't really have a ton of space to do tedious stuff. We spend a lot of our time on really weird, complex things and trying to find whatever way possible to get customers the data they need.
Starting point is 00:46:37 Okay. So I suppose one final area that you say no in is that you go to your site and try to access it from the EU and you don't serve customers there. That's quite a bold move, isn't it? Yeah, it's GDPR is the answer. So the high level on this is there's two different ways to think about privacy, security, etc. One of them is you open everything and then you remove things as you realize you shouldn't be doing business in different parts of the world or things come up. Our approach has just been like, we would love to do business in Europe. We know regulation, like it's, if it was just, this is GDPR, it's done. Like it is a box that
Starting point is 00:47:20 is never going to change. It's much simpler for us to say, okay, great. There's a price tag for us to support those rules. Let's do it. That's not how it works. Like regulation in every, like every country has their own regulation in the U S states have different regulation in Canada. Territories have different regulation around data. And it's one of those things of like, they're not static. They change all the time. So for us, it's just a question of how many of those rules and changes do we, are we able to stay up to date on from a legal perspective and a privacy and a security perspective? It's not that like, we don't have the security measures in place. It's just the question of tomorrow, if rules change in Europe, are we going to pay the lawyer in Europe to read the updates and tell us how that impacts our specific business is the part that we've held off on for now. But over time, we are excited to open up Europe.
Starting point is 00:48:28 It tends to be more complex because it's Europe, but then there's also each country has their own perspective on things. And as new cases take place and new things are introduced, it just gets complicated. The U.S. isn't that much. The U.S. is – California has got its own rule. Like it's changing. So like I'm not saying it's any simpler here. But it's just the question of complexity.
Starting point is 00:48:56 As I said, we're a super lean team. So our answer has been for now just say, hey, sorry, we don't support those like other regions and like that way people we don't waste people's time and we're efficient and then when we do open things like we're excited to to help and finally ethan um if somebody wants to find out more about portable how can they do that so to find out about portable you can go to portable.io. You can sign up and try it today. You can also follow me on LinkedIn. It's Ethan Aaron. I post my thoughts.
Starting point is 00:49:32 If you have perspectives, feel free to comment. I'll comment back or send me a message. I'm happy to grab time. Okay, brilliant. Well, it's been fantastic speaking to Ethan. Appreciate your time now. So thank you very much and stay in touch. Absolute pleasure.
Starting point is 00:49:43 Really enjoyed the conversation

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.