a16z Podcast - AI Agents and the Fight for Customer Data

Starting point is 00:00:00 There is a new reason to have all your data in one place, which is AI agents need context. If you don't do that, then it's sort of like using chatGBT from before chatGBT was connected to the internet. Postgres, contrary to popular belief, is very old technology. It is not a good database simply because it was written a long time ago. It has a lot of technical debt. Satya has said that there's going to be the collapse of SaaS. Do you think that SaaSpocalypse is a thing and we're going to see a massive shift? The bigger threat is that AI-native companies will just,

Starting point is 00:00:30 Zoom and catch up to the established incumbents and maybe be better. Like, we'll actually have an HR, and that HR team will onboard AIs as they come. They'll be part of teams, they'll join the Slack. And in that world, these aren't software. That's actually more seats, more consumption of software. And so do you think that for enterprise agents, we're moving more to these, you treat them like humans? Or do you think that that's too far? For years, companies built data infrastructure to answer questions about the business.

Starting point is 00:01:00 Now, they're building it for AI. As agents become more capable, the challenge is no longer collecting data. It's making sure the right systems can access the right context at the right time. That shift is forcing companies to rethink everything from data platforms and APIs to enterprise software and systems of record. Martine Casato speaks with 5Tran co-founder and CEO George Frazier about AI data infrastructure and why the next wave of enterprise software may look very different from the last. So our guest today is George Frazier, who is the CEO of FiveTran. Five Tran announced the merger with DBT.

Starting point is 00:01:41 So maybe to start, just give a quick overview of what FiverN does. So Fivtran, we've been around for a while. We've been around since 2013. Had customers since 2015. 2013? Yeah, 13 years? Yeah, exactly. I've been doing this long enough that a slide about the past date in my own slides is the same

Starting point is 00:01:58 slide as the future state from when I started. But what Fivtrain does is we help our customers get all of their data from all their systems like Salesforce, NetSuite, all their SaaS tools, their own databases into one place. Getting all your data in one place, it's not a new thing. Businesses have had the need to do this since filing cabinets. The primary reason historically that people used Fivtrand to get all their data in one place was to do business intelligence. was to build reports about things like, what's your revenue, what's going on with your sales team, what are we forecasting for this quarter, all those great things. And now there is a new reason to have all your data in one place, which is if you want to use AI agents in business,

Starting point is 00:02:42 AI agents need context. And it turns out that the same data foundations that work well for business intelligence and reporting with some additions and some modifications actually can work really well for AI agents as well. I mean, talk about a sector of the industry, which is under a lot of change because of AI. And so maybe can you give like a high level overview of how it is evolving, what are some of the considerations about the shifts in data? And in particular, like, we're seeing a lot of changes

Starting point is 00:03:10 how vendors view their own data, how the big labs use data. So just talk a bit about what the industry is. The thing about data in the context of business is it is always born somewhere else. It's always born in systems of record, like Salesforce, like, workday like SAP, and even if it's your own applications data, if you're a software company and you

Starting point is 00:03:34 run your own database, the data is born in that database. And since, as I said, time in memorial, businesses have had the need for internal use to centralize a copy of all their data in another location. It doesn't work to just go and do all of your reporting and ask all your questions in each system individually. Some kinds of questions require you to look across the entire system. And so that is not new. However, these AI agents are new. And there has been, in the last year, a reaction, which really started with the stock market. As we saw the SaaSpocalypse happen, and as we saw the stock prices of all these systems of record that I'm talking about, plummet, people viewed them as under threat from AIs. We have seen some of these companies start to think

Starting point is 00:04:21 that a great strategy for dealing with AI might be to lock it out and to say your data is our data now and you can't take it elsewhere and if you want to use AI on it, you have to use the AI tools that we provide. Notably, just a couple weeks ago, SAP announced a new API policy that literally said all AI agent access was banned except in a way specifically approved by SAP. Now, if you're an SAP user. Don't panic. This is just a policy. You have contracts with SAP. Those are authoritative as to what you are and are allowed to do. So don't overreact to these policy memos. But it just shows how extreme the reaction of some of these companies has been. I just want to tease this apart. I think there's a lot of confusion on what exactly is going on. Right. So this is locking down

Starting point is 00:05:13 access to the data that an agent would use instead of an app, right? It's not access to data because you're going to train your own model. That's right. Very few people are in the business of training their own models. Most people, when they want to access their own data and their own systems of record, even if those systems of record

Starting point is 00:05:29 are managed by vendors, they are using it for context. They're using it in order to ask and answer questions about what's going on in their business. So the concern is my SaaS app has less value as an interface because now the

Starting point is 00:05:45 agents can access the data directly and basically perform the same functions the SaaS app was before. Is that the concern? I think there are many concerns. I think that one of them. Can you get just straw man the set of concerns? Because this is one of the biggest reactions I've seen in the industry in a very long time. And I'm kind of trying to come to grips with what the actual worry is. I think people are worried that their systems that they've spent many years building will simply be less valuable in a world where their users are no longer humans, but their agents. I think they're worried that. But why isn't this just another seat? I mean,

Starting point is 00:06:18 It seems like, I mean, arguably, this is positive because there's going to be more consumers of... So agents don't need as many individual identities. When you have AI agents accessing systems, you really just need roles. You don't need the same granularity of users. You have many product managers, each will have their own identity in a system. But if you have a product manager bot, you really might just have one role that it uses. and it might have a single identity and yet do the work of hundreds or thousands of people.

Starting point is 00:06:53 So there's not an easy answer like that. Furthermore, these companies have a history of having open APIs. Open APIs are a good thing. If these companies did not have open APIs, they would have been consigned to the dustbin of legacy SaaS decades ago. I mean, this is a thing that happened in the 90s, right? The evolution of open APIs. And their customers have been using them

Starting point is 00:07:17 and depending on them for decades. And those same APIs are the primary target of AI agents. So it is very hard to differentiate whether the users are accessing the APIs in the same way that they always have been, or whether they are accessing them in agentic ways that may substitute for human workflows. I'm going to keep poking on this because I still think it's not a real concern. So let me make another strong of that argument. So let's say you had opened up all your APIs in the 90s, which is the case. Like, why couldn't I just write a procedural app, which is my own version of the SaaS,

Starting point is 00:07:52 and therefore also disintermediate your SaaS? Like, why are agents somehow different than me just writing my own software or my own dashboard? Maybe they're not. This may all be kind of much ado about nothing. I think it's foolish for them to close down their APIs. So you're putting me in a weird position. No, I do. I do.

Starting point is 00:08:09 I'm trying to defend a position that I think is stupid. So I think a lot of these threats are not new. Like, well, maybe they'll use programmatic access and thereby use less seats. Maybe they'll move some functionality to their own interfaces. I mean, that is a real thing. That has been happening for years. I just want to let you know I am old enough to remember these discussions in the 90s.

Starting point is 00:08:31 The rhetoric was exactly the same. The react to the exact way. We could never open up APIs. We can never have them do this because they're going to disintermediate us. And it just turns out that if you're buying into a business process, like the operational flow of something, that is set up by the company that you're buying it from.

Starting point is 00:08:46 Salesforce knows how to run to sales force. And so whether it's an agent that's consuming it or SaaS, I would argue that there's still the value there. I completely agree, and I will point out another piece of evidence for that claim, which is if you look at the budgets of real companies

Starting point is 00:09:02 that are heavy consumers of software, they spend 5 to 10% of headcount on software. Software costs are immaterial in the grand scheme of things. Software compared to everything else a typical business spends money on is so cheap. The idea that they're going to use AI to value engineer the number of seats they have on Slack or something is ridiculous. They're going to use AI to go make their business work better in whatever it is

Starting point is 00:09:31 that they do. They're not trying to take that 5% software spend and turn it into 4.5. That is not the highest best use of AI. I mean, I mean, famously, all the big AI, labs, including Andresen Horowitz, and we're all very heavy users of AI, like, still use these SaaS tools. And so four years into this, we don't have a lot of evidence that like, as do we, as do Open AI and Anthropics, who are both Fivacem customers, and we replicate lots of data from these very SaaS tools on their behalf into their data leaks. So if they're still using them, do we really think the company of the future is not going to be? Right, right. But one think we can both agree on this, this is bad for customers, right?

Starting point is 00:10:14 It's kind of locking down the APIs is bad for customers. So maybe talk through like A, in white's bad, which may just be obvious, and then B, kind of your recommendation for how to manage that, like assuming that this is happening industry-wide. So the reason it's bad for customers, anytime vendors put up walls and try to regulate data access is that you need to have all your data in one place in order to do meaningful reporting in order to understand us what the heck is going on in your business and in order for AI agents to work in the context of business. If you don't do that, then it's using chatGBT from before

Starting point is 00:10:52 chatGBT was connected to the internet. If you used it back then, you remember it used to have this knowledge cut off and it would tell you I can only answer questions that happened by in my training window. Six months ago. Yeah, six months ago was when I was trained. I don't know anything after that because I'm not connected to the internet. That is what using today's AIs is like if you're using them in a business context and they don't have access to your business data. So it's very important for every company

Starting point is 00:11:20 who wants to do things with data to create their own data platform where they have a copy of all of their own data and it's being kept continuously up to date. And any time vendors start putting up barriers, it just makes it harder to get that done. And the customers will still, they'll do it. They'll just work around these barriers at great cost and complexity.

Starting point is 00:11:42 One of my favorite things that you've done as a company to educate the customers on this is this benchmark. Can you maybe talk through what that is? Well, we have a website. Is that what you're talking about open data infrastructure? Open data infrastructure.com. It's a benchmarking. Like it does scoring, right? Yeah, so we score, the list is growing. We're trying to make it as long as possible. We score as many vendors as we've been able to to catalog so far on their data access policies. So we basically score them on whether they try to charge egress charges,

Starting point is 00:12:16 whether they try to make you pay for getting your own data out, whether they make it impossible to get a complete copy of your data because some vendors will do that or they'll make it just very difficult, and whether they have terms of use restrictions on accessing your own data. So there's a big grid on there, and it rates a one or three-grateful. Are you comfortable saying who the worst offenders are? Well, or should just point people to the website and read it. It's all very fact-based and very well-evidence.

Starting point is 00:12:51 I mean, the worst offenders historically have been, SAP has always been really bad. I mean, even when I was running a large business, I mean, that was always... It's interesting because they were getting better, and I've sensed in the last few years that there's sort of two camps within SAP, one of whom who regards it as it's the customer's data. The customer's got to be able to do what they need to do with their own data.

Starting point is 00:13:12 And then there's sort of the old guard who views it as SAP's data and you'll do with it what we tell you to. And then, you know, historically Salesforce has been really good with the exception of Slack where they are terrible. Yeah, I know. But

Starting point is 00:13:29 they've started to get squirrelly about this. So it's a moving target. I am hopeful that this is merely a brief flirtation with closed data by most of these vendors, and they will realize this is not a good idea that they are at war with their own customers, and it's not even going to work anyway, and we will go back to the trend towards ever more openness. You just answered my next question by my asking anyways, which is like, do you feel like this is just like a repeat of the Open Data API?

Starting point is 00:13:57 It'll resolve quickly. Like, you know, we always kind of go through this soul searching, and then we will resolve back to where we were, which is like Open Data's the right thing. I do think so. You would think that's the path? Do you think it's different this time? Yes, I think it will be the same. That's my prediction.

Starting point is 00:14:09 It will be the same path as mentors will discover that they cannot provide inside their own platform a solution to every data problem that their customers have because they are simply so diverse. And instead, they simply create a mechanism for the customers to replicate it to their data platform of choice and do with it what they will. And even if they charge fees for that, it's not the end of the world. On open data infrastructure.com, you really get yellow for charging fees. You know, it's only red when you try to actually block it.

Starting point is 00:14:38 At the end of the day, if you want to have a little toll, that's not the end of the world. The problem is when you start, when you start actually blocking it. Like there is no option. And saying, oh, if you want to do anything with data, you have to come use my tools inside my walled garden, which never works. Because all the rest of your data is not in that walled garden. And it's not going to be. And you can never create enough tools to support all the different things customers want to do with data. You know, this is probably related to this notion of, you know, or this belief in data gravity.

Starting point is 00:15:11 And, I mean, one thing that I've loved working with you over the years is, like, you're exceedingly smart and you're exceedingly contrarian. It's just so fun to, like, kind of, you know, watch your opinions diffuse and more often right than wrong. And one thing you have said is that data gravity is either overrated or not real. So do you think that, like, A, do you stand by the statement? and B, do you think that this is driving people to, like, try to do these wall gardens? I think data gravity is completely fake. I am the only person who thinks this. If you want to see evidence.

Starting point is 00:15:45 Well, first, maybe describe what data gravity means. People use the term a lot, but, like, I don't think they understand the implications. Data gravity is the idea that business data is so large that it's very expensive to move around because of egress charges of cloud vendors. And if you want to see, and that therefore, it's very important that you choose a physical region in the world where all your data is going to live in a specific region of a specific cloud, and then you build all of your data-consuming services in that same location.

Starting point is 00:16:22 Or you can partition it. Especially because of egresses. Well, this is A-Berg. The term data gravity does get used to be more. It's very general. This is one particular incarnation of the idea of data gravity, and this is the one that I am saying is fake, that egress charges are so important.

Starting point is 00:16:38 And if you want to see evidence against this, come look at the networking dashboard of five trans various AWS and GCP accounts. You will be astonished, despite replicating huge datasets for thousands of companies. We have 7,000 customers of size and thousands more, little ones, the amount of data being moved at any given time is tiny.

Starting point is 00:17:04 And the reason is that we're doing change data capture. You can have a huge data set, but if you just replicate the changes, the changes are always much smaller than people think. And I think that a lot of this idea of data gravity came from dumb

Starting point is 00:17:19 data pipelines that people wrote where they would copy their entire company's data sets out of their database every day, once a day at midnight. And so they just had this crazy read amplification. You know, they were just repeatedly copying the same data over and over, and it gave them the impression that they had so much data, but they really don't. May it may be a good argument not to roll your own on these things anyways? Yes, yes. This is what happens when people roll their own data pipelines is they fall back

Starting point is 00:17:47 on these patterns that are easy to get right. Like that, that pattern I just described, it's extremely easy to get a correct replica just by copying it all over and over, but very expensive to operate in the long run. Maybe before we leave this topic, so let's say there's the CIA listening to this right now, and the CIA is like, oh, how do I navigate these kind of uncertain three to six months? Like, what kind of leverage do I have with these SaaS vendors as they're closing things off? Do you have any sort of guidance for them? Yeah, I think, number one, you should insist on having a copy of all of your own company data

Starting point is 00:18:23 in a data lake that you control. Don't let go of that for any vendor. You have a lot of leverage. These vendors actually have a lot of obligations to let you do that. The reason why they get away with blocking people is simply because people don't fight. So pick fights.

Starting point is 00:18:44 And write it into your, if you have big contracts with vendors and you're redlining your MSAs, write language guaranteeing your own data access into those MSAs. Are you seeing those show up now? Actually, I hadn't even,

Starting point is 00:19:00 I hadn't even considered that. We have model language on open data infrastructure.com that we recommend you incorporate into your MSAs. And even if you don't get it, just by asking for it, you are sending a signal.

Starting point is 00:19:14 So, you know, if it's a $10,000 contract, don't do it. But if you're signing, you know, 500K million contracts insist on data access in your MSA

Starting point is 00:19:27 and you will find surprisingly often that you get it. I want to talk to a little bit about agents now. So I feel like we've kind of gone through multiple phases

Starting point is 00:19:37 already in agents. So the first one is they're just purely treated as software. They're like, it was almost like search plus plus. Like take all your data,

Starting point is 00:19:44 put it into a data leg, and then you have an LLM that has access to it. And then we go to that and it's like enterprise search, And then we went to, like, the agents like, open claw. And then that model was, like, a personal agent, but it kind of, like, was part of you.

Starting point is 00:20:01 So you'd give it access to your email, and you would give it access to, like, your API keys and give it access to your accounts. So it was, like, part of you and extension of you. Now we're saying... I actually set up open claw. That's great. I use nanoclaw. Well, I set up open claw, and then I turned off of it onto nanoclaw.

Starting point is 00:20:19 Yeah. Because it's such a monstrous piece of over-complex software. And then I actually turned off Nanoclaw onto Nanobot, which is what I've stuck with. But I use it to manage my tennis team. But it has its own identity. Okay. So I don't know this is what I'm going to say. It has its own email.

Starting point is 00:20:33 So this is, I think we've all come to this conclusion, which is like, I don't want it having access to my email. I want it. So I also have the Mac Mini. I run to actually now I'm working with somebody like build a harness. But like they're all the same, right? I run on a VM. Yeah, perfect. It's got its own WhatsApp number, its own email.

Starting point is 00:20:49 So I got an own phone number, his own email address. And then, you know, now as we think about A16D, we're actually thinking about, like, you know, why don't why don't we just treat all these agents like this? Like, we'll actually have an HR. And that HR team will onboard AIs as they come. They will train them. They will show them the access to the documents

Starting point is 00:21:07 that you need. They'll be part of teams. They'll join the Slack, you know, just like humans do. And in that world, and we touched on this a little bit before, but in that world, these aren't software. That's actually more seats, more consumption of software. And so do you think that for enterprise agents, we're moving more to these, you treat them like humans,

Starting point is 00:21:25 or do you think that that's too far? I think it is a intermediate form. I think the reason this works well is because you can slot it in to the existing workflows without having to refactor the whole universe. So in my example of my agent that manages my USCA tennis team, it can email with the players,

Starting point is 00:21:49 it can go to the USDA website, check the schedule, make lineups, check availability. And it works well having its own identity because it can slot into all of these existing workflows that were designed for humans. At 5Tran, we have an AI agent

Starting point is 00:22:05 that helps respond to support tickets that goes and inspects the logs, inspects the code. It uses all the contexts that we centralize with VVTran to find out what is going on with this customer, what might be the solution to this problem, and it drafts responses. And right now it slots into the system a lot like a person would, but we are working

Starting point is 00:22:34 on making the whole thing just a closed loop, pure AI system where it will only have one identity. There will just be the connector, troubleshooter, Borg, hive mind. And there may be, but let me give the argument for the multiple. So why do I use a Mac Mini for my agent, right? I mean, there's a couple reasons. One of them, so it can access I message because there's no programmatic way to access I message. And so, like, you know, it has a desktop.

Starting point is 00:23:03 Another one is like everybody is moving headless, like Salesforce is doing their headless thing and there's headless browsers. But it turns out, let's just talk about browsers. Like, if you have a headless browser, all of the anti-scraping software kicks in and then like it's not as functional. So it's actually much better to just give it a fully functional version of Safari. So you could

Starting point is 00:23:25 argue that the interfaces that have evolved over the last 30 years to deal with unpredictable users that know how to use computers is the UI we have today and actually the simplest thing to do rather than try to rewrite all of that stuff is just to give these agents

Starting point is 00:23:41 that have been trained on human data access to full end to end to systems. I don't know. The systems that I've worked on browser use has never been necessary. Browser use has a big cost, which is very slow. And it consumes a lot of tokens. Yes. And I have found that in my tennis example,

Starting point is 00:24:03 the USDA website does not have any anti-scriping provisions. You're not trying to read from LinkedIn or Zillow. Try that if you are. Yes. And so I'm just using I'm using just like Python browser automation. I'm actually using Selenium. And then there's actually a skill that emits just exactly what you want to know

Starting point is 00:24:31 so that the agent is not reading HTML and consuming all those tokens all the time. And then at 5Tran, for example, we are working right now on a Salesforce administration agent to integrate, to basically do continuous integration. of small changes into our Salesforce, which is very labor-intensive right now. And it also does not use browser automation because the Salesforce CLI we found does everything we need to do. We'll use browser automation if we need to,

Starting point is 00:24:58 if we come to something that can't be done with the Salesforce CLA, but to their credit, the Salesforce CLA is quite comprehensive. Pretty much anything you can do in the UI, you can do with a C-LI command. And the agents seem to already know how to use it. So can I do a $1 bet with you when we're on this podcast of five years? So I think the majority of use for agents in five years is going to be the same interfaces that humans are using

Starting point is 00:25:23 just because it's the long tail of integration has already been solved and all the protections and all the sharing and everything else. And would you say the majority would be through APIs that are kind of more of a traditional computer software. Yeah, they'll just hit the APIs. Yeah. What do you think about things like, you know,

Starting point is 00:25:39 these technologies that mediate Dow like MCP, which have emerged to kind of try and solve. that problem. Do you think there's a future for those, or do these think that those just give way to, like, strict tool API usage? You know, in theory, it seems like they're an unnecessary layer because these agents are great at calling APIs and calling command line tools. So why bother having this other layer? In practice, when you sit down and actually try to build systems, MCPs do solve important problems, particularly authentication

Starting point is 00:26:09 and just like discoverability of what's available. So even though you can say kind of from first principles that maybe they shouldn't exist, when you actually sit down and write a real system that's accessing context, you almost always end up sticking an MCP server into it. And this is authorization, authentication, authorization, and discoverability, like the fact that this thing exists. Yeah, and there's just also a lot of little affordances in the AI tools that are built around MCP, like user granting authorization for specific tasks

Starting point is 00:26:43 are done at the tool level on MCPs. And that's just a rule that's been baked into the, the harnesses that ends up working well for a lot of situations. It also works badly in some situations. But the tools on the consuming side have started to grow around MCP. And thus, even if maybe theoretically you don't need it, I think in practice, it's taking, it's taken hold. The thing that I find so strange about this is like even the tool use itself, like, you could argue that like as smarter models come out, they could build better tools anyways.

Starting point is 00:27:13 And so sometimes I wonder if like an agent should just be the most. minimal thing ever, like it manages, like durable state, it manages compute, and then, you know, you run like whatever, like the anthropic SDK or the open AISDK, and then you just tell it to build its own tools. Like, you know, build your connection to this. Well, that's how nanoclaw works. No, it's exactly how it works right now. But you could argue that, like, there's so much money being poured into these foundation models,

Starting point is 00:27:37 like tens of billions of dollars, maybe become hundred of billions. So the most intelligent thing at any point on the planet is one of these models. So why would you use an old tool if it could build a better tool? Yeah, so I mean, nanoclaw, for those who don't know, is a personal AI agent, sort of like OpenClaw, except the way it works is you fork the repo when you start up. And then you just sort of vibe-coded into whatever you want. And the problem I encountered is it sort of went awry.

Starting point is 00:28:00 And I didn't really want to go and troubleshoot all of the details. I debugged NanoCla. At some point, I spent enough time debugging NanoClau, that I was like, I want something that has more separation of concerns where there's sort of an agent over here that has an API. and it works the way it works, and then I just write, like, skills and stuff on top of that. Ironically, even when I did that,

Starting point is 00:28:24 I ended up having to add things to nanobot because, like, it couldn't differentiate WhatsApp group messages from DMs, and I think I actually have a PR against Nanobot because of that. So I sort of ended up back in the same place a little bit. But I take your point. Like, if they get smart enough, they may, you know, they may just build their own intermediate abstractions as needed. Yeah, yeah, that's right.

Starting point is 00:28:45 Yes, you go. just to maybe wrap this bit up. I mean, you know, Satya has said that there's going to be the collapse of SaaS. And, you know, we know Databricks is trying to rebuild a lot of SaaS on top of their platforms. I mean, they've been very public about that. So do you think we're going to see like a massive, like, do you think that SaaSpocalypse is a thing and we're going to see a massive shift? And like, you know, it's going to be agents and it's going to be on new types of infrastructure that they run. or do you think that, you know, SaaS is fine,

Starting point is 00:29:18 the markets are overblown, Satya is, you know, wrong. Well, to give credit to the public markets, I think that they are accurately, I mean, whether the magnitude is right, I cannot say, but the direction is right. There's a lot more uncertainty embedded in all these SaaS companies, including my own, than there was a year ago.

Starting point is 00:29:39 And that's reflected in the changing, in the decline price. But I don't think it's for this, I don't really buy into this reason that all the SaaS categories are going to disappear and be replaced with vibe coded software. I think there will be some, but I think the bigger threat is simply new companies coming along. It is just so much easier to write software now that AI native companies will just zoom and catch up to the established incumbents and maybe be better in some ways. You know, it's so interesting. I mean, you just mentioned that about FivTran, but if you look across the companies we work with that are traditional companies,

Starting point is 00:30:19 and I actually would not include you in that, I mean, listen, you know, like both of the labs are your primary customers. But if you look at across the traditional companies, like you don't actually see in the data that they're slowing down. Like FivTran is doing great. It's actually the company is accelerating. And so how do you reconcile this high-risk concern that you just voice with the actual business?

Starting point is 00:30:41 Is it more just kind of existential angst? No, it's like X-risk. You know, some new way of doing what you do comes along that's dramatically better. And this will happen to some companies. I know, of course it will, right? But we're four years into this at this point. You know, I don't know. I just feel like you can't derive this from the data as far as I can tell.

Starting point is 00:31:01 And like maybe it's coming. Yeah, it's not even really into public company data. No, it's not. No, across the border. It's when you start to see like less than one net dollar retention. then, you know, it's come. Yeah, yeah, for sure, yeah. And maybe it will.

Starting point is 00:31:17 Maybe it's in the future, but we've been having this conversation. I remember, I actually remember when I was when you put in the call, and it was a year ago or so, and you're like, Martine, this AI stuff is, like, really real, and, you know, like, you can kind of code connectors with it pretty good, and it's pretty good, and it's coming. But that was actually quite a while ago when you put in that call,

Starting point is 00:31:34 and the company's done fantastic since then. So it could be the case that, like, rather than someone trying to redo an existing company that, you know, has figured out like a long tail of stuff, they'll go work on different problems that are more suitable. So we have, you know, in our particular case, we have been trying ourselves to use AIs to build data replication connectors, which is the core of what we do for years since GBT3.

Starting point is 00:32:01 And they continue to improve in terms of what they can put out. They still do not discover, this long tail of complexity. It is, it is, it really surprises people how difficult it is just to make an accurate copy of a system and keep it up to date. And so, and, and now we are actually starting to see new capabilities inside 5Tran to push the bounds of quality, particularly quality even further, like, like completeness of coverage of the sources and the correctness of replication.

Starting point is 00:32:39 You can imagine how you can use AIs to do that. more comprehensively than you ever could with human beings. Yeah. And so I think in addition to the sort of, you know, the AI threat is getting closer, but it's still, I think, a ways away from what we do. We're actually starting to see the opportunity pull us forward. So we're starting to get better at our own core business by leveraging AI internally in extremely non-obvious ways that I don't think anyone else has discovered yet.

Starting point is 00:33:08 Can you just so? Can you talk to those or is that great secret? Well, it's, it's, you know, at the end of the day, what's going on inside 5Tran is just this crazy mass troubleshooting effort that never ends. Wait, that sounds like every startup ever. Well, we have, it is, but it's, it's, the breadth of it is much larger for us. Because we have 750 connectors to different systems of record, everything from Oracle to SAP to Qualtricks to you name it. They all have different idiosyncrasies.

Starting point is 00:33:45 And you only discover these idiosyncrasies when real customers bump into them. And they show up as performance problems, correctness problems, and failures. And, you know, the way we have always solved it is, I always like to say, the trick is there's no trick. It's just a lot of effort behind the scenes. And it's kind of an economic trick. We only have to fix every bug once and then every customer who uses that connector benefits from it. But you can imagine how you can use AI coding agents, which are basically an infinite supply of junior engineers, that is a particularly valuable tool for this kind of problem. And the details of putting that into practice turn out to be quite tricky.

Starting point is 00:34:30 But we've really, especially the last couple months, started to see at work and started to see, you know, like improvements at scale. Many, many, many small improvements start. We've seen the flood start to come. And I think you'll see the quality and reliability of Fyftrain take yet another leap this year because of that. You know, you're in this very unique vantage point because you have the big labs as customers,

Starting point is 00:34:59 like Open Air Anthropical customers. So, you know, they are AI native. they're at the forefront. Do they use 5Train differently than traditional enterprises or anything that the enterprise can learn from that? No, their use cases are very typical.

Starting point is 00:35:14 They use 5Train to replicate data from lots of different systems of record into a centralized data lake. And they do analytics with that. They feed that as context into their own internal AI workflows. So they have built data foundations that look very much like

Starting point is 00:35:31 the data foundations of many other companies. The systems at Anthropic, one of the people who helped set them up was a consultant who had set up 5Tran and DBT at many other companies. So their data platforms look very typical. And I think this is a very important message.

Starting point is 00:35:51 If you are thinking about data foundations for AI, do not make the mistake of thinking you need to build some exotic new system as a data foundation for AI. The right data foundation for AI is probably the one you already have. If you have a reasonably modern data platform, something like Snowflake, Databricks, or BigQuery, or maybe even you have transitioned to an iceberg data lake with those compute systems running on top, that is a great foundation for your context for AI as well. You know, there used to be this idea. And again, we've touched them in the context of FIFTRA, but more broadly, there's this idea that,

Starting point is 00:36:33 AI commoditizes infrastructure broadly, right? And so the idea was like, well, it can write anything. The opposite seems to be true. More software is being written than ever before. The software is actually pretty buggy. It needs kind of stable infrastructure below it, you know. And so most infrastructure companies have seen a lift as a result of this. You know, in your sense, is this transitory?

Starting point is 00:36:53 Like the eventual AI consumers infrastructure is coming? Or do you actually – let me just give you my quick view on this, which is – building rock solid software that you can operate for long periods of time is just not what AI is best at, and you're probably better spent focusing it on other things. But is my view blinkered in how powerful it's going to get over time? No one knows how powerful it will get over time. I mean, the nice thing about that is that if it gets sufficiently powerful,

Starting point is 00:37:24 all these questions become sort of moot, because we'll just be living in a post-scarcity world. But I think if we look at the present day, I think it is mostly true that AIs are just creating more demand for infrastructure and not commoditizing it at all. You can think of infrastructures having layers and at the bottom are like data centers and then you go to cloud vendors like AWS and then you have systems like Convac's sort of serverless platforms that try to make the cloud vendors easier.

Starting point is 00:37:56 and then you even have, you know, systems that sit at a higher level of abstraction than that, which could include, you know, Databricks. A lot of their businesses hosting notebooks, right? I think that last layer is the one that is threatened by AI. Yeah, the consumption layer. Yeah, for sure. AI is quite good at navigating slightly more complicated infrastructure. So if you have an AI agent, maybe you don't really.

Starting point is 00:38:26 need that very most user-friendly layer, you can drop down to the next one and use that. Yeah, I mean, you could argue that whenever the consumption layer is up for grabs, which also happened with the internet, right? Like, you kind of went to different places to go do things. Like, it changed the UI. Like, it changes a bunch of stuff. But, like, the core infrastructure stays in place, right?

Starting point is 00:38:50 Like, you still have operating systems. You still have chips. You still have databases. And, like, they kind of evolve over time rather than they get replaced. Yeah, and maybe you peel one layer or maybe you peel three. Exactly. You're not going to peel it all the way back down to glass. Yeah, that's right.

Starting point is 00:39:02 Let's talk, let me actually just do a quick time check here just because I just enjoy talking to you so much. We can just talk forever. Okay. Let's change topics a bit to the DBT merger. Yeah. So you acquired census in 2025 and then Tabico Data and SQL Mesh. and then you signed with DBT Labs.

Starting point is 00:39:28 And so, I mean, this has always been a space that's been relatively acquisitive, but I would say for the new style companies, FFTA has been the most acquitted. So maybe can you talk through the strategy and the plan? Or is this ad hoc? Is there some grand strategy? Well, I am the child of investment bankers. So maybe I'm just realizing my destiny.

Starting point is 00:39:48 Not just investment bankers. PE, right? Well, my brother did P.E. and my cousin. and then a bunch of other people in my family. There you go. But not my parents. My mother was a commercial banker, and my dad was an M&A investment.

Starting point is 00:40:01 Oh, there you go. Okay, M&A, well, I'll see. But in the... Between people, M&As. Okay, I see. I see. But anyways, I got you, listen, for a startup, and having watch you do it

Starting point is 00:40:10 is actually been very impressive to watch you run the strategy. Well, but seriously, it was not something we set out to do. FiveTrain does not have a corp dev function. I felt with every single... I never really thought about that. That's true. Yeah. I've always felt that any acquisition or merger, and the first big one was really HVR.

Starting point is 00:40:33 HBR, yeah, I remember very well. It should feel like it's for these unique reasons, and it feels like it's the last one you're ever going to do. And it's not going to be the last one you ever do, but the reasons to do it should be really, really strong. You shouldn't go looking for this. Nonetheless, we have found these strong reasons several times. I think the DBT one is a great fit. These are two products that have historically almost always been used together. And they kind of go together.

Starting point is 00:41:06 A 5Train is the tool that gets all your data in one place. DBT is the tool that you use to organize it and turn it into a model that reflects the particular details of your business. And then that is what feeds into all of the data consumers. You know, you've actually said publicly that DBT is going to be one of the biggest beneficiary of coding agents. Can you kind of pencil that out of it? Yeah, yeah. So there's this great... I don't even really know what that means.

Starting point is 00:41:34 Well, I think there's going to be way more usage of DBT. I think coding agents are going to write tons of DUT models. I told you that. I was already seeing that. I mean, that's it. Yeah, and it's going to be a great beneficiary. There is this great quote from Dykstra. think, which is...

Starting point is 00:41:52 Dextra's algorithm, Dextra? Yeah, that computer code should be seen as a means of communication between humans and only incidentally as an execution format for computers. And nowhere is that more true than in SQL queries in DBT projects. It is a great way to express. These are the rules of data at my company.

Starting point is 00:42:14 And even if it's being written by AI, you still want to have that artifact that is an executor. incutable documentation of how your business works. All right. So you have the pleasure of being the CEO of a relatively large company during the AI wave. You've gotten to vector writing coding, which a lot of us have. You're running a lot of experience.

Starting point is 00:42:36 Like you mentioned experiments, your nanoclaw experiment. Though you're now running what? Nanobot. Nanobot. Yeah. So how much of this is, you know, George, the scientist, the techie versus like you actually do this like pragmatically useful for the CEO of a large company. Yeah, I don't know if it's a good idea. I just can't resist. And coding agents are great for

Starting point is 00:42:59 CEOs who want to write code on the side because they work sort of asynchronously. So you can, you can have a lot of things spinning in the background. I have, I have a lot of projects going right now. Can you name them? I mean, I, I mean, I have things I am just doing as hobbies, like, like the system for managing my tennis team. I'm working on a little on a tennis statistics machine vision app. But then I have many things at 5Trend. They're all experimental proof of concepts that I share with people. And we talk about there's a potential like nano-data lake catalog that I have going that attempts to,

Starting point is 00:43:43 when you use data lakes, you have to adopt this additional service called a catalog. and it's the answer to the question, could we make the catalog invisible? So that's an example. I'm working on just for the hell of it, a from scratch, classic OLTP SQL database. What?

Starting point is 00:44:03 I think there is an, well, it's crazy, it's crazy, right? But the whole point of the project is, no, no, it's not distributed. It's, what it attempts to do is to be, like SQ Lite except S3 is the backing store. Oh, that's cool. That's a great idea.

Starting point is 00:44:24 For it. Because when you build AI workflows, you have this need for like zillions of tiny databases. And it's a proof of concept. It's mostly an exploration of like, could you with sufficient AI coding just take on something absolutely ridiculous? If you're hearing this and you want to work on this

Starting point is 00:44:44 and you are an expert in databases. Do you have a public GitHub? I don't, I don't. But if you want to work on this, come talk to me, and maybe you can come do this at 5Train. You don't even have to use my proof of concept. I think there's actually a real opportunity, right, at this moment. I really lament that we're sort of stuck with Postgres forever.

Starting point is 00:45:01 Postgres, contrary to popular belief, is very old technology. It is not a good database. Undergraduates, writing class projects, write better databases than Postgres. Not because the people who built Postgres were not smart, but simply because it was very very good. a long time ago, it has a lot of technical debt. And I really think the world should create a new operational database rather than just endlessly repackaging Postgres.

Starting point is 00:45:28 That's an amazing take. Yeah, that's another one of my country. Postgres is bad, actually. I should also say. No, no, even better, like an undergrad in a database course is writing better databases than with underlying. In many respects, the storage engine that you would write as an undergraduate it in a database course is better than Postgres' heap storage engine.

Starting point is 00:45:50 Postgres' storage engine, I don't think the creators of, the people who promote it today would admit it is not a good design. It's been patched up in a lot of ways, but the Postgres storage engine. I'm not going to digress on this. A million questions about, but I will say just like this conversation, you ever find this becomes a bit of a distraction? Like, it is so dazzling and so fun. It's interesting to work on these things.

Starting point is 00:46:15 and you're like, oh, you know, I should be like doing that one-on-one, but instead I'm here in my... No, I think about that a lot. And I make sure, you know, you've got to, like, keep it at bay. Because it is like, you know, there's a danger of quad psychosis, which is a term that I love, but you can just get sucked in. But, you know, I have a lot of time. I don't have any kids, so I have a lot of time.

Starting point is 00:46:36 I mean, what people with children will say? It's like having a whole other self. Yeah, yes. If you don't. Trust me. And I do. Yeah, yeah. I mean, one thing I do, I do really appreciate about you as an executive as this, you as the founders, you're actually quite reflective.

Starting point is 00:46:54 And one of the mental exercises that you've been doing as long as I've known you was, like, pretending if you were a new CEO brought in by the board to fix Fivetran, like, what would you immediately unwind? Which, vice the way, it always happens when you bring in a CEO. And I know you kind of do this mental exercise. So is there anything recently that you've thought, like, boy, if I was brought up. in to like run things, I would change this? Many things. That's an exercise I do regularly. I'm trying to think of a recent example.

Starting point is 00:47:22 I mean, a not so recent example is I did do some things to try to simplify pricing. I think those have been successful. They were painful for the company because they mostly cut prices for small customers. Another exercise I do is I ask myself, what should other CEOs do? It's hard. It's a trick to get yourself to do things that are big and scary. Yeah. So in the DBT merger, you know, when I was reflecting on that,

Starting point is 00:47:48 on whether that was a thing we should seriously consider, one of the tricks I use is I ask myself, what should Sertrardar do? Sertrard is the CEO of Snowplay, who we've worked with for a long time. It's a good metal exercise. And then I just go do that. And that was like a clear answer in my mind was like merge with DBT, absolutely. By that framework. Even though it seems very big and scary, when you imagine,

Starting point is 00:48:10 is this a good idea for someone else? then you can kind of get there. Yeah. Maybe just kind of a softball cliche, final question. But, you know, you do have, like, the unique perspective of managing a CEO during this transition. So first, what are you most kind of existentially worried about? And the second, what are you most excited about?

Starting point is 00:48:31 Well, I think the thing I most worry about is just, you know, that at some point the coding agents will get so good at writing connectors that people will just shift to DIY. why. I think that's a real threat to Fivetran. I think some different businesses are more and less threatened by like maybe the customers will just vibe code it themselves. That is a thing I worry about with FibTran. And we will find a way to thrive in that world if we get there and we, we will provide the tools that you use to do that if that indeed becomes possible, even if it comes at a short-term cost to ourselves. But I do worry a lot about that. And,

Starting point is 00:49:12 And then the biggest opportunity, I think, is that AI is just a whole new set of things to do with data. The need for getting all your data in one place organizing it is so much greater now than ever before. I think that there's a whole set of tools that people are going to need on the other side of that data platform. And I think we, and especially we and DBT, are perfectly positioned to provide them. Amazing. Well, it's always a pleasure to have you, George. Thanks for coming. Good to see you, Martin. Thanks for listening to this episode of the A16Z podcast. If you like this episode, be sure to like, comment, subscribe, leave us a rating or review, and share it with your friends and family. For more episodes, go to YouTube, Apple Podcast, and Spotify. Follow us on X at A16Z and subscribe to our Substack at A16Z.com. Thanks again for listening, and I'll see you in the next episode.

Starting point is 00:50:13 As a reminder, the content here is for informational purposes only. Should not be taken as legal business, tax, or investment advice, or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any A16Z fund. Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast. For more details, including a link to our investments, please see A16Z.com forward slash disclosures.

a16z Podcast - AI Agents and the Fight for Customer Data

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.