The Data Stack Show - 178: How to Build a Data Stack to Win PLG, Featuring Peter Chapman

Episode Date: February 21, 2024

Highlights from this week’s conversation include:Peter's background and journey in data (0:26)Introduction to PLG (4:18)Starting in data at Heroku (6:05)Building the data stack at Heroku (8:13)Data ...stack requirements for early-stage companies (12:00)Differentiating PLG companies from open source companies (19:26)Venture capital and open source as a lever for growth (22:56)Initial data modeling and analysis (25:38)Operationalizing Data (29:16)Sales and Marketing Operationalization (31:52)Identifying Signals (34:16)Challenges in Developing Signals (37:07)Account Management for Developer Tools (42:30)Challenges in Achieving Margins (45:02)Leveraging Infrastructure for Margins (47:35)Inference vs Training (54:55)Final thoughts and takeaways (57:02)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You'll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by Rudderstack, the CDP for developers. You can learn more at rudderstack.com. Peter, welcome to the Data Stack Show. Great to be here, Eric. All right, give us a quick background of your, I guess, data history, data journey.
Starting point is 00:00:32 I can't think of a catchy term. Well, it started at a company called Roku, where I joined as Roku was just starting to build its business function. And I ended up building the data and revenue operations teams there. And spent some time there, had a two-year stint in venture. And since there have been, I guess the term is a fractional data and revenue leader for a bunch of developer tool startups. Awesome. Very excited to chat with you today. Likewise. Yeah. And now it's based on the script, like my turn to talk. But you know, one of the things
Starting point is 00:01:16 that I've learned and I got introduced to from Peter is improv actually. So I'm going to change a little bit the script, and I'd like to add something here. So Peter, actually, the reason that I first came to San Francisco, and when he was at Heavy Beats, he was a customer, actually. Although I think you were not paying, but... We had, because we're good at revops here and reporting. actually, although I think you were not paying, but it worked. We had, because, you know, we're good at the rev up share and like reporting of Glendo. And I remember like talking with him and he was like the person who actually encouraged me, like to come to San Francisco in the Silicon Valley.
Starting point is 00:01:59 Like until that time, I was like, Oh, like, okay. It was just like a, I don't know, like a tiny thing in the other side of the world that probably nobody cares about. So what I'm going to do there? So that's how I came the first time. And actually I spent a lot of time at the Heavy Beat offices. I had like a couch there. I was very cozy. The street out there was also very interesting for me. That was on the 9th, right? In San Francisco. Yes.
Starting point is 00:02:33 It was like deep Soma, I would say. Yeah, deep Soma. Anyway, so I'm really happy to have him here also because of that, like the personal connection I have with him, but also he's a person who has like an extremely deep knowledge of how data is actually used to deliver value to companies. Starting from Heroku, which was a company pretty much ahead of its time.
Starting point is 00:03:00 From what it seems, seeing like how things like come back today and like companies are like, oh, we're going to rebuild Heroku in 2023. Now that Salesforce decided that they don't need it. So he has a very deep knowledge of that. to hear and talk more about the connection of data, how the need for data emerges in the company, and how it gives sparks, let's say, different functions inside. Like RevOps, hear about PLG, what it is and why it is important, and understand better what it is. And also talk a little bit of how how things have changed from Sheroku to AI craziness that we have today, right?
Starting point is 00:03:50 Because things change, but also remain constant in a way when it comes to business. So it's great to have like someone with, have seen things in so many different like contexts, like from Sheroku to today and, compare and like learn from that. So that's what I have in my mind. I'm sure we're going to come up and improve a little bit with the questions, hopefully. What about you, Peter? What would you would like to talk about?
Starting point is 00:04:18 Peter Piotrowski- Well, I'm always excited to talk about PLG. And question for you, is this, have we beaten this topic to death on this show, or do we think that the audience is excited to learn what PLG means and what a PLG stack looks like? I don't think we've ever talked about it. Okay, great.
Starting point is 00:04:38 We'll see. I think what we always try to do is we try to drive the conversation based on like the curiosity, my curiosity and like Eric's curiosity. So we'll see. I'm pretty sure that like at some point we will divert.
Starting point is 00:04:56 But I definitely want to learn more about PLG. It is like one of these things that is, it remains kind of abstract, but it feels important to me. So yeah, I'd love to hear from you how you think about it and how you implement it also, because you've done that. I'm always excited to talk about PLG and developer tools and how you actually make money off these things. What do you think, Eric? Should we go on and record? I'm ready.
Starting point is 00:05:28 I'm ready. Yeah. Let's do it. All right. We are here with Peter Chapman on the Data Stack Show. Peter, thanks for joining us. Great to be here. Thanks for having me, Eric.
Starting point is 00:05:40 All right. Well, you gave us a brief background in the intro, but I'm interested to know, how did you get into data? I mean, you've had sort of an industrious career across multiple companies at data, but is it something that you were interested in? Did you sort of happen into it? You know, I studied math in college and I think, so I've always had, I've always had sort of a quantitative slant and I've enjoyed looking at the world through a quantitative lens. When I first joined Heroku, I was not a data guy. I joined to manage partnerships.
Starting point is 00:06:28 And I was hired to be a partner manager. And, you know, because I'm mathy, I guess, my first step was to be like, all right, well, how do I know? How do I measure the impact of this work? Like, how do I figure out which partners are bringing us revenue and which partners I should spend time in? And like, can I ask for more money to invest in partnerships? Is there ROI there? And so I started trying to run some reports and it was really hard. Like we just didn't have good foundational reporting about revenue. So I ended up building a sort of holistic model of revenue at Heroku, just so I could understand how I fit into the picture, right? I said like, all right, this is what I think revenue looks like. Most of
Starting point is 00:07:17 it comes from customers that look like this, and customers grow like this. And attrition looks like this. Oh, and by the way, partners do this. And that was interesting enough to leadership at the time that they were like, oh, do that. You know, like we can get someone else to do the partnership stuff. But understanding our business from an end-to-end way and seeing what the levers are, that feels really useful. Let's get you a team. So that's how I stumbled into it. Wow. That's super interesting.
Starting point is 00:07:54 One thing, actually, I'm interested to know, you talked about the reporting being really difficult. Yeah. What was the stack like and how did you, you know know was there a stack in place or were you hitting the financial system and querying prod to figure stuff out yeah i mean we were querying copies of prod roku at the time it had this very mixed blessing of having hired mostly engineers which meant that... And the product people were also incredibly technical. All of them were fluent in SQL.
Starting point is 00:08:31 So it meant that from time to time, a product person would be like, oh, I wonder how this is doing. And they'd just query a database and produce what we call the data clip, which is like a saved SQL query. But there was no data warehouse. There was no BI system. And as you can imagine, asking questions that spanned multiple sources of data was not impossible. So a lot of what I did over those early years was build a data warehouse, install Looker,
Starting point is 00:09:02 build the sort of fundamental infrastructure we needed to both run reports and operate the business. Do you, when you look back on that, is there anything you would have done differently in building out that stack? Well, the tooling was so different back then. This was like pre-DBT, the ETL tools available were less good. We were doing a bunch of database copies and I wouldn't do that today, but that's what was sort of available to us at the time.
Starting point is 00:09:33 Yeah. Yep. Super interesting. One thing, so we were talking about this a little bit as we were chatting before the show, but you've built stacks a number of different times at a number of different companies, both large organizations and startups. But let's focus on maybe a startup or maybe more of a blank slate. Maybe Heroku is a good example, right? In modern day where there really is no stack in place, right? There's no data warehouse.
Starting point is 00:10:02 Where would you start? What is your sort of minimum viable stack if you were doing that again today, knowing what you know now, obviously? Okay, so let's start with the requirements. When I first start working with companies, the first conversation I have with them is I say, walk me through your funnel. Show me everything you understand about your business from website visit to payment and upsell and one of two things happens either you do walk me through your funnel or you as a founder look kind of embarrassed you're like you know like here's something you know i know a little bit from stripe and then like here's i think this is what google analytics is telling me like i have a kind of
Starting point is 00:10:46 fuzzy picture what's going on the marketing side but like this end-to-end funnel you're talking about i don't actually know right like i can't actually tell you the roi of a sign up and if that's where you are and this is pretty common for the companies i work with i go great step one is going to be to help you build this. And to build this, we're going to have to start dumping a bunch of disparate data into your data warehouse. So kind of step zero for me is something that consolidates data, right? So something like, I'm going to reference Red Hat Stack. Something that pushes data from SaaS services into your data warehouse, something that pushes
Starting point is 00:11:33 data from your production database into your data warehouse. You almost definitely need DBT from day one. It's going to make your life easier. I'll stop there. I think that's step one. And we can talk about steps two later. How many companies do you... The companies that you work with,
Starting point is 00:11:55 how many are looking at Stripe and Google Analytics? Because maybe a better way to answer that is the tools have become very accessible today. And so are more companies adopting that stack? Or is it just that the vendors want to believe that? No one. It's really rare for companies I've worked with. Well, if you're a seed or a early series A company, and you don't have a full-time marketing team, your Google Analytics is completely neglected.
Starting point is 00:12:30 And website conversion data is totally ignored. Yep. Because Stripe is a really easy way to get revenue data and all companies care about revenue, you might be pulling Stripe for financial information, but as I just mentioned, Stripe gives you no ability to exceed conversion rates or understand attrition or growth within accounts. Yep. How many people... Oh, yeah. Go ahead, Kostas.
Starting point is 00:12:58 One question here, because you are both talking about the moments where, said, like, Peter, you asked the question, like, okay, take me through like your funding. And I think it's like also here a question about like timing. company, right, to actually engage in this conversation and try to formalize, let's say, the funnel and the business itself, right? Because that's what it is at the end. The funnel is a representation of how this thing that we call a company actually generates value for both sides, right? So when is the right time to do that?
Starting point is 00:13:46 Because, okay, I can see also people overengineering that too early in a way. Totally. Right? So based on your experience, when should people do that? Well, I advise companies to start thinking about their data stack from day zero. And that doesn't mean you have to buy a DBT and a bunch of data connectors and spin up a large data warehouse. But just having a rough roadmap, right?
Starting point is 00:14:16 Like having a very simple Google Doc that's like, hey, here's the reporting we want to get to. Here's what we think we'll need to get there. Can help avoid a lot of architectural pain later. So much of this stuff is easier to build than rewire. Mm-hmm. Right? I'd say the timing of when you build your stack
Starting point is 00:14:41 is a function of your go-to-market approach. So if you're doing top-down enterprise sales, give me an example of your favorite enterprise product, Kostas. My favorite enterprise product? Oh, isn't there such a thing like favorite enterprise product? You don't think about this all the time? Well, let's say, I don't know, like buying IBM, whatever that means. Let's say you're selling a security product. There's no bottoms up motion. You're just selling a security product. Yeah. And there's no bottoms-up motion. You're just selling directly to teams.
Starting point is 00:15:39 You could probably get a lot of what you need out of Salesforce or pick your favorite CRM, provided you have good discipline around Salesforce, right? Like, maybe you have a contact us button on your website and you want to see how that converts. And you're doing a bunch of outbound SDR stuff and you need to see how that converts. But because your funnel is a sales funnel, as long as you're instrumenting your CRM correctly, you don't need a lot of sophisticated reporting out of that. And in fact, in this case, you may not even be using Stripe, right? You're just invoicing your clients. So if that's who you are, top-down enterprise sales, I live and die by the success of my sales team, you don't need BI for a while. In fact, Salesforce, like all these tools, comes with their own tool-specific analytics suite.
Starting point is 00:16:26 So you can get away with just using the Salesforce reports or just using the HubSpot reports. That's option one, enterprise top-down sales. Option two, PLG, our favorite acronym. You might also hear me refer to it as bottoms-up. When I say PLG, I'm talking about companies that people can start using without talking to a sales rep and then have a lot of organic usage. A PLG company is a company where most of your users start using you without ever talking to you. And then it's your job to figure out which of those users to talk to.
Starting point is 00:16:59 If that's the world you're in, building a cohesive data stack becomes a lot more important because now your sales pipeline is just a sliver of the information you need, right? There's some important activity happening in your CRM that you absolutely need to track, but a lot of really important activity that determines the overall success of your funnel is happening on your website and it's happening with your product. So tying all that together becomes a lot more paramount and needs to happen a lot earlier in a company's journey. Okay. That makes sense. Sorry. One last question, Eric, and then I'll give the microphone back to you.
Starting point is 00:17:39 This is great. But you mentioned like the bottom shop motion there, and you said you have users that they interact with the company and the product, and probably they won't even talk to anyone, right? Yeah. There is also, especially in developers to link, there's a lot of... Part of it is also the open source. So is open source and product-led growth compatible things or different things? How do you combine them in a way? And the reason I'm asking you is because
Starting point is 00:18:17 I've seen and I've experienced at Starburst the most traditional version of that, which is we have an open source project used by all the Fortune 500 companies out there in production on their own, having actual people getting paid, right? To set it up and run it. And then a company that's trying to monetize that. But at the same time, the company itself is a very sales-driven company, right? So it's kind of like a weird mix of having the extreme version of Bottoms Up in a way. But at the same time, the company itself is doing the more traditional enterprise sales-driven motion,
Starting point is 00:19:03 like getting implemented there. So how does this thing work together and how they have also changed? Because what Trino is doing or what Spark was doing in 2010, it's not necessarily what companies in 2023 that they want to incorporate open source in their strategies are doing. Things have probably changed. So what's your take on that? Okay, so I'm going to answer a slightly different question.
Starting point is 00:19:30 OK. I'm going to answer a slightly different question, which is, what is a PLG card? I would say that there's three ingredients. One, you need a product that is immediately valuable to one person, ideally, or a single team. Two, there needs to be an organic way for usage to grow from one person to many people or from moderate usage to lots of usage, right? So that could mean one user invites another user invites another user or could mean I build a prototype app and then I move it to production and then I build more apps. And the third thing you need for this motion to be successful is you need a way to talk to your
Starting point is 00:20:20 users without being weird. Open source companies, most open source companies that I've worked with get number one right. Sometimes they get number two right. But if you're really open source, you don't have number three, right? You don't have any way of contacting your users. You're not getting any information about them. And even if you do get information about them,
Starting point is 00:20:45 it would actually feel really strange to reach out to them because you're sort of breaking the expectation of open source. So unless you've built a product where customers are actually logging in and having sort of an in-product experience, your open source company is not a PLG company. OK. But you can turn an open source company into a PLG company if you implement the third part, right?
Starting point is 00:21:17 Yeah. OK, that's interesting. That's like also like a lot of we had experience like that when I was at Rutherstack because we were trying to do that. Anyway, it's like a huge conversation and I think it's really hard for people to do it right. Open source is always like tricky.
Starting point is 00:21:36 But anyway, Eric, back to you. I don't want to... Yeah, it really is an interesting conversation because I think that... And Peter would be interested in your thought on this, right? I mean, there are obviously examples, multiple examples that we could all think of gigantic companies that grew out of open source projects, right? But when we think about technical tools, data tools, etc., it is really strange. You have this ethos of open source, but then you mix in venture capital.
Starting point is 00:22:13 And a very classic way in the DevTools space is to take an existing open source project that you have built and go to venture capitalists and say, look how many stars I have, look how many downloads I have. I am building the hosted version of this. Can I have money? Investors love investing in tools like this because there's already a proven user base and a proven use case, right?
Starting point is 00:22:45 The mechanics of getting it right are tricky. I've seen a lot of companies build really successful open source projects and struggle to build successful commercial offerings on top of them. One of the things that makes it hard is you're always competing against that free... Yourself. Yeah, exactly! So if you've built a really good open source project that's easy to install
Starting point is 00:23:10 and use, it can be very difficult to build a hosted version of that that's actually competitive. Right. Yep. Yeah, that's a super helpful perspective. Yeah, it is interesting.
Starting point is 00:23:28 Like the success of open source is actually a limiting factor in your commercial growth, right? But it's also an ingredient to the success of the technology generally, which is quite a needle to thread. One thing that we were talking about a little bit before the show was how you use some of the data that is being combined in the warehouse, right? So let's use the example that you talked about, which is you have product usage data, you have financial data,
Starting point is 00:24:04 you have multiple sources like going you have financial data, you have multiple sources like going into the warehouse, right? And someone's going to use a tool like dbt or write SQL in order to model, you know, model that data. What are the first ways, let's say you get that stack set up, what are the first things that you do with that? What is your, you know, sort of in sequence, like the playbook of like, okay, here's where we start once we have all that data. Yeah. So the first thing I build almost all the time is a table where the rows are accounts
Starting point is 00:24:39 and the columns are timestamps and the timestamps represent important lifecycle events. So I want to see first website visit, first sign-in, first product usage. Depending on the product, I might look for things like first invitation sent, first invitation accepted, first payment above 100, first payment above 1,000. The reason this is so important is because without this consolidated account-based view,
Starting point is 00:25:18 it's really hard to know where to focus. And in particular, a lot of the tools, if you're not building a consolidated view, it's easy to get confused because the tools that you're working with will give you user-specific views. But if you're building
Starting point is 00:25:39 a team-based product, your user conversion rate is different from your account conversion rate, and that's a super important distinction. Tell me if I'm going, if this is too wonky for you. No, no, it makes total sense. Yeah.
Starting point is 00:25:54 It makes total sense. And so actually like the way we do this, you know, if we're, if I were to sort of get into the weeds of dbt is I actually first want to see this at a user level. All right. So like by by user when do they first as the website sign up use the product start paying and then i want to aggregate that at an account level and take the minimum time for each user at that account level so i can get that consolidated fun yep okay so that's one thing we're doing is we're building this foundational model that has users with timestamps, accounts
Starting point is 00:26:28 with timestamps. Oh, and part of this, that's always a fun part, is actually defining what an account is. Because you're going to have a bunch of competing, right? At almost all the companies I've worked with, you've got
Starting point is 00:26:43 an internal representation called an organization or a team that comes from your product. Yep. Then hopefully you're using a CRM. Let's just say you're using Salesforce. So Salesforce has an idea of what an account is. It's called an account in Salesforce. Those are probably different definitions. Yes.
Starting point is 00:27:04 And it's possible that a single company in Salesforce has multiple teams associated with it. Then you've got this third heuristic, which is that people who sign up from the same non-free email address are probably from the same company. So part of what you're doing is you're building this foundational data model is you're just getting the entire company to agree on what is an account? Yep.
Starting point is 00:27:30 What is the source of truth for what an account is? So that like when my sales team talks about, you know, revenue by company, they're using the exact same language as my marketing team and my product team and my finance team. Yep. Okay, so that's kind of track one, which is just like building this foundational user and account model that lets
Starting point is 00:27:53 me see my funnel. Track two, for most of the companies I work with, which are PLG companies, is starting to operationalize that data in order to proactively engage with your most important customers. Makes total sense. And you're doing that on the marketing side and the sales side? Like when you say operationalize, what does that encompass? Yeah. So I think about this as having two domains. One is I call it hand-to-hand combat.
Starting point is 00:28:38 You want to make sure that the people who are talking to customers are talking to the right customers at the right time and they know what to talk about. That's the most important thing like step one even before you've built a sophisticated account and user funnel thing is you need to make sure that the people who are paying you are talking to you hmm yeah and then very often that sounds simple it's's like, yeah, I mean, that sounds obvious, but that's actually quite hard.
Starting point is 00:29:08 Yeah, it's funny. I remember my early days at Heroku, we were faced with this problem, right? We had a ton of paying customers and a really small sales team. We had like three salespeople and tens of thousands of people paying us. So the question of who do we engage with was paramount.
Starting point is 00:29:27 And people were really developing like incredibly sophisticated mechanisms to determine what a promise in account felt like. Right. Someone was like, oh, I think if their usage grows by more than $100 in a week, that's a signal. Right. Someone's like, oh, I think if we see a production database, that's a signal. And my signal was, hey, if they're in the top 20% of revenue, that's the signal, right? Before you start looking at product usage or revenue growth, the first thing we're doing is we're setting up sort of good defense, which means the people who are already paying us the most need to have a relationship with us. Yeah. That was a total tangent. Take me back. What was the original question here, Eric?
Starting point is 00:30:17 Yeah, no, we were talking about like what you do, you know, sort of the uses that you have the day, like when you said operationalizing. And so I was asking, you know, sort of the uses that you have of the data, like when you said operationalizing, and so I was asking, you know, what does operationalizing encompass, right? So, and sales is obviously an example, like which company should our salespeople try to build relationships with? What other things are like on the marketing side? How are you operationalizing the data? Great. So I almost always handle sales before marketing. If you are, you know, every single company I've worked with, their revenue falls into a heavily skewed free distribution, right?
Starting point is 00:30:55 Where it's like a huge number of their, a huge amount of the revenue is coming from a relatively small amount of customers, which means that we have to invest in really good personal relationships with those companies before it makes sense to worry about our long tail. Yep. So sales comes first, and that means two things. One, it means telling your sales people or your account managers or your customer success people who they should be talking to. And the second component of that is telling them when to initiate a conversation and what's
Starting point is 00:31:31 relevant, right? So I'll give you an example. We're just building alerts here. And an alert could mean like, all right, if a customer spends over $1,000 in a month, assign them to a sales rep, table sales rep. But another alert could be if a customer's usage drops by 20% week over week, send that sales rep a Slack message, create a task for that sales rep in the CRM.
Starting point is 00:31:56 Yep. It could also mean if a customer is using this feature of the product, tell the sales rep because this is a good thing to initiate a conversation around. Once... Please, Kostas. Yeah, sorry for interrupting, but one of the things that I observed
Starting point is 00:32:15 here in the conversation is that we talk a lot about signals. And, okay, you gave some heuristics about don't start over-ering, trying to figure out, like, you know, like, the best possible signal there. Start with the basics. Like, if someone is 20%, go and, like, chat with these people. But as you go through, like, what you're saying now, like, you keep coming up, like, with signals.
Starting point is 00:32:44 And it feels to me at least like these signals are very context sensitive, like it's business or like product. Actually, it sounds like it has to do with the product a lot, but tell us a little bit about that. How do you think about finding the right signals there and what they look like. Great. I use a four-step process. Signal number one, existing spend. Good news about this one, it's both your most important signal
Starting point is 00:33:17 and it's pretty easy to capture. Yeah. Signal number two, demographic data, by which I mean company customer size and customer revenue. You'll use a data augmentation tool like Clearbit to get this information, or HubSpot now has this built into their product. You want to make sure that if someone from Nike signs up for your product, they're immediately getting excellent support and a really friendly account manager.
Starting point is 00:33:49 Signal number three. So you're going to get like 80% of the value you can get from these two alone. Okay. And very often, if you're a really staged company, my approach is like, let's build these and then pause. Give your account team time to digest this, see how they do with it, like get people used to this new signal-based assignment world. Make sure that we're actually seeing an effect from this. Let's let this marinate for
Starting point is 00:34:22 at least three months before we touch it at all. Okay let's fast forward in time let's say you've built your foundational signals um your customer team is not overwhelmed and you're trying to eat some more percentage points of growth out now it's time to get to signals three and four. Signal three is usage and I'll say it's sort of choose your own adventure, right? Where you might develop your own intuition about what features signal a propensity for growth and start running experiments. It's pretty easy to measure this stuff, right? So you could just say, let's use the production database example. You might say, all right, I think that everyone who uses a production database
Starting point is 00:35:09 deserves account management. So let's start assigning these folks to account managers, or let's start assigning most of these folks to account managers, and let's measure what happens. And the fourth approach, the most sophisticated and expensive approach, is ML-derived signals,
Starting point is 00:35:35 where instead of you guessing at what the signals are, you dump all your data into a vendor's database, and you say, hey, you tell me what I should be looking for. And that vendor is going to spit back a scoring model for you that you can use to generate signal. I would assume that this last one also requires a substantial amount of data to make sense of that, right? It's not something that's... Because I can see, especially first-time founders with engineering backgrounds,
Starting point is 00:36:14 that they feel like technology can help solve everything, so let's throw a model on this data and see what the model is going to say. But usually the model is just random shit coming out because you don't have data yet that can support that. So I think, I don't know, my experience, at least my advice is like, avoid the sophisticated stuff until you are really growing and have like a lot of data points there and most importantly in my opinion you have personally built intuition about your business so outside of like what the math can
Starting point is 00:36:54 tell also you can like use your intuition like to assess how these things like work or not work but anyway that's my my experience at least. Eric, back to you again. Well, I agree with that because the other thing is, I was actually going to ask you about this, Peter. One of the things you have to think about is that in an early stage company, a lot of things change, right? And so if you try to develop really sophisticated propensity scores, right? But then there's a significant change in your product or the way people use it or those sorts of things. It's like, this stuff is pretty dynamic in early stage companies, right? And so it makes a ton of sense to sort of focus on
Starting point is 00:37:45 the first two steps because those are going to remain stable even, you know, or somewhat stable, right? If you look at existing revenue and then the demographics, right? Even if there are significant changes in your product or model, those are going to be fairly durable. You know, the other thing that makes this tricky is that the ideal signal is not propensity to spend. It's a signal of how much you are able to be influenced by human interaction. Right? So it's easy to get sort of false positive signals
Starting point is 00:38:24 that measure inevitable growth. Right? Like you might say, if you dump all your data into a machine learning model, very likely that model is going to be like, oh, people who put down the credit card are way more likely to spend money. It's like, well, yeah. We know that. But I don't know that everyone who puts on credit card information needs to talk to a salesman yeah we we were doing some we're super
Starting point is 00:38:55 nerdy at rudder second so we had we i wanted to look at a couple different like multi-touch models for our funnel all right let's just see what you know and it's funny like the first one is yeah like here's all the data points or whatever right and it's like people who respond like one of the things is like oh man people who respond to an sdr email are really they're the most likely to go on to have a sales deal, right? And it's like, oh, wow. That's amazing. So yeah, that stuff is super interesting. Costas, I feel like I've been monopolizing the conversation here and you've jumped in. But what questions do you have for Peter?
Starting point is 00:39:43 Yeah. Okay, we've talked about PLG. We gave a definition. That's one of the things that I really want to hear about. But one of the things that I'd like to ask you, Peter, is you've seen growth in tech from, let's say, the early SaaS and cloud days up to today with the craziness of people literally killing
Starting point is 00:40:11 for GPUs out there to do something with AI. And things, again, my feeling is that things change very rapidly, but they also remain constant in some ways. There are some things, like learnings that you can take from the days of Heroku, and they still apply today. So I'd love to hear from you about that. What have you seen that remains, let's say, like constant when it comes like to building growth functions, right? And what has changed because of like, not that I wouldn't say that necessarily the technology,
Starting point is 00:40:58 I would probably say more of like the demand from the markets, because my feeling is that like that drives more change than actual like the technologies out there. But I'd love to hear from you that because you do have like, in my opinion, like a very unique experience going like from all these different like phases of the industry.
Starting point is 00:41:21 What's the same and what's changed? You know know one constant is that if you want to sell to developers you need to talk to your customers and i know this sounds maybe this sounds really obvious but i need to say it in public because I watched so many developer tool startups get built by engineers who would love to not do sales. This is maybe the trap of building a successful PLG company is you might delude yourself into thinking, well, I've got this great open source product. I've got this great free to use product. I've got this great product that anyone can sign up and use for. So all I need to do is build a great product experience and really good documentation and maybe hire a support team
Starting point is 00:42:14 and then growth should just happen, right? But the lesson that's been hammered into me time and time again is that if you want to see real significant venture exciting revenue growth, you need to get serious about account management. And very often that means, and this is tricky, because when you think sales, it might feel somewhat orthogonal to the company culture that you as a technical founder are trying to build, right? You probably don't see yourself as a salesperson. And you, as an engineer, hate being sold to and don't want to have to get on a call with someone to use their product. So figuring out how to integrate account management into a developer tool company, not just from
Starting point is 00:43:02 a technical and operational perspective, but also from an organizational and cultural perspective, is both critical to a company's success and difficult. We probably don't want to hire your standard enterprise sales bro. You want to hire someone who speaks the language, who gets the culture, who knows when to engage, and who's also smart enough to not be the pushy salesperson when that's not the right approach. So that's the constant. You have to do sales, and you have to do sales right,
Starting point is 00:43:39 and it's hard. It's hard in this industry. Yeah. I'd say what's different and unique about machine learning is, boy, is it hard to find margins. You can read about this. We're seeing company after company come out with the fact that they have negative margins. I was just at Replicate and Replicate had negative margins for my entire time there I could talk a lot about why margins are hard
Starting point is 00:44:10 but maybe the TLDR is that like boy is the machinery expensive and I don't like the machines are really expensive and the market expectations are really low and everyone is racing to grab market share so there's a real market temptation to produce negative market products.
Starting point is 00:44:38 That's very interesting. Do you think the answer to that will come from like, I don't know come from hardware, having more availability of hardware there and pushing prices down? Or from a paradigm shift of how products are built on top of this hardware? Because at the end, it wouldn't like to be very reductionist here. At the end, it wouldn't like to be very reductionist here. At the end, a company like Replicate, what sells is like an API on top of GPUs, right?
Starting point is 00:45:14 That's what it is. It gives GPUs to people to go and do it. In the same way that a company like Spark, sorry, like Databricks, what it sells is to a very specific group of people CPU cycles, right? And that's where you start building your margins to the abstractions that you add there
Starting point is 00:45:35 and all these things, like with multi-tenancy and all that stuff. At least with CPUs, at least they have figured out a little bit better, I would say, compared to GPUs, but where do you think the margins will come from at the end? least they have figured out a little bit better, I would say, compared to GPUs. But where do you think the margins will come from at the end? Because they have to, right? It has to turn into a viable market at the end.
Starting point is 00:45:52 We can't just donate hardware out there using money from pension funds at the end, right? Right. Thank you to the school teachers of America for powering their APIs. I'm thinking about where do margins come into question. This is a lot of different levers here. I'll spit out some of the categories that I think about when I think about margins.
Starting point is 00:46:26 One is totally beyond your control. And it's just like as hardware gets cheaper and better, it's easier to find margins. So you could say, like, hey, got bad margins now. We can just wait. Maybe we can just wait and margins will get better. Two, your internal infrastructure engineering is paramount to finding good margins. Trying to figure out how to approach this here. So there's a trade-off. There's a trade-off between latency and cost efficiency.
Starting point is 00:47:12 In any infrastructure product, but it's especially impactful on machine learning products. Let's say you're building an API on top of machine learning model, or you have an API that just serves images. You probably want, you care a ton about your user experience so you need a relatively fast response time and you're already burdened by the fact that machine learning models they're inherently
Starting point is 00:47:36 chunkier than asking your Rails app to tell you what's in the cart or whatever. In order to get the lowest possible response time, you want to keep whatever it is you're serving running all the time. But it's really expensive. And so, you know, what I see companies doing is they're, like, running stuff.
Starting point is 00:48:09 Let me back up a bit. The faster you're able to boot up an instance, the better your margins are. So the ability to, like, quickly load whatever machine learning model you're serving onto your instance and let it serve, let it respond to calls, that has a huge impact on your margins. Consolidating hardware has a big impact on margins, right? You get the best prices when you buy reservations. Do you know what a reservation is? I think so. I mean, it's like you go to someone like AWS and
Starting point is 00:48:46 you pay upfront for a certain amount of computes and usually for a certain period of time also, and you get a better price for that, right? Yeah, exactly. So part of the way that vendors like Heroku or like Replicate find margins is they buy a lot of compute and they commit to using that compute over a long time period. And that allows them to achieve deep, deep discounts. And then they charge a price that's somewhere between, say, the reserve price and the on-demand price, the price that a customer would pay if they're using it without a reservation. This is kind of where you want to be competitive, right? It's nice to be able to offer prices that are similar to on-demand while you're paying reserve prices.
Starting point is 00:49:39 And it also means that if your customers compare to you to running it themselves, they're your compelling even without the products being really stiffy. Does that make sense? It makes a little sense. Yeah. Yeah. Getting this right is tricky, right? Because if your usage is super spiky, it's hard to reserve the right amount of instances. If your usage is distributed across a bunch of different hardware types,
Starting point is 00:50:09 it's hard to find savings on reserved instances. If you want to consistently be on the latest and greatest hardware type, which is coming out in six months, it's hard to be on the best reserved instances. So, you know, back in the days of Heroku, when we were just selling CPU compute, we didn't care that much about being on the latest and greatest instances. Like it was nice,
Starting point is 00:50:33 but a CPU instance is a CPU instance. And that meant that if we'd like bought some three reservations and they eventually became outdated, it was kind of fine. If you are trying to be the hot new machine learning startup, you want to be on the latest and greatest NVIDIA stuff, which means that making a three-year reservation might be painful for you. But if you don't make that reservation,
Starting point is 00:50:58 your margins suck. That was a deep dive into why margins are hard in ML. I'm happy to go deeper or abandon it entirely. No, it's very interesting because it's, I mean, again, I think it's one of the things that remains constant in a way that's, okay, you need to figure out your margins at the end and figuring out the margins is not straightforward,
Starting point is 00:51:22 especially when you're talking about infrastructure products. Right? all, you can't have a simple model of predictive demand. It's not like a SaaS application that it's much easier to, first of all, predict what the use is going to be there. And also you have some very, some very standard tools, like, to improve, like, your latency, for example. You can put, like, caching there, right? Like, these things that we've done, like, in the past, like, I don't know, like, 20 years
Starting point is 00:51:54 when we were, like, application building. But when we're, like, building and selling infra, it's, like, it's quite different, right? And I would like to ask you, do you see, see like a case there that these companies at the end may have to break ties with the cloud providers and like start getting their own hardware
Starting point is 00:52:15 and actually building on top of that? Like, are we going to see like a full cycle going back to the data center again for this like to happen? Because that would be, I don't know, very interesting. Yeah, it's really tempting. I'll try to dig it up. There's a great essay by Martín Tassado
Starting point is 00:52:37 on why buying your own hardware makes sense at a certain scale and the impact that can have on your margins. It is much cheaper. That said, boy, is it hard to run a data center. You know, like you're going to have to really, it's a whole new organization that you need to run and a whole new set of inventory and finance problems that you need to solve. What I'm seeing right now is a scrabble for market share, not margins. And I think as long as we're in that stage where market share trumps profitability, no one's going to be buying their own machines. The mandate is to move as quickly as possible.
Starting point is 00:53:29 And the way to move quickly is to get on a large cloud provider. Yeah. Also, one last question from me, and then I'll give it back to Eric. ML and AI, whatever we like to call it. They're like always two parts of it. Okay. There's inference, which is like what drives actually like the user experience on the end, right?
Starting point is 00:53:52 Like, I mean, like this low latencies and all that stuff, but there's also training and like expensive, right? Like it is like the big investment for the companies. From your experience so far, what is actually driving most of the growth out there for these new AI companies? Is it more the inference part or the training part? Inference, 100% inference. Training is expensive and training can take a huge amount of time.
Starting point is 00:54:27 But most companies are not training perpetually. It's like a discrete stage in their own development and then once that training model you're letting it run for a long time.
Starting point is 00:54:44 And which one of the two has the potential for better margins for the vendor? So I want to zoom out a little bit here. Because I actually think if you're, if you are, I think that reselling infrastructure is a poor business decision. You don't want to look, if you position yourself as an infrastructure reseller, your customers are always going to compare you to buying directly from a cloud provider or running it themselves.
Starting point is 00:55:19 And it's hard to find margins there. I think you want to find your margins on enterprise features and functionality. Okay. Right? Like, yes, you have to price for infrastructure usage, but that's not where you want to make money. That's actually where you want to be competitive.
Starting point is 00:55:38 And then you want to find the margins for feature-specific stuff. Okay. That's interesting. Okay, we need another episode to go through that. I have like a lot of questions about Eric, back to you. Yeah, well, we are we're actually here at the buzzer. Brooks is telling us that we have used all of our allotted time
Starting point is 00:55:57 costs, which is that's about par for the course. Peter, this has been such an awesome show. Uh, we've learned so much and would love to have you back on several topics that we didn't get to. But thanks for giving us some of your time today. My pleasure. Thanks for having me. Thanks for listening to the show. Peter is a consultant who specializes in helping PLG companies drive more revenue with data. If you'd like to connect with Peter about his advisory services, his email is peter at chapman-coaching.com.
Starting point is 00:56:32 That's P-E-T-E-R at chapman-coaching.com. We hope you enjoyed this episode of the Data Stack Show. Be sure to subscribe on your favorite podcast app to get notified about new episodes every week. We'd also love your feedback. You can email me, ericdodds, at eric at datastackshow.com. That's E-R-I-C at datastackshow.com. The show is brought to you by Rudderstack,
Starting point is 00:56:59 the CDP for developers. Learn how to build a CDP on your data warehouse at rudderstack.com.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.