The Data Stack Show - 178: How to Build a Data Stack to Win PLG, Featuring Peter Chapman
Episode Date: February 21, 2024Highlights from this week’s conversation include:Peter's background and journey in data (0:26)Introduction to PLG (4:18)Starting in data at Heroku (6:05)Building the data stack at Heroku (8:13)Data ...stack requirements for early-stage companies (12:00)Differentiating PLG companies from open source companies (19:26)Venture capital and open source as a lever for growth (22:56)Initial data modeling and analysis (25:38)Operationalizing Data (29:16)Sales and Marketing Operationalization (31:52)Identifying Signals (34:16)Challenges in Developing Signals (37:07)Account Management for Developer Tools (42:30)Challenges in Achieving Margins (45:02)Leveraging Infrastructure for Margins (47:35)Inference vs Training (54:55)Final thoughts and takeaways (57:02)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Transcript
Discussion (0)
Welcome to the Data Stack Show.
Each week we explore the world of data by talking to the people shaping its future.
You'll learn about new data technology and trends and how data teams and processes are run at top companies.
The Data Stack Show is brought to you by Rudderstack, the CDP for developers.
You can learn more at rudderstack.com.
Peter, welcome to the Data Stack Show.
Great to be here, Eric.
All right, give us a quick background of your, I guess, data history, data journey.
I can't think of a catchy term.
Well, it started at a company called Roku, where I joined as Roku was just starting to build its business function.
And I ended up building the data and revenue operations teams there.
And spent some time there, had a two-year stint in venture.
And since there have been, I guess the term is a fractional data and revenue leader for a bunch of developer tool startups.
Awesome. Very excited to chat with you today.
Likewise.
Yeah. And now it's based on the script, like my turn to talk. But you know, one of the things
that I've learned and I got introduced to from Peter is improv actually. So I'm going to change a little bit the script, and I'd like to add something here.
So Peter, actually, the reason that I first came to San Francisco,
and when he was at Heavy Beats, he was a customer, actually.
Although I think you were not paying, but...
We had, because we're good at revops here and reporting. actually, although I think you were not paying, but it worked.
We had, because, you know, we're good at the rev up share and like reporting of Glendo.
And I remember like talking with him and he was like the person who actually encouraged
me, like to come to San Francisco in the Silicon Valley.
Like until that time, I was like, Oh, like, okay.
It was just like a, I don't know, like a tiny thing in the other side of the world that
probably nobody cares about. So what I'm going to do there? So that's how I came the first time.
And actually I spent a lot of time at the Heavy Beat offices. I had like a couch there.
I was very cozy. The street out there was also very interesting for me.
That was on the 9th, right?
In San Francisco.
Yes.
It was like deep Soma, I would say.
Yeah, deep Soma.
Anyway, so I'm really happy
to have him here also because
of that, like the personal connection I have
with him, but also he's a person who has like an extremely deep knowledge of how data is
actually used to deliver value to companies.
Starting from Heroku, which was a company pretty much ahead of its time.
From what it seems, seeing like how things like come back today and like companies are like, oh, we're going to rebuild Heroku in 2023.
Now that Salesforce decided that they don't need it.
So he has a very deep knowledge of that. to hear and talk more about the connection of data, how the need for data emerges in
the company, and how it gives sparks, let's say, different functions inside.
Like RevOps, hear about PLG, what it is and why it is important, and understand better
what it is.
And also talk a little bit of how how things have changed from Sheroku to AI
craziness that we have today, right?
Because things change, but also remain constant in a way when it comes to business.
So it's great to have like someone with, have seen things in so many different
like contexts, like from Sheroku to today and, compare and like learn from that.
So that's what I have in my mind.
I'm sure we're going to come up and improve a little bit with
the questions, hopefully.
What about you, Peter?
What would you would like to talk about?
Peter Piotrowski- Well, I'm always excited to talk about PLG.
And question for you, is this, have we beaten this topic to death on this show,
or do we think that the audience
is excited to learn what PLG
means and what a PLG stack looks like?
I don't think
we've ever talked about it.
Okay, great.
We'll see.
I think what we
always try to do is we try to
drive
the conversation
based on like the curiosity, my curiosity and like Eric's curiosity.
So we'll see.
I'm pretty sure that like at some point we will divert.
But I definitely want to learn more about PLG.
It is like one of these things that is, it remains kind of abstract, but it feels important to me.
So yeah, I'd love to hear from you how you think about it and how you implement it also,
because you've done that.
I'm always excited to talk about PLG and developer tools and how you actually make money off
these things.
What do you think, Eric? Should we go on and record?
I'm ready.
I'm ready.
Yeah.
Let's do it.
All right.
We are here with Peter Chapman on the Data Stack Show.
Peter, thanks for joining us.
Great to be here.
Thanks for having me, Eric.
All right.
Well, you gave us a brief background in the intro, but I'm interested
to know, how did you get into data? I mean, you've had sort of an industrious career
across multiple companies at data, but is it something that you were interested in? Did you sort of happen into it?
You know, I studied math in college and I think, so I've always had, I've always had
sort of a quantitative slant and I've enjoyed looking at the world through a quantitative
lens. When I first joined Heroku, I was not a data guy.
I joined to manage partnerships.
And I was hired to be a partner manager.
And, you know, because I'm mathy, I guess, my first step was to be like, all right, well, how do I know?
How do I measure the impact of this work? Like, how do I figure out which partners are bringing us revenue and which partners I should
spend time in? And like, can I ask for more money to invest in partnerships? Is there ROI there?
And so I started trying to run some reports and it was really hard. Like we just didn't have
good foundational reporting about revenue. So I ended
up building a sort of holistic model of revenue at Heroku, just so I could understand how I fit
into the picture, right? I said like, all right, this is what I think revenue looks like. Most of
it comes from customers that look like this, and customers grow like this. And attrition looks like this. Oh, and by the way, partners do this.
And that was interesting enough to leadership at the time that they were like, oh, do that.
You know, like we can get someone else to do the partnership stuff.
But understanding our business from an end-to-end way and seeing what the levers are, that feels really useful.
Let's get you a team.
So that's how I stumbled into it.
Wow.
That's super interesting.
One thing, actually, I'm interested to know, you talked about the reporting being really difficult.
Yeah.
What was the stack like and how did you, you know know was there a stack in place or were you
hitting the financial system and querying prod to figure stuff out yeah i mean we were
querying copies of prod roku at the time it had this very mixed blessing of having hired
mostly engineers which meant that...
And the product people were also incredibly technical.
All of them were fluent in SQL.
So it meant that from time to time, a product person would be like, oh, I wonder how this
is doing.
And they'd just query a database and produce what we call the data clip, which is like
a saved SQL query.
But there was no data warehouse.
There was no BI system.
And as you can imagine, asking questions that spanned multiple sources of data was not impossible.
So a lot of what I did over those early years was build a data warehouse, install Looker,
build the sort of fundamental infrastructure we needed to both run reports
and operate the business.
Do you, when you look back on that, is there anything you would have done differently in
building out that stack?
Well, the tooling was so different back then.
This was like pre-DBT, the ETL tools available were less good. We were doing a bunch of database copies
and I wouldn't do that today,
but that's what was sort of available to us at the time.
Yeah.
Yep. Super interesting.
One thing, so we were talking about this a little bit
as we were chatting before the show,
but you've built stacks a number of different times at a
number of different companies, both large organizations and startups. But let's focus
on maybe a startup or maybe more of a blank slate. Maybe Heroku is a good example, right?
In modern day where there really is no stack in place, right? There's no data warehouse.
Where would you start? What is your sort of minimum
viable stack if you were doing that again today, knowing what you know now, obviously?
Okay, so let's start with the requirements. When I first start working with companies,
the first conversation I have with them is I say, walk me through your funnel. Show me everything
you understand about your business from website visit to payment and upsell and one of two things happens either you do walk me through your funnel
or you as a founder look kind of embarrassed you're like you know like here's something you
know i know a little bit from stripe and then like here's i think this is what google analytics
is telling me like i have a kind of
fuzzy picture what's going on the marketing side but like this end-to-end funnel you're talking
about i don't actually know right like i can't actually tell you the roi of a sign up
and if that's where you are and this is pretty common for the companies i work with i go great
step one is going to be to help you build this. And to build this,
we're going to have to start dumping a bunch of disparate data into your data warehouse.
So kind of step zero for me is something that consolidates data, right? So something like,
I'm going to reference Red Hat Stack.
Something that pushes data from SaaS services into your data warehouse, something that pushes
data from your production database into your data warehouse.
You almost definitely need DBT from day one.
It's going to make your life easier.
I'll stop there.
I think that's step one.
And we can talk about steps two later.
How many companies do you...
The companies that you work with,
how many are looking at Stripe and Google Analytics?
Because maybe a better way to answer that is
the tools have become very accessible today.
And so are more companies adopting that stack?
Or is it just that the vendors want to believe that?
No one.
It's really rare for companies I've worked with.
Well, if you're a seed or a early series A company, and you don't have a full-time marketing team, your Google Analytics is completely neglected.
And website conversion data is totally ignored.
Yep.
Because Stripe is a really easy way to get revenue data and all companies care about revenue, you might be pulling Stripe for financial information, but as I just mentioned,
Stripe gives you no ability to exceed conversion rates
or understand attrition or growth within accounts.
Yep.
How many people...
Oh, yeah. Go ahead, Kostas.
One question here,
because you are both talking about the moments where, said, like, Peter, you asked the question, like, okay, take me through like your funding.
And I think it's like also here a question about like timing. company, right, to actually engage in this conversation and try to formalize, let's say,
the funnel and the business itself, right?
Because that's what it is at the end.
The funnel is a representation of how this thing that we call a company actually generates
value for both sides, right?
So when is the right time to do that?
Because, okay, I can see also people overengineering that too early in a way.
Totally.
Right?
So based on your experience, when should people do that?
Well, I advise companies to start thinking about their data stack from day zero. And that doesn't mean you have to buy a DBT
and a bunch of data connectors
and spin up a large data warehouse.
But just having a rough roadmap, right?
Like having a very simple Google Doc that's like,
hey, here's the reporting we want to get to.
Here's what we think we'll need to get there.
Can help avoid a lot of architectural pain later.
So much of this stuff is easier to build than rewire.
Mm-hmm.
Right?
I'd say the timing of when you build your stack
is a function of your go-to-market approach.
So if you're doing top-down enterprise sales, give me an example of your favorite
enterprise product, Kostas. My favorite enterprise product? Oh,
isn't there such a thing like favorite enterprise product? You don't think about this all the time?
Well, let's say, I don't know, like buying IBM, whatever that means.
Let's say you're selling a security product.
There's no bottoms up motion. You're just selling a security product. Yeah. And there's no bottoms-up motion.
You're just selling directly to teams.
You could probably get a lot of what you need out of Salesforce or pick your favorite CRM, provided you have good discipline around Salesforce, right?
Like, maybe you have a contact us button on your website and you want to see how that converts.
And you're doing a bunch of outbound SDR stuff and you need to see how that converts.
But because your funnel is a sales funnel, as long as you're instrumenting your CRM correctly, you don't need a lot of sophisticated reporting out of that.
And in fact, in this case, you may not even be using Stripe, right?
You're just invoicing your clients.
So if that's who you are, top-down enterprise sales, I live and die by the success of my sales team, you don't need BI for a while.
In fact, Salesforce, like all these tools, comes with their own tool-specific analytics suite.
So you can get away with just using the Salesforce reports or just using the HubSpot reports.
That's option one, enterprise top-down sales.
Option two, PLG, our favorite acronym.
You might also hear me refer to it as bottoms-up.
When I say PLG, I'm talking about companies
that people can start using without talking to a sales rep and then
have a lot of organic usage. A PLG company is a company where most of your users start using you
without ever talking to you. And then it's your job to figure out which of those users to talk to.
If that's the world you're in, building a cohesive data stack becomes a lot more important
because now your sales pipeline is just a sliver of the information you need, right?
There's some important activity happening in your CRM that you absolutely need to track,
but a lot of really important activity that determines the overall success of your funnel
is happening
on your website and it's happening with your product. So tying all that together becomes a
lot more paramount and needs to happen a lot earlier in a company's journey. Okay. That makes
sense. Sorry. One last question, Eric, and then I'll give the microphone back to you.
This is great.
But you mentioned like the bottom shop motion there, and you said you have users that they
interact with the company and the product, and probably they won't even talk to anyone, right?
Yeah.
There is also, especially in developers to link, there's a lot of... Part of it is also the open source. So is open source and product-led growth
compatible things or different things?
How do you combine them in a way?
And the reason I'm asking you is because
I've seen and I've experienced at Starburst
the most traditional version of that,
which is we have an open source project used by all the Fortune 500 companies out there in production on their own, having actual people getting paid, right?
To set it up and run it.
And then a company that's trying to monetize that.
But at the same time, the company itself is a very sales-driven company, right?
So it's kind of like a weird mix of having the extreme version of Bottoms Up in a way.
But at the same time, the company itself is doing the more traditional enterprise sales-driven motion,
like getting implemented there.
So how does this thing work together and how they have also changed?
Because what Trino is doing or what Spark was doing in 2010,
it's not necessarily what companies in 2023 that they want to incorporate
open source in their strategies are doing.
Things have probably changed.
So what's your take on that?
Okay, so I'm going to answer a slightly different question.
OK.
I'm going to answer a slightly different question, which
is, what is a PLG card?
I would say that there's three ingredients.
One, you need a product that is immediately valuable to one person, ideally, or a single team.
Two, there needs to be an organic way for usage to grow from one person to many people or from moderate usage to lots of usage, right? So that could mean one user invites another user invites another user
or could mean I build a prototype app and then I move it to production and then I build more apps.
And the third thing you need for this motion to be successful is you need a way to talk to your
users without being weird. Open source companies, most open source companies that I've worked with
get number one right.
Sometimes they get number two right.
But if you're really open source,
you don't have number three, right?
You don't have any way of contacting your users.
You're not getting any information about them.
And even if you do get information about them,
it would actually feel really strange to reach out to them
because you're sort of breaking the expectation of open source.
So unless you've built a product where customers are actually logging in
and having sort of an in-product experience,
your open source company is not a PLG company.
OK.
But you can turn an open source company into a PLG company
if you implement the third part, right?
Yeah.
OK, that's interesting.
That's like also like a lot of we
had experience like that when I was at Rutherstack
because we were trying to do that.
Anyway, it's like a huge conversation
and I think it's really hard for people to do it right.
Open source is always like tricky.
But anyway, Eric, back to you.
I don't want to...
Yeah, it really is an interesting conversation
because I think that...
And Peter would be interested in your thought on this, right?
I mean, there are obviously examples, multiple examples that we could all think of gigantic companies that grew out of open source projects, right?
But when we think about technical tools, data tools, etc., it is really strange.
You have this ethos of open source, but then you mix in venture capital.
And a very classic way in the DevTools space is to take an existing open source project that you have built and go to venture capitalists and say,
look how many stars I have,
look how many downloads I have.
I am building the hosted version of this.
Can I have money?
Investors love investing in tools like this
because there's already a proven user base
and a proven use case, right?
The mechanics of getting it right are tricky.
I've seen a lot of companies build really successful open source projects and struggle
to build successful commercial offerings on top of them.
One of the things that makes it hard is you're always competing against that free...
Yourself.
Yeah, exactly!
So if you've built a really
good open source project that's easy to install
and use,
it can be very difficult to
build a hosted version of that
that's actually competitive.
Right. Yep.
Yeah, that's
a super helpful perspective.
Yeah, it is interesting.
Like the success of open source is actually a limiting factor in your commercial growth, right?
But it's also an ingredient to the success of the technology generally, which is quite a needle to thread. One thing that we were talking about
a little bit before the show
was how you use some of the data
that is being combined in the warehouse, right?
So let's use the example that you talked about,
which is you have product usage data,
you have financial data,
you have multiple sources like going you have financial data, you have multiple
sources like going into the warehouse, right? And someone's going to use a tool like dbt or
write SQL in order to model, you know, model that data. What are the first ways, let's say you get
that stack set up, what are the first things that you do with that? What is your, you know, sort of
in sequence, like the playbook of like, okay, here's where
we start once we have all that data.
Yeah.
So the first thing I build almost all the time is a table where the rows are accounts
and the columns are timestamps and the timestamps represent important lifecycle events.
So I want to see first website visit,
first sign-in, first product usage.
Depending on the product,
I might look for things like first invitation sent,
first invitation accepted,
first payment above 100, first payment above 1,000.
The reason this is so important is because without this consolidated account-based view,
it's really hard to know where to focus. And in particular,
a lot of the tools,
if you're not building a consolidated view,
it's easy to get confused
because the tools
that you're working with
will give you user-specific views.
But if you're building
a team-based product,
your user conversion rate
is different from your
account conversion rate,
and that's a super important distinction.
Tell me if I'm going, if this is too wonky for you.
No, no, it makes total sense.
Yeah.
It makes total sense.
And so actually like the way we do this, you know, if we're, if I were to sort of get
into the weeds of dbt is I actually first want to see this at a user level.
All right. So like by by user when do they first
as the website sign up use the product start paying and then i want to aggregate that at an
account level and take the minimum time for each user at that account level so i can get that
consolidated fun yep okay so that's one thing we're doing is we're building this foundational model that
has users with timestamps, accounts
with timestamps.
Oh, and part of this,
that's always a fun part,
is actually defining what an account
is. Because you're going to have a bunch of
competing, right?
At almost all the companies I've worked with,
you've got
an internal representation called an organization or a team that comes from your product.
Yep.
Then hopefully you're using a CRM.
Let's just say you're using Salesforce.
So Salesforce has an idea of what an account is.
It's called an account in Salesforce.
Those are probably different definitions.
Yes.
And it's possible that a single company in Salesforce
has multiple teams associated with it.
Then you've got this third heuristic, which is that people who sign up
from the same non-free email address are probably from the same company.
So part of what you're doing is you're building this foundational data model
is you're just getting the entire company to agree
on what is an account?
Yep.
What is the source of truth for what an account is?
So that like when my sales team talks about,
you know, revenue by company,
they're using the exact same language
as my marketing team and my product team
and my finance team.
Yep.
Okay, so that's kind of track one, which is just like building this foundational user and account model that lets
me see my funnel. Track two, for most of the companies I work with, which are PLG companies, is starting to operationalize that data
in order to proactively engage with your most important customers.
Makes total sense.
And you're doing that on the marketing side and the sales side?
Like when you say operationalize, what does that encompass?
Yeah.
So I think about this as having two domains.
One is I call it hand-to-hand combat.
You want to make sure that the people
who are talking to customers
are talking to the right customers
at the right time
and they know what to talk about.
That's the most important thing like step one even before you've built a sophisticated
account and user funnel thing is you need to make sure that the people who are paying you
are talking to you hmm yeah and then very often that sounds simple it's's like, yeah, I mean, that sounds obvious, but that's actually quite hard.
Yeah, it's funny.
I remember my early days at Heroku,
we were faced with this problem, right?
We had a ton of paying customers
and a really small sales team.
We had like three salespeople
and tens of thousands of people paying us.
So the question of who do we engage with was paramount.
And people were really developing like incredibly sophisticated mechanisms to determine what a promise in account felt like.
Right. Someone was like, oh, I think if their usage grows by more than $100 in a week, that's a signal.
Right. Someone's like, oh, I think if we see a production database, that's a signal. And my signal was, hey, if they're in the top 20% of revenue,
that's the signal, right? Before you start looking at product usage or revenue growth,
the first thing we're doing is we're setting up sort of good defense,
which means the people who are already paying us the most need to have a relationship with us.
Yeah.
That was a total tangent. Take me back. What was the original question here, Eric?
Yeah, no, we were talking about like what you do, you know, sort of the uses that you have
the day, like when you said operationalizing. And so I was asking, you know, sort of the uses that you have of the data, like when you said operationalizing, and so I was asking, you know, what does operationalizing encompass, right? So,
and sales is obviously an example, like which company should our salespeople
try to build relationships with? What other things are like on the marketing side?
How are you operationalizing the data?
Great. So I almost always handle sales before marketing.
If you are, you know, every single company I've worked with, their revenue falls into
a heavily skewed free distribution, right?
Where it's like a huge number of their, a huge amount of the revenue is coming from
a relatively small amount of customers, which means that we have to invest in really good personal relationships with
those companies before it makes sense to worry about our long tail.
Yep.
So sales comes first, and that means two things.
One, it means telling your sales people or your account managers or your customer success
people who they should be talking to.
And the second component of that is telling them when to initiate a conversation and what's
relevant, right?
So I'll give you an example.
We're just building alerts here.
And an alert could mean like, all right, if a customer spends over $1,000 in a month,
assign them to a sales rep, table sales rep.
But another alert could be if a customer's usage drops by 20% week over week,
send that sales rep a Slack message,
create a task for that sales rep in the CRM.
Yep.
It could also mean if a customer is using this feature of the product,
tell the sales rep because this is a good thing
to initiate a conversation around.
Once...
Please, Kostas.
Yeah, sorry for interrupting,
but one of the things that I observed
here in the conversation is that we talk a lot about
signals.
And, okay, you gave
some heuristics about
don't start over-ering, trying to figure out, like, you know, like, the best possible signal there.
Start with the basics.
Like, if someone is 20%, go and, like, chat with these people.
But as you go through, like, what you're saying now, like, you keep coming up, like, with signals.
And it feels to me at least like
these signals are very context sensitive, like it's business or like product. Actually, it sounds
like it has to do with the product a lot, but tell us a little bit about that. How do you think about
finding the right signals there and what they look like.
Great.
I use a four-step process.
Signal number one, existing spend.
Good news about this one, it's both your most important signal
and it's pretty easy to capture.
Yeah.
Signal number two, demographic data,
by which I mean company customer size and customer revenue.
You'll use a data augmentation tool like Clearbit to get this information, or HubSpot now has this
built into their product. You want to make sure that if someone from Nike signs up for your
product, they're immediately getting excellent support
and a really friendly account manager.
Signal number three.
So you're going to get like 80% of the value you can get
from these two alone.
Okay.
And very often, if you're a really staged company,
my approach is like, let's build these and then pause. Give your account team time to
digest this, see how they do with it, like get people used to this new signal-based assignment
world. Make sure that we're actually seeing an effect from this. Let's let this marinate for
at least three months before we touch it at all. Okay let's fast forward in time let's say you've built your foundational signals
um your customer team is not overwhelmed and you're trying to eat some more percentage points
of growth out now it's time to get to signals three and four. Signal three is usage and I'll say it's sort of choose your
own adventure, right? Where you might develop your own intuition about what features signal
a propensity for growth and start running experiments. It's pretty easy to measure this
stuff, right? So you could just say, let's use the production database example.
You might say, all right,
I think that everyone who uses a production database
deserves account management.
So let's start assigning these folks to account managers,
or let's start assigning most of these folks
to account managers,
and let's measure what happens.
And the fourth approach,
the most sophisticated and expensive approach,
is ML-derived signals,
where instead of you guessing at what the signals are,
you dump all your data into a vendor's database,
and you say, hey, you tell me what I should be looking for.
And that vendor is going to spit back a scoring model for you
that you can use to generate signal.
I would assume that this last one also requires
a substantial amount of data to make sense of that, right?
It's not something that's... Because I can see, especially first-time founders with engineering backgrounds,
that they feel like technology can help solve everything,
so let's throw a model on this data and see what the model is going to say.
But usually the model is just random shit coming out
because you don't have data yet that can support that.
So I think, I don't know, my experience, at least my advice is like,
avoid the sophisticated stuff until you are really growing
and have like a lot of data points there and most importantly in my opinion
you have personally built intuition about your business so outside of like what the math can
tell also you can like use your intuition like to assess how these things like work or not work
but anyway that's my my experience at least. Eric, back to you again.
Well, I agree with that because the other thing is, I was actually going to ask you about this,
Peter. One of the things you have to think about is that in an early stage company,
a lot of things change, right? And so if you try to develop really sophisticated propensity scores, right? But then there's a significant
change in your product or the way people use it or those sorts of things. It's like,
this stuff is pretty dynamic in early stage companies, right? And so it makes a ton of
sense to sort of focus on
the first two steps because those are going to remain stable even, you know, or somewhat stable,
right? If you look at existing revenue and then the demographics, right? Even if there are
significant changes in your product or model, those are going to be fairly durable. You know, the other thing that makes this tricky
is that the ideal signal is not propensity to spend.
It's a signal of how much you are able to be influenced
by human interaction.
Right?
So it's easy to get sort of false positive signals
that measure inevitable growth.
Right?
Like you might say, if you dump all your data into a machine learning model, very likely
that model is going to be like, oh, people who put down the credit card are way more
likely to spend money.
It's like, well, yeah.
We know that. But I don't know that everyone who puts on
credit card information needs to talk to a salesman yeah we we were doing some we're super
nerdy at rudder second so we had we i wanted to look at a couple different like multi-touch models
for our funnel all right let's just see what you know and it's funny like the first one is yeah like here's all the data points or whatever
right and it's like people who respond like one of the things is like oh man people who respond to an
sdr email are really they're the most likely to go on to have a sales deal, right? And it's like, oh, wow.
That's amazing.
So yeah, that stuff is super interesting.
Costas, I feel like I've been monopolizing the conversation here and you've jumped in.
But what questions do you have for Peter?
Yeah.
Okay, we've talked about PLG.
We gave a definition.
That's one of the things that I really want to hear about.
But one of the things that I'd like to ask you, Peter,
is you've seen growth in tech
from, let's say, the early SaaS and cloud days
up to today with the craziness of people literally killing
for GPUs out there to do something with AI.
And things, again, my feeling is that things change
very rapidly, but they also remain constant in some ways.
There are some things, like learnings that you can take
from the days of Heroku, and they still apply today.
So I'd love to hear from you about that.
What have you seen that remains, let's say, like constant when it comes like to building growth functions, right?
And what has changed because of like, not that I wouldn't say that necessarily the technology,
I would probably say more of like the demand from the markets,
because my feeling is that like that drives more change
than actual like the technologies out there.
But I'd love to hear from you that
because you do have like, in my opinion,
like a very unique experience
going like from all these different
like phases of the industry.
What's the same and what's changed?
You know know one constant is that if you want to sell to developers
you need to talk to your customers and i know this sounds maybe this sounds really obvious
but i need to say it in public because I watched so many developer tool startups get built by engineers who would love to not do sales.
This is maybe the trap of building a successful PLG company is you might delude yourself into thinking, well, I've got this great open source product.
I've got this great free to use product.
I've got this great product that anyone can sign up and use for. So all I need to
do is build a great product experience and really good documentation and maybe hire a support team
and then growth should just happen, right? But the lesson that's been hammered into me time and time
again is that if you want to see real significant venture exciting revenue growth,
you need to get serious about account management. And very often that means, and this is tricky,
because when you think sales, it might feel somewhat orthogonal to the company culture
that you as a technical founder are trying to build, right? You probably don't see yourself as a salesperson.
And you, as an engineer, hate being sold to and don't want to have to get on a call with
someone to use their product.
So figuring out how to integrate account management into a developer tool company, not just from
a technical and operational perspective, but also from an
organizational and cultural perspective, is both critical to a company's success and difficult.
We probably don't want to hire your standard enterprise sales bro. You want to hire someone
who speaks the language, who gets the culture,
who knows when to engage,
and who's also smart enough to not be the pushy salesperson when that's not the right approach.
So that's the constant.
You have to do sales, and you have to do sales right,
and it's hard.
It's hard in this industry.
Yeah.
I'd say what's different and unique about machine learning is, boy, is it hard to find margins.
You can read about this.
We're seeing company after company come out with the fact that they have negative margins.
I was just at Replicate and Replicate had negative margins for my entire time there
I could talk a lot about why margins are hard
but maybe the TLDR is that like boy is the machinery expensive and I don't like
the machines are really expensive and the market expectations
are really low
and everyone is racing
to grab market share
so there's a real market
temptation to
produce negative market products.
That's very
interesting. Do you think the answer to that
will come
from
like, I don't know come from hardware, having more availability of
hardware there and pushing prices down? Or from a paradigm shift of how products are built on top
of this hardware? Because at the end, it wouldn't like to be very reductionist here. At the end, it wouldn't like to be very reductionist here.
At the end, a company like Replicate, what sells is like an API on top of GPUs, right?
That's what it is.
It gives GPUs to people to go and do it.
In the same way that a company like Spark, sorry, like Databricks, what it sells is to a very specific group of people
CPU cycles,
right?
And that's where you start
building your margins
to the abstractions that you add there
and all these things, like with multi-tenancy
and all that stuff.
At least with CPUs, at least they have
figured out a little bit better, I would say,
compared to GPUs, but where do you think the margins will come from at the end? least they have figured out a little bit better, I would say, compared to GPUs.
But where do you think the margins will come from at the end?
Because they have to, right?
It has to turn into a viable market at the end.
We can't just donate hardware out there
using money from pension funds at the end, right?
Right. Thank you to the school teachers of America
for powering their APIs.
I'm thinking about where do margins come into question.
This is a lot of different levers here.
I'll spit out some of the categories
that I think about when I think about margins.
One is totally beyond your control.
And it's just like as hardware gets cheaper and better, it's easier to find margins.
So you could say, like, hey, got bad margins now.
We can just wait.
Maybe we can just wait and margins will get better. Two, your internal infrastructure engineering is paramount to finding good margins.
Trying to figure out how to approach this here.
So there's a trade-off.
There's a trade-off between latency and cost efficiency.
In any infrastructure product, but it's especially impactful on machine learning products.
Let's say you're building an API on top of machine learning model, or you have an API that just serves images.
You probably want, you care a ton about your user experience so you need
a relatively fast response time
and you're
already burdened by the fact that
machine learning models
they're inherently
chunkier
than asking
your Rails app to tell you what's in the cart
or whatever.
In order to get the lowest possible response time,
you want to keep whatever it is you're serving running all the time.
But it's really expensive.
And so, you know, what I see companies doing is they're, like, running stuff.
Let me back up a bit.
The faster you're able to boot up an instance, the better your margins are.
So the ability to, like, quickly load whatever machine learning model you're serving onto your instance and let it serve, let it respond
to calls, that has a huge impact on your margins.
Consolidating hardware has a big impact on margins, right?
You get the best prices when you buy reservations.
Do you know what a reservation is?
I think so. I mean, it's like you go to someone like AWS and
you pay upfront for a certain amount of computes and usually for a certain period of time also,
and you get a better price for that, right? Yeah, exactly. So part of the way that vendors like Heroku or like Replicate find margins is they buy a lot of compute and they commit to using that compute over a long time period.
And that allows them to achieve deep, deep discounts.
And then they charge a price that's somewhere between, say, the reserve price and the on-demand price,
the price that a customer would pay if they're using it without a reservation.
This is kind of where you want to be competitive, right?
It's nice to be able to offer prices that are similar to on-demand
while you're paying reserve prices.
And it also means that if your customers compare to you
to running it themselves, they're
your compelling even without the products being really stiffy. Does that make sense?
It makes a little sense. Yeah.
Yeah. Getting this right is tricky, right? Because if your usage is super spiky,
it's hard to reserve the right amount of instances.
If your usage is distributed
across a bunch of different hardware types,
it's hard to find savings on reserved instances.
If you want to consistently be on the latest
and greatest hardware type,
which is coming out in six months,
it's hard to be on the best reserved instances.
So, you know, back in the days of Heroku,
when we were just selling CPU compute,
we didn't care that much about being on the latest and greatest instances. Like it was nice,
but a CPU instance is a CPU instance. And that meant that if we'd like bought some three
reservations and they eventually became outdated, it was kind of fine. If you are trying to be the hot
new machine learning startup,
you want to be on the latest and greatest
NVIDIA stuff, which means
that making a three-year reservation
might be painful for you.
But if you don't make that reservation,
your margins suck.
That was a deep dive
into why margins are hard in ML.
I'm happy to go deeper or abandon it entirely.
No, it's very interesting because it's, I mean,
again, I think it's one of the things that remains constant
in a way that's, okay, you need to figure out your margins at the end
and figuring out the margins is not straightforward,
especially when you're talking about infrastructure products. Right? all, you can't have a simple model of predictive demand.
It's not like a SaaS application that it's much easier to, first of all, predict what
the use is going to be there.
And also you have some very, some very standard tools,
like, to improve, like, your latency, for example.
You can put, like, caching there, right?
Like, these things that we've done, like, in the past,
like, I don't know, like, 20 years
when we were, like, application building.
But when we're, like, building and selling infra,
it's, like, it's quite different, right?
And I would like to ask you,
do you see, see like a case there
that these companies at the end
may have to break ties with the cloud providers
and like start getting their own hardware
and actually building on top of that?
Like, are we going to see like a full cycle
going back to the data center again
for this like to happen?
Because that would be, I don't know, very interesting.
Yeah, it's really tempting.
I'll try to dig it up.
There's a great essay by MartÃn Tassado
on why buying your own hardware makes sense at a certain scale
and the impact that can have on your margins.
It is much cheaper.
That said, boy, is it hard to run a data center.
You know, like you're going to have to really, it's a whole new organization that you need to run and a whole new set of inventory and finance problems that you need to solve.
What I'm seeing right now is a scrabble for market share, not margins.
And I think as long as we're in that stage where market share trumps profitability, no one's going to be buying their own machines.
The mandate is to move as quickly as possible.
And the way to move quickly is to get on a large cloud provider.
Yeah. Also, one last question from me,
and then I'll give it back to Eric.
ML and AI, whatever we like to call it.
They're like always two parts of it.
Okay.
There's inference, which is like what drives actually like the user experience on the end,
right?
Like, I mean, like this low latencies and all that stuff, but there's also training
and like expensive, right?
Like it is like the big investment for the companies.
From your experience so far,
what is actually driving most of the growth out there for these new AI companies?
Is it more the inference part or the training part?
Inference, 100% inference.
Training is expensive and training can take a huge amount of time.
But
most companies are not training
perpetually.
It's like
a discrete stage in their own
development and then once that training model
you're letting it run
for a long time.
And which one of the two has the potential for better margins for the vendor?
So I want to zoom out a little bit here.
Because I actually think if you're, if you are,
I think that reselling infrastructure is a poor business decision.
You don't want to look, if you position yourself
as an infrastructure reseller, your customers are always going
to compare you to buying directly from
a cloud provider or running it themselves.
And it's hard to find margins there. I think
you want to find your margins
on enterprise features and functionality.
Okay.
Right?
Like, yes, you have to price for infrastructure usage,
but that's not where you want to make money.
That's actually where you want to be competitive.
And then you want to find the margins
for feature-specific stuff.
Okay.
That's interesting.
Okay, we need another episode to go through that.
I have like a lot of questions about Eric, back to you.
Yeah, well, we are we're actually here at the buzzer.
Brooks is telling us that we have used all of our allotted time
costs, which is that's about par for the course.
Peter, this has been such an awesome show.
Uh, we've learned so much and
would love to have you back on several topics that we didn't get to. But thanks for giving
us some of your time today. My pleasure. Thanks for having me.
Thanks for listening to the show. Peter is a consultant who specializes in helping PLG
companies drive more revenue with data. If you'd like to connect with Peter about his advisory services,
his email is peter at chapman-coaching.com.
That's P-E-T-E-R at chapman-coaching.com.
We hope you enjoyed this episode of the Data Stack Show.
Be sure to subscribe on your favorite podcast app
to get notified about new episodes every week.
We'd also love your feedback.
You can email me, ericdodds, at eric at datastackshow.com.
That's E-R-I-C at datastackshow.com.
The show is brought to you by Rudderstack,
the CDP for developers.
Learn how to build a CDP on your data warehouse
at rudderstack.com.