Y Combinator Startup Podcast - #111 - Jake Klamka and Kevin Hale
Episode Date: February 6, 2019Jake Klamka founded Insight. Insight provides intensive 7 week professional training fellowships in fields such as data science and data engineering. Insight was in the YC Winter 2011 batch.Kevin Hal...e is a Visiting Partner at YC. Before YC Kevin was the cofounder of Wufoo, which was funded by YC in 2006 and acquired by SurveyMonkey in 2011.You can find Jake on Twitter at @jakeklamka and Kevin at @ilikevests.The YC podcast is hosted by Craig Cannon.***Topics00:37 - Kevin's intro01:07 - Jake's intro1:42 - Applying to YC with one product then changing it4:07 - How Insight started4:57 - Jake's first students and initial coursework8:37 - Finding out what companies want from data scientists10:37 - Picking the first class of students12:07 - Common pitfalls for people transitioning into data science15:07 - Types of data science roles17:22 - What data scientists should look out for in companies18:17 - Chuck Grimmett asks - When do you know you need to bring in seasoned data scientists?20:37 - How Insight has scaled and changed22:37 - What happens in the program23:57 - Examples of a good project for a data science resume26:27 - Will more data scientists be founders in the future?28:37 - Teaching product29:37 - Cleaning data32:07 - Tools for tracking data32:57 - Track what are you trying to optimize35:57 - Churn and conversion39:37 - Is there an ideal background for a data scientist?41:37 - Which startups recruit well at Insight?43:37 - Contracting46:17 - Fields Jake is excited about
Transcript
Discussion (0)
Hey, how's it going? This is Craig Cannon, and you're listening to Y Combinators podcast.
Today's episode is with Jake Clampka and Kevin Hale. Jake founded Insight. Insight provides intensive
seven-week professional training fellowships and fields such as data science and data engineering.
Insight was in the YC 2011 batch. Kevin's a visiting partner at YC. Before YC, Kevin was a co-founder of Wufu,
which was funded by YC in 2006 and acquired by SurveyMonkey in 2011.
You can find Jake on Twitter at Jake Clampka and Kevin at I like best.
All right, here we go.
So, Kevin, for those of our listeners that don't know who you are, what's your deal?
I'm a partner here at Wycombinator.
I actually was in the second ever bachelor.
I was in winter 2006, and I founded a company called Woofu.
We ran that for five years, and then we were acquired by Servie Monkey, and that moved us from Florida to California,
and that's when PG asked if I'd be interested in helping out at WhiteFoo.
I see. And I've been there pretty much ever since.
Yeah. And you suggested Jake as a guest for this episode.
So Jake, what do you do?
So I'm the founder and CEO of Insight. So Insight is an education and company.
We run a fellows programs that help scientists and engineers transition to careers in data science and AI.
And it's a pretty unique model because they're completely free of these fellowships.
They're full time. The company sort of fund the process.
Engineer, scientists, build projects for seven weeks.
They meet top data teams and they get hired on those teams.
We've got over 2,000 Insight alumni working as data scientists now across the US and Canada.
Nice.
And you always haven't been working on this.
So you applied to YC for the winner 2011 batch.
That's right.
Yeah.
And what was your idea then?
So I was back in, so I started my career and this is relevant to why I started insight because I basically started, I wish I existed when I was around.
I was a physicist.
Yeah.
At the university, Toronto.
I thought it was going to be a scientist for the last of my life.
And then partway through my PhD, I realized I want to go into technology.
and I think to myself, I'm writing code, I'm building machine learning models, this is great, I've got what I need.
And it frankly took me a long time to transition.
Eventually got a Wai Combinator, came down here from the Winter 11 batch.
I was building a bunch of time mobile sort of productivity apps that are machine learning enabled.
And didn't quite get the up into the right graph that you would hope for after YC.
But it was an incredible experience.
And, you know, in those sort of, in that sort of late 2011 after, called six, 12 months after YC was searching for a new idea.
And actually went, spoke with Paul Graham and a few other advisors.
And the recommendation was work on a problem you yourself have.
You're kind of building these apps that, you know, you're trying to use these machine learning models.
And hopefully somebody's got that as a problem.
But flip it around.
Start with a problem you've had, then figure out what the solution is.
And when I reflected on it, it took me a few years to really make.
this transition. I've been so close all along, but I didn't know product. I wasn't really
connected in the valley. There's a bunch of, you know, technically I had the fundamentals, but a lot of
the tool sets were different in the industry. So I didn't know what I didn't know. And when I got down
here and I started talking to people, that's when I finally started figuring it out. And I was seeing
a lot of my friends having that same struggle. So brilliant, you know, mathematicians, neuroscientists,
biologists, also engineers later, we found the same thing, kind of getting stuck. And like,
they're like, I want to go into data science.
I want to go into AI.
I want to go into these cutting edge fields.
But, you know, it doesn't say the right thing on my resume or I'm kind of like, you know,
just getting that last mile is really hard to, to cross.
And I thought, okay, well, this is the problem I want to solve because these are some of
the most brilliant people I'd ever work with.
A lot of them were my former colleagues from physics.
And I thought, what does the solution for this look like?
And at first, I was focused on it's going to be an app again, right?
It's some machine learning enabled app.
And then I realize now it actually probably looks more like an in-person program where folks are getting together, building cool projects and then getting started from there.
And so did you just go ahead and teach a class?
Yeah.
So I basically, you know, started talking.
First, I talked to companies and I said, listen, I've got these brilliant friends coming out of academia who I think you should be hiring.
Why aren't you hiring them?
And basically what they told me is I know they're brilliant.
I know they got all these great skills.
but they're probably like one to two months away from where I need them to be in terms of if I had full days to mentor them for a month or two, they'd be an incredible data scientist.
But they're like, I don't have a month or two to mentor them.
So I say no in the interview, right?
And so I'm like, well, I have a month or two.
So maybe what insight is going to be is that month or two where folks are filling in those last piece of the puzzle, learning the sort of cutting edge techniques and sort of tool sets and other things.
and then let's bring those data scientists in the room and have them hire.
And then we just jump right in and I ran the first sessions.
The first session was just me.
First students, so the focus were PhDs.
So that first group was in 2012.
Were they like your friends?
No, no.
I mean, I had to go beyond my friends.
So at first I started talking to my friends in academia.
So, you know, I got confirmation from my friends in academia that you had, I mean,
I already knew that they were looking for jobs and they were excited about transitioning.
I got confirmation from the hiring managers and say,
listen, we're hiring, we can't find folks with the full skill set.
If you bring them into a room, we'll go look at them.
And then the rest was kind of getting the word out and getting applications.
How did you know what to teach them?
Because you mentioned that you didn't know what you didn't know.
Yeah.
I mean, by that time, I had spent like three years figuring it out, including doing YFC and meeting
a much data scientist and building a bunch of data products.
So, you know, by that point, I kind of knew what the pieces were.
But also, really, the program was focused not on me teaching the fellows.
It was focused on me bringing in the sort of,
leading data scientists at the time and having them directly tell them.
So we had, you know, Facebook, LinkedIn, Twitter, Square, all these early data teams in 2012,
their heads of data science come in.
So they're willing to do like one day.
They just couldn't commit like a lot.
Yeah.
Yeah, that's exactly it.
That's exactly it.
They're like, I'll come in for a few hours, but I don't have two months.
And I'm like, well, if I have a bunch of you come in for a few hours, plus, you know,
really have these folks kind of working away for a few months, learning from each other,
learning from these mentors.
Once we had alumni, too, it was incredible.
We had all these alumni coming in to help.
How big was that first class?
It was eight fellows.
And then how many of them did you get them jobs?
All of them, pretty much, yeah.
All of them, yeah.
100%.
Yeah, one went to Facebook, one to Square, one went to LinkedIn, one went to Twitter.
I mean, at the time, these were like, they still are the top data teams, but, I mean,
it was a clear success.
It was super stressful.
I didn't have the model.
I hadn't figured it out.
It was crazy.
I mean, what mistakes did you make?
Like was that first class kind of a shit show?
Oh, of course.
Of course.
So what was the problem?
In the sense that it was the first time I was doing it.
And a lot of it, a career transition is always stressful.
Whenever people are doing insight, they're stressed.
But at least there's a track record there.
And now we have things baked pretty well.
At that time, the overall idea was there.
But a lot of the details weren't there.
And so, and frankly, the track record wasn't there.
So a lot of these folks are like, what have I done?
I'm in a room with this guy who's never.
done this before. Like, so there, there's a lot of stress just around, is this even going to work,
this weird model? What I'm trying to understand is like, but we made it work. What is like, what is
like, like, what did those eight students believe, right? Like, were they desperate or like, were you
great at sales? No, no, I think, I think they're, they're genuinely excited. Like, we, part of the
application process, I got way more applications I expected and when I started it. There was a real
demand to get into the field. I didn't have a track record.
but I basically went around to these universities and said,
I'm going to have the head of data science from Facebook, LinkedIn, Twitter,
all these companies coming in and you're going to meet them.
So the raw shirt made them feel a lot more comfortable.
That's right.
Yeah.
And they were,
and my interview process really centered around how excited you're about this.
So the folks who are like,
I really don't want to do this,
but I need a plan B.
No, thank you.
Right?
It was the people who said to me,
I love my work as a scientist,
but I really want to kind of move,
have a kind of more of an applied impact in the world.
I'm excited but what I'm seeing here.
Here's what I think I can do.
I mean,
that's kind of folks I would take into the program.
Totally makes sense.
Starting off with like qualifying the lead.
It's such a more common technique you're seeing a lot of seraps do now,
like superhuman,
for example.
Yeah.
Heavily qualifying a lead before they'll even let them access to the product.
So that way you're trying to guarantee that like the time I do spend
to move with someone that's like going to have a spectacular experience.
Yeah,
that's why these hire managers wanted to come in.
How did you figure out what?
students were going to be the most excited about this? Like, what do you ask them? Yeah. So, I mean,
I had some opinions, but really what I did is I went to these early heads of data science teams
and said, what do you look for? And what they said is, like, you know, they list off some technical
skills. But, you know, it's kind of like laser, they need to know SQL. They need to know
Python. They need to know. And it's like, I'm like, okay, but what really like would like clinch it
for you? Like they, like, you want this person. And there's two things always. There's like,
they have a side project and their eyes would light up they'd go oh if they had a side project
if you send me a URL oh my god then i know they're excited then i know they're and so that's where
the idea came around for hey this isn't about you know these folks have been through enough classes
it's about actually building it's not actually creating something and proving that i've got
all these great background but now i'm going to do this last piece of the puzzle to show you i can do
something relevant in this area and the second thing that they wanted and i think this is where the
project really shows this, but they wanted overall is just curiosity.
So folks, and I thought that they weren't being serious, to be honest with you.
Because I was like, yeah, you say you want curiosity.
It really you just want somebody's good at like SQL or something, right?
Or good at like machine learning.
And it proved to be true.
The people they were higher would be the ones who were the people who, hey, I studied astrophysics.
But in my spare time, I was like, kind of dabbling with genomics.
And then I got into machine learning on the side.
And then I built this cool for fun project that like, I don't know, predicts like where I should go
a, you know, I don't know, camping or something because I'm a big camp or something.
And then you take a person like that and that's the kind of folks that these teams want,
wanted and still want because these problems are so open-ended.
Well, it's like curious people don't get blocked as much, right?
They're willing to try the things.
And, you know, it's such a new field.
The roles our fellows are getting hired into.
Most of the companies, it's not like we know what we need you to do, just do it.
Yeah.
It's what can we even do here, right?
What kind of impact we have with data?
Again, how did you ask, how did you test for that?
Like, curiosity.
Yeah.
Like the project seems like, okay, that's something we have to shoot for it.
But again, it's like, how do you know that these were the right eight people?
Well, so, you know, a lot of it was trial and error.
I would do like 12 plus interviews a day and kind of, you kind of get to know folks and kind of get to know it.
But I think the main thing, the signal that I saw was it's kind of that example I gave, is that almost, people would be almost apologetic.
They'd be like, listen, what I'm going to tell you is not part of my usual work, but it's on the side.
And it's like, no, no, I want to hear about that.
I remember I had this, one of the fellows came, well, she became a fellow, but she was in an early session.
She was a mathematician at Berkeley.
And she had done all this incredible analysis.
I can't quite remember what.
Like this really cool data analysis project.
I think on like maybe flight times or something in sports.
I can't quite remember.
And partway through the interview, I'm like, but you're mathematician.
Like, don't you do pencil and paper math?
Yeah.
And I had done some math.
She's like, oh, yeah, like, I can't remember what the field was.
And she's like, oh, yeah, this is not even part of my.
And she almost felt kind of apologetic about it.
I was like, this is who I want as a fellow, right?
Brilliant mathematician doing incredible work.
And able to on the side on the weekend, quickly pick up Python, this, that, the other, make something useful.
She went on to what she worked as is, continues to work at Facebook.
She went to Facebook after the program, been super successful.
So it's people like that that you're like, I want you.
So this is related to one of the first.
is related to one of these like overarching questions we had for you. So basically it's like,
how can people get into data science? And then what are the pitfalls for people who say have a PhD,
you know, they know Python? They're like at a higher level than like a coding boot camp person.
Yeah. What are the pitfalls they make when they're trying to bridge that gap and get into a data science role,
provided that they didn't do your program? Yeah, absolutely. And we see it because we, you know,
I started with scientists and now we also have programs for engineers who are transitioning to machine learning,
engineering, deep learning research.
And you sort of see very similar problem on both sides, which is folks are extremely
focused on the sort of technical, let me get the algorithmic knowledge down, let me know
every last algorithm, which of course you need and you need those foundations.
Yeah.
But when you're already dealing with someone who has, you know, been doing a bunch of work for
years in a PhD or in engineering in these areas, what you actually want to see and what
these teams want to see is communication ability.
its ability to understand the underlying like business and product problem, because what they want to do is hire someone who's going to first think about what are we trying to accomplish here.
How can we help our users?
How can we help our company succeed?
And then figure out how do I use my tool set of machine learning or analysis to do it?
And what often happens in this of the pitfall is, you know, part of why you get into it is because you're excited about that analysis.
You're excited about the machine learning.
And so you start always putting that first.
And you're always like, let me tell you the algorithms I can build.
And it's like what folks need to start who are trying to transition into it, need to start thinking about product, need to start thinking about business, need to ask.
Like the skills there is like making them a better salesperson.
And what's interesting about the advice that we give to a lot of people about sales is like it's not about selling your own thing.
It's about understanding their problem.
And then feeding and then feeding whatever you have to them.
And so it seems like for the data science, the same thing needs to happen.
It's not to say, here's all the things I have.
It is like try to figure out what it is that you fit into for them.
Exactly right.
And it's like understand the underlying, forget data, forget machine learning out.
What are we trying to accomplish here?
What's our mission?
What are we trying to do for our users?
And then.
Like making yourself look like the solution, not trying to be like, oh, I have a bunch
of stuff.
Which one of these things are you interested in?
Exactly.
I have a hammer and a screwdriver, like, which I can use all of them.
It's like, what are we trying to build here?
Yeah.
And sometimes that's actually.
a separate role. So for instance, say, like, Facebook might list a data science job.
Whereas some, you know, smaller startup would say, like, we have an engineering role open.
Right. And you might classify yourself as a data scientist. So if you have to pitch data science to a startup, right. How do you do that as an engineer?
Right. This is a great question. So first of all, data science, machine learning, these are all like super broad umbrella terms. It's such a new field.
Yeah. Maybe you should define data science. So, yeah. Maybe what I'll do is define data science. And I think, I think this is this is essentially answering your question.
which is, so what we see in the industry, kind of broadly speaking, broad terms, details,
let's not worry about the details, is sort of, I see kind of three big pieces of how sort of
data science is used. So some data science roles are what I kind of call product analytics or
business analytics roles. The idea there is you're looking for a better understanding.
You're analyzing data about users or a company and trying to understand how to improve it.
Help users succeed, help the business succeed.
The second types of roles that we see are data product roles.
So these are roles where you're actually using machine learning and predictive models
to actually change the user experience and give them something they want right there and then as part of the product.
And the third one is kind of, you know, usually what you hear termed as like AI, which is, you know,
AI roles, machine learning engineering roles where it's not just a feature in the product.
like that's the prediction, it's like the product is machine learning, right?
Like it's like a self-driven car.
Like if that doesn't, if the machine learning doesn't work, like the whole product doesn't
work.
And so you'll have an example of a lot of teams misunderstanding which one, which one they need.
Is there not a category in between where it's like, oh, machine learning like supplements
a feature?
Yeah.
Or augments?
Yeah.
Yeah.
So usually that's where folks talk about data products.
So when they talk about data products, it's often like a feature.
So like the Netflix recommendation.
engine. That's a situation where honestly, if they didn't have machine learning, they could still
just say, like, here are the top movies, go watch them. But with that predictive model,
you're really getting a much better experience. And so we have probably 30 plus fellows working at
Netflix. A lot of them work on that stuff. But some of them work on analytics, which is,
how are people even using this product? What could we add a more product level do to improve it?
And there, the output isn't a feature that the user sees, like a actual algorithm serving
recommendations. There it's like they have to go and communicate with the product team to say, hey,
users seem to want us to be building this sort of product for them. Let's over the next six to
12 months take the product in that direction. It's like a very different role.
Here's an interesting question. So I know like what the dream scenario for a lot of data scientists are.
Like I want to get a job working on these interesting problems. Right. Like what should they
look out for that they should avoid an accompany? Like like what are what is a company who says,
because I think everyone's kind of thinking,
or more than should,
is like looking for a data scientist.
What should a data scientist be worried about it?
It's like, oh, they're not ready to actually hire me.
And if I go here, this will be a bad experience.
So you remember how I said the data scientist needs to know what the actual problem is?
The company needs to know what the actual problem is.
And so the companies you need to be wary about are the ones where it's like,
hey, do you know what?
Just like, I want deep learning.
And it's like, what does that mean?
What do you want us to do here?
And why do you need it?
And the company you want to go to is the one that's got a mission you align with.
You want to see them succeed.
You want to have whatever solution they're bringing to the market, you know, thrive in the world.
And then they have a clear sense of if we add some data analysis to this, if we add machine learning of this, it's going to be better.
And then you can help them get there.
So someone from Twitter asked this.
Chuck Grimmett asked, when do you know or when do you know you need to bring in seasoned data scientists?
So like, is there any kind of benchmark you can offer?
Yeah, I think, so first of all, I think you have to start as a founder, start with the idea.
And you can do this.
I recommend this before you have a data scientist.
Understand, is data sort of critical to my, to building my product?
Or is it something that I'll just add on once it's already working and I need to kind of optimize the experience?
So, you know, an example, an example for sort of something critical is like Amazon Alexa, right?
Like if you're building Alexa, those algorithms, voice recognition alves better work from day one versus a scenario where like, say, you're on the analytics team at Airbnb and you already have a lot of users and you're just trying to optimize that experience, right?
And so for a startup, figure that out first.
And then if you need one from day one, hire one from day one.
If you get a machine learning engineer in the door who really, that's their forte, you're going to be better set up for success.
instead of trying to sort of, you know, kind of hack it and then have to kind of catch up later.
Because often you don't know what you don't know.
And you might not be tracking the right data or you're not sort of setting things up, your infrastructure in a way that's going to help you scale later.
And then those, especially in products where machine learning is critical, that becomes challenging.
One thing I recommend to startups actually is talk to just talk to folks in the industry and frankly get an advisor.
right? If you're not ready to hire a data scientist yet, at least maybe think about getting
a data science advisor because they're going to be able to sit down. Where do you find those?
Yeah, good question. So I'm trying to understand like who gives that information for free?
Email me. Yeah. No, I mean, you'd be surprised a lot of, I mean, maybe some of the top folks
who started the data science team at like LinkedIn, you know, that's, that's hard to get an advisor.
But, you know, I think even any sort of data scientists who've been in the field who knows what
they're doing, we'll be able to sit with a founder and say, listen, you're probably going to
want to instrument these features to collect this data because you're going to want to analyze this
later. Or here's the type of work you want done probably down the road.
You want someone to help you understand how to lay the groundwork to actually do that higher.
You guys started off with eight students in that first class. Can you talk about where it is
right now? How many students are you processing now? And then also, like, what is different about
the curriculum and program? Yeah. Yeah. So it's definitely scaled up a bit since then.
We're now in five cities, so San Francisco, New York, Boston, Seattle, Toronto.
My hometown just launched it this year, which is fun.
And we've got a bunch of different specializations now.
So data science is one, data engineering, health data, AI.
We're even sort of doing product management now helping product managers transition to AI.
So overall, we do three sessions a year.
It's like almost like you have different classes depending on where you're starting on.
On the specializations, yeah.
Because the field specialized, right?
It used to be like you just hire a data scientist who you hope will take care of everything.
And now you want folks who are building infrastructure, the data engineers.
You want the data scientists who are sort of building the early prototypes and figuring out what to build.
And then more often than not now, you need machine learning engineers to really kind of put that into production now.
And so you see these different specializations.
And we essentially have a program for each.
So the data science programs for PhDs because that sort of scientific experience is critical.
The AI program, for instance, is four PhDs, predominantly for engineers, actually, who are going to machine learning engineering role.
Then how big are these classes?
So overall, across all the cities and programs, we're at about 300, oh, just over 300 fellows per session now.
But each program is small.
So we keep it sort of maximum 20 to 30, 35 fellows.
And because the idea is each one of those sub programs.
That's right.
Each program is in each location because you want that, the collaboration is critical.
You want that group to sort of gel.
everybody's working on a project.
You want people kind of tapping each other on the shoulder, asking for help.
You want that alumni who's coming in to be able to kind of sit with the fellows.
How long is like the class?
And the small groups really are critical for that.
How long is the class?
Seven weeks.
And then super fast.
So what gets done in seven weeks?
Yeah.
So it's pretty incredible how fast people learn and what they build.
So literally, you know, they'll go from in week one trying to come up with the idea or partnering with a startup.
So often fellows work with startups.
We have a partnership with YC.
So right from the get go they start with a project.
Well, week once, figure out what project.
So like your first week is like, should I come up with something on my own and build it based on advice I'm getting from our alumni, from our mentors, our team.
Or should I go partner with a YC startup who's got a data challenge that they want solved.
And so that's step one is figure out what you're building.
And again, figure out what problem you're building.
In the next couple of weeks, you better build it fast.
So folks have to go from literally nothing to like an MVP in a week or two.
And then they're out presenting those projects in a few weeks time.
They're working individually because they're trying to show that they're able to kind of execute end-to-end on a real world problem.
But it's incredibly collaborative.
So if you come to insight, it's like it doesn't look like a classroom.
It looks like kind of like a startup office and everybody just kind of at desks sitting together.
And people are on whiteboards.
They're talking to each other, helping each other.
because you know you encounter the same problems technical otherwise and it's that
collaborative aspect that allows people to move super fast and learn a ton and if you're in the
program or you're just checking out the program maybe applying for jobs like this yeah
what are the types of projects that you recommend avoiding you know things people have seen
a hundred times before yeah I recommend uh like are people happy on Twitter is like that's
maybe done that's a bad example I'll get a general general example because there's people
have been doing like this uh uh the more kind of i think useful example is make something useful
right so i think it's really easy to just be like i took this algorithm that used to operate at 99.1
and now accuracy and now i'm going to make it 92.3 like you know and i don't know why but like it's
better now right or uh what you see scientists sometimes do is this very generic like i studied um you know
i'll give an example of a project i love that i felt
come up. So here's the bad version. Here's the version. Someone did at Insight, but you can do
it at, you know, do this at home. So the bad version, so let's say the topic is solar panels.
You want to understand solar panel usage and really, uh, enable people to adopt solar panels.
Bad project is I analyze general trends about solar panel usage in California, right?
It's like look at this interesting fact I found about like, okay, whatever, right? Maybe for an analyst
report that's interesting, but not for actually getting anything done. To me, it's like it has no call to
action.
Exactly.
Like you want it to be almost opinionated because that way a business knows it's like,
oh,
I can look at this,
no,
know what to do.
That's exactly right.
I think the bad projects are the ones that feels like,
oh,
now I have homework.
That's right.
Oh,
here's some information you may have to,
like,
and it's actually the problem I have with a lot of like analytic startup.
It's like,
all you do is like just tell me that I don't know anything.
Yeah.
But now I still don't know what to do.
So I've paid to be told to figure stuff out or that feels dumb.
Exactly.
And so the good version of this project,
which is a fellow did and is one of my favorites is,
I'm a homeowner.
should I buy solar or not?
Will solar be profitable on my roof?
Okay, that's a hard problem.
What's the weather like?
What's the, I mean, a ton of different factors.
Press, there's some predictive aspects of all that.
All this, the fellow took all this data, synthesized it, build a predictive model.
I come in, I type in my address.
It tells me whether I should buy solar.
Oh, they basically build a product.
Yeah.
Yeah.
So all these projects are very product focused.
They're so product focused that sometimes companies are like,
why are you showing us products when like we just want data scientists?
And the answer is because that demonstrates that people can think product wise.
And they end up loving it because they sort of abstractly don't understand why they're showing us products.
But but people gravitate to real solutions.
And then well, they hire the fellows.
So this is related to something we talked about the other day, which is like in the future,
are more data scientists going to become founders?
Or is that like personality, that mentality?
Like is that best suited within a big company?
Yeah, the thing I...
Absolutely, I think...
Oh, really?
Yeah.
So this is not going to be a case, like, designers who, like, for some reason, designers don't
tend to become founders.
You know, we'll see how it shapes up in terms of, like, is it going to be on mass data
scientists?
But certainly, I would say probably about a quarter of every fellow's program I see, like, raise
their hand when they say they want to start a company in the next five years.
Oh, shit.
I mean to get my ass out there.
Yeah.
Yeah.
And so, and so I think that's going to be a bit.
big thing. We've already seen some of our alumni start companies, although again, it's early.
In the early days, we have very few fellows. But Diana Wu was in one of the early sessions,
started Trace Genomics, a genomics company, which uses genomic data to tell farmers when to plant,
when not to plant. Super interesting. Not an alumni, but like an early mentor, Ben Kamens,
used to be the kind of founding engineer at Khan Academy. And he hired a,
one of our fellows, Lauren, who is a physicist, she went there, helped them sort of, you know,
help impact a bunch of, hopefully impact a bunch of kids' lives by, like, helping them learn
faster because they really have millions of data points of data on how people learn.
And she was there for a few years with Ben helping with education.
And now Ben went off.
I mentioned Ben because he's very much kind of a data scientist at heart.
And although his founder, you know, his title is officially CTO, he went off and founded
It's spring discovery now.
They're doing sort of helping aging-related diseases using machine learning to do that.
Lauren went over there with him, sort of part of the founding team.
And so, you know, again, TBD in terms of like what the stats are going to be in terms of founders,
but that founder spirit is there.
And the skill set is so useful.
I mean, that's the thing.
Like regardless, having an understanding of product is like the pickle.
Absolutely.
That's what you can go for.
Absolutely.
Because whether your employee, whether your founder or employee 10 or 100 or frankly,
a thousand you better you better know what what and so do you teach that as well oh yeah it's one of the
biggest things i mean how do you teach it uh you know i found the only way he teaches by doing yeah
so you say like build a product and uh then they don't they give you a graph that shows you interesting
things and you say no no like no like and you iterate you just iterate i mean that's the learning
experience you do it wrong and then you iterate and you fix it and get better and so the model at insight
is really just continual feedback so uh
If at the end of the program, I tell you, like, no, that's wrong.
Then that's a bad learning experience.
But at Insight, you'll be told, like, half a day in that that's, like, not the way to go.
And by the, like, next half day, you'll be closer there.
And by the first week, you'll hopefully be on a good path to building a cool product.
So that's that fast iteration of you go.
Cool.
I think one of the things that ends up being a problem for a lot of startups or for even people getting into the data science field is, like, they're encountering very dirty data.
And so now a lot of time actually is like, this is not like, oh, I'm solving cool problems and making products.
It's like, oh, I'm just sitting here cleaning up this day just so I can get to this point.
And so I'm trying to figure out is like, is this something that data scientists need to be aware of that you're just going to walk into this?
Or something like startups need to start thinking about and like what can they do to like prevent that?
Both.
But I think you can never avoid it.
So it really is the data scientist's job to be prepared for that to be to do well at that.
And that's what is that?
What is the ratio of the job?
of like cleaning versus like there's this joke that like 90% of the job is data cleaning.
I don't know if it's 90, but it's a lot.
And it's not just data clean because data cleaning sounds kind of lame like you're just
kind of cleaning things up.
Yeah.
It's, I think more interesting than that.
It's like literally like what data even makes sense to get here.
It's not obviously.
In advance, you think it's obvious.
You're like, oh, just throw some data in what data?
Of what?
And how can you combine that data?
And what does it mean to have clean, relevant data?
And do you have an example?
That's a skill set.
Well, you know, I'll have an example around the founder side, right?
So I think founders often make the sort of assumption that they're tracking all the right things.
And then we've had many experiences where, you know, we'll talk to a founder.
If I was going to work with like a founder and they'll say, yeah, we got all the data.
We got everything.
Big data, big data.
Yeah.
It's all, it's always big data.
It's the best data.
And then, you know, and then you open it up.
And it's like, oh, shit, they didn't track user logins, like which user was logging in.
They're tracking all the movements on the site, but not which movement, which user was doing
that and at what time stamp.
And again, it's like, oh, my God.
Like all this data is borderline unusable because we can't kind of peg it to specific
behavior and model that behavior.
And, you know, when you're looking at from the data perspective, it sounds like hilarious.
Like, why don't you track users?
But you know what?
I'm a founder.
I know when you're a founder, you're thinking about a million different things.
You have a million different tradeoffs.
And honestly, like, yeah, the loggings turn on.
Like, let's go, right?
Let's build.
And then a year later, you're regretting that.
So again, I think a lesson learned for sure.
That's why it's like, hey, have a coffee with the data scientist.
Like, maybe all you'll get up from it is like log your user logins.
But that might be enough.
And then a year later, you can get started with a data scientist.
What's the best tools that people should do for tracking data or like, is there a product that startup should use just right?
to get that you know that if they do this we're just going to start on a good so you know honestly
i i saw some of the questions on twitter and i've you know folks always asked about tools so i was
actually asking around some of my team like hey like what's the latest on this and uh there are
great tools i think for just sort of like basic analytic tracking of like websites but if you're
really building products like it's still to this day we see the teams roll around um because
there's so much um there's so much such a
disappointing answer. It is a disappointing answer. And I think, you know, listen, there are companies
working on it, some YC companies, and they're slowly progressing up to more sophisticated,
sort of data products. But at the end of the day, if your lifeblood is a very specific product
that does something very specific, like there's like nothing beats just having somebody
very thoughtfully say, what do we actually care about tracking here? Okay. So the,
so stepping back then, yeah, like assuming there, there is no easy answer then,
you're a founder.
You just started your thing.
Can you give me like five or ten things that I should be tracking?
Well, I mean, it really depends on the company, right?
Sure.
Okay, fine.
So I think the number one thing you have to think about as a founder is actually not even what you're tracking.
Because honestly, if you think about this first thing, right, I think that'll become more obvious.
Yeah.
The first thing you got to think about and think about it right is what are you actually trying to optimize?
What's the one or two metrics you actually care about?
What if you're thinking about machine learning and building predictive models like say you had a magic machine learning model that like did whatever you want, but you only had one or two.
Which problem in your company would you would you apply it to?
Because I think what what I see folks do is, oh, I know my business in and out.
And so I know my metric is this, this, this, this, this, this, this and this.
And then, oh, machine learning will build this, this, this, this, this.
And you know what?
You might at some point down the road.
but initially you're going to have to focus.
And if you don't have that focus,
that's where you get into this habit of,
I'll just track everything or nothing.
Whereas if you know what you're trying to optimize is,
let's say I'm Netflix.
Yeah.
What am I going to start tracking?
I mean, you're,
you obviously want to see how long people are watching the video,
how far they get in that video.
One of the teams there,
less obvious is people are using different devices on different bandwidth.
So they track, I mean, they test this stuff and track it on all sorts of different, different machines.
So again, like if in a generic tool, would you have a situation where you're testing like a stream on a hundred different devices and a hundred?
No, you wouldn't because like if that's not a quarter of your business, why would you ever do that?
Right.
But if you're Netflix, you better be doing that.
And because you know that user experience is the key, right?
Con Academy is something different, right?
For Con Academy, it's like, you know, maybe it's the amount of time.
kids are spending on a question and that's telling you something about whether they're learning
where on another site it's like you don't really care about the timing you just care about the flow
can i just simplify that so it's so for us like for any startup in most companies it's like always like
my goal is growth yeah and and for us at yc we've actually pretty much simplified it where it's just
like look for the most part your kPI that your company is actually interested in driving it's
either going to be revenue right and that's like 99% of the company yeah and for some like consumer
very difficult play is like I'm going after engagement.
Like daily active users is the ideal.
Sometimes it's weekly active users.
That's just the nature of the product.
And so to me it's just like, okay, what drives those two things?
I really just like, only two numbers.
It's like conversion and then like churn.
And so I imagine like most questions fall into those two categories.
Like what increases conversion for revenue and what reduces churn for revenue and the same thing for like engagement?
So maybe I'll speak directly to those.
those because now you're kind of zeroing in on certain types of companies.
And so for churn, we often have fellows built churn prediction models for startups.
So again, they're customized because there's, I mean, churn, churn for what, what's happening?
Yep.
But when we're talking about churn, it's a customer deciding to stop using the product.
And if we can predict that ahead of time, then they're able to intervene, maybe offer a discount,
maybe engage that user, get feedback.
So those are top of the list.
And for conversion, experimentation is the key.
It's like these experimentation frameworks.
I always feel like a lot of times startups, especially early ones, they neglect that whole
churn question.
Because I always tell them as like, look, you're obsessed about conversion.
Right.
Because you're in sales mode and trying to bring them in.
But I always feel like it's very expensive.
And I feel like improving churn, like improving churn by the same percentage, that's exactly
the same thing for coverage.
But it's way easier, cheaper, et cetera.
And so is that usually what the first projects that startups and companies should be looking at
if they haven't at all?
Absolutely agree.
And you know, you know, one thing I'll add about churn is it's often more reflective of what is actually working or not working, right?
It's like makes something people want.
It's like if you improve churn, that means you're truly understanding what the user wants.
Maybe you can get them to sign up or convert just by sort of, you know, having a flashy sales pitch.
But churn really you understand it.
And then that's where the exploratory data analysis comes in.
Do you really understand what your user is doing?
That's where the AB testing and often what's called like multi-arm bandit testing where you're trying various different.
experiments at once. That's where you're predicting churn and then trying to intervene to
help the customer. But you see what I'm saying? It's like it's like a number of different
things, all of which are grounded in. Do I understand what my user wants and my building
building to what they really care about? I think the other big trend that you're having people
sort of obsessed with metrics wise is like cohorts and like retention curves over time. And so what do
you do like the best things people should do? Like yes, just understanding and knowing it. Like that's
sometimes really difficult. But in terms of improving that, like where does data science usually
help? Right. I mean, I think it's coming back to churn, right? Because if you're, if you're seeing
folks drop off at month three in like your early cohorts, I mean, I mean, that's a churn problem
right there. So yeah, I think it goes back to churn. A lot of those sort of dashboards are, you know,
are, you know, there are great tools for those. So certainly like when I started people with like
hand code like cohort analyses, now there's a bunch of tools for that. So I'm not saying, certainly I think
in the metrics sort of dashboard domain, there's a lot.
lot of solutions. When I was saying that there isn't really a ready-made solution, it's more,
it's more that stuff that's, that's kind of, where it's, you're actually building models to
improve the product in a very sort of deep way. Do you guys have a favorite for like, because you
said like good startups have good problems. Are you waiting for a sponsorship? I'm trying to
understand. What do you mean? For like some tool to pay you money to say what's great? Honestly, at
insight, almost everybody just used open source, right? So everybody's building Python, you know,
yeah, it's all the open source.
And because that's actually what's we're seeing reflected in the industry.
So if you go to a top data science team, by far and away,
the vast majority of what they're using and building on is open source.
What are those projects?
Like, I think Python has definitely, it used to be like Python and R.
They're still building it themselves.
Yeah.
Yeah, absolutely.
Yeah.
Right.
And then they just use like what?
Jupiter notebooks, stuff like that.
For prototyping.
And then you got to then start building.
And then you roll your own.
And frankly, at that point, as soon as you get away, as soon as you get past the prototyping
stage, you're really just building product, right?
It's the same thing in engineering team does a startup, right?
It's like, what tools are they using to build the fundamental product?
And that's where you're living.
Those data scientists are often embedded with the team building directly.
Who makes the best data science?
Like, from what field have you noticed?
We're like, oh, it's much better that they come from this field.
What's kind of been shocking and amazing.
I want to know who your favorite children are.
What's been, I was, early days I was accused of, you know, I'm from physics.
So, but now, you know, there's, we, we have fellows from all of different backgrounds.
So, so they all succeed.
No, I mean, I think that's been the, uh, the shocking thing is like how different the backgrounds are.
We have a fellow in this, uh, session who's in, as an archaeology PhD.
Uh, we had a fellow, a session ago.
It was like an engineer at like, uh, SpaceX, right?
Like, we had a, imagine each of them have, so we have certain kinds of problems.
Like, you have, like, a mathematician.
That's exactly right.
Going to get the math, understand the product and so selling themselves and understanding
problems probably going to be a challenge.
Exactly.
So often you'll find like a mathematician is great, for instance, I've made great data engineers
because they think about large-scale systems and how can they fail.
I mean, in math is logic systems, but then they kind of transfer that sort of mode of thinking
to data infrastructure.
But someone like, for instance, psychology was one that like in the early days, I didn't
really have a network in kind of psychology or neuroscience.
So we did a lot of work to try to kind of put the word out there.
We found social scientists are incredible data scientists quite often because they know how to
ask the right question.
Yeah, and they know how to think about people.
And ultimately, you know, obviously data is branching out.
But most of the time, when you're talking about users, you're talking about customers, it's people, right?
And so fantastic data scientists from those fields.
But it's just one of my favorite parts of my job, actually, is the fact that I'll sit at
lunch or a happy hour or just hang out at the office.
And it's like an astrophysicist with a psychologist with a software engineer with a, you know, electrical engineer.
And they're all kind of working and collaborating.
And it's just incredible, incredible kind of environment to be around all these different people.
You have all these companies coming in talking with all your students during the program.
And they're usually coming like with a problem?
Or are they just talking about here's the kind of problems we work on and solve?
Because they're kind of doing a little bit of recruiting in addition to giving an understanding.
Oh, absolutely.
Yeah.
Yeah.
And who's been really great at that?
We have, I mean.
Or what do they do?
That's really great.
There's a bunch of teams.
You know, listen, I think, so the way the program works is fellows will often work
with a startup company on a project.
But most of the, most of the interactions of the fellows have with companies is actually
companies coming in who try to hire them, right?
And when I say companies, I mean like the actual technical data team coming in talking
about what they work on.
Yeah.
And, but, you know, and trying to hire them.
And so the teams that do really well,
listen,
obviously the ones with great brands,
the Airbnbs,
the lifts,
the Uber's,
the Facebooks,
I want to know what
the little guys have to do
to compete with them.
But this is what I found
is when startups come in,
what often happens is,
fellows come in,
what's this startup?
I don't heard about it.
Why don't I have to go to this?
And they come out,
they're like,
this is my dream job.
I want to work at this company.
And I started trying to figure out
what certain startups did to do that.
And what it really boils down to
is impact.
The startups that do well
recruiting data scientists,
make the pitch, you are critical to our success.
If we, if, if, oh, they made it.
It looks like they're going to be all stars.
And, and, and they're telling the truth because there, a lot of companies these days,
frankly, if the machine learning or if the analytics doesn't work, like the company will
fail.
Like, that's what they're pitching to.
Well, also when there's one of you versus 300 of you.
Right.
Well, that's a personality thing, right?
Some people are excited about, I'm going to be the first data scientist.
And some people are like, I want, I want, I want some mentorship.
I need a little motor on me.
Yeah.
Yeah.
But when it comes to, like, I've never heard of this company before.
And then an hour later, like, oh, my God, I want to work for them.
It's always the impact piece.
It's always the like, if you come here, what you do will matter in a big way.
And obviously there's the technical piece that you're going to work on cool stuff.
Yeah.
But I thought the technical piece would be the biggest one.
But the biggest one actually is the impact for sure.
So one thing we haven't talked about, and I actually don't know if you have an opinion on this, is contracting.
So for an average startup, say they're like a couple of,
years and they're like, I don't know if we really have a need for this, but we have all this data.
Maybe we could put it to use.
Do you see people like doing two month contracts and getting a system up and then just
letting it go?
What happens?
Yeah, I think contracting is good for prototyping.
So we see a lot of like when I'm saying YC startups work with our fellows.
That's essentially, it's a pro bono consulting, but they're working with them for, you know,
the program and helping to deliver some results.
And where that works really well is, you know, often it is integrated, but it's at this
sort of prototyping stage.
Will this even work?
Or I've got a model.
Will this one work better if we try this?
So let me give you an example of one I really like recently,
a fellow work with ISO-No Health, YC startup.
Really amazing product does like sort of in-home breast cancer screening.
So it's a device instead of going once a year to get screen for breast cancer.
If you're in high risk, you can do it at home.
Secondly is leading cause of cancer death and women.
So huge impact potentially, life-saving technology.
And obviously a big part of that is, can we, do we have the right algorithms to detect and notify a user that, hey, you need to go speak to your doctor or notify a doctor?
Obviously, a doctor does the final thing, but is there something abnormal here that we need to be taking a closer look at?
They had algorithms that were working great and doing well for them, especially at that stage of, hey, let's just bring it to a doctor to be safe.
But they were curious about, hey, are some of these more like these newest sort of deep learning algorithms that are just coming out in the papers?
Are they going to do better for us?
So a fellow did that.
They took the data and essentially used some brand new sort of convolutional neural network techniques that had just kind of been published and got better results for them that were almost on par with sort of expert radiologists.
And so I mean, that's that's awesome, right?
And so, of course, that team then has to do some more work to implement it.
But that's an example where I think consulting works is like, is this going to work?
Is this feasible?
Is it a prototype?
Anytime you actually, I think then kind of, anytime that becomes a part of your product, you need a team.
Right.
Because it's never static.
Something's going to evolve and change and you need to be able to evolve it.
It's just like asking, like, can a startup just have like contract software engineers overseas?
It's like, well, maybe to prototype something, but in general, probably the answer is no, because that product's going to keep evolving every month, every year.
And you need folks on the staff to do it.
Makes sense to me.
Great.
Cool.
So I think one last thing I wanted to talk about was just areas you're excited about in particular.
Yeah.
We mentioned health the other day.
Yeah.
But yeah, what's exciting to you right now in the field?
No, I mean, there's a bunch of stuff that's excited me, but health is the top one that I'm pumped about because it,
I mean, the impact's there, right?
Like the example I just shared with you, I mean, early detection, disease monitoring.
I mean, you literally saving people's lives as this stuff works.
And what's interesting is people have been talking about the impact in data science,
machine learning, and health for years because, you know, you start thinking about this stuff
and pretty quickly you're like, oh, this can make an impact.
But, you know, actually getting it to work is tough.
And only I think in the last few years, we've been seeing a lot of teams actually making
really amazing progress there. I'll give you an example I love of like the impact here.
Memorial Sloan Kettering Cancer Hospital in New York has hired out a team of data science data
engineers from us over the last few years. And what they do is they build essentially data
products but that are used internally by their doctors. So these are cancer doctors.
It's really tough situations and they're faced with a situation of what clinical trial do I recommend
to my patient.
And there's thousands of clinical trials.
And there's new ones coming online every day.
Which one do you suggest?
And so they're building these kind of data products where the doctor gets based on the specific
personalized, you know, whether it's genomic or clinical factors.
Hey, you should at least think about these new clinical trials that are coming online.
And again, the doctor makes a final decision.
But it's, hey, maybe one of those trials they hadn't heard about now to save that patient's life.
Right.
And it's, it's, it's a hospital that's a hospital.
Right.
And then soon thereafter, New York Presbyterian hired a fellow.
And then Mount Sinai hired a fellow.
And now pharma companies are hiring fellows.
And like, it's really fascinating to see data broadened out when companies realize that they can just take it beyond.
Oh, I want to optimize this like business and efficiency and really think what can I create that's going to add incredible value.
And so health is one I'm excited.
but there's a ton more out there.
That's super cool.
All right.
Well, thanks for coming in.
Thanks so much.
Yeah, thanks to great.
All right.
Thanks for listening.
So as always, you can find the transcript and the video at blog.
Dot Ycombinator.com.
And if you have a second, it would be awesome to give us a rating and review wherever you find your podcast.
See you next time.
