Y Combinator Startup Podcast - #111 - Jake Klamka and Kevin Hale

Episode Date: February 6, 2019

Jake Klamka founded Insight. Insight provides intensive 7 week professional training fellowships in fields such as data science and data engineering. Insight was in the YC Winter 2011 batch.Kevin Hal...e is a Visiting Partner at YC. Before YC Kevin was the cofounder of Wufoo, which was funded by YC in 2006 and acquired by SurveyMonkey in 2011.You can find Jake on Twitter at @jakeklamka and Kevin at @ilikevests.The YC podcast is hosted by Craig Cannon.***Topics00:37 - Kevin's intro01:07 - Jake's intro1:42 - Applying to YC with one product then changing it4:07 - How Insight started4:57 - Jake's first students and initial coursework8:37 - Finding out what companies want from data scientists10:37 - Picking the first class of students12:07 - Common pitfalls for people transitioning into data science15:07 - Types of data science roles17:22 - What data scientists should look out for in companies18:17 - Chuck Grimmett asks - When do you know you need to bring in seasoned data scientists?20:37 - How Insight has scaled and changed22:37 - What happens in the program23:57 - Examples of a good project for a data science resume26:27 - Will more data scientists be founders in the future?28:37 - Teaching product29:37 - Cleaning data32:07 - Tools for tracking data32:57 - Track what are you trying to optimize35:57 - Churn and conversion39:37 - Is there an ideal background for a data scientist?41:37 - Which startups recruit well at Insight?43:37 - Contracting46:17 - Fields Jake is excited about

Transcript
Discussion (0)
Starting point is 00:00:00 Hey, how's it going? This is Craig Cannon, and you're listening to Y Combinators podcast. Today's episode is with Jake Clampka and Kevin Hale. Jake founded Insight. Insight provides intensive seven-week professional training fellowships and fields such as data science and data engineering. Insight was in the YC 2011 batch. Kevin's a visiting partner at YC. Before YC, Kevin was a co-founder of Wufu, which was funded by YC in 2006 and acquired by SurveyMonkey in 2011. You can find Jake on Twitter at Jake Clampka and Kevin at I like best. All right, here we go. So, Kevin, for those of our listeners that don't know who you are, what's your deal?
Starting point is 00:00:41 I'm a partner here at Wycombinator. I actually was in the second ever bachelor. I was in winter 2006, and I founded a company called Woofu. We ran that for five years, and then we were acquired by Servie Monkey, and that moved us from Florida to California, and that's when PG asked if I'd be interested in helping out at WhiteFoo. I see. And I've been there pretty much ever since. Yeah. And you suggested Jake as a guest for this episode. So Jake, what do you do?
Starting point is 00:01:06 So I'm the founder and CEO of Insight. So Insight is an education and company. We run a fellows programs that help scientists and engineers transition to careers in data science and AI. And it's a pretty unique model because they're completely free of these fellowships. They're full time. The company sort of fund the process. Engineer, scientists, build projects for seven weeks. They meet top data teams and they get hired on those teams. We've got over 2,000 Insight alumni working as data scientists now across the US and Canada. Nice.
Starting point is 00:01:35 And you always haven't been working on this. So you applied to YC for the winner 2011 batch. That's right. Yeah. And what was your idea then? So I was back in, so I started my career and this is relevant to why I started insight because I basically started, I wish I existed when I was around. I was a physicist. Yeah.
Starting point is 00:01:52 At the university, Toronto. I thought it was going to be a scientist for the last of my life. And then partway through my PhD, I realized I want to go into technology. and I think to myself, I'm writing code, I'm building machine learning models, this is great, I've got what I need. And it frankly took me a long time to transition. Eventually got a Wai Combinator, came down here from the Winter 11 batch. I was building a bunch of time mobile sort of productivity apps that are machine learning enabled. And didn't quite get the up into the right graph that you would hope for after YC.
Starting point is 00:02:24 But it was an incredible experience. And, you know, in those sort of, in that sort of late 2011 after, called six, 12 months after YC was searching for a new idea. And actually went, spoke with Paul Graham and a few other advisors. And the recommendation was work on a problem you yourself have. You're kind of building these apps that, you know, you're trying to use these machine learning models. And hopefully somebody's got that as a problem. But flip it around. Start with a problem you've had, then figure out what the solution is.
Starting point is 00:02:52 And when I reflected on it, it took me a few years to really make. this transition. I've been so close all along, but I didn't know product. I wasn't really connected in the valley. There's a bunch of, you know, technically I had the fundamentals, but a lot of the tool sets were different in the industry. So I didn't know what I didn't know. And when I got down here and I started talking to people, that's when I finally started figuring it out. And I was seeing a lot of my friends having that same struggle. So brilliant, you know, mathematicians, neuroscientists, biologists, also engineers later, we found the same thing, kind of getting stuck. And like, they're like, I want to go into data science.
Starting point is 00:03:26 I want to go into AI. I want to go into these cutting edge fields. But, you know, it doesn't say the right thing on my resume or I'm kind of like, you know, just getting that last mile is really hard to, to cross. And I thought, okay, well, this is the problem I want to solve because these are some of the most brilliant people I'd ever work with. A lot of them were my former colleagues from physics. And I thought, what does the solution for this look like?
Starting point is 00:03:48 And at first, I was focused on it's going to be an app again, right? It's some machine learning enabled app. And then I realize now it actually probably looks more like an in-person program where folks are getting together, building cool projects and then getting started from there. And so did you just go ahead and teach a class? Yeah. So I basically, you know, started talking. First, I talked to companies and I said, listen, I've got these brilliant friends coming out of academia who I think you should be hiring. Why aren't you hiring them?
Starting point is 00:04:16 And basically what they told me is I know they're brilliant. I know they got all these great skills. but they're probably like one to two months away from where I need them to be in terms of if I had full days to mentor them for a month or two, they'd be an incredible data scientist. But they're like, I don't have a month or two to mentor them. So I say no in the interview, right? And so I'm like, well, I have a month or two. So maybe what insight is going to be is that month or two where folks are filling in those last piece of the puzzle, learning the sort of cutting edge techniques and sort of tool sets and other things. and then let's bring those data scientists in the room and have them hire.
Starting point is 00:04:53 And then we just jump right in and I ran the first sessions. The first session was just me. First students, so the focus were PhDs. So that first group was in 2012. Were they like your friends? No, no. I mean, I had to go beyond my friends. So at first I started talking to my friends in academia.
Starting point is 00:05:10 So, you know, I got confirmation from my friends in academia that you had, I mean, I already knew that they were looking for jobs and they were excited about transitioning. I got confirmation from the hiring managers and say, listen, we're hiring, we can't find folks with the full skill set. If you bring them into a room, we'll go look at them. And then the rest was kind of getting the word out and getting applications. How did you know what to teach them? Because you mentioned that you didn't know what you didn't know.
Starting point is 00:05:33 Yeah. I mean, by that time, I had spent like three years figuring it out, including doing YFC and meeting a much data scientist and building a bunch of data products. So, you know, by that point, I kind of knew what the pieces were. But also, really, the program was focused not on me teaching the fellows. It was focused on me bringing in the sort of, leading data scientists at the time and having them directly tell them. So we had, you know, Facebook, LinkedIn, Twitter, Square, all these early data teams in 2012,
Starting point is 00:05:58 their heads of data science come in. So they're willing to do like one day. They just couldn't commit like a lot. Yeah. Yeah, that's exactly it. That's exactly it. They're like, I'll come in for a few hours, but I don't have two months. And I'm like, well, if I have a bunch of you come in for a few hours, plus, you know,
Starting point is 00:06:13 really have these folks kind of working away for a few months, learning from each other, learning from these mentors. Once we had alumni, too, it was incredible. We had all these alumni coming in to help. How big was that first class? It was eight fellows. And then how many of them did you get them jobs? All of them, pretty much, yeah.
Starting point is 00:06:29 All of them, yeah. 100%. Yeah, one went to Facebook, one to Square, one went to LinkedIn, one went to Twitter. I mean, at the time, these were like, they still are the top data teams, but, I mean, it was a clear success. It was super stressful. I didn't have the model. I hadn't figured it out.
Starting point is 00:06:42 It was crazy. I mean, what mistakes did you make? Like was that first class kind of a shit show? Oh, of course. Of course. So what was the problem? In the sense that it was the first time I was doing it. And a lot of it, a career transition is always stressful.
Starting point is 00:06:56 Whenever people are doing insight, they're stressed. But at least there's a track record there. And now we have things baked pretty well. At that time, the overall idea was there. But a lot of the details weren't there. And so, and frankly, the track record wasn't there. So a lot of these folks are like, what have I done? I'm in a room with this guy who's never.
Starting point is 00:07:15 done this before. Like, so there, there's a lot of stress just around, is this even going to work, this weird model? What I'm trying to understand is like, but we made it work. What is like, what is like, like, what did those eight students believe, right? Like, were they desperate or like, were you great at sales? No, no, I think, I think they're, they're genuinely excited. Like, we, part of the application process, I got way more applications I expected and when I started it. There was a real demand to get into the field. I didn't have a track record. but I basically went around to these universities and said, I'm going to have the head of data science from Facebook, LinkedIn, Twitter,
Starting point is 00:07:52 all these companies coming in and you're going to meet them. So the raw shirt made them feel a lot more comfortable. That's right. Yeah. And they were, and my interview process really centered around how excited you're about this. So the folks who are like, I really don't want to do this,
Starting point is 00:08:05 but I need a plan B. No, thank you. Right? It was the people who said to me, I love my work as a scientist, but I really want to kind of move, have a kind of more of an applied impact in the world. I'm excited but what I'm seeing here.
Starting point is 00:08:18 Here's what I think I can do. I mean, that's kind of folks I would take into the program. Totally makes sense. Starting off with like qualifying the lead. It's such a more common technique you're seeing a lot of seraps do now, like superhuman, for example.
Starting point is 00:08:30 Yeah. Heavily qualifying a lead before they'll even let them access to the product. So that way you're trying to guarantee that like the time I do spend to move with someone that's like going to have a spectacular experience. Yeah, that's why these hire managers wanted to come in. How did you figure out what? students were going to be the most excited about this? Like, what do you ask them? Yeah. So, I mean,
Starting point is 00:08:49 I had some opinions, but really what I did is I went to these early heads of data science teams and said, what do you look for? And what they said is, like, you know, they list off some technical skills. But, you know, it's kind of like laser, they need to know SQL. They need to know Python. They need to know. And it's like, I'm like, okay, but what really like would like clinch it for you? Like they, like, you want this person. And there's two things always. There's like, they have a side project and their eyes would light up they'd go oh if they had a side project if you send me a URL oh my god then i know they're excited then i know they're and so that's where the idea came around for hey this isn't about you know these folks have been through enough classes
Starting point is 00:09:29 it's about actually building it's not actually creating something and proving that i've got all these great background but now i'm going to do this last piece of the puzzle to show you i can do something relevant in this area and the second thing that they wanted and i think this is where the project really shows this, but they wanted overall is just curiosity. So folks, and I thought that they weren't being serious, to be honest with you. Because I was like, yeah, you say you want curiosity. It really you just want somebody's good at like SQL or something, right? Or good at like machine learning.
Starting point is 00:09:55 And it proved to be true. The people they were higher would be the ones who were the people who, hey, I studied astrophysics. But in my spare time, I was like, kind of dabbling with genomics. And then I got into machine learning on the side. And then I built this cool for fun project that like, I don't know, predicts like where I should go a, you know, I don't know, camping or something because I'm a big camp or something. And then you take a person like that and that's the kind of folks that these teams want, wanted and still want because these problems are so open-ended.
Starting point is 00:10:23 Well, it's like curious people don't get blocked as much, right? They're willing to try the things. And, you know, it's such a new field. The roles our fellows are getting hired into. Most of the companies, it's not like we know what we need you to do, just do it. Yeah. It's what can we even do here, right? What kind of impact we have with data?
Starting point is 00:10:39 Again, how did you ask, how did you test for that? Like, curiosity. Yeah. Like the project seems like, okay, that's something we have to shoot for it. But again, it's like, how do you know that these were the right eight people? Well, so, you know, a lot of it was trial and error. I would do like 12 plus interviews a day and kind of, you kind of get to know folks and kind of get to know it. But I think the main thing, the signal that I saw was it's kind of that example I gave, is that almost, people would be almost apologetic.
Starting point is 00:11:05 They'd be like, listen, what I'm going to tell you is not part of my usual work, but it's on the side. And it's like, no, no, I want to hear about that. I remember I had this, one of the fellows came, well, she became a fellow, but she was in an early session. She was a mathematician at Berkeley. And she had done all this incredible analysis. I can't quite remember what. Like this really cool data analysis project. I think on like maybe flight times or something in sports.
Starting point is 00:11:32 I can't quite remember. And partway through the interview, I'm like, but you're mathematician. Like, don't you do pencil and paper math? Yeah. And I had done some math. She's like, oh, yeah, like, I can't remember what the field was. And she's like, oh, yeah, this is not even part of my. And she almost felt kind of apologetic about it.
Starting point is 00:11:46 I was like, this is who I want as a fellow, right? Brilliant mathematician doing incredible work. And able to on the side on the weekend, quickly pick up Python, this, that, the other, make something useful. She went on to what she worked as is, continues to work at Facebook. She went to Facebook after the program, been super successful. So it's people like that that you're like, I want you. So this is related to one of the first. is related to one of these like overarching questions we had for you. So basically it's like,
Starting point is 00:12:14 how can people get into data science? And then what are the pitfalls for people who say have a PhD, you know, they know Python? They're like at a higher level than like a coding boot camp person. Yeah. What are the pitfalls they make when they're trying to bridge that gap and get into a data science role, provided that they didn't do your program? Yeah, absolutely. And we see it because we, you know, I started with scientists and now we also have programs for engineers who are transitioning to machine learning, engineering, deep learning research. And you sort of see very similar problem on both sides, which is folks are extremely focused on the sort of technical, let me get the algorithmic knowledge down, let me know
Starting point is 00:12:53 every last algorithm, which of course you need and you need those foundations. Yeah. But when you're already dealing with someone who has, you know, been doing a bunch of work for years in a PhD or in engineering in these areas, what you actually want to see and what these teams want to see is communication ability. its ability to understand the underlying like business and product problem, because what they want to do is hire someone who's going to first think about what are we trying to accomplish here. How can we help our users? How can we help our company succeed?
Starting point is 00:13:22 And then figure out how do I use my tool set of machine learning or analysis to do it? And what often happens in this of the pitfall is, you know, part of why you get into it is because you're excited about that analysis. You're excited about the machine learning. And so you start always putting that first. And you're always like, let me tell you the algorithms I can build. And it's like what folks need to start who are trying to transition into it, need to start thinking about product, need to start thinking about business, need to ask. Like the skills there is like making them a better salesperson. And what's interesting about the advice that we give to a lot of people about sales is like it's not about selling your own thing.
Starting point is 00:13:58 It's about understanding their problem. And then feeding and then feeding whatever you have to them. And so it seems like for the data science, the same thing needs to happen. It's not to say, here's all the things I have. It is like try to figure out what it is that you fit into for them. Exactly right. And it's like understand the underlying, forget data, forget machine learning out. What are we trying to accomplish here?
Starting point is 00:14:19 What's our mission? What are we trying to do for our users? And then. Like making yourself look like the solution, not trying to be like, oh, I have a bunch of stuff. Which one of these things are you interested in? Exactly. I have a hammer and a screwdriver, like, which I can use all of them.
Starting point is 00:14:33 It's like, what are we trying to build here? Yeah. And sometimes that's actually. a separate role. So for instance, say, like, Facebook might list a data science job. Whereas some, you know, smaller startup would say, like, we have an engineering role open. Right. And you might classify yourself as a data scientist. So if you have to pitch data science to a startup, right. How do you do that as an engineer? Right. This is a great question. So first of all, data science, machine learning, these are all like super broad umbrella terms. It's such a new field. Yeah. Maybe you should define data science. So, yeah. Maybe what I'll do is define data science. And I think, I think this is this is essentially answering your question.
Starting point is 00:15:06 which is, so what we see in the industry, kind of broadly speaking, broad terms, details, let's not worry about the details, is sort of, I see kind of three big pieces of how sort of data science is used. So some data science roles are what I kind of call product analytics or business analytics roles. The idea there is you're looking for a better understanding. You're analyzing data about users or a company and trying to understand how to improve it. Help users succeed, help the business succeed. The second types of roles that we see are data product roles. So these are roles where you're actually using machine learning and predictive models
Starting point is 00:15:43 to actually change the user experience and give them something they want right there and then as part of the product. And the third one is kind of, you know, usually what you hear termed as like AI, which is, you know, AI roles, machine learning engineering roles where it's not just a feature in the product. like that's the prediction, it's like the product is machine learning, right? Like it's like a self-driven car. Like if that doesn't, if the machine learning doesn't work, like the whole product doesn't work. And so you'll have an example of a lot of teams misunderstanding which one, which one they need.
Starting point is 00:16:18 Is there not a category in between where it's like, oh, machine learning like supplements a feature? Yeah. Or augments? Yeah. Yeah. So usually that's where folks talk about data products. So when they talk about data products, it's often like a feature.
Starting point is 00:16:28 So like the Netflix recommendation. engine. That's a situation where honestly, if they didn't have machine learning, they could still just say, like, here are the top movies, go watch them. But with that predictive model, you're really getting a much better experience. And so we have probably 30 plus fellows working at Netflix. A lot of them work on that stuff. But some of them work on analytics, which is, how are people even using this product? What could we add a more product level do to improve it? And there, the output isn't a feature that the user sees, like a actual algorithm serving recommendations. There it's like they have to go and communicate with the product team to say, hey,
Starting point is 00:17:03 users seem to want us to be building this sort of product for them. Let's over the next six to 12 months take the product in that direction. It's like a very different role. Here's an interesting question. So I know like what the dream scenario for a lot of data scientists are. Like I want to get a job working on these interesting problems. Right. Like what should they look out for that they should avoid an accompany? Like like what are what is a company who says, because I think everyone's kind of thinking, or more than should, is like looking for a data scientist.
Starting point is 00:17:33 What should a data scientist be worried about it? It's like, oh, they're not ready to actually hire me. And if I go here, this will be a bad experience. So you remember how I said the data scientist needs to know what the actual problem is? The company needs to know what the actual problem is. And so the companies you need to be wary about are the ones where it's like, hey, do you know what? Just like, I want deep learning.
Starting point is 00:17:50 And it's like, what does that mean? What do you want us to do here? And why do you need it? And the company you want to go to is the one that's got a mission you align with. You want to see them succeed. You want to have whatever solution they're bringing to the market, you know, thrive in the world. And then they have a clear sense of if we add some data analysis to this, if we add machine learning of this, it's going to be better. And then you can help them get there.
Starting point is 00:18:14 So someone from Twitter asked this. Chuck Grimmett asked, when do you know or when do you know you need to bring in seasoned data scientists? So like, is there any kind of benchmark you can offer? Yeah, I think, so first of all, I think you have to start as a founder, start with the idea. And you can do this. I recommend this before you have a data scientist. Understand, is data sort of critical to my, to building my product? Or is it something that I'll just add on once it's already working and I need to kind of optimize the experience?
Starting point is 00:18:44 So, you know, an example, an example for sort of something critical is like Amazon Alexa, right? Like if you're building Alexa, those algorithms, voice recognition alves better work from day one versus a scenario where like, say, you're on the analytics team at Airbnb and you already have a lot of users and you're just trying to optimize that experience, right? And so for a startup, figure that out first. And then if you need one from day one, hire one from day one. If you get a machine learning engineer in the door who really, that's their forte, you're going to be better set up for success. instead of trying to sort of, you know, kind of hack it and then have to kind of catch up later. Because often you don't know what you don't know. And you might not be tracking the right data or you're not sort of setting things up, your infrastructure in a way that's going to help you scale later.
Starting point is 00:19:35 And then those, especially in products where machine learning is critical, that becomes challenging. One thing I recommend to startups actually is talk to just talk to folks in the industry and frankly get an advisor. right? If you're not ready to hire a data scientist yet, at least maybe think about getting a data science advisor because they're going to be able to sit down. Where do you find those? Yeah, good question. So I'm trying to understand like who gives that information for free? Email me. Yeah. No, I mean, you'd be surprised a lot of, I mean, maybe some of the top folks who started the data science team at like LinkedIn, you know, that's, that's hard to get an advisor. But, you know, I think even any sort of data scientists who've been in the field who knows what
Starting point is 00:20:16 they're doing, we'll be able to sit with a founder and say, listen, you're probably going to want to instrument these features to collect this data because you're going to want to analyze this later. Or here's the type of work you want done probably down the road. You want someone to help you understand how to lay the groundwork to actually do that higher. You guys started off with eight students in that first class. Can you talk about where it is right now? How many students are you processing now? And then also, like, what is different about the curriculum and program? Yeah. Yeah. So it's definitely scaled up a bit since then. We're now in five cities, so San Francisco, New York, Boston, Seattle, Toronto.
Starting point is 00:20:52 My hometown just launched it this year, which is fun. And we've got a bunch of different specializations now. So data science is one, data engineering, health data, AI. We're even sort of doing product management now helping product managers transition to AI. So overall, we do three sessions a year. It's like almost like you have different classes depending on where you're starting on. On the specializations, yeah. Because the field specialized, right?
Starting point is 00:21:17 It used to be like you just hire a data scientist who you hope will take care of everything. And now you want folks who are building infrastructure, the data engineers. You want the data scientists who are sort of building the early prototypes and figuring out what to build. And then more often than not now, you need machine learning engineers to really kind of put that into production now. And so you see these different specializations. And we essentially have a program for each. So the data science programs for PhDs because that sort of scientific experience is critical. The AI program, for instance, is four PhDs, predominantly for engineers, actually, who are going to machine learning engineering role.
Starting point is 00:21:53 Then how big are these classes? So overall, across all the cities and programs, we're at about 300, oh, just over 300 fellows per session now. But each program is small. So we keep it sort of maximum 20 to 30, 35 fellows. And because the idea is each one of those sub programs. That's right. Each program is in each location because you want that, the collaboration is critical. You want that group to sort of gel.
Starting point is 00:22:15 everybody's working on a project. You want people kind of tapping each other on the shoulder, asking for help. You want that alumni who's coming in to be able to kind of sit with the fellows. How long is like the class? And the small groups really are critical for that. How long is the class? Seven weeks. And then super fast.
Starting point is 00:22:30 So what gets done in seven weeks? Yeah. So it's pretty incredible how fast people learn and what they build. So literally, you know, they'll go from in week one trying to come up with the idea or partnering with a startup. So often fellows work with startups. We have a partnership with YC. So right from the get go they start with a project. Well, week once, figure out what project.
Starting point is 00:22:51 So like your first week is like, should I come up with something on my own and build it based on advice I'm getting from our alumni, from our mentors, our team. Or should I go partner with a YC startup who's got a data challenge that they want solved. And so that's step one is figure out what you're building. And again, figure out what problem you're building. In the next couple of weeks, you better build it fast. So folks have to go from literally nothing to like an MVP in a week or two. And then they're out presenting those projects in a few weeks time. They're working individually because they're trying to show that they're able to kind of execute end-to-end on a real world problem.
Starting point is 00:23:26 But it's incredibly collaborative. So if you come to insight, it's like it doesn't look like a classroom. It looks like kind of like a startup office and everybody just kind of at desks sitting together. And people are on whiteboards. They're talking to each other, helping each other. because you know you encounter the same problems technical otherwise and it's that collaborative aspect that allows people to move super fast and learn a ton and if you're in the program or you're just checking out the program maybe applying for jobs like this yeah
Starting point is 00:23:53 what are the types of projects that you recommend avoiding you know things people have seen a hundred times before yeah I recommend uh like are people happy on Twitter is like that's maybe done that's a bad example I'll get a general general example because there's people have been doing like this uh uh the more kind of i think useful example is make something useful right so i think it's really easy to just be like i took this algorithm that used to operate at 99.1 and now accuracy and now i'm going to make it 92.3 like you know and i don't know why but like it's better now right or uh what you see scientists sometimes do is this very generic like i studied um you know i'll give an example of a project i love that i felt
Starting point is 00:24:39 come up. So here's the bad version. Here's the version. Someone did at Insight, but you can do it at, you know, do this at home. So the bad version, so let's say the topic is solar panels. You want to understand solar panel usage and really, uh, enable people to adopt solar panels. Bad project is I analyze general trends about solar panel usage in California, right? It's like look at this interesting fact I found about like, okay, whatever, right? Maybe for an analyst report that's interesting, but not for actually getting anything done. To me, it's like it has no call to action. Exactly.
Starting point is 00:25:10 Like you want it to be almost opinionated because that way a business knows it's like, oh, I can look at this, no, know what to do. That's exactly right. I think the bad projects are the ones that feels like, oh,
Starting point is 00:25:18 now I have homework. That's right. Oh, here's some information you may have to, like, and it's actually the problem I have with a lot of like analytic startup. It's like, all you do is like just tell me that I don't know anything.
Starting point is 00:25:27 Yeah. But now I still don't know what to do. So I've paid to be told to figure stuff out or that feels dumb. Exactly. And so the good version of this project, which is a fellow did and is one of my favorites is, I'm a homeowner. should I buy solar or not?
Starting point is 00:25:41 Will solar be profitable on my roof? Okay, that's a hard problem. What's the weather like? What's the, I mean, a ton of different factors. Press, there's some predictive aspects of all that. All this, the fellow took all this data, synthesized it, build a predictive model. I come in, I type in my address. It tells me whether I should buy solar.
Starting point is 00:26:00 Oh, they basically build a product. Yeah. Yeah. So all these projects are very product focused. They're so product focused that sometimes companies are like, why are you showing us products when like we just want data scientists? And the answer is because that demonstrates that people can think product wise. And they end up loving it because they sort of abstractly don't understand why they're showing us products.
Starting point is 00:26:21 But but people gravitate to real solutions. And then well, they hire the fellows. So this is related to something we talked about the other day, which is like in the future, are more data scientists going to become founders? Or is that like personality, that mentality? Like is that best suited within a big company? Yeah, the thing I... Absolutely, I think...
Starting point is 00:26:39 Oh, really? Yeah. So this is not going to be a case, like, designers who, like, for some reason, designers don't tend to become founders. You know, we'll see how it shapes up in terms of, like, is it going to be on mass data scientists? But certainly, I would say probably about a quarter of every fellow's program I see, like, raise their hand when they say they want to start a company in the next five years.
Starting point is 00:27:02 Oh, shit. I mean to get my ass out there. Yeah. Yeah. And so, and so I think that's going to be a bit. big thing. We've already seen some of our alumni start companies, although again, it's early. In the early days, we have very few fellows. But Diana Wu was in one of the early sessions, started Trace Genomics, a genomics company, which uses genomic data to tell farmers when to plant,
Starting point is 00:27:23 when not to plant. Super interesting. Not an alumni, but like an early mentor, Ben Kamens, used to be the kind of founding engineer at Khan Academy. And he hired a, one of our fellows, Lauren, who is a physicist, she went there, helped them sort of, you know, help impact a bunch of, hopefully impact a bunch of kids' lives by, like, helping them learn faster because they really have millions of data points of data on how people learn. And she was there for a few years with Ben helping with education. And now Ben went off. I mentioned Ben because he's very much kind of a data scientist at heart.
Starting point is 00:28:01 And although his founder, you know, his title is officially CTO, he went off and founded It's spring discovery now. They're doing sort of helping aging-related diseases using machine learning to do that. Lauren went over there with him, sort of part of the founding team. And so, you know, again, TBD in terms of like what the stats are going to be in terms of founders, but that founder spirit is there. And the skill set is so useful. I mean, that's the thing.
Starting point is 00:28:27 Like regardless, having an understanding of product is like the pickle. Absolutely. That's what you can go for. Absolutely. Because whether your employee, whether your founder or employee 10 or 100 or frankly, a thousand you better you better know what what and so do you teach that as well oh yeah it's one of the biggest things i mean how do you teach it uh you know i found the only way he teaches by doing yeah so you say like build a product and uh then they don't they give you a graph that shows you interesting
Starting point is 00:28:52 things and you say no no like no like and you iterate you just iterate i mean that's the learning experience you do it wrong and then you iterate and you fix it and get better and so the model at insight is really just continual feedback so uh If at the end of the program, I tell you, like, no, that's wrong. Then that's a bad learning experience. But at Insight, you'll be told, like, half a day in that that's, like, not the way to go. And by the, like, next half day, you'll be closer there. And by the first week, you'll hopefully be on a good path to building a cool product.
Starting point is 00:29:23 So that's that fast iteration of you go. Cool. I think one of the things that ends up being a problem for a lot of startups or for even people getting into the data science field is, like, they're encountering very dirty data. And so now a lot of time actually is like, this is not like, oh, I'm solving cool problems and making products. It's like, oh, I'm just sitting here cleaning up this day just so I can get to this point. And so I'm trying to figure out is like, is this something that data scientists need to be aware of that you're just going to walk into this? Or something like startups need to start thinking about and like what can they do to like prevent that? Both.
Starting point is 00:29:57 But I think you can never avoid it. So it really is the data scientist's job to be prepared for that to be to do well at that. And that's what is that? What is the ratio of the job? of like cleaning versus like there's this joke that like 90% of the job is data cleaning. I don't know if it's 90, but it's a lot. And it's not just data clean because data cleaning sounds kind of lame like you're just kind of cleaning things up.
Starting point is 00:30:16 Yeah. It's, I think more interesting than that. It's like literally like what data even makes sense to get here. It's not obviously. In advance, you think it's obvious. You're like, oh, just throw some data in what data? Of what? And how can you combine that data?
Starting point is 00:30:28 And what does it mean to have clean, relevant data? And do you have an example? That's a skill set. Well, you know, I'll have an example around the founder side, right? So I think founders often make the sort of assumption that they're tracking all the right things. And then we've had many experiences where, you know, we'll talk to a founder. If I was going to work with like a founder and they'll say, yeah, we got all the data. We got everything.
Starting point is 00:30:55 Big data, big data. Yeah. It's all, it's always big data. It's the best data. And then, you know, and then you open it up. And it's like, oh, shit, they didn't track user logins, like which user was logging in. They're tracking all the movements on the site, but not which movement, which user was doing that and at what time stamp.
Starting point is 00:31:16 And again, it's like, oh, my God. Like all this data is borderline unusable because we can't kind of peg it to specific behavior and model that behavior. And, you know, when you're looking at from the data perspective, it sounds like hilarious. Like, why don't you track users? But you know what? I'm a founder. I know when you're a founder, you're thinking about a million different things.
Starting point is 00:31:35 You have a million different tradeoffs. And honestly, like, yeah, the loggings turn on. Like, let's go, right? Let's build. And then a year later, you're regretting that. So again, I think a lesson learned for sure. That's why it's like, hey, have a coffee with the data scientist. Like, maybe all you'll get up from it is like log your user logins.
Starting point is 00:31:53 But that might be enough. And then a year later, you can get started with a data scientist. What's the best tools that people should do for tracking data or like, is there a product that startup should use just right? to get that you know that if they do this we're just going to start on a good so you know honestly i i saw some of the questions on twitter and i've you know folks always asked about tools so i was actually asking around some of my team like hey like what's the latest on this and uh there are great tools i think for just sort of like basic analytic tracking of like websites but if you're really building products like it's still to this day we see the teams roll around um because
Starting point is 00:32:29 there's so much um there's so much such a disappointing answer. It is a disappointing answer. And I think, you know, listen, there are companies working on it, some YC companies, and they're slowly progressing up to more sophisticated, sort of data products. But at the end of the day, if your lifeblood is a very specific product that does something very specific, like there's like nothing beats just having somebody very thoughtfully say, what do we actually care about tracking here? Okay. So the, so stepping back then, yeah, like assuming there, there is no easy answer then, you're a founder.
Starting point is 00:33:00 You just started your thing. Can you give me like five or ten things that I should be tracking? Well, I mean, it really depends on the company, right? Sure. Okay, fine. So I think the number one thing you have to think about as a founder is actually not even what you're tracking. Because honestly, if you think about this first thing, right, I think that'll become more obvious. Yeah.
Starting point is 00:33:20 The first thing you got to think about and think about it right is what are you actually trying to optimize? What's the one or two metrics you actually care about? What if you're thinking about machine learning and building predictive models like say you had a magic machine learning model that like did whatever you want, but you only had one or two. Which problem in your company would you would you apply it to? Because I think what what I see folks do is, oh, I know my business in and out. And so I know my metric is this, this, this, this, this, this, this and this. And then, oh, machine learning will build this, this, this, this, this. And you know what?
Starting point is 00:33:53 You might at some point down the road. but initially you're going to have to focus. And if you don't have that focus, that's where you get into this habit of, I'll just track everything or nothing. Whereas if you know what you're trying to optimize is, let's say I'm Netflix. Yeah.
Starting point is 00:34:09 What am I going to start tracking? I mean, you're, you obviously want to see how long people are watching the video, how far they get in that video. One of the teams there, less obvious is people are using different devices on different bandwidth. So they track, I mean, they test this stuff and track it on all sorts of different, different machines. So again, like if in a generic tool, would you have a situation where you're testing like a stream on a hundred different devices and a hundred?
Starting point is 00:34:40 No, you wouldn't because like if that's not a quarter of your business, why would you ever do that? Right. But if you're Netflix, you better be doing that. And because you know that user experience is the key, right? Con Academy is something different, right? For Con Academy, it's like, you know, maybe it's the amount of time. kids are spending on a question and that's telling you something about whether they're learning where on another site it's like you don't really care about the timing you just care about the flow
Starting point is 00:35:03 can i just simplify that so it's so for us like for any startup in most companies it's like always like my goal is growth yeah and and for us at yc we've actually pretty much simplified it where it's just like look for the most part your kPI that your company is actually interested in driving it's either going to be revenue right and that's like 99% of the company yeah and for some like consumer very difficult play is like I'm going after engagement. Like daily active users is the ideal. Sometimes it's weekly active users. That's just the nature of the product.
Starting point is 00:35:33 And so to me it's just like, okay, what drives those two things? I really just like, only two numbers. It's like conversion and then like churn. And so I imagine like most questions fall into those two categories. Like what increases conversion for revenue and what reduces churn for revenue and the same thing for like engagement? So maybe I'll speak directly to those. those because now you're kind of zeroing in on certain types of companies. And so for churn, we often have fellows built churn prediction models for startups.
Starting point is 00:36:01 So again, they're customized because there's, I mean, churn, churn for what, what's happening? Yep. But when we're talking about churn, it's a customer deciding to stop using the product. And if we can predict that ahead of time, then they're able to intervene, maybe offer a discount, maybe engage that user, get feedback. So those are top of the list. And for conversion, experimentation is the key. It's like these experimentation frameworks.
Starting point is 00:36:23 I always feel like a lot of times startups, especially early ones, they neglect that whole churn question. Because I always tell them as like, look, you're obsessed about conversion. Right. Because you're in sales mode and trying to bring them in. But I always feel like it's very expensive. And I feel like improving churn, like improving churn by the same percentage, that's exactly the same thing for coverage.
Starting point is 00:36:44 But it's way easier, cheaper, et cetera. And so is that usually what the first projects that startups and companies should be looking at if they haven't at all? Absolutely agree. And you know, you know, one thing I'll add about churn is it's often more reflective of what is actually working or not working, right? It's like makes something people want. It's like if you improve churn, that means you're truly understanding what the user wants. Maybe you can get them to sign up or convert just by sort of, you know, having a flashy sales pitch.
Starting point is 00:37:11 But churn really you understand it. And then that's where the exploratory data analysis comes in. Do you really understand what your user is doing? That's where the AB testing and often what's called like multi-arm bandit testing where you're trying various different. experiments at once. That's where you're predicting churn and then trying to intervene to help the customer. But you see what I'm saying? It's like it's like a number of different things, all of which are grounded in. Do I understand what my user wants and my building building to what they really care about? I think the other big trend that you're having people
Starting point is 00:37:40 sort of obsessed with metrics wise is like cohorts and like retention curves over time. And so what do you do like the best things people should do? Like yes, just understanding and knowing it. Like that's sometimes really difficult. But in terms of improving that, like where does data science usually help? Right. I mean, I think it's coming back to churn, right? Because if you're, if you're seeing folks drop off at month three in like your early cohorts, I mean, I mean, that's a churn problem right there. So yeah, I think it goes back to churn. A lot of those sort of dashboards are, you know, are, you know, there are great tools for those. So certainly like when I started people with like hand code like cohort analyses, now there's a bunch of tools for that. So I'm not saying, certainly I think
Starting point is 00:38:19 in the metrics sort of dashboard domain, there's a lot. lot of solutions. When I was saying that there isn't really a ready-made solution, it's more, it's more that stuff that's, that's kind of, where it's, you're actually building models to improve the product in a very sort of deep way. Do you guys have a favorite for like, because you said like good startups have good problems. Are you waiting for a sponsorship? I'm trying to understand. What do you mean? For like some tool to pay you money to say what's great? Honestly, at insight, almost everybody just used open source, right? So everybody's building Python, you know, yeah, it's all the open source.
Starting point is 00:38:50 And because that's actually what's we're seeing reflected in the industry. So if you go to a top data science team, by far and away, the vast majority of what they're using and building on is open source. What are those projects? Like, I think Python has definitely, it used to be like Python and R. They're still building it themselves. Yeah. Yeah, absolutely.
Starting point is 00:39:11 Yeah. Right. And then they just use like what? Jupiter notebooks, stuff like that. For prototyping. And then you got to then start building. And then you roll your own. And frankly, at that point, as soon as you get away, as soon as you get past the prototyping
Starting point is 00:39:23 stage, you're really just building product, right? It's the same thing in engineering team does a startup, right? It's like, what tools are they using to build the fundamental product? And that's where you're living. Those data scientists are often embedded with the team building directly. Who makes the best data science? Like, from what field have you noticed? We're like, oh, it's much better that they come from this field.
Starting point is 00:39:44 What's kind of been shocking and amazing. I want to know who your favorite children are. What's been, I was, early days I was accused of, you know, I'm from physics. So, but now, you know, there's, we, we have fellows from all of different backgrounds. So, so they all succeed. No, I mean, I think that's been the, uh, the shocking thing is like how different the backgrounds are. We have a fellow in this, uh, session who's in, as an archaeology PhD. Uh, we had a fellow, a session ago.
Starting point is 00:40:10 It was like an engineer at like, uh, SpaceX, right? Like, we had a, imagine each of them have, so we have certain kinds of problems. Like, you have, like, a mathematician. That's exactly right. Going to get the math, understand the product and so selling themselves and understanding problems probably going to be a challenge. Exactly. So often you'll find like a mathematician is great, for instance, I've made great data engineers
Starting point is 00:40:28 because they think about large-scale systems and how can they fail. I mean, in math is logic systems, but then they kind of transfer that sort of mode of thinking to data infrastructure. But someone like, for instance, psychology was one that like in the early days, I didn't really have a network in kind of psychology or neuroscience. So we did a lot of work to try to kind of put the word out there. We found social scientists are incredible data scientists quite often because they know how to ask the right question.
Starting point is 00:40:56 Yeah, and they know how to think about people. And ultimately, you know, obviously data is branching out. But most of the time, when you're talking about users, you're talking about customers, it's people, right? And so fantastic data scientists from those fields. But it's just one of my favorite parts of my job, actually, is the fact that I'll sit at lunch or a happy hour or just hang out at the office. And it's like an astrophysicist with a psychologist with a software engineer with a, you know, electrical engineer. And they're all kind of working and collaborating.
Starting point is 00:41:26 And it's just incredible, incredible kind of environment to be around all these different people. You have all these companies coming in talking with all your students during the program. And they're usually coming like with a problem? Or are they just talking about here's the kind of problems we work on and solve? Because they're kind of doing a little bit of recruiting in addition to giving an understanding. Oh, absolutely. Yeah. Yeah.
Starting point is 00:41:48 And who's been really great at that? We have, I mean. Or what do they do? That's really great. There's a bunch of teams. You know, listen, I think, so the way the program works is fellows will often work with a startup company on a project. But most of the, most of the interactions of the fellows have with companies is actually
Starting point is 00:42:04 companies coming in who try to hire them, right? And when I say companies, I mean like the actual technical data team coming in talking about what they work on. Yeah. And, but, you know, and trying to hire them. And so the teams that do really well, listen, obviously the ones with great brands,
Starting point is 00:42:18 the Airbnbs, the lifts, the Uber's, the Facebooks, I want to know what the little guys have to do to compete with them. But this is what I found
Starting point is 00:42:23 is when startups come in, what often happens is, fellows come in, what's this startup? I don't heard about it. Why don't I have to go to this? And they come out, they're like,
Starting point is 00:42:31 this is my dream job. I want to work at this company. And I started trying to figure out what certain startups did to do that. And what it really boils down to is impact. The startups that do well recruiting data scientists,
Starting point is 00:42:46 make the pitch, you are critical to our success. If we, if, if, oh, they made it. It looks like they're going to be all stars. And, and, and they're telling the truth because there, a lot of companies these days, frankly, if the machine learning or if the analytics doesn't work, like the company will fail. Like, that's what they're pitching to. Well, also when there's one of you versus 300 of you.
Starting point is 00:43:06 Right. Well, that's a personality thing, right? Some people are excited about, I'm going to be the first data scientist. And some people are like, I want, I want, I want some mentorship. I need a little motor on me. Yeah. Yeah. But when it comes to, like, I've never heard of this company before.
Starting point is 00:43:18 And then an hour later, like, oh, my God, I want to work for them. It's always the impact piece. It's always the like, if you come here, what you do will matter in a big way. And obviously there's the technical piece that you're going to work on cool stuff. Yeah. But I thought the technical piece would be the biggest one. But the biggest one actually is the impact for sure. So one thing we haven't talked about, and I actually don't know if you have an opinion on this, is contracting.
Starting point is 00:43:42 So for an average startup, say they're like a couple of, years and they're like, I don't know if we really have a need for this, but we have all this data. Maybe we could put it to use. Do you see people like doing two month contracts and getting a system up and then just letting it go? What happens? Yeah, I think contracting is good for prototyping. So we see a lot of like when I'm saying YC startups work with our fellows.
Starting point is 00:44:04 That's essentially, it's a pro bono consulting, but they're working with them for, you know, the program and helping to deliver some results. And where that works really well is, you know, often it is integrated, but it's at this sort of prototyping stage. Will this even work? Or I've got a model. Will this one work better if we try this? So let me give you an example of one I really like recently,
Starting point is 00:44:24 a fellow work with ISO-No Health, YC startup. Really amazing product does like sort of in-home breast cancer screening. So it's a device instead of going once a year to get screen for breast cancer. If you're in high risk, you can do it at home. Secondly is leading cause of cancer death and women. So huge impact potentially, life-saving technology. And obviously a big part of that is, can we, do we have the right algorithms to detect and notify a user that, hey, you need to go speak to your doctor or notify a doctor? Obviously, a doctor does the final thing, but is there something abnormal here that we need to be taking a closer look at?
Starting point is 00:44:59 They had algorithms that were working great and doing well for them, especially at that stage of, hey, let's just bring it to a doctor to be safe. But they were curious about, hey, are some of these more like these newest sort of deep learning algorithms that are just coming out in the papers? Are they going to do better for us? So a fellow did that. They took the data and essentially used some brand new sort of convolutional neural network techniques that had just kind of been published and got better results for them that were almost on par with sort of expert radiologists. And so I mean, that's that's awesome, right? And so, of course, that team then has to do some more work to implement it. But that's an example where I think consulting works is like, is this going to work?
Starting point is 00:45:40 Is this feasible? Is it a prototype? Anytime you actually, I think then kind of, anytime that becomes a part of your product, you need a team. Right. Because it's never static. Something's going to evolve and change and you need to be able to evolve it. It's just like asking, like, can a startup just have like contract software engineers overseas? It's like, well, maybe to prototype something, but in general, probably the answer is no, because that product's going to keep evolving every month, every year.
Starting point is 00:46:10 And you need folks on the staff to do it. Makes sense to me. Great. Cool. So I think one last thing I wanted to talk about was just areas you're excited about in particular. Yeah. We mentioned health the other day. Yeah.
Starting point is 00:46:22 But yeah, what's exciting to you right now in the field? No, I mean, there's a bunch of stuff that's excited me, but health is the top one that I'm pumped about because it, I mean, the impact's there, right? Like the example I just shared with you, I mean, early detection, disease monitoring. I mean, you literally saving people's lives as this stuff works. And what's interesting is people have been talking about the impact in data science, machine learning, and health for years because, you know, you start thinking about this stuff and pretty quickly you're like, oh, this can make an impact.
Starting point is 00:46:52 But, you know, actually getting it to work is tough. And only I think in the last few years, we've been seeing a lot of teams actually making really amazing progress there. I'll give you an example I love of like the impact here. Memorial Sloan Kettering Cancer Hospital in New York has hired out a team of data science data engineers from us over the last few years. And what they do is they build essentially data products but that are used internally by their doctors. So these are cancer doctors. It's really tough situations and they're faced with a situation of what clinical trial do I recommend to my patient.
Starting point is 00:47:29 And there's thousands of clinical trials. And there's new ones coming online every day. Which one do you suggest? And so they're building these kind of data products where the doctor gets based on the specific personalized, you know, whether it's genomic or clinical factors. Hey, you should at least think about these new clinical trials that are coming online. And again, the doctor makes a final decision. But it's, hey, maybe one of those trials they hadn't heard about now to save that patient's life.
Starting point is 00:47:56 Right. And it's, it's, it's a hospital that's a hospital. Right. And then soon thereafter, New York Presbyterian hired a fellow. And then Mount Sinai hired a fellow. And now pharma companies are hiring fellows. And like, it's really fascinating to see data broadened out when companies realize that they can just take it beyond. Oh, I want to optimize this like business and efficiency and really think what can I create that's going to add incredible value.
Starting point is 00:48:24 And so health is one I'm excited. but there's a ton more out there. That's super cool. All right. Well, thanks for coming in. Thanks so much. Yeah, thanks to great. All right.
Starting point is 00:48:32 Thanks for listening. So as always, you can find the transcript and the video at blog. Dot Ycombinator.com. And if you have a second, it would be awesome to give us a rating and review wherever you find your podcast. See you next time.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.