Y Combinator Startup Podcast - #128 - Michael Babineau and Kevin Hale

Episode Date: May 29, 2019

Michael Babineau is cofounder and CEO of Second Measure. Second Measure analyzes billions of credit card transactions to answer real-time questions on consumer behavior. They were in the Summer 2015 b...atch of YC and you can check them out at SecondMeasure.com.Kevin Hale is a Partner at YC. Before working at YC he cofounded Wufoo.You can find Michael on Twitter @mikebabineau and Kevin is @ilikevests.The YC Podcast is hosted by Craig Cannon.Y Combinator invests a small amount of money ($150k) in a large number of startups (recently 200), twice a year.Learn more about YC and apply for funding here: https://www.ycombinator.com/apply/***Topics00:00 - Intro00:35 - What idea did Mike apply to YC with?01:20 - Where did the idea come from?4:35 - From project to company10:20 - What info did investors want to know that Second Measure could provide?12:05 - Their first customers14:35 - The primary use case of Second Measure for VCs15:20 - What questions are they trying to answer?19:35 - Data examples from their blog21:05 - Post: Fashion retailers have nothing to fear (yet) from the rise of Stitch Fix23:35 - Post: Holiday sales rocket Peloton memberships ahead of SoulCycle active riders25:05 - Post: Prime members deliver for Amazon every day27:35 - Second Measure's product development process29:35 - Finding good data scientists who work from first principles37:05 - Why is credit card data so messy?42:05 - Cleaning data44:20 - Using their product for competitive analysis47:35 - Their sales process49:05 - Raising money from Goldman Sachs and Citi52:05 - Focusing on a specific problem54:05 - Keeping the product compelling when it's table stakes

Transcript
Discussion (0)
Starting point is 00:00:00 Hey, how's it going? This is Craig Cannon, and you're listening to Y Combinators podcast. Today's episode is with Michael Babineau and Kevin Hale. Michael is co-founder and CEO of Second Measure. Second Measure analyzes billions of credit card transactions to answer real-time questions on consumer behavior. They were in the summer 2015 batch of YC, and you can check them out at second measure.com. Kevin is a partner at YC. Before working at YC, he co-founded Wufu. You can find Michael on Twitter at Mike Babonaner, and Kevin is that I like Vess. All right, here we go.
Starting point is 00:00:35 Mike, Kevin was your group partner when you did YC in the summer 2015 batch. What idea did you apply with? So our basic idea at the time was really to use credit card data to help investors make better investment decisions. And I think like one thing that and that is actually not really far from what we do today. The only like the main evolution is that now we work with companies. as well, not just investors. But I think a big part of the idea, though, is not just to look at credit card data and try to find interesting things and then tell investors about it, but instead to build an analytics platform, throw that in front of investors, and then let them answer their
Starting point is 00:01:14 own questions. And what led you to coming up with that idea? Oh, that is a good question. So I did not, like, I don't come from an investing background or I don't come from finance at all. I actually worked in video games. And the same is true of my co-founder. Lilian. So she and I met at Electronic Arts. We worked together there and then at another gaming startup. Before that, I was in ad tech. And like, I've always been in, you know, we're both software engineers. Like, we've always been in the tech world. But we've got plenty of friends in finance. And one of those friends just out of the blue call me one day. I was like, Mike, I need your help. I've got two terabytes of data on a hard drive. How do I load this into Excel? And that, that was like,
Starting point is 00:01:57 It was one of those moments where, again, as a software engineer, right? Yeah. You know, I get this question. I'm like, oh, God. You know, like, why? Like, why are you asking me this, right? So he's in New York. Like, I'm in the Bay Area.
Starting point is 00:02:11 It's the middle of the afternoon. Why, like, why am I fielding this? And I, I, like, wasn't feeling particularly helpful. I was like, like, what did, what did your engineer, like, what, you know, did you ask your dev team? Yeah. engineering team. And I just hear silence.
Starting point is 00:02:31 And then Mike, what are you talking about? We've got an IT guy. And that's it. And that blew my mind because he was at a $30 billion hedge fund. And like I just assume that all hedge funds, you know, look like 2 Sigma or Rentech or, you know, these like just these places that have hundreds of quants and hundreds of engineers. But in reality, most hedge funds have a handful of analysts.
Starting point is 00:02:56 And just some back office support, right? They don't have any coders in house. I think that's when we realize there's this huge opportunity because investors are like, you know, investors, they make money off of, off of having an information edge, right? Off of knowing things that other people don't. And they're like, and a lot of people who work at hedge funds are very, very clever, right? They're looking for this edge like wherever they can find it. And over the like over recent years, increasingly they've been looking at.
Starting point is 00:03:26 at things like Google Trends, right, to see like, oh, is there some leading indicator in search terms that would indicate some, you know, some like bigger shift in consumer sentiment about, I don't know, some company. Very unsophisticated sort of analysis. Yeah. Yeah. But at the same time, like a clever idea and it, you know, oftentimes works, right? You've got investors like subscribing to things like ComScore, looking at how many, how many visits to a website are happening. And because that roughly is like roughly correlated with with actual sales. And it's also like this nice leading indicator in the sense that public companies only come out with, they only report metrics once a quarter. And it's like not right at the end of the quarter. It's actually sometime
Starting point is 00:04:14 afterwards. So you can actually look at how many people visit. If you can see how many people visited Amazon.com over the past quarter, then like you can look at the full quarter of information and then you can see how well that correlates with you know the resulting reported performance so how'd you go from like helping someone with like a two terabyte excel problem and working on video games to being like okay this is now time for us like quit our jobs and then solve this problem because like what were you doing there at EA like what with your role yeah so um so we are so we are not video game programmers right we were working at a video game company, but my specialty was building high-scale infrastructure.
Starting point is 00:04:58 And Lillian's specialty is building data pipelines and analytics teams. And when you look at the video game space, you know, like what, like how does, how does like a company like Zinga, I think Zinga epitomizes this, right? They're like very metrics driven, very data driven, right? They were, one of the things they did very well is like optimize that. They optimize the hell out of their games. So when you think about an online game and you think about what you want to optimize for, right? You want to actually, let's just let's talk about this in terms of fun.
Starting point is 00:05:34 If your game is too hard, no one's going to play it. And the game is too easy. No one's going to play it. So you have to find like this balance where the game is not too hard and not too easy. And if you have an online game, then you have this like amazing leg up over, over games that are, you know, like press to disc and then shipped out because you can update them. if you and like the the best way to tell if your game is too hard or too easy is to simply look at you know where like how far people make it into the game and so um for instance you could look at how many players make it you know from level one to level two um and if there's like a severe drop
Starting point is 00:06:08 off right if like not enough people are doing it then it's a signal that hey you know it may be um you know maybe we need to tweak this uh of course the person who needs to answer that question um is a game designer. And usually game designers aren't writing sequel. And so, you know, you've got all these metrics, like, you're tracking all these, all these events, right, of like, oh, a player, you know, they passed level one or like player died or whatever. All of these events are being tracked and you, you know, you're like, this is, this is like a standard sort of analytics pipeline, right? You, you, you, you instrument your application. You have all these events streaming out. You store them somewhere, you do some sort of processing on them, and then you dump them into some place
Starting point is 00:06:54 where you can, at some place that you can query. But then you've got these people who typically aren't coders, like a game designer or a product manager, they want to answer questions about how people are behaving in the game. And, you know, you basically have two pads at this point, right? If a game designer says, like, how many people have made it to level two, then as, like, as, you know, somebody on the data team, you can say, okay, you know, let me run that report for you. And then you go and you like query it and you put together the results and you send it back. And then, you know, they look and they say, oh, this is great.
Starting point is 00:07:27 How many people made it to level three? And you're like, oh, you roll your eyes and you're like, I see where this is going. And at that point, you're like, okay, I have a choice. I can either play this like go between, right? And like fetching data over and over again. Or I can build tools, right? And if I build a tool and hand that tool to the designer and say, you know, here, like answer this yourself, then, you know, I can focus on.
Starting point is 00:07:50 doing like much cooler things and much more interesting things. Also I'm out of the I'm like I'm out of the way now like I'm no longer in the way of This this this person answering their own questions This is this is exactly what we did in the in you know in the video game space and This is a pattern that we recognize could be really useful in the investment space right? You've got all these all these Investment analysts and they like they know so much about the companies that they are making investment decisions on and they know what questions they want to answer. And so you can either put yourself in a position where you're trying to guess at the questions and like writing, prewriting reports
Starting point is 00:08:29 and trying to sell them reports or you can just give them like give them some sort of tool that actually empowers them to answer like all the crazy questions that they thought they couldn't answer. So this is the thing that's like fascinating. You guys are building tools for understanding how to improve video games. How does that become all of a sudden the skill set needed to sell financial and analytic software and insights to people who run like hedge funds and investment firms or even to do corporate like competitive tracking. Because to me it's just like I imagine they're going to ask about like what's your background, how do I know?
Starting point is 00:09:07 Like how does that start? Like what made you realize that like we could probably do this? Yeah. I think it comes down to like what is the fundamental problem being solved. And you know like the fundamental problem is that you've got somebody. who probably isn't a coder, and they want to answer a question of behavioral data. You're at video games,
Starting point is 00:09:28 and then how you decide what was the first product going to be? Yeah, I mean, I think for us, it was really digging in further to understand what types of data investors were most interested in. And what we found is that transactional data, like specifically credit card transaction data, is one of the things that they were really excited about, but they were banging their heads again.
Starting point is 00:09:50 it, right? Like, this is fundamentally credit card transaction data is it's a messy and, like, it's a messy data set with unstructured data problems baked into it. And the skill sets of investors, even the more technical ones,
Starting point is 00:10:08 those tend to lean more towards like time series analysis as opposed to dealing with large messy data sets. What kind of questions were the investors interested in? Like from that data set. Yeah. I mean, I think like one of the one of the main things is just like how is Chipotle doing, right?
Starting point is 00:10:26 Like are they, so like they famously had a food poisoning incident a couple of years ago. Actually, I think they had several. But, you know, they wanted like investors, eager investors wanted to know, you know, what is the impact to their to their actual revenue. How come there was no way to answer this before you guys came onto the scene? Yeah. So this is one of the interesting things. So there actually was a way to answer it. It was just a terrible path. Right. The way to answer this before was with a survey. Right. So you go to some market research company and you say like, hey, there's this, you know, like Chipotle, the whole food poisoning thing. You know, can you, can you help me understand like how many people stopped going to Chipotle? And they just have to like try to find a bunch of people that match a demographic and then hope like these people answer. Are you? going to represent like exactly it takes it takes weeks or months it costs tens or hundreds of thousands of dollars you end up with this tiny sample of you know like oh good we got a hundred respondents and they said that you know they um uh you know from this this pool of a hundred you know
Starting point is 00:11:34 like 20 of them said that they uh have considered stopping uh stopping uh stopping uh their their chippole like dining altogether and then what do you guys do instead so for us you know we like because we have direct observations of millions of U.S. consumers, like, we see all the purchases, right? We can just look it up right away. In fact, like, we don't even need to look it up. We can just give you a tool, and then you can find the answer yourself. And so you got your first customers during YC, correct? How did you go about even getting them? Yeah, that's a good question. I have to think back. So this is 2015. So our very first customer was a VC who also ended up investing in us. and I think one of the things that was interesting is that actually this is one of the
Starting point is 00:12:23 this is one of the things where we got to I feel like we got to cheat a little bit because because we were in YC like because we were in YC like all like VCs are always excited to talk to YC companies and that's I mean they're trying to figure out who is in the batch and then try to invest before Demo Day that's exactly right and so we had we had a whole bunch of these these like funny these like funny meetings where you know know, we're trying to get in front of them to, you know, pitch them on a product. And, you know, they're happy to take the meeting because they want to hear about what we do. And so it ends up being this like, dual purpose thing where they're like, okay, show me the product. Now tell
Starting point is 00:12:59 me your business model. And you're like, well, would you like to buy the product? And, you know, fortunately, like a lot of times the answer was, like, did end up being yes. And now most, most of the VCs here in the Bay Area, they are our customers, you know, but it was really interesting navigating those early conversations. What were they excited about? Because with credit card data, there's some things that it's really good at showing and identifying and somethings that are not so good. So, for example, it tends to be great for predicting consumer trends.
Starting point is 00:13:28 Yeah, I mean, it's basically, I think you just have to keep in mind, like, what is it we're actually seeing? And what we're seeing is spending for a large proportion of U.S. consumers. And so, like, if it's, if you want to understand a company that doesn't, that doesn't target consumers, if it doesn't target specifically U.S. consumers, and more specifically, if it doesn't sell things directly to them, right?
Starting point is 00:13:53 If it, you know, if they, like, we're not going to see general mills, right? That's all sold through grocery stores. But if it's something that you might see on your credit card statement, then like those are the things that we can help with. Like Uber, Lyft. Exactly.
Starting point is 00:14:07 All the meal, like, gobble, etc. And then what you're not going to see is like BDB enterprise companies, et cetera. it tends to be like lots of people are interested in consumer stuff because they're like the fastest growing most interesting segment. Exactly. There's like there's more, yeah, exactly. The market is more than big enough. And so are they using it as a market sizing tool? Not as because if you're investing in a seed stage company. Yeah. So probably the primary use case among, among VCs is actually diligence, right? And when you think about like, like put yourself in the
Starting point is 00:14:36 of a venture capitalist. So, you know, some company walks in and they say, you know, they throw some numbers on, on some slides. They show it to you. And you're like, okay, great. Like, is this, I have like lots of follow-on questions. Do I, you know, do I try to get these, the numbers from you? And then additionally, there's a whole bunch of questions I have about about your market, which you may not even know the answers to. So a good example of this would be if you are like so as a VC somebody comes in and pitches you they're in actually let's just talk about bird and lime right so so imagine you're a VC bird comes in and pitches you and they're like they show you they show you this chart and it's like it's the perfect hockey stick chart you're
Starting point is 00:15:21 like this is amazing you know like I've never seen growth like this before and at the same time though you've you know you've heard of other companies you know Lyme is out there you've heard of like jump bikes you want to pick the best one Exactly. Like, are you talking to number one? Are you talking to number two? You know, like what's, and also fundamentally are the like if, you know, if bird is showing good unit economics, like is that, is that best in class, you know, or could it be even better? And this is an area, like this is one of the key areas where we help VCs is in giving them visibility, not just into the company they're talking to, but into into their competitors, right? into each of those, like every company in that space in relation to one another. So we can say like, oh, yeah, bird, lime. Like, here's where birds winning. Here's where Lyme's winning, right?
Starting point is 00:16:13 Here are where the differences are in how, like, how well those customers perform, like, how much they spend and so on and so forth. But when you say unit economics, how do you uncover that data? So we don't see unit economics. Sorry. So obviously, like, we don't see the cost side. We just see the spending side. Right. So you could say, you know, an average bird customer spends $40 a week versus a lime customer that might be 20. Exactly. And I think and generally, like, again, if you're if if you're a VC, like you have, you have your own ideas for how to estimate the cost side of the equation. Okay. Gotcha. What other metrics are you able to show? Like, I was always impressed when looking into the dashboard instead of second measure about like, wow, I can not just see like how much revenue is like being pulled.
Starting point is 00:17:03 But also things like cohorts, lifetime value, et cetera. And so like what metrics get investors like super excited? Yeah. I mean, let's, I guess taking a step back, let's think about like, what are the main problems that we're trying to solve? So one is generally the one is generally focused on company performance, right? And this includes things like competitive intelligence and benchmarking, right? Like show me, you know, what is? I don't know, like what what is the relative market share of the various meal kit players?
Starting point is 00:17:39 You know, how long do their customers stick around, right? How much do they spend over time, right? Like what are the lifetime sales after 12 months? And again, if we split those into different cohorts, you know, are those, are newer cohorts performing better or worse than older cohorts? So there's all of these things in and around company perform. and then separately, there's stuff around consumer behavior, right? And these are things like, where else do my customers shop?
Starting point is 00:18:14 Things intended to help you get a better picture of, you know, who your customers are and, like, really help you hone in on, like, who your best customer. And I'm saying you, but really, it could be you. It could be your competitor. It could be a company you're doing diligence on, you know, some target company. What are some good examples of that? because your blog is basically just this, right? It's like just insights.
Starting point is 00:18:36 Yeah, yeah. It's interesting, right? Because our core product is really about, it's really about empowerment and saying, like, hey, you know, you as a user, you can answer whatever questions you want, like, within this space of U.S. consumer spending. But then, and we don't sell, we don't sell research. Oh, so you don't answer questions for people correctly. So we'll do it on a case by, like a project by project basis. but we're not the ones coming up with the questions, right?
Starting point is 00:19:04 If somebody comes to us and says, like, I have the specific question, you know, I tried, I tried this in your application. Like, you know, I can't quite answer it yet. Like, I have this more specific question. Can, you know, could it be answered? Those are cases where we can, you know, we can do it like a one-off research project. But those are, and those are like paid projects, but we don't publish those. The thing we don't do is we don't proactively do research and go out and, like, you know,
Starting point is 00:19:30 call up 10 of our clients and try. to sell it to them. Gotcha. What's some stuff that you guys have put on the blog recently? That's your favorite. Yeah. So we've started some, one thing we've started doing is so, actually, if we talk about our blog, we also need to talk about like our press mentions.
Starting point is 00:19:47 So we actually work with the press a whole ton, right? And so we keep getting quoted in like Wall Street Journal, Financial Times, et cetera. And I mean, this has been great for us. It's great for the reporters too because, you know, they're trying to write about like the upcoming like potential lift IPO or you know whatever and they want to support their reporting with more information and we can help provide them with that information we're happy to do so the uber lifting is like a recurring topic and so in our blog we've decided you know what we're just going to keep publishing periodically are the publishing updates on that so when you
Starting point is 00:20:23 choose like a question you want to ask about like the uber versus lift do you guys like I've come up with the initial questions, and now you listen to what the press are kind of asking you that they want to verify, or is it always you guys are coming up with? So I'd say it is us always coming up with it. We actually have a dedicated editorial team. Gotcha.
Starting point is 00:20:42 So we've got, you know, we literally have a team of data scientists and writers who just pay attention to what's going on in, like in the news, you know, what's going on, you know, with companies that like could potentially be interesting to others.
Starting point is 00:20:59 the person who runs it like you know she has a journalistic background i mean this is this is their core focus right is find interesting things to write about it write about and then write about them so let's talk about some examples so uh before we started recording one you mentioned was stitch fix and where the customers of stitch fix do and do not maybe spend yeah so specifically we had um so this is a really interesting thing right because uh one thing it so so part of uh understanding what people are asking is like just going out and talking to people. And one recurring question we heard about Stitch Fix was like is Stitch Fix cannibalizing like department store sales, right? Like are they, are they competitive with department stores? And so we decided to dig in. We had no idea what the
Starting point is 00:21:46 answer was. We decided to dig in and we attacked the problem by basically saying, okay, let's look at let's look at people spending a department stores before and after they become a Stitchfix customer. And what we found is that Stitch Fix had no impact on department store spent. People just started spending more on clothes, period. Right. And in fact, the people who Stitch Fix's best customers actually spent even more on clothes before becoming a Stitch Fix customer than after. Oh, like Stitch Fix inspired them to go out and find more clothes or to buy more.
Starting point is 00:22:29 to characterize it is that is that it you know piques their interest in in fashion and so they they don't they don't spend any any less they just but part of it is like it probably jump starts like a variety they're like oh I'm introduced to a variety of stuff I never would have considered beforehand and now it's like oh now when I'm out there at in the real world looking at stuff I'm like oh I'm more I there's more things that might appeal to me because I've been exposed to them yeah and the key thing is that it's not displacing the spend right and that was I mean that was a real surprise. And also, like, it's also like a really important question to answer because if you're at a department store and you're trying to figure out, like, you know,
Starting point is 00:23:06 is this sitch fix, friend or foe, right? Like this, this really points more to friend. So do you actively track like the rise and fall of brands? Because I'm wondering, there must be instances of certain things being swapped out. On a recent post was Peloton memberships going up ahead of Soul Cycle, right? So that's really interesting. Is that, is that, are there trades happening that you could follow? So sorry, when you say trades, do you mean people, so, you know, sign up for Peloton instead of SoulCycle? Yeah.
Starting point is 00:23:36 So, I mean, really, we, again, we, this is something we will attack from an editorial perspective. But again, it's, you know, like our core business is about putting a product in front of, in front of our clients that they, through which they can answer their own questions. Now, on the blog side, yeah, I mean, the Pelotons, Peloton's, Peloton and SoulCycle story is super interesting. Like Peloton is a beast. And SoulCycle is, uh, has some interesting, like, actually, so, so after we came out with this article, SoulCycle, basically they had a, uh, a nice, like, non-denial denial, um, where they basically said, uh, like, we don't know what they're
Starting point is 00:24:15 talking about. Their numbers are, like, our numbers are great, um, but didn't actually dispute the metrics. To give some context, what did your blog post say? And then what was it that like, SoulCycle was nervous about. I mean, the short version is that Peloton has now surpassed Soul Cycle in terms of like the number of active Peloton members, right? And this is based on a spending, based on spending behavior. Active Peloton members on a monthly basis have surpassed the number of Soul Cycle, like active riders on a monthly basis. Is there an overlap, like a Venn diagram of like people who were used to be Soul Cycle and they've switched to Peloton? There is, there's both like a current overlap and there's like a you know the the sanky diagram type thing of like
Starting point is 00:24:59 you know people who used to be one and now or another well have you been following how amazon basics has developed their products uh i am generally familiar with it i'd say for us that is uh not something what we have a lot of visibility into because um it's at the end of the day we just see an amazon general amazon exactly but you've done some research about Amazon Prime people. Yes. Yeah, we did. So this is a case where we did a much deeper dive,
Starting point is 00:25:32 and we actually gave, we gave several talks on this. So one thing, and this is, you know, this is spearheaded again by an editorial team. You know, one of our data scientists, Brandon. So he, so he dug into Amazon, Amazon's customer base. And specifically, you know, he wanted, he wants to. wanted to understand really the differences in behavior between Amazon Prime members and non-prime members and like how that's changed over time and really like how important Amazon Prime's members are to Amazon. And one of the, I think one of the interesting takeaways is that increasingly Amazon is looking more and more like a subscription business. Like they're increasingly reliant on Amazon Prime customers for their revenue.
Starting point is 00:26:24 And then another interesting thing is that even people who, so people who became an Amazon Prime subscriber, even if they laps, right? Even if they are no longer a subscriber, they're still spending more on Amazon after than they did before. How do you get to that conclusion that like, what was the evidence that showed that like, oh, Amazon is more focused on a subscriber? Like, how do you guys sort of like get to that conclusion? I would characterize that they're less, it's not that they're more focused on subscribers, but instead that an increasing proportion of their revenues derive from people who are Amazon subscribers. I got you. So it's one of these things where it's like, oh, it's turning out like Amazon's most valuable revenue streams comes from the Amazon Prime subscribers.
Starting point is 00:27:15 Yes. And we don't know the reason why, but like there's obvious things that people can sell it, 10, for example, it's just like, hey, they already pay for this membership, so they might as well use it when the ordering and buying stuff. So it's like an excuse to have something delivered to your house versus go to the store because I'm already paying for the membership. It's like a cost-sunk thing. And so when it comes to product development on your side, are you incorporating this data in any way or is it just talking to your users developing product from there? Yeah, so when we think about, when we think about improving a product, like we have a few
Starting point is 00:27:48 different streams for like really feeding the backlog. So one is internally driven, right? And this is, it's based on, you know, it's based on like where we know we want to take our, our application. And also factors in, you know, us going out and proactively speaking with their own customers, like doing that user research and really like digging into their use cases, then use cases and then figuring out where the gaps are and then attacking those. That's one. Another is, I mentioned earlier that we do some custom research for customers. This is like, you know, think of it as a professional services like approach. You know, this is something that also helps feed our backlog because if we see recurring requests, then, you know, this is probably something we should add to our product.
Starting point is 00:28:38 And then finally, we have like the editorial side, which, you know, for us is like the best form of dog fooding, right? So we're, you know, we can go in and like try to use our apps. to answer a question if we find that we hit a wall, right? We can't, it's like, well, we've dug as far as we can go, and now we have to go to the data behind it to answer the question. Like, you know, that's a great signal that this is something we should probably build. One thing that's interesting to me is that I feel like we just like recently just talked to J. Clamka at Insight Data Science.
Starting point is 00:29:11 And I feel like data scientists, like hearing about your company, like this seems like a dream job. I work on interesting problems and questions and then even if it's with your editorial board that's figuring that stuff out it seems fascinating to me as like oh every problem is going to be kind of different we put that out there and whether solving it stuff for your customers
Starting point is 00:29:29 or stuff that like promote the company like how do you look at like finding like because you guys are hiring right now right yes like how do you find a good data science like what are traits that you're looking for that you know it's going to be a good fit for this kind of like nebulous work. Yeah. It's such a good question. I feel like data scientists is such an overloaded and I think a bit overused, like an overstretched term. I think for us, specifically what we're
Starting point is 00:30:02 looking for are people who are like scientists with a capital S who have very strong quantitative backgrounds and can understand from first principles like the problems that they're trying to solve. I think very frequently what you find are, you know, people interested in, interested in data science, you know, they learn a lot of the tools, but maybe skip over the fundamentals. When you say, like, are able to think from first principles, I think this is something I hear as a common theme also for people who are looking for good engineers or product manage, etc. Like, what does that mean exactly? Yeah. So let's think about it this way. So we have, So a third of our company have PhDs, right?
Starting point is 00:30:44 We have, we're basically equally, so most of the team is technical. How big is the team? So we're 60 people today. And most, so most of the team is technical. And it's about an, you know, 50, 50 split between engineers and data scientists. Now, on the data side, what you'll find is that we have people, you know, with backgrounds ranging from statistical genetics to cognitive neuroscience to string theory to like to, like, to like earth science to climate, you know, climate science, like really all over the place.
Starting point is 00:31:16 And like the common theme, though, is that all of them are extremely good in statistics, right? So that you've got this, there's sort of this statistical foundation that, you know, that in our opinion, like everything is built on top of. And it's our view that if you come in with that, that strong, that strong, like, you know, mathy foundation, that learning the tools, like the tools can be taught, right? We can, like, we're happy to help, uh, to help people get onboarded with, like using Python. Like, okay, cool, you've only used our. Like, that's fine, right? We can, we can help you like learn to switch over to Jupiter notebooks. Um, but the thing that we're not going to teach you, uh, is we're not going to teach you how to do math. Mm-hmm. And then how does that translate,
Starting point is 00:32:02 like, into the first principles. So, because I usually think of it as like, someone who's willing to challenge, like, I will give someone a task and so. And so, Sometimes they will come back and say, like, actually, can we just dive out? I was like, what's the reason behind this task? And maybe just be able to be like, oh, actually, I think I can improve the question we need to be looking into instead. Yeah, I think this, a lot of this ties in with like the nature of the types of problems we're trying to solve. Right? You can't, like, there's no, there's no like, I don't know, playbook of best practices for dealing with the problems associated with transactional data.
Starting point is 00:32:37 Right? There's no playbook on building an analytics platform focused on consumer spending behavior. A lot of the things that we're doing, you know, they're either, like we're either, we're doing them for the first time. And in some cases, maybe they are simply being done for the first time. So it's something where we benefit from people who, you know, who can approach these like big, nebulous and open-ended problems and come in and figure out how to structure and decompose the problem. And then tackle it piece by piece.
Starting point is 00:33:10 So do you train for that or you just hope that they have it? Like, what is the test is my question, really? Yeah. Because I mean, because it's really just like, here's a problem. But then before you get overwhelmed by the problem, because often you're told like, hey, you have to take route A or B. Usually there's options like C through infinity, right? And so you have to ask why.
Starting point is 00:33:31 And so how do you, whether it's through interviews or training, get that out of employees? Yeah. For us, I mean, I think of this. is less something that we train people to do and more something that we hire for, like we screen for in the hiring process. So we've taken great care in designing and actually iterating on our interview process. And I'd say that there is significant technical evaluation where we're trying to test for exactly these types of things. For data scientists, one of the things that we do is we actually, you know, give them a big messy data set. And we say,
Starting point is 00:34:11 do some, like, it's open-ended. Do some research. Tell us what you're, and then present it to us. Right. Tell us what you were looking for and tell us what you found. What's some common mistakes that, like, people do that end up not working out so well? And what's some stuff that the really great employee and applicants have been able to do? I know I'm trying to help people like cheat on on your. I'd say, like, the number one mistake that people made. make is that they, you know, they assume, they assume too much of the data. They assume the data is perfect, right? They assume that what we give them, you know, that like, oh, like, this is easy. All I have to do is just, like, you know, load it into whatever, like into pandas or load it into, like, throw it on a database and
Starting point is 00:34:54 just start running queries, get the answers, and then throw it into a slide and be done with it. like it never like that never really works because like this is and this is just isn't how data in our world works like there are always dragons like somewhere and so a big part of this of this exercise is like well how you know like how diligent were you in looking for dragons right and anticipating these these like problems and then you know you don't necessarily need to solve all of them but you need to be aware of them because they actually can distort your findings. And so as long as you, like, if you identify them and even if you have findings that are invalid, but you're able to identify that, you know, hey, like, I found this thing, but I made this, like, I deliberately made this
Starting point is 00:35:44 assumption, the simplifying assumption so I could complete it in a reasonable amount of time. Like, that's fine. So the good people, what they're good at is like not starting from their own assumption, but actually trying to query and figure out what were the assumptions that I'm working with. Yeah, exactly. Whether it's in the data, the question, et cetera. And so once you have that, it helps you understand as like, how strong or how weak is my ultimate conclusion going to be as a result.
Starting point is 00:36:08 Yeah. I mean, it's like, it's sort of like building a house, right? If you, if you were to hire a construction crew to come out and build a house and they just came and they just like came out on site and they just started like erecting walls and then, you know, they hand over the keys. you slam the front door, the whole thing falls over because it was on a shaky foundation, right? Then, like, clearly they failed.
Starting point is 00:36:29 And so for us, you know, what we like is to find people who really like to understand the foundation that they're working with to make sure that it will be sound when they build the house. So I've never done a project involving credit card data. Can you, but then I use these like tools like mint and it consistently classifies things as the wrong thing, right? Can you explain to me why this stuff is? not normalized because it seems like incredibly valuable, potentially not that difficult.
Starting point is 00:36:57 Obviously, it is difficult. But like, why isn't it normalized? Why do you have to clean it all? Yeah. So I think, I guess, you know, maybe the easiest place to start is like, think about your, think about your last credit card statement, right? Like, think about a time where you've looked at your credit card statement and you saw a transaction on there and it says something like S bucks or like, I don't know, like MW.
Starting point is 00:37:22 San Carlos, which would be like men's warehouse San Carlos. It doesn't say men's warehouse. It doesn't say Starbucks, right? It says something, which if you like squint at it and you scratch your head a little bit, like you as a human can probably figure out what it is. Now, the problem is that like that, the problem is that the problem is that there are many different companies all, you know, putting in, you know, some piece, actually the fundamental problem here is that some human decided how to represent that that store in a credit card statement. And they're working within this constraint of a limited space, right? They only have a certain number of characters and they have to type something in,
Starting point is 00:38:04 which, again, communicates to a human that, like, yeah, you were at Walmart. Yeah. So you don't dispute the charge. But it was never designed for a machine to read. And so, like, the result of this is that there are, you end up with this, this cardinality problem, right? You end up with many different variants for a single, for a single merchant. And part of our job is to find all the variants and to map it back to that singular merchant.
Starting point is 00:38:39 But they're, so you're saying there are multiple text strings associated with men's warehouse in San Jose or San Jose or whatever. Yeah. So within our data set, we have, so we're looking at, like 50 plus billion transactions, we have one billion unique transaction descriptions. And I'll tell you what, there are not one billion merchants in the U.S. Right. Okay. So like Macy's alone has like three million different representations.
Starting point is 00:39:12 Yeah, I'm just like kind of baffled that it was never like, hey, Macy's, your store number 1,200, whatever. So there are two, there are basically two layers of problems. So one is that, you know, one is that, one is the human layer, right, where you've got somewhere you've got a human and they're setting up the point of sale, you know, system, like the swiping device for, you know, for a certain Macy store. Let's actually, let's just talk about McDonald's for a second. So McDonald's, you've got franchises. So when somebody sets up their franchise, you know, they work with like a point of sale provider and they get their point of sale set up. And like, okay, well, you know, what should this be? It should be like McDonald's, I don't know, like F.
Starting point is 00:39:51 139. Okay, great. Right. Now we've got this this one location. The problem is, depending on, depending on how the transaction is processed, the apostrophe that you expected to appear in McDonald's could be a space, it could be a star. It could be deleted, right? Could just be, you know, McDonald's nothing. Right? And like basically, the two problems, you know, one is a human one where different humans could describe things differently. They can even tell. typo the name of their own company, which happens. And then the second problem is there are like various perturbations that can take place in the processing chain. I think part of it was like the corrections had to happen by users of mint. Yeah. And I think humans don't want to correct that data. No.
Starting point is 00:40:46 Diligently. And also if it turns out, it's like, oh, I can see. a human getting really frustrated where it's like, this is the 50th time I had to correct that this is coming from McDonald's. And therefore, like, I no longer want to correct this anymore because, like, this is just not any good. And so the problem actually is like, oh, all of them are so different. And so humans are giving up on the classification when really it's like, this is actually more complicated. I have like such limited incentive to classify my end. I don't really care. I mean, I'm sure some people do, but I don't really care how much I spend
Starting point is 00:41:15 on food. The problem gets even worse, right? Sometimes I don't want to know. It's like I need to sit in that like fast food denial. Yeah. If Amazon was all classified in one category, that would not be good. So, I mean, like this, you know, if you're coming into this like with a, I don't know, like a software mindset, right, you're thinking like, oh, yeah, there should be some like unique identifier for, for blue apron, right? But if you actually just look at all the blue apron transactions, what you're going to find out is that, you know, there's actually more than one blue apron. Did you know that there's a blue apron grocery store? That's very close. It's in Brooklyn. Yeah, like things like that or like United, like United Airlines, of course, but then there's also a United grocery store.
Starting point is 00:41:58 And they show up, in some cases, they show up the exact same on your credit card statement. How much time are you guys spending cleaning up data? Is it like perpetual and nonstop? So we don't think of it as like a fundamentally human. There are human elements of it, but I mean, really it's something that we, you know, try to use machine-based approaches to get, to like operate as a giant lever. For, I guess we think of it this way, right? We've basically had to build two different products.
Starting point is 00:42:28 So one is this pipeline, which ingests raw transactional data, and then output something useful. And like, you know, the things that we do in that process are things like, like this entity resolution, which is what we've just been talking about with merchants. But it also includes, like, you know, figuring out for an Uber transaction, it says San Francisco. but always said San Francisco. But obviously not all Uber rides are like in this city.
Starting point is 00:42:53 Oh, looking at other transactions around at the sea of like, oh, maybe this originated somewhere else. Exactly. So we figure out the location of the purchaser based on where their other purchases are. And that lets us like fill in the gap. So we say like, oh, you know what? Ignore this location for Uber and instead, you know, use this computed location. There are other things that we need to solve. And then there's this whole other thing around debiasing, right?
Starting point is 00:43:25 Because we basically have this longitudinal study going on, right? We have this panel, the panel of consumers. And obviously, it's not going to be a perfectly representative sample of the U.S. So we endeavor to figure out all the ways in which it isn't representative and then apply corrections to make sure that, you know, whatever results you get do represent the greater population. So anyway, so that's one thing that we're building is this pipeline. And we've got 10 to 15 people working on that. But then we also have our analytics platform, right? This is the, think of it as the hyper-specialized tableau where, you know, we've built in lots of different analyses that operate on this nice, clean data set that the pipeline is outputted.
Starting point is 00:44:11 One increasingly growing set of customers for you guys are like corporations doing this for like sort of. I guess competitive analysis? Yeah. How did that come up? And so like why is that? I mean, I can see why it would be interesting to them. But I'm just wondering, are they looking at questions very differently when they're looking at your platform to answer them?
Starting point is 00:44:35 Yeah. I think this is like, this is a really interesting journey for us because, you know, we started out building a platform that was focused on helping investors, understand company performance, right? And YC, hammers,
Starting point is 00:44:47 you know, hammers in that you need to like focus, focus, right? That it's not like, it's better to have something a small number of people love than something that many people just like. And we took that, like, you know, we really took that to heart and we didn't want to work with companies for a long time because we were afraid that it would spread out our focus. One of the things that changed our thinking was this, so there's a book from Clayton Christensen. So he's a professor at at HBS and he wrote Innovators Dilemma. More recently, he published a book called Competing Against Luck.
Starting point is 00:45:22 And in it, he talks about the theory of jobs to be done. And like the basic premise is that when you're thinking about, you know, substitutes for your product, you shouldn't be thinking about things that just look similar to your product. Instead, you should be thinking about, you fundamentally, what is the job that your customer is hiring your product to do, right? And if, and this, this, I guess, this changed the way we thought about focus because, you know, like this whole time we've been, we've been thinking like, oh, investors, investors, investors. But in truth, there are many different use cases for investors, right?
Starting point is 00:46:00 A fundamental discretionary hedge fund, right? Like, think of it as a group of analysts who are, you know, working in Excel and trying to figure out, like, is, you know, is stitch fix a good, like, poised for growth in the longer term. Like, they have a very different use case from a quant investor who's focused, like, someone who has a purely systematic strategy and is trying to trade, you know, on a daily, weekly or even like, like, just quarter to quarter based on where they think companies are likely to beat or miss relative to expectation. Right?
Starting point is 00:46:40 These are different use cases. Now, if we think about one of our core use cases, is this being, uh, helping people understand company performance, then that's when we began to understand like, okay, well, investors want to know how companies are performing, but so do other companies, right? Companies want to know how they're, uh, how their, uh, competitors are doing. And, um, we had a really convenient way into this because we were working with so many VCs. They were actually bringing our product into the boardroom. You know, they were showing like they were showing their portfolio companies. And then the CEO would raise their hand and say, like,
Starting point is 00:47:18 wait, how do I get that? It's an interesting sales strategy. Yeah, I think, like, maybe you could speak to that a little bit more because there are so many YC companies. And oftentimes people just think, like, YC is just consumer. Very much not true. YC is just software, also not true. How do you guys think about your sales process? Yeah. I mean, this is, this is an area of focus for us now. We were very, very fortunate to have just a ton of, I mean, really, like, a ton of virality, which is like a funny thing to talk about in the context of really enterprise sales. But we actually haven't done any outbound sales yet. We have 150 clients. Every single one of them came to us through inbound, right? They basically, you know, somebody signed up and
Starting point is 00:48:11 then they told their friend about us. Their friend reached out. love what they saw, signed up, told their friends, and so on. I mean, it's a box of secrets. Yeah. And so to me, it's just like, hey, I have this thing and it lets me see stuff that it's like that I've never been able to see before. And so, like, that's a very remarkable thing that's easy to spread around. Yeah, exactly.
Starting point is 00:48:31 Like, yeah, everyone knows that, you know, Uber's bigger than Lyft, but like how much we can actually quantify it. And I think that's, it's a lot of, like, it's a lot of fun. And for certain people, right, it sort of, it unlocks like a new way of doing their job. and so it's it's it's it's become like table stakes and that's that's been great for us um but now like you know we just raised our series a um so that was led by Bessemer and co-led by by Goldman Sachs um and then we also had participation from city um the city bank correct golden Sachs and Citi that's such interesting partners or investors to be leading around what why were they super excited especially yeah yeah i think
Starting point is 00:49:14 So we fall into, so I'd say that the reasons are different for each. So we fall into this general category. When you're talking about the investment world, we fall into this category of companies, generally known as like alternative data companies. So alternative data basically refers to anything that can, any information that can help you understand how companies are performing, that isn't just the traditional reported fundamentals or like stock prices or things like that. So this collectively, it's referring to credit card data, satellite imagery, web traffic data, geolocation, like data for mobile devices and so on.
Starting point is 00:50:02 Goldman Sachs is making, has made a big push into the alternative data space. And, you know, they they had not made an investment in any company touching, dealing with credit card data. And so we're like, you know, we're their horse in that race, if you will. Awesome. And they've been just phenomenal. I think, I think like here in the Bay Area, there's like so much of, like, you know, everybody's focused on working with, you know, with like big traditional VCs. But I think, you know, we've actually had a tremendous success working with sort of like,
Starting point is 00:50:39 I don't know, less, less expected players, I guess, out here. So our seed round was actually led by Jeffries, another investment bank. And one thing that we found to be true for both Jeffries and Goldman is that they are extraordinarily well connected, you know, like in New York City, in the East Coast with not just investors, but also with companies, right, because they're investment banks. So they've been just tremendous in terms of helping us get in front of more, you know, more of the types of, you know, clients we want. Now, for city, of course, they have a ton of of transactional data. And like this is something that, you know, they, like this is a pain point that they feel internally.
Starting point is 00:51:27 Like all the things that I described about messy transactional data, they understand. It seems odd to me that they wouldn't have a handle on this. already themselves. So it's a really, really hard problem. Like, I can't understate that enough. Like, why are they so bad?
Starting point is 00:51:45 Why is everyone else so bad? It's not, I wouldn't say that it's, that everyone else is so bad. I think it's just that, you're so good. Or their other products are so profitable. Yeah.
Starting point is 00:51:56 I think it's that people are focused on solving specific problems. And so, like, I wouldn't say that, you know, like mint is, I wouldn't say that mint is terrible at, at identifying,
Starting point is 00:52:06 at like understanding transactions, right? They're just, they're, they're good at different things because they're focused on solving a different problem, right? Like, mint.com is not trying to, like, they're trying to solve the problem of, you know what, we need a best guess as to what this transaction is, but we need to do it for all the transactions, right? Like, we flip that problem upside down. We say, you know what?
Starting point is 00:52:26 We don't care about most transactions. We only care about the, you know, 5,000 or so companies that we track and growing, right? We care about that and we can't be wrong. Because if we're wrong, somebody's going to lose millions of dollars. So the constraints actually help make it much easier as a result of not having to focus on everything. Exactly. It makes the problem tractable. And because we're focused on that, like, what we're discovering is that there are surprisingly interesting applications of this thing that we built for this, like, hyper-specific use case.
Starting point is 00:53:00 You know, suddenly we're finding out that, like, oh, this could, you know, this could help, you know, this type of company. I don't know, find, find new customers, right? Like, it's a company that sells to other businesses and they want to find fast-growing businesses so they can sell to them. This is, I think this has been one of the interesting parts about our journey is discovering, like, really by accident, you know, all of these additional use cases
Starting point is 00:53:28 that we really didn't anticipate. One thing that's tricky, and it's probably one of these, like, great problems that have as a company, is that if you're like people's secret weapon and it becomes table sticks to be like, hey, if we want to stay ahead of the game and I have to, like, Bloomberg is a good example.
Starting point is 00:53:45 It's like, oh, I have to sign up for Bloomberg if I'm a trader to use this. And I think second measure might easily be come into that category as well for a lot of investors. I feel like the tricky part is then like if all of a sudden now everyone is using us, like how do you develop the product? Like how do you keep it interesting?
Starting point is 00:54:03 Yeah, so... Keep people on board. versus like jumping ship or trying to find some other solution. Yeah, I mean, this is a really, really good point in particular for the investment audience, right? Because investors are looking, like they make money off of information edge. They make money off of knowing things that other people don't. And this actually informed a lot about how we tackled this problem because we could have very easily focused on selling. quote insights or quote signal to hedge funds, right?
Starting point is 00:54:39 Where we say like, oh, here are the most interesting, I don't know, like trading signals and we send those out. But as we add more and more customers, then, you know, the value to each one becomes significantly diluted. And so, you know, we took the view that in particular because transactional data, there's no single owner of transactional data. there's no way to to like control how many people have access to it. Why not just assume everybody's going to have access to it one day and then focus on building,
Starting point is 00:55:14 building a tool to help people, you know, answer more creative questions, right? And our view is that even if everybody has access to the same data, that if they simply focus on asking better questions, they'll still find their own edge. Now, that's for the investment community, though. the corporate side, on the corporate side, I mean, really, the fact that you're, a fact that somebody else used to the product doesn't. I think that would be delightful. It's like every major corporate company is like, we have to use this for competitive analysis. I mean, like, if the worst case scenario was you were Bloomberg, you'd be okay. Yeah. I think Bloomberg's doing just
Starting point is 00:55:52 fine. Yeah. Right. All right. Awesome, Mike. Thanks for coming in. Oh, definitely. Thank you. All right. Thanks for listening. So as always, you can find the transcript and the video at blog. combinator.com. And if you have a second, it would be awesome to give us a rating and review wherever you find your podcast. See you next time.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.