Drill to Detail - Drill to Detail Ep.61 'Kaggle, DataRobot and How to Be a ... Zillionaire!' With Special Guest Jordan Meyer

Starting point is 00:00:00 Pounds. Dollar. Millionaire. Pound. Dollar. Pound. Dollar. Pound. Millionaire. Millionaire. Pound. Dollar. Millionaire. Pound. Dollar. Millionaire. So hello and welcome to the second episode in this new season of the Drill to Detail podcast

Starting point is 00:00:18 and I'm your host Mark Whitman. My guest on the show hit the headlines recently when his team won the Zillow Prize and a million dollars And he's actually an ex-colleague of mine And he's come on the show to tell you how he owes it all to his previous CTO at that company Welcome to the show, Jordan Meyer Hey, thanks for having me So Jordan, just explain who you are, what you're doing now in terms of work

Starting point is 00:00:40 And kind of how we knew each other from the past Sure, yeah. So I'm Jordan Meyer. I'm a data scientist. I'm currently a data scientist at DataRobot. It's an automated machine learning company. I came by way of Ritman Mead, where I was previously a consultant and then a head of R&D and then finally a CTO.

Starting point is 00:01:03 So yeah, looking forward to chatting with you. Excellent. Well, it's good to have you on the show, Jordan. So actually, we worked for quite a while together a few years ago. And I was CTO at the time, and you were our kind of data scientist. You did some interesting things there with us. You built quite a few interesting demos and did some things at conferences. Maybe kind of just tell people what sort of things you used to do back in your consulting career, maybe.

Starting point is 00:01:29 Sure, yeah. You know, I think probably one of the first ones I did at Ritma Mead was I was getting interested in social network analysis. So I was looking for ways to present at Oracle conferences, but not necessarily use Oracle technologies, do the things I was more interested in. And so I just kind of wedged it in there by doing a social network analysis of Oracle aces and Oracle ace directors. It's like you were an ace director or still are, and it's a badge of honor essentially. So I analyzed the Twitter network, all the people who were following and reverse following each other, and found that the different elements of centrality, so like the page rank. So like how Google originally figured out what was an important webpage was how central the other pages that linked to it were very, very similar analyses and essentially tried to make a predictive model of whether someone was

Starting point is 00:02:38 going to be an ace or an ace director. And it had really high accuracy because of course, you know, the more central you were in this Oracle network, the more likely you were to get that badge. Okay. Okay. I was going to go for two things around that time. There was one thing was you'd never used my slide deck that I gave you. And you'd always managed to use your own one at the time, which was funny. The other one was that you were always able to, I think, explain these concepts,

Starting point is 00:03:05 so concepts around machine learning and AI and so on, in a kind of way that made you feel stupid. You know, you're always kind of very good at doing this in a way that was good for the layman and so on. And you've gone on now to work for DataRobot. Is that right? That's right, yep. Okay, so maybe at the end of the talk,

Starting point is 00:03:24 we'll kind of go through some of the things you're doing there and just get an idea of, I suppose, how you do those sort of things now really work-wise. But the thing that really struck me was I was looking on Twitter a while ago, and I saw that you won what's called the Zillow Prize with the team. So tell us a bit, what is the Zillow Prize, first of all, and what's it kind of about and what's the kind of point to it, really? Sure. So Zillow's mostly in the US. It's a website that allows you to look at pretty much any house in the country and see if it's on sale.

Starting point is 00:04:01 You can see the MLS listing, the sale price, or the asking price, that kind of thing. And even if it's not on sale, it will actually estimate the value of a home. So when it first came out, pretty much everyone I knew got on the website and we were looking at the house they grew up in, their neighbor's houses, their own houses, just to see what's this estimate. And they called it the Zestimate, which is an estimation of what your house would sell for if it's sold today for any house in the country. So this prize was, it's been five or six years, I guess, since they released that, maybe longer.

Starting point is 00:04:39 And they were interested in seeing if the community, similar to how the Netflix prize looked to make recommendations better, they wanted to see if the data science community could contribute and make the Zestimate better. So when the Zestimate first started, I think it was off by, on average, maybe more than 10%. And it's gotten down to about 4.5%. And then through this competition, our winning team got it down to about four and a half percent. And then through this competition, our winning team got it down to just under 4% on average. So they were, you know, really looking to minimize the error of all of these home estimates they're making. Okay, so and it was organized through Kaggle. Is that correct? Tell us what Kaggle is and the background to that.

Starting point is 00:05:23 Sure. Yeah. So Kaggle's a machine learning competition website. So companies will go on with similar kinds of challenges like this. Maybe they're looking to predict customer churn, and they'll put their data sets, make it available to the Kaggle community, and the Kagglers will compete, trying to make a predictive model that will best predict which customers are going to churn. And they're ranked according to these metrics that are very specific. And often the distance between the first and the 10th place will be something like 0.0001 or something like that. So it's a hyper competitive place where people try to beat each other at

Starting point is 00:06:06 making the best machine learning models. Okay. Okay. And I think, I think since remember back in the past that you, we use Kaggle then, I mean, is that part of how you learned how to do the things you do now? Is that part of your kind of, I suppose, how you, you know, better yourself and learn new kinds of techniques and so on? A little bit, you know, so actually I joined eight years ago, and the first time I competed was the Zillow Prize. So round one actually was my first Kaggle competition where I submitted something, but I have watched it very closely.

Starting point is 00:06:39 So I would, at the beginning of any interesting competition, have a look at the data, look at the people competing, read the discussions and so forth. And then at the end, often the winners will reveal the tricks that they use to win. So I paid a lot of attention to that over the years and have tried to incorporate it in my work. But I actually wasn't super interested in competing because I felt like I lived in a bit of a middle ground. Like I see Kaggle as being great for people trying to learn and great for the best Kagglers, the best machine learning people in the world, because one of them's going to learn something and the other one's going to make money. And anyone who's in between, you know, probably isn't going to benefit that much from it. So I figured if I'm consulting and I'm making money

Starting point is 00:07:23 on this, why would I try my hand at, you know, the chance at maybe making money at it? Okay. Okay. So we'll get into the competition in a bit, but tell us a bit about your, your background before, you know, before we met and how you got into the world of kind of, you know, of this sort of thing. So stats and machine learning and so on, what was your kind of route into this sure so as an undergraduate i was i was interested in um neural networks and genetic algorithms and those kinds of things as a as a computer science major and when i got out of school i realized that business intelligence and analytics seemed to be the the growth market so i went to work at a couple of different universities the The first one was UNC

Starting point is 00:08:05 Chapel Hill. And it was a really great place to start my career because they actually allow you as a employee to take a couple of classes each semester for free. And you don't really have any restrictions. So I was able to take graduate courses in operations research, statistics, information science, those kinds of things and like uh all the while um you know making a paycheck and and uh getting real world experience with the actual analytics challenges of the university so it was a good place for me to to learn data engineering and i guess data science at the same time okay okay and didn't you work for a startup at some point in between? You actually worked

Starting point is 00:08:46 my old company for a couple of times, but you worked at a startup for a while, didn't you as well? I did. Yeah. So right after my first stint with Ritmah Mead, where I worked as a consultant, I spent a couple of years at Slacker Radio, where I worked as, yeah, I worked as a data scientist there, building predictive models for the the the user funnel so will this user convert from a free listener to a paid listener uh will if they are paid a paid listener will they will they stop those types of predictive models as well as things like you know music recommendations and um how how we uh categorize our music those types of problems

Starting point is 00:09:24 okay i remember at the time as well that when I first met you, there was a lot of talk about Hadoop, for example. And I think Hadoop and this kind of world of machine learning and so on were very kind of synonymous. But I think I remember speaking to you at the time, and although it's important, a lot of the work you did was on quite small data sets, and it's not all about doing it on massive ones.

Starting point is 00:09:43 That was a surprise to me and how much of the work was involved in just tidying the data as well. Yeah, you know, I've always seen Hadoop as kind of an engineering problem and not necessarily a data science one. You know, it's something that I hoped would be abstracted away.

Starting point is 00:09:57 And I think we've gotten lucky and it has been. We don't have to write directly MapReduce now. We can abstract that out. Things like Spark will take care of that for you. So you don't need to write directly MapReduce now. We can abstract that out. Things like Spark will take care of that for you. So you don't need to do that anymore. And what's your current, I mean, apart from DataRobot, what's your current toolkit of choice with these kind of competitions

Starting point is 00:10:15 and generally doing this kind of work? What would you tend to use these days? So professionally, I've used R consistently. So pretty much anytime I'm doing anything work-wise, I'll use R. And then in Kaggle, I actually tried to take the opportunity to learn Python by doing the Kaggle competitions in Python. So if I was doing maybe something with deep learning for a client, I probably would go into Python, but it's almost always R. Okay, okay. All right, let's get into the competition then.

Starting point is 00:10:45 So, you know, there were 4,000, is it 4,000 other teams or competitors that were taking part? And I think the original entry from your team was from your two colleagues or two team members, wasn't it? And you joined them afterwards. I mean, how did that work? And what were they doing first of all, really?

Starting point is 00:11:01 And what was their, I suppose, first attempt at this kind of problem? Yeah, so there were two rounds to to the zillow prize the the first round uh i had a different teammate um and uh the two people that i ended up teaming up with in the second round they were teamed up in the first round so we sort of started as two teams um and yeah in the first round there were almost 4 000 teams teams. And I think they said from like 91 different countries, lots of people competing. The first round was specifically to predict where the Zestimate is wrong. So not predicting the home price directly, but predicting the error that the Zestimate would have, which

Starting point is 00:11:45 is an interesting formulation of the problem. And you weren't allowed to bring in external data. And it was very much closer to what I would consider a standard Kaggle competition, where the differences in places were very, very, very small. There's only so much predictive power you can pull out of the smaller set of data. Um, and, uh, the second round though, you were allowed to bring in any data you wanted. Um, the, there were production, uh, requirements on, on the, the models that you delivered. So they couldn't run past a certain amount of time on commodity hardware. Uh, they had to be fully reproducible in a Docker container and that kind of thing.

Starting point is 00:12:24 And that's where knowing that that's what the second round would be is why I got interested in this particular Kaggle competition. Because I thought, you know, if I could just survive round one, get past the standard Kaggle, then I could use my, you know, more professional experience with the pulling in disparate data sources and trying to bring them all together to build a model. I thought that would give me a leg up in round two. Okay. Okay. So just as, I mean, for the layman really, or someone who is new to this, I mean, why is it tricky to try and predict house prices? I mean, why is it not just a case of looking at the other sales in that area and just kind of adding something to it? Or why is it a hard problem to solve, really?

Starting point is 00:13:08 Yeah, it's a great question. You know, there's a few answers, I guess. The first answer is there's always going to be inherent uncertainty in a house price. So, for example, if a house reminds a buyer of their childhood home, let's say, they might pay more for that house or be willing to pay more for that house. That's one area where you'll just never be able to get the exact price because there's so many sort of human elements involved, but you can get very close if that was the only problem. The other problem is that for Zillow, they're trying to predict every house in the country, not just the

Starting point is 00:13:45 ones on sale. So there are plenty of houses in more rural areas where there's not a lot of data about them. You just know the lot size is roughly an acre and there's a house on there that was built 50 years ago. And there's a lot of uncertainty in that. It wouldn't necessarily compare to the houses around it if it's either falling apart or recently renovated. So there ends up being a lot that you have to kind of impute to understand what's going to make that house easier to predict. Okay. So I understand that your speciality in this was deep neural networks. And that's the kind of thing that you brought to this. And there was external data and so on.

Starting point is 00:14:28 Again, for the layman, just explain what deep neural networks are. And I suppose in a way, why also is it an area of interest to you? Sure. Yeah. So deep neural networks are getting a lot of press lately for advancements they're making in a lot of cool areas, but not necessarily getting press for how they're helpful in business settings. So you'll hear about deep neural networks being able to outpredict doctors at spotting lung cancer, for example. Recently, I think a deep neural network beat the professional players at StarCraft. And of course, it beat Go a couple of years ago.

Starting point is 00:15:09 So we hear a lot about that kind of stuff. But what interests me about them is how can we take all of this cutting edge, really interesting research by these superstars at Google and Facebook and apply that to the types of problems that people like you and I solve? Customer churn, forecasting sales, that kind of thing. And it turns out that you can reformulate a lot of normal problems into problems that these types of things can solve. So deep neural networks currently hold the record at the best translation. If you look at Google Translate now compared to, let's say, five years ago, it's just remarkably better. And that's because it's powered by these deep neural networks. But behind the scenes, that sort of sequence processing as it reads the sentence in

Starting point is 00:15:57 order to spit out another sentence is a technique you can use on forecasting grocery store sales, for example. So I did a Kaggle competition called Corporation Favorita that was predicting item level sales. And I used techniques from Google that were for actually text to speech. So there's something called a WaveNet where it will read the historical speech, and then you can give it words, and it will speak in that voice. If you wanted your Google Home to sound like C-3PO or something, you can feed it enough information, and it learns the phonetics. That particular architecture actually worked incredibly well at predicting grocery store sales. So in that competition, with just that one model, I got 10th place. And the person who got fourth place used one. And first place, I think, also used that same architecture. It was one that the person in fourth place kind of popularized,

Starting point is 00:17:03 and we all thought, oh, that's a great idea let's try it okay so so try this again explain in layman's terms what is a deep neural network i mean what what is it compared to say what people's i suppose main experience with with machine learning and how is this different and and why is it um i suppose particularly kind of useful and and valuable and so on but thes really, first of all, what is it? Yeah, well, it's very much similar to a linear regression, where if you're familiar with machine learning, you know, it's taking lots of predictors. If we're talking about, you know, home sales,

Starting point is 00:17:41 it would be each one of the things you might know about a home would be a variable that this would take. And then it's going to assign to each variable some weight, some sort of importance, and then add all that up and then predict sale price. So that would be like a single layer of a neural network is actually essentially just a regression. But as you start to stack these, they can learn abstractions. So instead of just having one output that could be a price, you could have one layer that's your initial internal features, and then at the top, some price that you want it to predict, and in between lots and lots of layers, each layer just being a bunch of calculations where what generally happens with these is they learn more and more abstract representations as you move up. I think a good example is

Starting point is 00:18:37 for image recognition. If you look inside these neural networks, the original layer will see the edges. So the neurons that are in this neural network will activate when they see a particular edge, maybe an upright edge or a vertical, maybe a diagonal edge or something like that. And then the next layer will actually activate when there's a few of those. So it'll start to see squares and triangles and small shapes. And as you move up the network, it gets more and more abstract until at the top of the network, a single neuron in this network may only light up when it sees a cat's face because it's taking as an input this giant tree of activations that are all getting excited when it sees a whisker, let's say. Okay. So is this something that's

Starting point is 00:19:33 been possible to do for a while or has it been like recent, I suppose, advances in computing and so on that have meant this is now possible? You know, I mean, I think the original neural networks predate some of the things like boosted trees. Like I think the perceptron was like maybe the fifties. But the, yeah, the neural networks have been around. I did neural networks in college. So in 2001, I tried to make a neural network play go a very small go board where you already knew the answer. So if there's like a go board where it's an obvious next move, was trying to get a neural network to do that, and then take a step back and look at a larger board and see if it could translate that knowledge of perfect moves. So this was, you know,

Starting point is 00:20:16 almost 20 years ago now. But back then, they just weren't as good. We didn't have GPUs. We didn't have some of these modern enhancements that we've had to them. So probably just in the past eight years or so, they've really started to take over again and be the go-to for cutting edge accuracy. Okay. So what kind of kit or what kind of services did you use then to do this particular thing? Did you use some cloud services or something? How did you get access to this kind of compute power so i just have two uh nvidia titan x gpus they're uh yeah just really good gaming uh gpus that people are now uh repurposing as deep neural

Starting point is 00:20:57 networks and bitcoin mining machines okay excellent and so so um so how did this how did this how did this deep neural network work uh Sorry, sort of stuff. How did that link with the rest of your team? How did what you're doing link to their solution and complement that? network and one gradient boosted trees model, which is just a way of doing like an ensemble of decision trees. So I was doing two models at the time, one neural network, one gradient boosted trees model, and the two of them were actually doing the same. So one of them was focusing entirely on a neural network and the other was focusing entirely on a boosted trees model. And we didn't know that as we were starting to talk about teaming up, we just both were in the top five of the leaderboard. So we were considering teaming up together and we can't really share one because if you don't end up teaming up, then you don't want to give away too much to a competitor.

Starting point is 00:22:06 But also, you know, it's just against the rules to go into too much detail if you're not on a team. So when we teamed up, we found out, oh, actually, we're taking very, very similar approaches. Like I was putting a lot of work into my neural network, as was Chahu. And then I was putting a lot of work into my gradient boosted trees model, as was Nima. And so one of the things we had to do really early on was figure out how do we want to diverge these models so that we don't have just like essentially four models where there's really only two because two of them agree too much. So we were looking for ways to make them different.

Starting point is 00:22:45 So how, I mean, again, so how would you, how would you make a better model than somebody else? And what would typically you be doing? What would you and somebody else be doing differently that might mean that yours is more successful or more accurate or whatever? What are the kind of inputs into how you work and how you build these things? Sure. Yeah. I think there's probably three dimensions along which you can make big improvements to your models, right? So you've got like feature engineering. So if you just come up with better representations of the original features, maybe it's not the square footage of the house that matters. Maybe it's the price of this per square foot, let's say. So just coming up with the clever ratios and things that make the models

Starting point is 00:23:26 better. People spend a lot of time on hyperparameter tuning. So just the small tweaks that you can do with these models. If you're talking about a tree model, maybe it's the number of decisions each tree can actually make, those kinds of things. So that's one where people spend a lot of time and I think maybe too much time in general on Kaggle versus things like feature engineering. And then the final area is model blending. So making multiple models and then figuring out how to best blend them together within an ensemble. And that can often mean like picking different modeling approaches so that I think one of the

Starting point is 00:24:12 kind of secrets of Kaggle, the open secrets is that it's important to have very diverse approaches and then blend them together because you get this like mixture of experts effect where one model can do really well at one part of your data set. Another model can do really well at another part of your data set. And if you combine them well, you get this sort of supermodel where they're voting and typically doing better than any one of those models would do. So do you think, I suppose just working in this area, is it more of an art or a science, do you think? I mean, it sounds to me there's a bit of kind of, a bit like blending wines, really. You know, there's a certain element there of what's you? I mean, is it an art or a science, do you think?

Starting point is 00:24:53 Yeah, I think there's a surprising amount of art. I'd like to say it's a pure science, but I do think there's quite a bit of experimentation and, I guess, art to it. Yeah, interesting, interesting. So I think also you mentioned earlier on about the second round, you could use external data. I mean, is that something that you brought into this then really? I mean, did you use external data and if so, what? Yeah, I did.

Starting point is 00:25:16 And I actually was a little disappointed that it didn't make a bigger difference. So some of the stuff that I brought in that we kept were things like census data. So a lot of interesting stuff came from the census. So things like knowing if a census tract has a higher percentage of owner-occupied houses. So rental markets can be very different in their pricing dynamics than places where people own. And that was easy to get from census, but hard to get from other data. So that would be a good census variable. And I found to be worth the extra effort. So we did have time limits on how long we could process. And that included the downloading and processing of the external data itself. So I actually spent probably a week on pulling building permits, open building permits from New York City to figure this, has this house been recently renovated?

Starting point is 00:26:26 If someone purchased it for, let's say $800,000 and it's in Brooklyn, and then they put, they pumped in $400,000 for a building permit, then I could assume that it's going to be the original price plus maybe twice what they what they put in, in that building permit. And that did improve my models for New Yorkork but it was so much data processing that overall it just wasn't worth it we didn't feel it would be um a big enough addition uh to to justify maybe not training our neural networks for as long okay so so obviously you know you're going to win the prize i mean what was the how did how did that kind of pan out i mean was there was there a kind of a deadline and and there was a kind of race to the end i mean what

Starting point is 00:27:08 was the kind of the what was the end of it like or was it a better case if you handed it in and then you got the marks back later on was there intention at the end oh yeah so so there was a the deadline was i think july 31st of 2018 and we had to make predictions for September and October home sales. So they essentially asked us for predictions for the actual future, not for some just held out set that maybe we could have, that people could find out and cheat or something. You actually had to predict the future. And the race to the finish there was very difficult. I think the final month was, was tough on us because we all wanted to continue to experiment, but we had to, uh, really pull everything together and, and, uh, make sure, test it

Starting point is 00:27:56 from every possible angle to make sure we didn't make any mistakes. Um, even though we were still, you know, having ideas and wanting to, to kind of just right up to the last moment, find some some breakthrough. Okay. And so so you won the competition, how close were you to that? To number two? I mean, what was the what, what do you put the difference down to really in terms of why you won? Yeah, so the closeness is always funny and Kaggle, it's very relative. So I mean, on one hand, we won by a third decimal place, which, you know, isn't a lot. However, I guess the distance between first and second was over 50 times the distance between second and third.

Starting point is 00:28:34 So, you know, it's all relative, I guess. You know, I think it was a convincing win from a Kaggle perspective. But I mean, try telling a client that you want to spend 10 times as long building their model for a third decimal place improvement like it's not you know it doesn't necessarily translate uh to real world stuff okay okay so and you won the prize I mean what what tell us about the prize and and that how did you how did that sort of uh you you obviously kind of given that by by by kind of Zillow and did you go for a presentation and what happened there? Sure. Yeah. So they did some interviews with us where they came to our houses and just kind of did some like human interest interviews. just a week or so before the actual announcement,

Starting point is 00:29:27 they were doing a second round of interviews, which they told Nima and I that was going to be about our team dynamic, that they wanted to just have us talk about our team. So they came in just like they had done before, set up cameras in our houses and stuff. And then right as the, just after the interview started, there was a knock at my door and I figured a neighbor was trying to figure out why I had a camera crew in my house. But then Stan Humphreys burst in with a million dollar check. So that was pretty fun.

Starting point is 00:29:54 Wow. Yeah. And he's their chief analytics officers who I recognized, of course, because I've been following him on Twitter and stuff, hoping for clues about who won or something. So that was the sort of announcement. And then we weren't allowed to tell anyone between the time that they did that and then the actual full announcement, which was a tough few days because I was still going to work. Wow.

Starting point is 00:30:17 So trying to talk to clients and stuff like right after staring at a giant million-dollar check, that was, that was a fun time, I guess. So you try and cash the check afterwards, then you got down to the bank with this massive kind of 10 foot check. And I did think that'd be pretty funny, but I haven't done it yet. So that would be, would be fun, though. Excellent. So I mean, I guess what's it I mean, this is interesting exercise, and it's great that you won this. And as I say, you know, you owe it all to me, really, for the inspiration that I gave you kind of years ago. Fair enough.

Starting point is 00:30:48 So how does this relate to what you do in a day job? I mean, maybe tell people what it is you do for a day job now. And what I'm interested in as well is elements of what you've been doing that you could apply to more day-to-day industry type questions, really. Sure. Yeah. So, I mean, lately, actually, just after submitting the Zillow prize submission that we did, about two weeks later, I started working at DataRobot, and it's an automated machine learning company. So I work there as a customer facing data scientist, and Gartner called the role data science concierge, which I like a lot.

Starting point is 00:31:28 I think that really kind of explains it well. Most of the time, because the product is an automated competition where I'm doing some feature engineering and trying to match that with the particular predictive model type and then hyperparameter tuning and all that kind of stuff is all fully automated by the tool. So I, as a data scientist there, get to kind of step back and talk with our customers more about how should they structure their data to make it work for a supervised learning problem like that? What is the right target that they should choose? Like what should they be trying to predict? And then how can they operationalize those results and turn it into either cost savings or profit? Okay, okay. And how does, even more mundane really, how would somebody use the output of what you're doing in a kind of analytics tool, like say Looker, for example? I mean,

Starting point is 00:32:24 not to be precise on that one, but how do these things then get productionized or pushed out as insights to people, you know, in a more kind of easy to understand way? Sure. So, you know, some of our customers are banks. So pretty much anytime you swipe your credit card, there's a predictive model behind the scenes, is this fraud or not? You know, is this person both here and halfway across the world trying to make a credit card transaction? That's the kind of things that those would pick up on. And, you know, we build models like that. So that credit card machine would hit a data robot API, let's say.

Starting point is 00:33:02 The API would say, this does not look like fraud. And so then the credit card goes through. That's a pretty simple example of the kinds of things that we do. I think with Looker, you might have, let's say, an insurance client might be examining a bunch of potential leads of who they want to bring on as a customer, and how expensive should they price the insurance for any particular customer. So they might have a predictive model behind the scenes

Starting point is 00:33:33 that's telling them, you know, how likely is this person to file a claim? Maybe another one that's, how much would that claim cost if they filed it? And then they would maybe look at Looker and just do some visual analytics there and pick the customers who would support the appropriate price with the least amount of risk for them. Okay. Okay. And I understand Jen Underwood, who's also been on the show in the past,

Starting point is 00:33:59 has joined DataRobot recently. What was she doing there? And what's the kind of the area that maybe she's going to try and build out and work with with you and other customers and so on yeah yeah it's really exciting to work with her because uh i think we have similar backgrounds and similar interests so she's uh she's um leading the charge at data robot to um help us build a uh like a community of capable and motivated business intelligence professionals. It's kind of a mouthful there, but, you know, it's essentially people who are already, you know, 80% of the way to data science, right? Like if 80% of data science is cleaning data, all BI professionals are already there, right? So it's that final 20% that data robot automates. So we're looking to

Starting point is 00:34:45 make it easy for people to make that jump and start building predictive models right away using the skills they already have. So we're building integrations with things like Tableau and Qlik, Power BI, Alteryx, if you're on the data engineering side. And she's she's uh very involved in that and trying to make sure that we're uh you know addressing the the right problems and making it very easy for for people to make that transition okay okay just to wrap up then really i mean what what are you i mean are there any other competitions going for are there any other things that you're playing around with or looking at in this area that that you know is uh that would be interesting really well you know you know i started started with DataRobot

Starting point is 00:35:25 right after the Zillow Prize, so I've only recently felt like I've had spare time. So, you know, no competitions yet. I was thinking I would retire from Kaggle when I found out we won. I'm certainly not going to top this, but if I get two more gold medals i will i'll be a kaggle grandmaster which is their top wow wow and so i'm thinking you know if i if i see a couple

Starting point is 00:35:51 more of the that look interesting i'll probably jump back in i think but right now definitely mostly focused on uh on just playing music and and uh making mixed drinks again that kind of thing well exactly i'm about to say hopefully at some point you'll resurrect your cocktail maker from Christmas that year. Just tell people about what that was at the time. Sure, yeah. I made a bar optimizer. You can tell it the bottles of liquor you have in your liquor cabinet,

Starting point is 00:36:21 and it'll tell you what's the next one to get to make the most new drinks and yeah i absolutely resurrected that recently and restocked my bar put about seven new bottles in and it's just like an endless list of of cocktails that i'm i'm suffering my way through it's terrible excellent that's good well it's been jordan it's been great speaking to you and and congratulations for winning that prize i mean obviously i can't think of any more deserving really forgetting that and uh it's been great having you on to talk about you know how you won that and a bit of a kind of layman's intro really to to you know how you won it and an update on data robot you know so thank you very much for that it's been great to have you with us thanks great great to talk to you excellent cheers Cheers.

Drill to Detail - Drill to Detail Ep.61 'Kaggle, DataRobot and How to Be a ... Zillionaire!' With Special Guest Jordan Meyer

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.