Drill to Detail - Drill to Detail Ep.61 'Kaggle, DataRobot and How to Be a ... Zillionaire!' With Special Guest Jordan Meyer
Episode Date: March 20, 2019Mark Rittman is joined by DataRobot’s Jordan Meyer to discuss Kaggle, machine learning, deep neural networks and his team’s strategy to win the $1m ZIllow Prize, beating over 1000 other teams to c...ome up with the most accurate home value prediction.Zillow Prize: Zillow's Home Value Prediction (Zestimate) | KaggleAnd the Zillow Prize Goes to…Team ChaNJestimate!Meet the ‘Zillow Prize’ winners who get $1M and bragging rights for beating the Zestimatehttps://github.com/jordanmeyerData Scientist Spotlight: Jordan MeyerDeep Learning definition on WikipediaABC - How To Be A Zillionaire (Wall Street Mix)
Transcript
Discussion (0)
Pounds. Dollar. Millionaire. Pound. Dollar.
Pound. Dollar. Pound.
Millionaire.
Millionaire.
Pound. Dollar. Millionaire.
Pound. Dollar. Millionaire.
So hello and welcome to the second episode
in this new season of the Drill to Detail podcast
and I'm your host Mark Whitman.
My guest on the show hit the headlines recently
when his team won the Zillow Prize and a million dollars
And he's actually an ex-colleague of mine
And he's come on the show to tell you how he owes it all to his previous CTO at that company
Welcome to the show, Jordan Meyer
Hey, thanks for having me
So Jordan, just explain who you are, what you're doing now in terms of work
And kind of how we knew each other from the past
Sure, yeah.
So I'm Jordan Meyer.
I'm a data scientist.
I'm currently a data scientist at DataRobot.
It's an automated machine learning company.
I came by way of Ritman Mead, where I was previously a consultant and then a head of
R&D and then finally a CTO.
So yeah, looking forward to chatting with you.
Excellent.
Well, it's good to have you on the show, Jordan.
So actually, we worked for quite a while together a few years ago.
And I was CTO at the time, and you were our kind of data scientist.
You did some interesting things there with us.
You built quite a few interesting demos and did some things at conferences.
Maybe kind of just tell people what sort of things you used to do back in your consulting career, maybe.
Sure, yeah.
You know, I think probably one of the first ones I did at Ritma Mead was I was getting interested in social network analysis.
So I was looking for ways to present at Oracle conferences, but not necessarily use Oracle technologies, do the things I was more
interested in. And so I just kind of wedged it in there by doing a social network analysis of
Oracle aces and Oracle ace directors. It's like you were an ace director or still are,
and it's a badge of honor essentially. So I analyzed the Twitter network, all the people who were following and reverse following each other, and found that the different elements of centrality, so like the page rank. So like how Google originally figured
out what was an important webpage was how central the other pages that linked to it were very,
very similar analyses and essentially tried to make a predictive model of whether someone was
going to be an ace or an ace director. And it had really high accuracy because of course,
you know, the more central you were in this Oracle network, the more likely you were to get that badge.
Okay.
Okay.
I was going to go for two things around that time.
There was one thing was you'd never used my slide deck that I gave you.
And you'd always managed to use your own one at the time, which was funny.
The other one was that you were always able to, I think, explain these concepts,
so concepts around machine learning and AI and so on,
in a kind of way that made you feel stupid.
You know, you're always kind of very good at doing this
in a way that was good for the layman and so on.
And you've gone on now to work for DataRobot.
Is that right?
That's right, yep.
Okay, so maybe at the end of the talk,
we'll kind of go through some of the things you're doing there
and just get an idea of, I suppose, how you do those sort of things now really work-wise.
But the thing that really struck me was I was looking on Twitter a while ago,
and I saw that you won what's called the Zillow Prize with the team.
So tell us a bit, what is the Zillow Prize, first of all,
and what's it kind of about and what's the kind of point to it, really?
Sure. So Zillow's mostly in the US.
It's a website that allows you to look at pretty much any house in the country and see if it's on sale.
You can see the MLS listing, the sale price, or the asking price, that kind of
thing. And even if it's not on sale, it will actually estimate the value of a home. So when
it first came out, pretty much everyone I knew got on the website and we were looking at the
house they grew up in, their neighbor's houses, their own houses, just to see what's this estimate.
And they called it the Zestimate, which is an estimation of what your house would sell for
if it's sold today for any house in the country.
So this prize was, it's been five or six years, I guess,
since they released that, maybe longer.
And they were interested in seeing if the community, similar to how the Netflix prize looked to make recommendations better, they wanted to see if the data science community could contribute and make the Zestimate better.
So when the Zestimate first started, I think it was off by, on average, maybe more than 10%.
And it's gotten down to about 4.5%.
And then through this competition, our winning team got it down to about four and a half percent. And then through this competition,
our winning team got it down to just under 4% on average. So they were, you know, really looking to
minimize the error of all of these home estimates they're making.
Okay, so and it was organized through Kaggle. Is that correct? Tell us what Kaggle is and
the background to that.
Sure. Yeah. So Kaggle's a machine learning competition website.
So companies will go on with similar kinds of challenges like this.
Maybe they're looking to predict customer churn, and they'll put their data sets, make
it available to the Kaggle community, and the Kagglers will compete, trying to make
a predictive model that will best predict which customers are
going to churn. And they're ranked according to these metrics that are very specific. And
often the distance between the first and the 10th place will be something like 0.0001 or something
like that. So it's a hyper competitive place where people try to beat each other at
making the best machine learning models. Okay. Okay. And I think, I think since
remember back in the past that you, we use Kaggle then, I mean, is that part of how you learned
how to do the things you do now? Is that part of your kind of, I suppose, how you, you know,
better yourself and learn new kinds of techniques and so on?
A little bit, you know, so actually I joined eight years ago,
and the first time I competed was the Zillow Prize.
So round one actually was my first Kaggle competition
where I submitted something, but I have watched it very closely.
So I would, at the beginning of any interesting competition,
have a look at the data, look at the people competing, read the discussions and so forth.
And then at the end, often the winners will reveal the tricks that they use to win.
So I paid a lot of attention to that over the years and have tried to incorporate it in my work.
But I actually wasn't super interested in competing because I felt like I lived in a bit of a middle ground. Like I see Kaggle as being great for people trying to learn and great for the best Kagglers,
the best machine learning people in the world, because one of them's going to learn something
and the other one's going to make money. And anyone who's in between, you know, probably
isn't going to benefit that much from it. So I figured if I'm consulting and I'm making money
on this, why would I try my hand at,
you know, the chance at maybe making money at it? Okay. Okay. So we'll get into the competition in
a bit, but tell us a bit about your, your background before, you know, before we met
and how you got into the world of kind of, you know, of this sort of thing. So stats and machine
learning and so on, what was your kind of route into this sure so as an undergraduate i was i was interested in um neural networks and genetic algorithms and
those kinds of things as a as a computer science major and when i got out of school i realized that
business intelligence and analytics seemed to be the the growth market so i went to work at a
couple of different universities the The first one was UNC
Chapel Hill. And it was a really great place to start my career because they actually allow you
as a employee to take a couple of classes each semester for free. And you don't really have any
restrictions. So I was able to take graduate courses in operations research, statistics,
information science, those kinds of things and like uh
all the while um you know making a paycheck and and uh getting real world experience with
the actual analytics challenges of the university so it was a good place for me to to learn data
engineering and i guess data science at the same time okay okay and didn't you work for a startup
at some point in between? You actually worked
my old company for a couple of times, but you worked at a startup for a while, didn't you as
well?
I did. Yeah. So right after my first stint with Ritmah Mead, where I worked as a consultant,
I spent a couple of years at Slacker Radio, where I worked as, yeah, I worked as a data
scientist there, building predictive models for the the the
user funnel so will this user convert from a free listener to a paid listener uh will if they are
paid a paid listener will they will they stop those types of predictive models as well as things like
you know music recommendations and um how how we uh categorize our music those types of problems
okay i remember at the time as well that when I first met you,
there was a lot of talk about Hadoop, for example.
And I think Hadoop and this kind of world of machine learning
and so on were very kind of synonymous.
But I think I remember speaking to you at the time,
and although it's important, a lot of the work you did
was on quite small data sets,
and it's not all about doing it on massive ones.
That was a surprise to me
and how much of the work was involved
in just tidying the data as well.
Yeah, you know, I've always seen Hadoop
as kind of an engineering problem
and not necessarily a data science one.
You know, it's something that I hoped
would be abstracted away.
And I think we've gotten lucky and it has been.
We don't have to write directly MapReduce now.
We can abstract that out. Things like Spark will take care of that for you. So you don't need to write directly MapReduce now. We can abstract that out.
Things like Spark will take care of that for you.
So you don't need to do that anymore.
And what's your current, I mean, apart from DataRobot,
what's your current toolkit of choice
with these kind of competitions
and generally doing this kind of work?
What would you tend to use these days?
So professionally, I've used R consistently.
So pretty much anytime I'm doing anything work-wise, I'll use R.
And then in Kaggle, I actually tried to take the opportunity to learn Python by doing the Kaggle competitions in Python.
So if I was doing maybe something with deep learning for a client, I probably would go into Python, but it's almost always R.
Okay, okay.
All right, let's get into the competition then.
So, you know, there were 4,000,
is it 4,000 other teams or competitors
that were taking part?
And I think the original entry from your team
was from your two colleagues or two team members, wasn't it?
And you joined them afterwards.
I mean, how did that work?
And what were they doing first of all, really?
And what was their, I suppose,
first attempt at this kind of problem?
Yeah, so there were two rounds to to the zillow prize the the first round uh i had a different teammate um and uh the two people that i ended up teaming up with in the second round
they were teamed up in the first round so we sort of started as two teams um and yeah in the first
round there were almost 4 000 teams teams. And I think they said
from like 91 different countries, lots of people competing. The first round was specifically to
predict where the Zestimate is wrong. So not predicting the home price directly, but predicting
the error that the Zestimate would have, which
is an interesting formulation of the problem.
And you weren't allowed to bring in external data.
And it was very much closer to what I would consider a standard Kaggle competition, where
the differences in places were very, very, very small.
There's only so much predictive power you can pull out of the smaller set of data.
Um, and, uh, the second round though, you were allowed to bring in any data you wanted. Um, the, there were production, uh, requirements on, on the, the models that you delivered.
So they couldn't run past a certain amount of time on commodity hardware.
Uh, they had to be fully reproducible in a Docker container and that kind of thing.
And that's where knowing that that's what the second round would be is why I
got interested in this particular Kaggle competition. Because I thought, you know,
if I could just survive round one, get past the standard Kaggle, then I could use my, you know,
more professional experience with the pulling in disparate data sources and trying to bring them
all together to build a model. I thought that would give me a leg up in round two.
Okay. Okay. So just as, I mean, for the layman really, or someone who is new to this, I mean,
why is it tricky to try and predict house prices? I mean, why is it not just a case of looking at
the other sales in that area and just kind of adding something to it? Or why is it a hard problem to solve, really?
Yeah, it's a great question.
You know, there's a few answers, I guess.
The first answer is there's always going to be inherent uncertainty in a house price.
So, for example, if a house reminds a buyer of their childhood home, let's say, they might
pay more for that house or be willing to pay more for
that house. That's one area where you'll just never be able to get the exact price because
there's so many sort of human elements involved, but you can get very close if that was the only
problem. The other problem is that for Zillow, they're trying to predict every house in the country, not just the
ones on sale. So there are plenty of houses in more rural areas where there's not a lot of data
about them. You just know the lot size is roughly an acre and there's a house on there that was
built 50 years ago. And there's a lot of uncertainty in that. It wouldn't necessarily compare to the houses around it if it's either falling apart or recently renovated.
So there ends up being a lot that you have to kind of impute to understand what's going to make that house easier to predict.
Okay.
So I understand that your speciality in this was deep neural networks.
And that's the kind of thing that you brought to this.
And there was external data and so on.
Again, for the layman, just explain what deep neural networks are.
And I suppose in a way, why also is it an area of interest to you?
Sure.
Yeah.
So deep neural networks are getting a lot of press lately for advancements they're making in a lot of cool areas, but not necessarily getting press for how they're helpful in business settings.
So you'll hear about deep neural networks being able to outpredict doctors at spotting lung cancer, for example.
Recently, I think a deep neural network beat the professional players at StarCraft.
And of course, it beat Go a couple of years ago.
So we hear a lot about that kind of stuff.
But what interests me about them is how can we take all of this cutting edge, really interesting research by these superstars at Google and Facebook and apply that to the types of problems that people like you and I solve?
Customer churn,
forecasting sales, that kind of thing. And it turns out that you can reformulate a lot of normal problems into problems that these types of things can solve. So deep neural networks
currently hold the record at the best translation. If you look at Google Translate now compared to,
let's say, five years
ago, it's just remarkably better. And that's because it's powered by these deep neural
networks. But behind the scenes, that sort of sequence processing as it reads the sentence in
order to spit out another sentence is a technique you can use on forecasting grocery store sales, for example. So I did a Kaggle competition called Corporation Favorita that was predicting item level sales.
And I used techniques from Google that were for actually text to speech.
So there's something called a WaveNet where it will read the historical speech, and then you can give it words, and it will speak in that voice.
If you wanted your Google Home to sound like C-3PO or something, you can feed it enough information, and it learns the phonetics.
That particular architecture actually worked incredibly well
at predicting grocery store sales. So in that competition, with just that one model, I got
10th place. And the person who got fourth place used one. And first place, I think, also used
that same architecture. It was one that the person in fourth place kind of popularized,
and we all thought, oh, that's a great idea let's try it okay so so try this again explain in layman's terms what is a deep neural
network i mean what what is it compared to say what people's i suppose main experience with with
machine learning and how is this different and and why is it um i suppose particularly kind of
useful and and valuable and so on but thes really, first of all, what is it?
Yeah, well, it's very much similar to a linear regression,
where if you're familiar with machine learning,
you know, it's taking lots of predictors.
If we're talking about, you know, home sales,
it would be each one of the things you might know about a home would be a
variable that this would take. And then it's going to assign to each variable some weight, some sort
of importance, and then add all that up and then predict sale price. So that would be like a single
layer of a neural network is actually essentially just a regression. But as you start to stack these,
they can learn abstractions. So instead of just having one output that could be a price,
you could have one layer that's your initial internal features, and then at the top, some
price that you want it to predict, and in between lots and lots of layers, each layer just being a bunch of calculations where what generally happens with these is they
learn more and more abstract representations as you move up. I think a good example is
for image recognition. If you look inside these neural networks, the original layer will see the edges.
So the neurons that are in this neural network will activate when they see a particular edge,
maybe an upright edge or a vertical, maybe a diagonal edge or something like that.
And then the next layer will actually activate when
there's a few of those. So it'll start to see squares and triangles and small shapes. And as
you move up the network, it gets more and more abstract until at the top of the network,
a single neuron in this network may only light up when it sees a cat's face because it's taking as an input this giant tree of activations that
are all getting excited when it sees a whisker, let's say. Okay. So is this something that's
been possible to do for a while or has it been like recent, I suppose, advances in computing
and so on that have meant this is now possible? You know, I mean, I think the original neural
networks predate some of the things like boosted trees. Like I think the perceptron was like maybe the fifties.
But the, yeah, the neural networks have been around. I did neural networks in college. So
in 2001, I tried to make a neural network play go a very small go board where you already knew
the answer. So if there's like a go board where it's an obvious
next move, was trying to get a neural network to do that, and then take a step back and look at a
larger board and see if it could translate that knowledge of perfect moves. So this was, you know,
almost 20 years ago now. But back then, they just weren't as good. We didn't have GPUs. We didn't
have some of these modern enhancements that we've had to them.
So probably just in the past eight years or so, they've really started to take over again and be the go-to for cutting edge accuracy.
Okay.
So what kind of kit or what kind of services did you use then to do this particular thing?
Did you use some cloud services or something?
How did you get access to this kind of compute power so i just have two uh nvidia titan x gpus they're
uh yeah just really good gaming uh gpus that people are now uh repurposing as deep neural
networks and bitcoin mining machines okay excellent and so so um so how did this how did
this how did this deep neural network work uh Sorry, sort of stuff. How did that link with the rest of your team? How did what you're doing link to their solution and complement that? network and one gradient boosted trees model, which is just a way of doing like an ensemble
of decision trees. So I was doing two models at the time, one neural network, one gradient boosted
trees model, and the two of them were actually doing the same. So one of them was focusing
entirely on a neural network and the other was focusing entirely on a boosted trees model.
And we didn't know that as we were starting to talk about teaming up, we just both were in the
top five of the leaderboard. So we were considering teaming up together and we can't
really share one because if you don't end up teaming up, then you don't want to give away too much to a competitor.
But also, you know, it's just against the rules to go into too much detail if you're not on a team.
So when we teamed up, we found out, oh, actually, we're taking very, very similar approaches.
Like I was putting a lot of work into my neural network, as was Chahu.
And then I was putting a lot of work into my gradient boosted trees model, as was Nima.
And so one of the things we had to do really early on was figure out how do we want to
diverge these models so that we don't have just like essentially four models where there's
really only two because two of them agree too much.
So we were looking for ways to make them different.
So how, I mean, again, so how would you, how would you make a better model than somebody else?
And what would typically you be doing? What would you and somebody else be doing differently
that might mean that yours is more successful or more accurate or whatever? What are the kind of
inputs into how you work and how you build these things?
Sure. Yeah. I think there's probably three dimensions along which you can make big improvements to your models, right? So you've got like
feature engineering. So if you just come up with better representations of the original features,
maybe it's not the square footage of the house that matters. Maybe it's the price of this per
square foot, let's say. So just coming up with the clever ratios and things that make the models
better. People spend a lot of time on hyperparameter tuning. So just the small
tweaks that you can do with these models. If you're talking about a tree model, maybe it's
the number of decisions each tree can actually make, those kinds of things.
So that's one where people spend a lot of time and I think maybe too much time in general
on Kaggle versus things like feature engineering.
And then the final area is model blending.
So making multiple models and then figuring out how to best blend them together within an ensemble.
And that can often mean like picking different modeling approaches so that I think one of the
kind of secrets of Kaggle, the open secrets is that it's important to have very diverse approaches
and then blend them together because you get this like mixture of experts effect where one model can do really well at one part of your data set.
Another model can do really well at another part of your data set.
And if you combine them well, you get this sort of supermodel where they're voting and typically doing better than any one of those models would do.
So do you think, I suppose just working in this area, is it more of an art or a science, do you think?
I mean, it sounds to me there's a bit of kind of, a bit like blending wines, really.
You know, there's a certain element there of what's you?
I mean, is it an art or a science, do you think?
Yeah, I think there's a surprising amount of art.
I'd like to say it's a pure science, but I do think there's quite a bit of experimentation and, I guess, art to it.
Yeah, interesting, interesting.
So I think also you mentioned earlier on about the second round,
you could use external data.
I mean, is that something that you brought into this then really?
I mean, did you use external data and if so, what?
Yeah, I did.
And I actually was a little disappointed that it didn't make a bigger difference.
So some of the stuff that I brought in that we kept were things like census
data. So a lot of interesting stuff came from the census. So things like knowing if a census tract
has a higher percentage of owner-occupied houses. So rental markets can be very different in their
pricing dynamics than places where people own. And that was easy to get from census, but hard to get from other data. So that would be a good census variable. And I found to be worth the extra effort. So we did have
time limits on how long we could process. And that included the downloading and processing of the
external data itself. So I actually spent probably a week on pulling building permits,
open building permits from New York City to figure this, has this house been recently renovated?
If someone purchased it for, let's say $800,000 and it's in Brooklyn, and then they put, they
pumped in $400,000 for a building permit, then I could assume that it's going to be the original
price plus maybe twice what they what they put in, in that building permit. And that did improve my
models for New Yorkork but it was so
much data processing that overall it just wasn't worth it we didn't feel it would be
um a big enough addition uh to to justify maybe not training our neural networks for as long
okay so so obviously you know you're going to win the prize i mean what was the how did how did that
kind of pan out i mean was there was there a kind of a deadline and and there was a kind of race to the end i mean what
was the kind of the what was the end of it like or was it a better case if you handed it in and
then you got the marks back later on was there intention at the end oh yeah so so there was a
the deadline was i think july 31st of 2018 and we had to make predictions for September and October home sales.
So they essentially asked us for predictions for the actual future, not for some just held out set
that maybe we could have, that people could find out and cheat or something. You actually had to
predict the future. And the race to the finish there was very difficult.
I think the final month was, was tough on us because we all wanted to continue to experiment,
but we had to, uh, really pull everything together and, and, uh, make sure, test it
from every possible angle to make sure we didn't make any mistakes.
Um, even though we were still, you know, having ideas and wanting to, to kind of just right up to the
last moment, find some some breakthrough. Okay. And so so you won the competition,
how close were you to that? To number two? I mean, what was the what, what do you put the
difference down to really in terms of why you won? Yeah, so the closeness is always funny and
Kaggle, it's very relative. So I mean, on one hand, we won by a third decimal place, which, you know, isn't a lot.
However, I guess the distance between first and second was over 50 times the distance
between second and third.
So, you know, it's all relative, I guess.
You know, I think it was a convincing win from a Kaggle perspective.
But I mean, try telling a client that
you want to spend 10 times as long building their model for a third decimal place improvement like
it's not you know it doesn't necessarily translate uh to real world stuff okay okay so and you won
the prize I mean what what tell us about the prize and and that how did you how did that sort of uh
you you obviously kind of given that by by by kind of Zillow and did you go for a presentation and what happened there?
Sure. Yeah. So they did some interviews with us where they came to our houses and just kind of did some like human interest interviews. just a week or so before the actual announcement,
they were doing a second round of interviews,
which they told Nima and I that was going to be about our team dynamic, that they wanted to just have us talk about our team.
So they came in just like they had done before,
set up cameras in our houses and stuff.
And then right as the, just after the interview started,
there was a knock at my door and I figured a neighbor was trying to figure out why I had a camera crew in my house.
But then Stan Humphreys burst in with a million dollar check.
So that was pretty fun.
Wow.
Yeah.
And he's their chief analytics officers who I recognized, of course, because I've been following him on Twitter and stuff, hoping for clues about who won or something.
So that was the sort of announcement.
And then we weren't allowed to tell anyone between the time that they did that
and then the actual full announcement, which was a tough few days
because I was still going to work.
Wow.
So trying to talk to clients and stuff like right after staring at a giant
million-dollar check, that was, that was
a fun time, I guess. So you try and cash the check afterwards, then you got down to the bank
with this massive kind of 10 foot check. And I did think that'd be pretty funny, but I haven't
done it yet. So that would be, would be fun, though. Excellent. So I mean, I guess what's
it I mean, this is interesting exercise, and it's great that you won this. And as I say, you know,
you owe it all to me, really, for the inspiration that I gave you kind of years ago.
Fair enough.
So how does this relate to what you do in a day job?
I mean, maybe tell people what it is you do for a day job now.
And what I'm interested in as well is elements of what you've been doing that you could apply to more day-to-day industry type questions, really.
Sure.
Yeah. So, I mean, lately, actually, just after submitting the Zillow prize
submission that we did, about two weeks later, I started working at DataRobot,
and it's an automated machine learning company. So I work there as a customer facing data scientist,
and Gartner called the role data science concierge, which I like a lot.
I think that really kind of explains it well.
Most of the time, because the product is an automated competition where I'm doing some feature engineering and trying to match that with the particular predictive model type and then hyperparameter tuning and all that kind of stuff is all fully automated by the tool.
So I, as a data scientist there, get to kind of step back and talk with our customers more about how should they structure their data to make it work for a supervised learning problem like that?
What is
the right target that they should choose? Like what should they be trying to predict? And then
how can they operationalize those results and turn it into either cost savings or profit?
Okay, okay. And how does, even more mundane really, how would somebody use the output of
what you're doing in a kind of analytics tool, like say Looker, for example? I mean,
not to be precise on that one, but how do these things then get productionized or pushed
out as insights to people, you know, in a more kind of easy to understand way?
Sure. So, you know, some of our customers are banks. So pretty much anytime you swipe your
credit card, there's a predictive model behind the scenes, is this fraud or not? You know, is this person both here and halfway across the world trying to make a credit card
transaction?
That's the kind of things that those would pick up on.
And, you know, we build models like that.
So that credit card machine would hit a data robot API, let's say.
The API would say, this does not look like fraud.
And so then the
credit card goes through. That's a pretty simple example of the kinds of things that we do.
I think with Looker, you might have, let's say, an insurance client might be examining
a bunch of potential leads of who they want to bring on as a customer,
and how expensive should they price the insurance
for any particular customer.
So they might have a predictive model behind the scenes
that's telling them, you know,
how likely is this person to file a claim?
Maybe another one that's,
how much would that claim cost if they filed it?
And then they would maybe look at Looker
and just do some visual analytics
there and pick the customers who would support the appropriate price with the least amount of
risk for them. Okay. Okay. And I understand Jen Underwood, who's also been on the show in the past,
has joined DataRobot recently. What was she doing there? And what's the kind of the
area that maybe she's going to try and build out and work with with you and other customers and so on yeah yeah it's really
exciting to work with her because uh i think we have similar backgrounds and similar interests
so she's uh she's um leading the charge at data robot to um help us build a uh like a community
of capable and motivated business intelligence professionals.
It's kind of a mouthful there, but, you know, it's essentially people who are already,
you know, 80% of the way to data science, right? Like if 80% of data science is cleaning data,
all BI professionals are already there, right? So it's that final 20% that data robot automates. So we're looking to
make it easy for people to make that jump and start building predictive models right away using
the skills they already have. So we're building integrations with things like Tableau and Qlik,
Power BI, Alteryx, if you're on the data engineering side. And she's she's uh very involved in that and trying to make sure that
we're uh you know addressing the the right problems and making it very easy for for people to make
that transition okay okay just to wrap up then really i mean what what are you i mean are there
any other competitions going for are there any other things that you're playing around with or
looking at in this area that that you know is uh that would be interesting really well you know
you know i started started with DataRobot
right after the Zillow Prize,
so I've only recently felt like I've had spare time.
So, you know, no competitions yet.
I was thinking I would retire from Kaggle
when I found out we won.
I'm certainly not going to top this,
but if I get two more gold medals i will i'll be
a kaggle grandmaster which is their top wow wow and so i'm thinking you know if i if i see a couple
more of the that look interesting i'll probably jump back in i think but right now definitely
mostly focused on uh on just playing music and and uh making mixed drinks again that kind of thing
well exactly i'm about to say hopefully at some point you'll resurrect your cocktail maker
from Christmas that year.
Just tell people about what that was at the time.
Sure, yeah.
I made a bar optimizer.
You can tell it the bottles of liquor you have in your liquor cabinet,
and it'll tell you what's the next one to get to make the most new drinks and yeah i absolutely resurrected that recently and
restocked my bar put about seven new bottles in and it's just like an endless
list of of cocktails that i'm i'm suffering my way through it's terrible excellent that's good
well it's been jordan it's been great speaking to you and and congratulations for winning that
prize i mean obviously i can't think of any more deserving really forgetting that and uh it's been great having you on to talk about you know
how you won that and a bit of a kind of layman's intro really to to you know how you won it and
an update on data robot you know so thank you very much for that it's been great to have you with us
thanks great great to talk to you excellent cheers Cheers.