a16z Podcast - a16z Podcast: On Data and Data Scientists in the Age of AI
Episode Date: December 5, 2017Data, data, everywhere, nor any drop to drink. Or so would say Coleridge, if he were a big company CEO trying to use A.I. today -- because even when you have a ton of data, there's not always enough s...ignal to get anything meaningful from AI. Why? Because, "like they say, it's 'garbage in, garbage out' -- what matters is what you have in between," reminds Databricks co-founder (and director of the RISElab at U.C. Berkeley) Ion Stoica. And even then it's still not just about data operations, emphasizes SigOpt co-founder Scott Clark; your data scientists need to really understand "What's actually right for my business and what am I actually aiming for?" And then get there as efficiently as possible. But beyond defining their goals, how do companies get over the "cold start" problem when it comes to doing more with AI in practice, asks a16z operating partner Frank Chen (who also released a microsite on getting started with AI earlier this year)? The guests on this short "a16z Bytes" episode of the a16z Podcast -- based on a conversation that took place at our recent annual Summit event -- share practical advice about this and more.
Transcript
Discussion (0)
Hi, everyone. Welcome to the A6 and Z podcast. Today's episode, continuing our series on translating AI into practice, is one of our shorter bites based on a panel discussion that took place at our recent annual A6&Z summit event just last month.
Operating partner Frank Chen, who put out a microsite on getting started with AI earlier this year, talks with Janstoyka, co-founder of Databricks, and Scott Clark, co-founder of Sigopt, and both have been on this podcast if you want to hear more from them in other episodes, about the cold,
start problem for companies getting started with AI, especially focusing on the role of data scientists
and domain experts in this context. You guys now, between the two of you, have now sort of been
with the customer on their journeys from sort of day one until they have models in production.
And so what advice do you have for people who aren't Google, Amazon, Facebook, Apple to realize
machine learning? What do they need to do on day one? I have many enterprise companies, and out of them,
over 70% actually they have AI projects.
And what we see, actually, if you take the step back, there are three stages.
The first stage is to make sure that you have the data.
Many times this takes more than actually building the machine learning or AI model.
The second thing, it's about once you have the data to become, so to speak, to operationalize this,
to become a data-driven company, to figure out what are the KPIs, the key performance indicators,
which are going to be driving your business.
You need to take these KPIs based on the data
and operationalize, meaning to have reports, dashboard, and so forth.
And now, once you have this,
then you are going to start and use machine learning and AI
to improve these KPIs.
So that's kind of the journey.
So that sounds great.
You have this sort of very methodical,
process-oriented roadmap to get from here to there.
So tell me, where can it go wrong?
Where are the pitfalls? Where have you seen people get stuck on this journey?
Yeah, at every single one of those stages, there are pitfalls that you're going to need to try to avoid
from just making sure that you have the right data, that it represents what's actually happening in the real world,
to defining those KPIs and metrics.
There needs to be this huge contextual component, and I think that's where data science is moving towards,
as more and more of these more arduous tasks gets automated, that you need to be able to say,
what's actually right for my business, and what am I actually aiming for?
And then, of course, it's how do I get there as efficiently as possible?
Again, I cannot emphasize enough how important is the data.
And this is a continuous process.
You need to devour resources on a continuous basis to make sure the data is correct,
because you are going to get data from new sources.
You are going to change the software which logs some of the data.
Everywhere, you can have mistakes can happen.
And like they say, you know, garbage in, garbage out, no matter what is how smart is what you have in between.
So that's number one.
So you really need to be paranoid about your data collection, the accuracy of your data.
I think the other thing is when I said about the second stage, typically it's about figuring out what are the KPIs.
That's why, you know, actually, when you hire data scientists, actually having data scientists which have a good understanding about your business
or can work with people, the business people, it's extremely important.
Fundamentally, data science is about, you know, you know to know statistics,
and you know to know math and, of course, machine learning,
but you need to either be a domain expert in what you are doing
or work well with domain experts.
So I had asked, what are the pitfalls?
Where can it go wrong?
I'm going to ask the inverse of that question.
So number one is productivity of people.
It's hard, as you know.
getting hiring best data scientists and retaining them so the next best thing
you can do is to make them more productive even more to make your organization
more productive by allowing them to share the artifacts they build in terms of
models with everyone in the organization sometimes it will be as simple as using a
model as writing a SQL query so I think that's a very important aspect the other
one which is related with that time to market right it's basically you
know there have many companies we can
cut the time to market from
idea to product by one order of magnitude.
I want to go back to this sort of getting
started, the cold start problem
in AI, because I've met
with hundreds of companies now
who are beginning their AI journeys. And if I were
to summarize their frustration, it would be this.
It's like, you Silicon Valley guys
drive me crazy. You told me
I couldn't run on bare metal. I had to run
on hypervisors. And then you said, I
can't run in my own data center. I have to run in the
cloud. And then you have to build an
iPhone app that's native. You can't just do
mobile web. And you have to do big data analysis and get really good at analysis. And now,
like, you're coming and telling me, I have to do AI and machine learning. Like, I can't keep up.
There's too much stuff. So as you think about the companies that have been successful with their
projects, how do they get over the cold start problem? Do they hire consultants? Do they repurpose
internal engineers? Do they send them to training classes? Do they hire people from all of these
data science boot camps? Yeah. So I think it's a very hard problem. So as any hard problems, there is no
single silver bullet. So we try to solve this problem by emphasizing on different aspects,
everything from education, deployment, and so forth. The one thing I want to also mention again
from our observation, the small companies actually they start with the AI mindset. They're building
the AI platform to solve a specific problem as opposed to being an incumbent that's then trying
to apply AI to what they already have. But let me talk a little bit about the enterprise.
which, you know, they are 50 years or even some cases over 100-year-old companies,
so they want to use AI, again, to improve their business, competitivity.
So what we see is that the enterprise which are the most successful, they go all in.
What do I mean is because they have multiple projects.
It's not only one project.
And yes, you can try with the one project and so forth to kind of test it.
But at the end of the day, it's hard when you start a data science, AI project,
to know whether it's successful.
In many cases, it goes down to the fact that even after you have the data, it may not be enough to get the kind of improvement you expect.
So think about it's like hedging.
It's because companies who have multiple projects they are doing, you know, some of these projects are going to be successful.
But not all of them can be successful.
We know companies which are actually very technical.
And some of the project fails because there is not enough data.
They believe that it's enough data, but it's not enough.
There is not enough signal.
At least is what we've seen.
And one of the things that we see is different than kind of these traditional approaches is that cold start used to take maybe a decade to kind of move from your own bare metal data centers to the cloud and things like that.
But now like all the pieces are kind of coming together for AI, like a lot of these traditional bottlenecks that would have traditionally taken the enterprise, oh, we need to do this over five, ten years.
Now you can kind of get up and running very quickly.
Like the pieces are there to move very quickly.
So I think that cold start problem where it used to be this huge threshold where you had to get over
is now becoming easier and easier.
And there's less of an excuse why you're not actually doing it, to be honest.
That's a perfect springboard to my last question, which is we're in this cycle right now
where the tools are improving rapidly, right?
And so what used to be a black heart can now be an API call,
a million dollar data science integers.
Now it's an API call away.
So if I'm an organization, shouldn't I just wait for the tools to get better?
Like, why do I need data science?
Or maybe another way to ask the question is,
how does the data science job change over the next two years
as the tools get much better?
I think it's all about that context.
So, once again, TensorFlow is an incredible tool.
It's a way to kind of get up and running very quickly with deep learning.
But it's only as good as what you point it at.
And this happens all the time.
We can tune any underlying system,
but we can only tune it towards the metrics you point us at.
We'll hit any target in the world,
but if you point us at the wrong target,
we'll hit that wrong target better than anything else.
the world. And so the idea is you still need the data scientists to really understand
what it is that you're trying to achieve as a business. And how does that relate to your
customers, relate to your unique data sets, and how do you actually differentiate yourselves
from your competitors? And I think there's going to be a lot of tools that make it easier to
do that, but at the end of the day, you need to know where you want to go with the business.
Yeah, so I cannot agree more. So fundamentally, like we discussed many times,
is the most important thing is to figure out what are your business objectives
and whatever you improved related to these business objectives.
So that's why the data science, they have to be accurately aware about the context.
And all these tools, you just allow them to get their faster,
to process more data, to hit this target faster, like Scott said.
But if the targets are wrong or they are not going to move the needle,
is not much you can do.
wants to do AI, but it doesn't really help to do it for the sake of just doing it.
Just checking a box and saying, okay, now we're doing AI isn't enough.
You need to know what it is you're shooting for.
And sometimes in like financial services, that might be relatively easy.
I just want to make as much money as possible.
But in other industries, it might be more difficult.
And setting up that success criteria early will be helpful to make sure that you build
towards the right goal and then eventually optimize towards it.
Well, Scott, Jan, thank you for joining us.
Thank you.
Thank you for us.
Thank you.