The a16z Show - a16z Podcast: On Data and Data Scientists in the Age of AI
Episode Date: December 5, 2017Data, data, everywhere, nor any drop to drink. Or so would say Coleridge, if he were a big company CEO trying to use A.I. today -- because even when you have a ton of data, there's not always enough s...ignal to get anything meaningful from AI. Why? Because, "like they say, it's 'garbage in, garbage out' -- what matters is what you have in between," reminds Databricks co-founder (and director of the RISElab at U.C. Berkeley) Ion Stoica. And even then it's still not just about data operations, emphasizes SigOpt co-founder Scott Clark; your data scientists need to really understand "What's actually right for my business and what am I actually aiming for?" And then get there as efficiently as possible. But beyond defining their goals, how do companies get over the "cold start" problem when it comes to doing more with AI in practice, asks a16z operating partner Frank Chen (who also released a microsite on getting started with AI earlier this year)? The guests on this short "a16z Bytes" episode of the a16z Podcast -- based on a conversation that took place at our recent annual Summit event -- share practical advice about this and more. Stay Updated:Find a16z on YouTube: YouTubeFind a16z on XFind a16z on LinkedInListen to the a16z Show on SpotifyListen to the a16z Show on Apple PodcastsFollow our host: https://twitter.com/eriktorenberg Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
Transcript
Discussion (0)
Hi everyone, welcome to the A6 and Z podcast. Today's episode, continuing our series on translating
AI into practice, is one of our shorter bites based on a panel discussion that took place at our recent
annual A6&Z summit event just last month. Operating partner Frank Chen, who put out a microsite on
getting started with AI earlier this year, talks with Janstuyka, co-founder of Databricks,
and Scott Clark, co-founder of Sigopt, and both have been on this podcast if you want to hear more
from them in other episodes about the cold start problem for companies getting started with AI,
especially focusing on the role of data scientists and domain experts in this context.
You guys now, between the two of you, have now sort of been with the customer on their journeys
from sort of day one until they have models in production. And so what advice do you have
for people who aren't Google, Amazon, Facebook, Apple to realize machine learning? What do they need to do on day one?
I have many enterprise companies, and out of them, over 70% actually they have AI projects.
And what we see, actually, if you take the step back, there are three stages.
The first stage is to make sure that you have the data.
Many times this takes more than actually building the machine learning or AI model.
The second thing, it's about once you have the data to become, so to speak, to operationalize this,
to become a data-driven company, to figure out,
what are the KPI, the key performance indicators,
which are going to be driving your business.
You need to take these KPI based on the data
and operationalize, meaning to have reports, dashboards, and so forth.
And now, once you have this,
then you are going to start and use machine learning and AI
to improve these KPI.
So that's kind of the journey.
So that sounds great.
You have this sort of very methodical, process-oriented roadmap
to get from here to there.
So tell me where can it go wrong? Where are the pitfalls? Where have you seen people get stuck on this journey?
Yeah, at every single one of those stages, there are pitfalls that you're going to need to try to avoid.
From just making sure that you have the right data, that it represents what's actually happening in the real world,
to defining those KPIs and metrics. There needs to be this huge contextual component.
And I think that's where data science is moving towards, as more and more of these more arduous tasks gets automated.
that you need to be able to say, what's actually right for my business,
and what am I actually aiming for?
And then, of course, it's how do I get there as efficiently as possible?
Again, I cannot emphasize enough how important is the data.
And this is a continuous process.
You need to devour resources on a continuous basis
to make sure the data is correct,
because you are going to get data from new sources.
You are going to change the software which logs some of the data.
everywhere you can have mistakes can happen and like they say you know
garbage in garbage out no matter what is how smart is what you have in between so
that's number one so you really need to be paranoid about your data collection
is the accuracy of your data I think the other thing is when I said about the
second stage is typically it's about figuring out what are the KPIs that's why
you know actually when you hire data scientists actually having data scientists
which have a good understanding about your business
or can work with people, the business people,
it's extremely important.
Fundamentally, data science is about,
you know, you know to know statistics,
and you know to know math and, of course, machine learning,
but you need to either be a domain expert
in what you are doing or work well with domain experts.
So I had asked, what are the pitfalls?
Where can it go wrong?
I'm gonna ask the inverse of that question.
So number one,
is productivity of people.
It's hard, as you know, getting, hiring best data scientists
and retaining them.
So the next best thing you can do is to make them more productive,
even more to make your organization more productive
by allowing them to share the artifacts they build
in terms of models with everyone in the organization.
Sometimes it will be as simple as using a model
as writing a SQL query.
So I think that's a very important aspect.
The other one, which is related with that,
time to market, right?
It's basically, you know, there have many companies we can cut the time to market from idea to product by one order of magnitude.
I want to go back to this sort of getting started, the cold start problem, right, in AI, because I've met with hundreds of companies now who are beginning their AI journeys.
And if I were to summarize their frustration, it would be this.
It's like, you Silicon Valley guys drive me crazy.
You told me I couldn't run on bare metal, I had to run on hypervisors.
And then you said, I can't run in my own data center.
I have to run in the cloud.
And then you have to build an iPhone app that's native.
You can't just do mobile web.
And you have to do big data analysis and get really good at analysis.
And now, like, you're coming and telling me, I have to do AI and machine learning.
Like, I can't keep up.
There's too much stuff.
So as you think about the companies that have been successful with their projects,
how do they get over the cold start problem?
Do they hire consultants?
Do they repurpose internal engineers?
Do they send them to training classes?
Do they hire people from all of these data science boot camps?
Yeah, so I think it's a very hard problem. So as any hard problems, there is no single silver bullet. So we try to solve this problem by emphasizing on different aspects, everything from education, deployment and so forth. The one thing I want to also mention again from our observation, the small companies actually they start with the AI mindset.
They're building the AI platform to solve a specific problem as opposed to being an incumbent that's then trying to apply AI to what they already have.
But let me talk a little bit about the enterprise, you know, they are 50 years or even some cases over 100-year-old companies.
So they want to use AI, again, to improve their business, competitivity.
So what we see is that the enterprise which are the most successful, they go all in.
What do I mean is because they have multiple projects.
It's not only one project.
And yes, you can try with the one project and so forth to kind of test it.
But at the end of the day, it's hard when you start a data science, AI project,
to know whether it is successful.
In many cases, it goes down to the fact that even after you have the data,
it may not be enough to get the kind of improvement you expect.
So think about it's like hedging.
It's because companies who have multiple projects they are doing.
You know, some of these projects are going to be successful.
But not all of them can be successful.
We know companies which are actually very technical.
And some of the project fails because there is not enough data.
So you believe that it's enough data, but it's not enough.
It's not enough signal.
At least is what we've seen.
And one of the things that we see is different than kind of these traditional approaches is that cold start used to take maybe a decade to kind of move from your own bare metal data centers to the cloud and things like that.
But now, like, all the pieces are kind of coming together for AI.
Like a lot of these traditional bottlenecks that would have traditionally taken the enterprise, oh, we need to do this over five, ten years.
Now you can kind of get up and running very quickly.
like the pieces are there to move very quickly.
So I think that cold start problem where it used to be this huge threshold
where you had to get over is now becoming easier and easier.
And there's less of an excuse why you're not actually doing it, to be honest.
That's a perfect springboard to my last question,
which is we're in this cycle right now where the tools are improving rapidly, right?
And so what used to be a black heart can now be an API call,
a million dollar data science integers.
Now it's an API call away.
So if I'm an organization,
Shouldn't I just wait for the tools to get better?
Like, why do I need data science?
Or maybe another way to ask the question is,
how does the data science job change
over the next two years as the tools get much better?
I think it's all about that context.
So once again, TensorFlow is an incredible tool.
It's a way to kind of get up and running very quickly
with deep learning.
But it's only as good as what you pointed at.
And this happens all the time.
We can tune any underlying system,
but we can only tune it towards the metrics you point us at.
We'll hit any target in the world,
but if you point us at the wrong target,
will hit that wrong target better than anything else in the world.
And so the idea is you still need the data scientists to really understand what it is
that you're trying to achieve as a business.
And how does that relate to your customers, relate to your unique data sets,
and how do you actually differentiate yourselves from your competitors?
And I think there's going to be a lot of tools that make it easier to do that,
but at the end of the day, you need to know where you want to go with the business.
Yeah, so I cannot agree more.
So fundamentally, like we discussed many times,
is the most important thing is to figure out what are your business objectives
and whatever you improve related to these business objectives.
So that's why the data science, they have to be accurately aware about the context.
And all these tools, you just allow them to get their faster,
to process more data, to hit this target faster, like Scott said.
But if the targets are wrong or they are not going to move the needle,
is not much you can do.
wants to do AI, but it doesn't really help to do it for the sake of just doing it.
Just checking a box and saying, okay, now we're doing AI isn't enough.
You need to know what it is you're shooting for.
And sometimes in like financial services, that might be relatively easy.
I just want to make as much money as possible.
But in other industries, it might be more difficult.
And setting up that success criteria early will be helpful to make sure that you build
towards the right goal and then eventually optimize towards it.
Well, Scott, Jan, thank you for joining us.
Thank you.
Thank you for us.
