a16z Podcast - a16z Podcast: On Data and Data Scientists in the Age of AI

Starting point is 00:00:00 Hi, everyone. Welcome to the A6 and Z podcast. Today's episode, continuing our series on translating AI into practice, is one of our shorter bites based on a panel discussion that took place at our recent annual A6&Z summit event just last month. Operating partner Frank Chen, who put out a microsite on getting started with AI earlier this year, talks with Janstoyka, co-founder of Databricks, and Scott Clark, co-founder of Sigopt, and both have been on this podcast if you want to hear more from them in other episodes, about the cold, start problem for companies getting started with AI, especially focusing on the role of data scientists and domain experts in this context. You guys now, between the two of you, have now sort of been with the customer on their journeys from sort of day one until they have models in production. And so what advice do you have for people who aren't Google, Amazon, Facebook, Apple to realize machine learning? What do they need to do on day one? I have many enterprise companies, and out of them, over 70% actually they have AI projects.

Starting point is 00:01:01 And what we see, actually, if you take the step back, there are three stages. The first stage is to make sure that you have the data. Many times this takes more than actually building the machine learning or AI model. The second thing, it's about once you have the data to become, so to speak, to operationalize this, to become a data-driven company, to figure out what are the KPIs, the key performance indicators, which are going to be driving your business. You need to take these KPIs based on the data and operationalize, meaning to have reports, dashboard, and so forth.

Starting point is 00:01:38 And now, once you have this, then you are going to start and use machine learning and AI to improve these KPIs. So that's kind of the journey. So that sounds great. You have this sort of very methodical, process-oriented roadmap to get from here to there. So tell me, where can it go wrong?

Starting point is 00:01:56 Where are the pitfalls? Where have you seen people get stuck on this journey? Yeah, at every single one of those stages, there are pitfalls that you're going to need to try to avoid from just making sure that you have the right data, that it represents what's actually happening in the real world, to defining those KPIs and metrics. There needs to be this huge contextual component, and I think that's where data science is moving towards, as more and more of these more arduous tasks gets automated, that you need to be able to say, what's actually right for my business, and what am I actually aiming for? And then, of course, it's how do I get there as efficiently as possible?

Starting point is 00:02:33 Again, I cannot emphasize enough how important is the data. And this is a continuous process. You need to devour resources on a continuous basis to make sure the data is correct, because you are going to get data from new sources. You are going to change the software which logs some of the data. Everywhere, you can have mistakes can happen. And like they say, you know, garbage in, garbage out, no matter what is how smart is what you have in between. So that's number one.

Starting point is 00:03:02 So you really need to be paranoid about your data collection, the accuracy of your data. I think the other thing is when I said about the second stage, typically it's about figuring out what are the KPIs. That's why, you know, actually, when you hire data scientists, actually having data scientists which have a good understanding about your business or can work with people, the business people, it's extremely important. Fundamentally, data science is about, you know, you know to know statistics, and you know to know math and, of course, machine learning, but you need to either be a domain expert in what you are doing or work well with domain experts.

Starting point is 00:03:44 So I had asked, what are the pitfalls? Where can it go wrong? I'm going to ask the inverse of that question. So number one is productivity of people. It's hard, as you know. getting hiring best data scientists and retaining them so the next best thing you can do is to make them more productive even more to make your organization more productive by allowing them to share the artifacts they build in terms of

Starting point is 00:04:06 models with everyone in the organization sometimes it will be as simple as using a model as writing a SQL query so I think that's a very important aspect the other one which is related with that time to market right it's basically you know there have many companies we can cut the time to market from idea to product by one order of magnitude. I want to go back to this sort of getting started, the cold start problem

Starting point is 00:04:32 in AI, because I've met with hundreds of companies now who are beginning their AI journeys. And if I were to summarize their frustration, it would be this. It's like, you Silicon Valley guys drive me crazy. You told me I couldn't run on bare metal. I had to run on hypervisors. And then you said, I

Starting point is 00:04:48 can't run in my own data center. I have to run in the cloud. And then you have to build an iPhone app that's native. You can't just do mobile web. And you have to do big data analysis and get really good at analysis. And now, like, you're coming and telling me, I have to do AI and machine learning. Like, I can't keep up. There's too much stuff. So as you think about the companies that have been successful with their projects, how do they get over the cold start problem? Do they hire consultants? Do they repurpose internal engineers? Do they send them to training classes? Do they hire people from all of these

Starting point is 00:05:17 data science boot camps? Yeah. So I think it's a very hard problem. So as any hard problems, there is no single silver bullet. So we try to solve this problem by emphasizing on different aspects, everything from education, deployment, and so forth. The one thing I want to also mention again from our observation, the small companies actually they start with the AI mindset. They're building the AI platform to solve a specific problem as opposed to being an incumbent that's then trying to apply AI to what they already have. But let me talk a little bit about the enterprise. which, you know, they are 50 years or even some cases over 100-year-old companies, so they want to use AI, again, to improve their business, competitivity.

Starting point is 00:05:58 So what we see is that the enterprise which are the most successful, they go all in. What do I mean is because they have multiple projects. It's not only one project. And yes, you can try with the one project and so forth to kind of test it. But at the end of the day, it's hard when you start a data science, AI project, to know whether it's successful. In many cases, it goes down to the fact that even after you have the data, it may not be enough to get the kind of improvement you expect. So think about it's like hedging.

Starting point is 00:06:29 It's because companies who have multiple projects they are doing, you know, some of these projects are going to be successful. But not all of them can be successful. We know companies which are actually very technical. And some of the project fails because there is not enough data. They believe that it's enough data, but it's not enough. There is not enough signal. At least is what we've seen. And one of the things that we see is different than kind of these traditional approaches is that cold start used to take maybe a decade to kind of move from your own bare metal data centers to the cloud and things like that.

Starting point is 00:07:00 But now like all the pieces are kind of coming together for AI, like a lot of these traditional bottlenecks that would have traditionally taken the enterprise, oh, we need to do this over five, ten years. Now you can kind of get up and running very quickly. Like the pieces are there to move very quickly. So I think that cold start problem where it used to be this huge threshold where you had to get over is now becoming easier and easier. And there's less of an excuse why you're not actually doing it, to be honest. That's a perfect springboard to my last question, which is we're in this cycle right now where the tools are improving rapidly, right?

Starting point is 00:07:33 And so what used to be a black heart can now be an API call, a million dollar data science integers. Now it's an API call away. So if I'm an organization, shouldn't I just wait for the tools to get better? Like, why do I need data science? Or maybe another way to ask the question is, how does the data science job change over the next two years as the tools get much better?

Starting point is 00:07:52 I think it's all about that context. So, once again, TensorFlow is an incredible tool. It's a way to kind of get up and running very quickly with deep learning. But it's only as good as what you point it at. And this happens all the time. We can tune any underlying system, but we can only tune it towards the metrics you point us at. We'll hit any target in the world,

Starting point is 00:08:10 but if you point us at the wrong target, we'll hit that wrong target better than anything else. the world. And so the idea is you still need the data scientists to really understand what it is that you're trying to achieve as a business. And how does that relate to your customers, relate to your unique data sets, and how do you actually differentiate yourselves from your competitors? And I think there's going to be a lot of tools that make it easier to do that, but at the end of the day, you need to know where you want to go with the business. Yeah, so I cannot agree more. So fundamentally, like we discussed many times,

Starting point is 00:08:40 is the most important thing is to figure out what are your business objectives and whatever you improved related to these business objectives. So that's why the data science, they have to be accurately aware about the context. And all these tools, you just allow them to get their faster, to process more data, to hit this target faster, like Scott said. But if the targets are wrong or they are not going to move the needle, is not much you can do. wants to do AI, but it doesn't really help to do it for the sake of just doing it.

Starting point is 00:09:15 Just checking a box and saying, okay, now we're doing AI isn't enough. You need to know what it is you're shooting for. And sometimes in like financial services, that might be relatively easy. I just want to make as much money as possible. But in other industries, it might be more difficult. And setting up that success criteria early will be helpful to make sure that you build towards the right goal and then eventually optimize towards it. Well, Scott, Jan, thank you for joining us.

Starting point is 00:09:38 Thank you. Thank you for us. Thank you.

a16z Podcast - a16z Podcast: On Data and Data Scientists in the Age of AI

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.