In The Arena by TechArena - Exploring Requirements for AI with EY

Episode Date: March 26, 2024

TechArena host Allyson Klein chats with EY’s Global Innovation AI Officer, Rodrigo Mandanes, about what he’s seeing from clients in their advancement with AI and what this means for the industry r...equirements for innovation.

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the Tech Arena, featuring authentic discussions between tech's leading innovators and our host, Alison Klein. Now, let's step into the arena. Welcome to the Tech Arena. My name is Alison Klein. Today, I'm so delighted to have Rodrigo Mandanis with me. He is the Global AI Innovation Officer at EY. Welcome to the program, Rodrigo. Hi, Alison. Nice to be invited here. Glad to be here. So, Rodrigo, I am so excited to talk to you because EY has really been setting a storm in terms of AI leadership. And I'm just going to start with this. Why don't you go ahead and introduce yourself and what does it mean to have yes, at EY, we do, just to set the context for EY, most people know us because we audit a lot of companies, so we do a lot of tax work. We also do a lot of consulting work for companies, both their strategy as well as their implementations of business processes or technology implementations. We also do a lot of stuff internally. The firm has about 400,000 employees.
Starting point is 00:01:29 So we develop technology internally for our people to do a good job delivering services. And my job is to do the innovation that uses AI for internal purposes. Like what tools, what AI tooling should we develop? So our tax professionals, auditors, just our own group of people can then use AI to be more productive. That's so interesting. You know, I think that I have a lot of conversations
Starting point is 00:01:59 about technology and I don't think I can have a conversation today without talking about AI and specifically large language models and the opportunity that they represent. I know that you talk to a lot of organizations and I know that they're seeking a way to integrate this technology into their applications, into their business processes. Where do you think we are on this journey? You know, when you look at Broad Enterprise and where do you see the biggest areas of opportunity?
Starting point is 00:02:29 So I have a story I tell because I talked to so many people around where we are today. Let me focus on where we are today first, which is we're in 2024. It's a very different year than 2023. 2023 was the year when everyone was surprised by the impact announced in November 2022 of ChatGPT.
Starting point is 00:02:57 So there were no budget allocated for this scale of AI in most companies. The budgets had already been baked by January 2023. So 2023 is really the year of the pilot. It's the year where organizations figured out who's in charge of all this. Can we do some pilots so we learn?
Starting point is 00:03:17 And can we do a lot of the compliance work so that we are able to do Gen AI in large enterprises, which is privacy assessments, legal reviews, information security assessments. So all that hard work happened in 2023. So we didn't see that many production at scale use cases in 2023. 2024 is the first year where all the budgets have been designed to develop Gen AI at greater scale. There are more clarity on who's in charge and how to move forward.
Starting point is 00:03:54 And the companies that managed to build out successful pilots in 2023 are now looking at what does it mean to deploy it in production 2024? And that has issues of compute costs, like how expensive is it to deploy this at scale? What is the impact in terms of change management for the organization? What is the right stack to support this going forward? So 2024 is the first year that we're seeing some AI at scale, and we're starting to see the ramifications of what that means. So that's the first part. In terms of the opportunities, opportunities are huge. When you look at the small scale studies of like 100 to 5,000 people, where they assign people a task without Gen AI and the same tasks with Gen AI,
Starting point is 00:04:48 the studies show a range of 10 to 40% productivity improvement in a diverse set of tasks, meaning they get it done 10 to 40% faster, higher quality, and higher satisfaction from the person doing the task and the people sort of on the other side so there's a huge i mean you have to realize productivity gains typically oscillate in one to two percent a year on good years so when we're talking about 30 this is a sort of like a once in a lifetime chance step function difference difference, right? Yeah, yeah. It's incredible because I've read so many things in the past where technology is arguably not even moving the needle when it comes to productivity. And here we are looking at these massive changes. It's a really interesting opportunity and it's a challenge for IT organizations because the industry is moving so fast. We could take this conversation in so many different ways across, you know, skill sets
Starting point is 00:05:51 inside of companies, you know, how do DevOps teams adapt and drive AppDev. But what I really want to talk to you today about is infrastructure. And, you know, there is so much demand for performance to actually deliver on this huge opportunity statement that you just introduced and really re-architect platforms. And I've been in the industry for 25 years. I don't think I've ever seen this type of demand before. But let's unpack that a little bit. What about these AI models is driving that demand? And how do you see and do you see any limits to deployments based on current infrastructure constraints? Yeah. It's fascinating the velocity at which we are deploying capacity and infrastructure. I think one of the things I also tell in the story of 2023, it was the year of the compute scarcity, where nobody was very hard to get a hand on the sort of most powerful GPUs because there were lack of six months to a year just in terms of getting access to those GPUs. I think we're seeing a huge amount of
Starting point is 00:07:13 infrastructure build from the large cloud providers and the large enterprises where they are seeing the capacity come from their clients and they're trying to build it out. And that capacity is both for training as well as for inference in models. And people are trying to make that distinction. The reason why that distinction is important is that we are moving into a stage where we're going to start probably differentiating infrastructure on dedicated chips and platforms for inference, which is what you use when you need to deploy AI and use AI versus for training, in which some companies are going to be doing a lot of training,
Starting point is 00:08:02 others less so, but we're definitely going to need a lot of inference capacity. Go ahead. I was going to say, it's mostly right now, a large chunk is current demand and a large chunk of it is anticipating the demand when you see the sort of year-on-year growth that infrastructure build-out always needs to happen ahead of the demand.
Starting point is 00:08:27 Because you need to meet that demand. So we're seeing a lot of the combination of those. The curve is getting really steep, the growth curve. And everybody's racing to put in place the infrastructure to serve demand. Yeah, I had another guest on a few months ago on the Tech Arena talking about the forecast of the large cloud providers and their greenfield buildouts being something between 50% and doubling of capacity in the next five years, which is mind-blowing when you think about how much capacity they have today. But, you know, one question that I have in my mind is there's a lot of infrastructure disruption. You talked about dedicated platforms for inference and for training. And I can get that picture when you talk to, you know, a Microsoft or a Google about what they're building on these massive scales. But when we look at enterprise, do you see the same types of infrastructure trends and the same infrastructure deployments? Are you seeing different types of
Starting point is 00:09:30 adoption when you get into different vertical markets? So I don't, I want to double click on the vertical market. What I do see is that enterprises are lagging the sort of rollout or the capacity build-out in that most people underappreciate the amount of heavy lifting that needs to be done to get the house in order to deploy Gen AI at scale in large enterprise. One of the things that's limiting, like I mentioned, is a privacy assessment. Legal needs to be comfortable. You need to have information security. You need to have MLOps and LLMOps.
Starting point is 00:10:17 And those differ by regulated industries, like finance versus non-regulated industries. We have different requirements. And then you have to talk about the multi-jurisdictional requirement where most of these large enterprises have different regulatory regimes in European countries versus U.S. versus others. So the complexity of deploying some of these proof of concepts into production is going to lag more than one expects because they can't move at that speed. This is already a phenomenal speed they're moving in. Now, we're sitting in a really interesting time period in March in the fact that GTC is next week
Starting point is 00:11:02 and we'll be seeing a lot of advancement from across the industry around GPU technology and the solutions that are coming to market that utilize it. But I actually want to talk to you about a different conference that we're both going to in a couple of weeks, MemCon. And MemCon is, you know, a preeminent conference to learn about what lead users of advanced technology require from memory. And memory is becoming very central to these very challenges that we're talking about. How do you see that? And why is memory all of a sudden becoming so central to compute bottlenecks. Yeah, I think that's a badly kept secret, let's say, which is that, first of all, for doing Gen AI, you need GPUs. You need to have very powerful computers that are designed for, you know, large-scale matrix multiplication.
Starting point is 00:12:09 These models tend to have billions of parameters, you know, hundreds of billions of parameters. So you can imagine the size of a model and the amount of compute needed to do these matrix multiplications at scale and at speed. To do those very complex multiplications at speed and at scale, you need a lot of memory. And you need a lot of memory of very particular types. You need memory that is going to be accessible by the GPUs. Some of it is going to be distributed across the computations, across multiple GPUs.
Starting point is 00:13:21 So right now, or at least until a few years ago, we didn't have the demand to start creating many dedicated architectures for doing Gen AI at scale that I think the topic of how you deal with provisioning the right memory architectures to not be a bottleneck for these computes is going up the agenda. And it might be a different type of memory structure that we'll need. I don't think we've solved which is the right architecture to solve or the different ways to solve it in different parts of the compute stack. But it is the huge issue. Compute capacity and memory are the key bottlenecks in doing LLM computations at Gale. When I talk to some of consumers of Gen AI, one of the key issues they raise is we can't afford that latency. It's just too slow of a user experience in many applications. And that latency comes from the compute and the memory bottlenecks. So I can tell you it's a problem. I can't tell you how to solve it. You know, I think we're looking for a memory that has the latency of HBM and the capacity of DDR5 and beyond.
Starting point is 00:14:15 And it's probably called unicorn memory because it doesn't exist yet. But I'm really excited to go to MemCon and hear from those in the user community about what the requirements are so that we can start shaping the future of memory technology. Now, I know you're giving a talk. Do you want to give us a preview about what you're going to be talking about at the conference? Yeah, I'm bringing a couple of other panelists and we're just going to do a fireside chat on what are we seeing in terms of new potential opportunities for accelerating the memory usage in Gen AI. We see memory as a key component of solving the Gen AI performance and compute capacity. So we'll talk about that problem at different angles with respect to that. Rodrigo, when you look forward in 2024, you painted such an interesting picture of 2024 is not like 2023. Do you think that the second half of 2024 will be similar to the first half. Given the speed that we're working, what do you expect to see as we exit 2024 in terms of enterprise adoption of this technology? And, you know, if you want to,
Starting point is 00:15:34 you know, say how the industry is going to respond to the core challenges that we've been talking about with infrastructure from supply chain challenges to, you know, limitations of current architectures. Yeah, my sense is that the second, so I don't know how the sort of supply side and demand side of the cloud structure will play out. I can tell you what it looks like from the consumer side, but the user side, if you will. I think the conversation in the second half of 2024 is going to start revolving more around the cost of compute as we go from 1x to 1000x usage for these pilots.
Starting point is 00:16:22 And I've had some conversations with some leaders who told me that the differential in choosing the right model to deploy and being more efficient in that deployment was a 100x differential in cost of daily use. You're like, are you serious? It was 100x. That was how big it was. And some of these budgets are prohibitive right now in terms of compute costs. So my guess is second half of 2024, compute costs is going to start becoming more at the forefront.
Starting point is 00:16:57 And it'll be fully at the forefront in 2025. That's fantastic. I am looking forward to seeing how this conversation evolves. And I would love to have you back on the show to explore some of the other challenges of AI. But I just want to thank you for your time. I have one more question for you. Where can folks reach out to you and your team to engage? And can folks meet you up at MedCon? Absolutely.
Starting point is 00:17:22 They should. And I will be meeting them at MemCon and people can reach out to me on LinkedIn and message me if there's any questions and, and, or there's the, if you go to ey.com, if you find my profile, you can also message me there. Well, Rodrigo, it was wonderful having you on the show. Really big pleasure to get to meet you on, on the tech arena. We'd love to have you back. Thank you, Allison. It's been great meeting you. And I didn't realize you were quite techie when you were talking about the DDRs. You know more than I thought. Thank you for having me
Starting point is 00:17:57 on the show. Thanks for joining the Tech Arena. Subscribe and engage at our website, thetecharena.net. All content is copyright by The Tech Arena.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.