Orchestrate all the Things - AI Chips in 2025: The end of “more GPUs is all you need”? Featuring InAccel CEO / Founder Chris Cachris

Episode Date: January 29, 2025

It’s early 2025, and we may already be witnessing a redefining moment for AI as we’ve come to know it in the last couple of years. Is the canon of “more GPUs is all you need” about to cha...nge? Truth is, when we arranged a conversation on AI chips with Chris Kachris, neither the Stargate Project nor DeepSeek R1 had burst onto the AI scene. Even though we did not consciously anticipate these developments, we knew AI chips is a topic that deserves attention, and Kachris is an insider. Join us as we explore how the AI chip market is shaped today and tomorrow. AI chips and open source AI models are all part of the comprehensive curriculum on Pragmatic AI Training that is being developed by Orchestrate all the Things: https://linkeddataorchestration.com/services/training/pragmatic-ai-training/ Subscribe to the newsletter to be the first to know and enjoy discounted rates: https://linkeddataorchestration.com/orchestrate-all-the-things/newsletter/ Check out the article published here for additional background and references: : https://linkeddataorchestration.com/2025/01/29/ai-chips-in-2025-the-end-of-more-gpus-is-all-you-need/

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to Orchestrate All The Things. I'm George Anatiotis and we'll be connecting the dots together. Stories about technology, data, AI and media and how they flow into each other, shaping our lives. It's early 2025 and we may already be witnessing a redefining moment for AI as we've come to know it in the last couple of years. Is the canon of more GPUs is all you need, about to change. Truth is, when we arranged the conversation on AI chips with Chris Kakris, neither the Stargate project nor DeepSeek R1 had burst onto the AI scene. Even though we did not consciously anticipate these developments, we knew AI chips is a topic that deserves attention, and Kakris is an insider.
Starting point is 00:00:41 I hope you will enjoy this. If you like my work and orchestrate all the things, you can subscribe to my podcast, available on all major platforms, my self-published newsletter, also syndicated on Substack, Hackernan, Medium and Dzone, or follow Orchestrate All the Things on your social media of choice. My great pleasure today to come back to a returning topic, the topic of AI chips. It's something that we first started exploring seven years already back, so it first started in 2018, and I have to say in advance that originally I had no background on the topic whatsoever, other than, you know than the general background that computer
Starting point is 00:01:27 science graduates get. However, at some point I felt that it was important to educate myself on that, for sort of obvious reasons, because AI is and had already been a topic of growing importance, and obviously what's fueing growth and what's also setting the boundaries on what's possible with AI is hardware and more and more we have been witnessing the evolution of hardware as well with more and more specialized and more and more powerful so-called AI chips so in order to explore the latest developments in this area, we have the pleasure of hosting today with us Mr.
Starting point is 00:02:14 Chris Cachris. Chris is a man who wears many hats. He has been in the industry driving the development of a startup called InAxial. He will share a few words on that. And he has also been, and still is, in fact, a researcher working on a number of topics under the AI chips umbrella. So, Chris, thank you very much for making the time for today's conversation.
Starting point is 00:02:41 I'll pass on to you to do a slightly more nuanced intro of yourself than the one I did for you. Thank you George, thank you very much for the invitation. I'm happy to be here. So I finished my PhD in Computer Engineering in Delft University of Technology and then after spending some time in the silicon valley in xilinx then i came back in in greece working on several research centers and at some point i was in the technical university of athens and doing some research on specifically on the fpgas and how they can be utilized in the domain data center and in the cloud. And out of this research, we tried to commercialize our experience.
Starting point is 00:03:36 So this is how we started in Axel with Elias Koromilas and Ioannis Stamelos. And the main goal of InAxel was how to utilize FPGAs in the data center and in the cloud. So we spent five years at InAxel and then I went back to the university, right? And so now I'm in the University of West Attica on the Department of Electrical and Electronic Engineering. And I'm still working on several research areas
Starting point is 00:04:11 in the domain of AI, cloud computing, and especially edge AI. So it is an exciting area. It is a really exciting area. You see a lot of innovation. And you see a lot of innovation. And you see a lot of innovation coming both from the industry and from academia, right. So you see a lot of research efforts from several
Starting point is 00:04:36 universities, especially for the art language model, you see a lot of innovation of AI tips, mostly from industry, because it is an extensive, expensive sport, but also from academia. Right, so it is a very exciting area. Great. Well, thanks. Thanks for introducing yourself and also giving a little bit of background on the area. And since, you know, the whole idea is trying to contextualize the discussion in current developments as much as possible and what's on everyone's minds these days is well a number of things which I find are somehow interrelated and also provide a good background to this conversation so on the one hand that's a few days ago, we have the announcement
Starting point is 00:05:25 of the so-called Stargate initiative, which is kind of, you would maybe call it like a public-private partnership, because it's an initiative that's announced by OpenAI, so the makers of chat GPT and so forth. But at the same time time they have the endorsement and backing of the u.s administration plus a number of industry heavyweights so the idea behind this initiative is to build infrastructure specifically for the us to be to enable them to to compete and to to outcompete rather the rest of the world basically basically. And that's a continuation of what we have been seeing in terms of US policy in the last couple of years. So their policies have basically been trying to give themselves an advantage compared to the rest of the world by doing things such as imposing limitations,
Starting point is 00:06:25 initially specifically for China and the BRICS countries. But as of late, they are also trying to impose limitations to access to GPUs for the rest of the world as well. And at the same time, some days ago as well, we are seeing another interesting development. So the rise of open source models, largely capable large language models, open source language models made in China, that are rapidly climbing the leaderboards of these models and are displaying very interesting capabilities at a fraction of the cost of
Starting point is 00:07:07 what it normally takes to train these types of models for the rest of the world, let's say, and despite having all of these limitations. So that already sort of sets the scene, let's say, for the beginning of the conversation. And I thought that we can have a look at the implications, let's say, of the development of AI chips on different levels. So we are sort of de facto starting with the upper layer. So the geopolitics scene. So what the Chinese companies have managed to achieve is
Starting point is 00:07:49 basically developing top quality, top capability open source large-angle models, while at the same time facing restrictions in the compute power available to them. So I think it's worth exploring how exactly they have managed to do that. Based on the information that we have available to us, it seems that China is not entirely deprived of the latest of GPU power. However, they have very limited access to the latest models. At the same time they seem to be strengthening their internal industry. Chinese companies are developing GPUs of their own and they're also doing something very interesting developing techniques that enable them to mix and match different GPU types and models. I'm sure that you must be able to shed some light on this.
Starting point is 00:08:47 Yeah, very interesting because it is an expensive sport, right? If you are working, especially if you want to develop a processor, an AI processor, especially for the training but also for inference, it is very expensive. So you need to allocate a lot of resources, not only in terms of money, but also in terms of talents, engineering, etc. So the Stargate initiative, it's very... It's complementary to this Chips Act in the US, that it's trying to make a much more stronger the US technology
Starting point is 00:09:28 and not so much depending on third parties like they have to outsource the manufacturing at the TSMC or other companies. And we saw that during coronavirus that it was some limitation and especially in the supply chain, exactly because the whole planet was basically depending on one or two companies based in Taiwan or some other place. So I think it's towards the right direction because they're trying to diversify the fabrication and they want to um to bring back the know-how right because the the manufacturing process was initially developed in the us right and then they outsource it some companies and now i think it makes sense that what they are trying to do is bring it back the expertise that they have right so so it
Starting point is 00:10:26 is definitely towards the right direction and at the same time they have to compete china in which we know that the government strongly support these initiatives um whether it is you know uh the chips whether it is the large language models etc so. So China, the government of China, is very strongly supporting all of these companies working towards these directions, and of course it makes sense also in the US and of course in Europe, to try, the governments there, to try to support these initiatives. Now, it's very interesting because what you mentioned about the China large language model, the nice, the interesting part is that they not only open source the model, but they also open source the training data. That is really something,
Starting point is 00:11:20 because most of the open source models, usually they don't or they hide the training data for several reasons, right? It can be that they inflate some copyrights, it can be anything else. So a lot of companies or a lot of industries or organizations, they are kind of reluctant to open source also the training data because, I don't know, maybe some infringement of the copyright patent something anything right so so it's really something that they open source also the training data right so so and not forget about europe that um i think these initiatives must also be applied also to europe right so definitely europe does not want to stay behind on this one. You know, it's hard to compete with NVIDIA, right? NVIDIA right now has 80-90% of the share market,
Starting point is 00:12:16 especially in the data centers, in the cloud, right? So, and it has huge amounts of money that they are willing, you know, to keep innovating. So it's very hard to compete. But I think what is really interesting is that still there are two things, right? First of all, it is the chips that are used for the training data. And then they are used for the prediction or for the generated ai right because you want to do the predictions or you want to do them to generate some text
Starting point is 00:12:51 the inference chips right so for the training of the data uh so nvidia is the leader and then you have some companies like amd you have the intel company, Intel with the Gaudi chips that they try to compete and they have some very good results. So it's very hard to compete there. But in the domain of inference, when you want not so much of floating point operation, et cetera. This is where there are space for several companies, startups, both in US, in Europe, in China to innovate. And this comes
Starting point is 00:13:36 from the fact that if you are talking about, for example, embedded systems you need different kinds of chips for the text for the video, for different sectors you need different kind of chips for the text for the video for different sectors you need different kind of chips in the factories you need different kind of chips in the hospitals in automotive cars right in autonomous cars etc right so i think there is a space for innovation specifically in the domain of edge AI. Okay, so interestingly you did mention Europe in this kind of landscape let's say of sorts of how different parts of the world are trying to cope and to compete in this AI race,
Starting point is 00:14:26 let's call it. And one of the notable things about Europe is that I've seen many voices, let's say, that are calling for more investment in AI, in compute for AI, and specifically in GPUs. And we would have to assume precisely because of what you said. Well, when people talk about GPUs, they generally mean Nvidia GPUs. So this is one way to go. And it also seems to be the way
Starting point is 00:14:56 that the US Stargate initiative is pointing to. So basically more compute and more power. However, I'm wondering, is there maybe something that the rest of the world can actually learn from the way that the Chinese open source models are being developed? Because it seems to me that maybe they have managed to be more efficient in the way that they do things
Starting point is 00:15:28 and also having to cope with the restrictions that they are facing may be more creative. So not just rely on just throwing more compute at the problem, but trying to make the training process more efficient and also to combine different chips not necessarily the latest models the more powerful ones but doing so in a way that that is more creative let's say do you do you think that is the case and if yes is that something that the rest of the world can learn from definitely it is a good use case because you know there is this motto that first you copy right but then you start innovating right so this is a typical you know for especially
Starting point is 00:16:18 a good way to start is first to copy the leaders right and then to start innovating and we have seen this what is happening for example with the electric, right, and then to start innovating. And we have seen this, what is happening, for example, with the electric cars, right? So, for example, a few years ago, there were a few major Chinese companies developing the electric cars, and now you see that there are some kind of leaders, right, outperforming even Tesla. So, the same, I think, is going to happen also for some domains, right? Outperforming even Tesla. So the same, I think, is going to happen also for some domains, right? First of all, about the large language model, it's very interesting
Starting point is 00:16:55 to see that the fact that they open source also the training data and the fact that they have managed to outperform some other models even with less parameters means that there is room for a lot of innovation. And they have given a very good example. Now the other thing that you mentioned that they can, for example, mix and match. They can combine different versions of GPUs and other processing units in order to create a powerful data center or cloud. This is very useful, especially if you think that now every one year,
Starting point is 00:17:41 in the past, you have to buy some new equipment every three years, every four years, etc. Now the innovation is so fast that almost every year you have more and more powerful chips and more powerful processors. And it does make sense to throw away processors that are one year old, two year old, right? So definitely you need to find a way to utilize the resources, even if it is a kind of heterogeneous resources, right? Even if there are different resources,
Starting point is 00:18:16 not even GPUs, right? You can allocate resource, for example, you can utilize resource from a GPU, a PGA, typical x86 processor, et cetera. And if this would be much more cost efficient, right? Instead of, you know, every time buying them the latest processors and throwing away the older one. So definitely makes sense
Starting point is 00:18:40 and we have to learn something from out of it. Right, so that's a very good cue for the next question I wanted to throw at you, which is sort of giving us a brief tool, let's say, of the other options out there. So if for whatever reason you don't have access to the latest and greatest NVIDIA GPUs,
Starting point is 00:19:03 what else is out there? I know, for for example you are very proficient in FPGAs because this is what you have been working for a number of years. However, there is also other options. So one that has I've personally been trying to keep an eye on for a number of years is the RISC-V architecture and I find that very interesting because it's an application of the open source idea in the domain of hardware so I find that particularly interesting. There's also the idea of chiplets so combining let's say your different units on the same chip and there's also custom chips in A6.
Starting point is 00:19:48 Which one of those do you see as viable for, as a viable GPU alternative? Okay, so if we go to the training part, right, there are some very good initiatives like the one from for example google has its own tpus now it is a version four right and or when they're coming version five that they're really powerful right and amd at the same time has um is releasing mi300 it's also very powerful and it's interesting to see that inside there are specialized units,
Starting point is 00:20:29 accelerating transformer algorithm, that it is the basic block of the chat GPT and the other algorithm, right? Intel has Gaudi chips that they're also very powerful. Right, so you see that there are some alternatives um the the nice thing about for example for example nvidia is that they don't sell just the cheap right so the a very novel idea is that they decided to go to sell the whole system, right? So instead of selling just the chip and trying to support some other vendors,
Starting point is 00:21:13 board vendors or CPU or computer vendors, drug vendors, et cetera, they decided to go vertically and they decided to provide the whole system, this DJX systems, it to go vertically and they decided to provide the whole system this djx systems so that it is ai in a box somehow right so that gives them the room to make some innovation for example they are use their proprietary and the link interfaces right whether right? Whether other companies, they're still based, the communication is still based on Ethernet, that it's not so fast, the NVLink, right?
Starting point is 00:21:54 So one innovation is that NVIDIA decided to provide the whole system, not only the chip, right? So while other companies, for example, in Delhi, they're just selling chips, and then they are based on some other vendors that they have to do the integration. So this creates some problems because usually, especially cloud providers, they want to have the whole system, etc. Or even universities or research centers or companies, they just want to have the box that works seamlessly. The other thing is that they have also a great ecosystem, right? So they have the software platform, they have all of these things. On the other hand, now, if we go to the inference part, right, I there there is a as i said before there is a big room
Starting point is 00:22:46 for a for innovation and you don't need so much powerful devices right so of course nvidia has a10 a30 the most cost efficient t4 example and aws has its own chip called Inferentia. And especially in this domain, FPGA, for example, can prevail because they can provide much lower latency. That is very critical in some applications. When we are talking about inference, we're talking whether you send an image and you want to see the prediction or whether you send a question and you want to see the prediction or whether you send a question and you want to see the answer. So FPGAs are very good because they can provide low latency.
Starting point is 00:23:34 And of course, you can see also other companies. You can see Krog, Cerebras, Graphcore, some of them are several companies that they are trying to gain some market share, especially in this area. And what you mentioned about this, the chiplets that you have, even if you are a startup, you can develop your own transformer accelerator, and you don't need anymore to develop your own chip. You can just provide the IP core that can be fabricated into a chiplet and then the chiplet can be integrated with processors, with memory, etc. It is still very challenging because if you want to develop a chiplet, you need also very fast interconnection network, right?
Starting point is 00:24:30 So that it takes a lot of area, especially in the dye. So there are still some challenges there, but definitely in the domain of FEDS, it makes sense to try to allocate some resources and try to do some differentiation there. And I think also if we talk about, for example, let's say in Europe, right? So currently right now, Qualcomm, Snapdragon, these devices that are everybody's phone, they have specialized units for AI. But in Europe, we have also very good companies like STMicro and XP. Traditionally, we have good companies that they are good, especially, and they have a good market share, especially in the domain of edge and edge AI. So I think there is room for innovation for European companies and the European ecosystem,
Starting point is 00:25:36 especially in the inference part where you see a lot of possibilities right um this is something that is supported by several people right from cambrian ai christos machillamos and several several researchers a supporter there is a room for innovation in the domain of inference especially in the domain of edge ai okay yeah actually yes your um your view i think is corroborated by a lot of independent analysts and and researchers that and also it makes sense from from a go-to-market point of view because even if training is is very expensive computation, actually the bulk of the compute operations in the lifetime of any AI model will be normally in inference. Yes, it does cost a lot to train it, but the actual operation will in due time actually accumulate to a larger amount of compute.
Starting point is 00:26:46 So it also makes sense to focus on inference from a business perspective. Another important aspect and actually a part of the reason why NVIDIA is in the position that they are, is the fact that they have invested heavily in their software stack. So the CUDA platform is pretty much omnipresent, let's say, and people are getting familiar with it from very early on. So it's very extensive and there's a very good developer advocacy program that makes sure that everyone knows how to program there. Obviously, this is something that competitors realize. And therefore, in the last few years, we are seeing efforts to sort of replicate, let's say, not necessarily in terms of feature, but learn from what CUDA has achieved and
Starting point is 00:27:46 try to build an alternative ecosystem. So we are seeing the one API initiative, which is spearheaded by Intel, and also AMD is trying to build its own environment called ROKAN. What's your opinion on these efforts? Do you see them gaining traction and being positioned, that they may be in the future in a position to be competitive to point CUDA started, you know, not only for graphical processing units but for GPUs and for games and etc. but also for high performance computing and then at some point there were a lot of researchers trying to program GPUs in CUDA and of course NVID very fortunate because you know a lot of matrix
Starting point is 00:28:48 multiplications that especially used in the GPUs for video games it's also the the struggling point or the most computational intensive part also for AI for HPC for generated AIative AI, etc. And they built on top of that, right? So they were very good at matrix multiplication. So they were able to transform this innovation also for generative AI and for AI application in general, right? So the software part is very important, right? Because at some point, whether you are a developer, you want to, even if another chip is better at AI or at HPC,
Starting point is 00:29:33 you don't want to rewrite your code at all, right? Whether it is written on Sys, Plus Plus, CUDA, et cetera, you need to be able to support any kind of a chip regardless of the programming language. And I think this is why one of the reason that FPGAs were struggling because it was very hard to program even using high level synthesis, it was very hard to program.
Starting point is 00:30:01 And once it was very hard to program and once it was, they released the OpenCL and HLS high-level synthesis, then it became much easier to program FPGAs. But if we go now in the domain of AI and the generated AI, I think that the companies, especially the vendors, the chip vendors need to provide an easy way to program these devices, right? Even if they have better performance, if it's not easy to program them, then they will not buy it, right?
Starting point is 00:30:37 Because no one wants to rewrite the code. So it is very important that if you are able to program these devices, whether it is in CUDA, ROCA, one API, but especially if it is easy to go from Python, you know, from Keras, from frameworks like this one, from TensorFlow, et cetera, to be able to program this one. And that's why we see that there is for some hiding place right so it's it's very important because using this framework it's much easier to explore and compare different platforms because you have everything there you have the data set you have the your code there and it's very easy to compare different solution there right so so the software part it's it's very easy to compare different solutions there. Right. So the software part, it's very important. Right.
Starting point is 00:31:28 So the software stack, you need to be able to program these chips very easily without having to rewrite your code. Right. Whether it is in CUDA, HLS or C, C++, Python, it is something very important but a lot of vendor companies usually they don't pay so much attention and at some point they they find it in front of them right so because they provide the best results but then nobody is going to bother because you you need to to rewrite your code or you need to learn a new language, right? So it's very important to provide the software stack that it's easy for the end user. Right.
Starting point is 00:32:14 Just out of curiosity, in your work with FPGAs, how did you manage to get around this issue? Yeah. How did you manage to get around this issue? Yeah, so actually this is our competitive advantage, right? So at the beginning, we started providing accelerators for machine learning in the FPGAs, but then we saw that this was the struggling point. I mean, nobody was willing to change how he writes code, right? So what the main innovation of Inaxel was that we developed the middleware
Starting point is 00:32:56 that allowed the data scientists and ML engineers to be able to utilize the FPGAs without having to change a single line of code. So this was the most important thing and that it was recognized by several vendors that we managed to expose the FPGA resources to the end user, to the data scientists, to the machine learning engineers and to be able to utilize the performance of the FPGAs without having to change, for example, their Python code. And just by importing a simple library, they were able to utilize the FPGAs. And this is how we made some traction, and this is how a lot of companies that are starting using our framework
Starting point is 00:33:47 etc right because we we knew that the data scientist and the engineering they don't want to rewrite their code right they don't want to change even a single line of code and this is very important you know for the important lessons for for lessons for anybody working on the domain of semiconductors and processors, right? You need to provide something that is able to be programmed very easy. Right, so that makes me wonder specifically about Intel. And I think it's very relevant to this conversation because of two things.
Starting point is 00:34:28 First, it seems that Intel, in a way, exemplifies what you just mentioned about the software being just as important or maybe even more important than the hardware itself. So, as you mentioned previously, Intel has its own line of AI chips that they got through an acquisition, which is called Gaudi. And the latest Gaudi version, Gaudi 3, seems to have missed its sales target largely, reportedly, due to software basically because of the fact that the software to access it, to program it was not in good shape and therefore that caused them to miss their targets. And another interesting piece of Intel, about Intel is the fact that they seem to be out in the market trying to sell off their FPGA branch which is called Altera and that makes me
Starting point is 00:35:33 wonder even though through the acquisition of Finaxel they seem to have managed to now offer a software layer for FPGAs that, as you mentioned, enables people to use them. They still, judging from the fact that they're not seem to be interested in retaining their FPGA unit, they still don't seem to be able to, well, to make the most of it. How do you explain that? Okay, so FPGAs can prevail in several sectors, especially because it has this low latency and much higher energy efficiency. There is not any killer application for this, but bothilinx and altera when they were standalone they had a lot of revenues because they were able to to sell these fpgs in several sectors from telecommunication sectors that was the main okay the mostly the killer application
Starting point is 00:36:41 was the telecommunication application networking networking, military application, etc. So they have some advantage that it's not always easy to utilize in sectors like, for example, the training part, the data center. However, still, there are some very good use cases, right? So for example, Microsoft Azure, they are utilizing FPGAs in order to process, for example, some searching and because they have coupled very novel, in a very novel way, the FPGAs with the Xeon processors. There are some very good examples of how the FPGAs can prevail and can provide several
Starting point is 00:37:33 advantages. It is not the killer application, for example, the AI training, and this is how, for example, NVIDIA is currently the dominant player there. However, there are still very good use cases, right? I told you about the telecommunication, networking, et cetera, whereas PDAs traditionally are being used and can offer several advantages,
Starting point is 00:37:57 low latency, low energy efficiency, and several other advantages. So I guess then we'll have to come to the conclusion that it has mostly to do with Intel's business strategy than the actual capability of the hardware itself. Yes, I guess so, right. I guess it is in the same way that they tried to to spin off the the foundry they have now they try to have a separate company for the foundry that can serve also third parties right so so i think this is part of the intel strategy to have it separate right Right, okay, so another interesting area to explore which you also touched upon already, inevitably I may add, is well this separation let's say between training and inference and for most organizations actually training I don't think is something that they will get into that much
Starting point is 00:39:07 because it's complicated, it costs lots of resources and also simply because of the fact that on a daily basis we are seeing new very capable models being released either as open source or in proprietary models and there are many ways we are already seeing many ways that people organizations are exploring the use of these models either by leveraging API's typically the API's of OpenAI or Anthropic that are building these proprietary models and making them available under certain licenses and terms and so on so it's a very common way for organizations to start experimenting or potentially many of them actually stay in on that on that trajectory let's say others however
Starting point is 00:40:01 choose to first start that way because it's simpler and faster. And then when they mature their use cases, what many of them do is they take some open source models and then try to adapt them to their use case. So that brings us in the area of inference. And by the way, something interesting that I sort of stumbled upon lately is the fact that you can also, there are already applications and frameworks that enable people to run these large language models, even locally on your local machine, assuming that it's sufficiently, let's say, powerful. So for most people, it's actually the inference that matters. And I know that you were also involved in a research effort recently that was precisely about exploring different ways to enable faster inference specifically for large language
Starting point is 00:41:03 models. So I was wondering if you could summarize your findings and most importantly whether any of these methods and frameworks that you investigated you think has the potential to be directly transferable to how people utilize this and run these models? So it's very interesting area, right? Especially for the inference part. And as you mentioned, most of the companies, even for companies, even if they want to have a specialized LLM, what they do is transfer learning,
Starting point is 00:41:40 meaning that they use an open source language model and they try to fine tune it or adapt it, fine tune it using their own documents, right? In order to make it more relevant to their area, right? So, and then they mostly care about the inference part. Now the inference part now the inference part is is very tricky because you know you don't need to go into 64-bit or 32-bits you can even do the processing using 16-bit or 8-bit or even 4-bits right so you need specialized architectures.
Starting point is 00:42:28 And there are even some startup companies that they are doing, for example, chips that it is specialized only for the transformer algorithm that it is the most computational intensive part. Now, in the domain of, you know, what is from commercial availability right now, it seems that FPGAs and GPU can provide the best performance. And when it comes to energy efficiency, maybe FPGAs can provide better energy efficiency compared to GPUs. But if we look at a little bit long term, right in three years or two years from now,
Starting point is 00:43:09 there are some potential, especially when we're talking about in-memory computing and the memory storage in general, meaning that there is this technology that you can couple together now you have the memory and you have the computing path with in-memory computing and memory store etc. you can combine together the memory and the computing power in the same way like our brain works right so if if this if this technology in memory computing can be commercially available
Starting point is 00:44:00 in a cost efficient way then it can provide much better performance compared to the typical processing technology that is currently using the typical CMOS technology. So I think that in a couple of years, we are going to see some novel technology using in-memory computing memories, et cetera, neuromorphic computing, as they call it, right? That is based on a neuromorphic computing that can provide much better performance. Currently, we see that the performance that the current chips have, it's really impressive,
Starting point is 00:44:48 but in terms of energy efficiency, it's much, much lower compared to how our brain works, right? Neuromorphic computing and in-memory computing, it's much more closer to how our brain works and it is much more energy efficient so i i think there is a room for innovation and some room for research especially in this in this domain as long as you know there are some cost efficient solutions that can be commercial developments not very exotic yeah actually that's a very important parameter and one that I'm personally glad to see that many vendors are starting to pay more attention to and emphasize. So it's not just about performance itself, but it's also about the performance to energy consumption ratio. So yes, and in that respect having an energy efficient solution is
Starting point is 00:45:49 very important for a number of reasons ranging from financial to environmental. So is this a direction that you see your research moving towards? Yeah, so especially in neuromorphic computing, you need some mix of experts, right, a group of experts, because it has to do with also with analog electronics, digital electronics, etc., right? So you need a larger group, right, in order to do some research in this domain that can have some expertise both in the analog mix signal digital domain etc but yeah definitely i think that some new technologies right whether it is neuromorphic computing or some other thing, definitely it's going to provide some much more energy efficient solution because the energy is translated to the energy bill, right, because at the bottom line, most of the users are interested how much tokens you have per
Starting point is 00:47:05 per dollar right and the donor is is depending especially on how much this processor consuming right so so so definitely i think it makes sense to to try to explore some other uh solution uh in order to find some much more energy efficient or cost efficient solutions. Right, so speaking of users and again sort of coming back to how most users end up using the available AI models and technology, again for most of them it's either through someone like OpenAI or Anthropic or which is probably the most typical scenario through one of the hyperscalers and I think it would be a miss of us not to at least mention what the hyperscalers are doing in this area so I would my summary would be that it's a kind of competition game. So all of them are very much collaborating with Nvidia because obviously it's the dominant player. So they have to
Starting point is 00:48:16 stack up their data centers and they have to provide the latest and greatest options to their users. But at the same time, because they don't want to be dependent on a third party, they're also investing heavily in developing their own custom technologies and AI chips. So I was wondering if you would like to offer like a brief take on each of the hyperscaler strategy and which ones you see potentially standing out and in what ways? It's a nice question because you know there are if you are a cheap vendor there are several ways that you can try to sell and promote your product.
Starting point is 00:49:05 And we have seen some of the cheap vendors, Krog or Graf, etc. We see that some of the vendors, of the cheap vendors, they even try to have their own resources or their own cloud and provide their FPGA or their product as a service, right? So we see that in some cases, this is also a good market fitter or a solution to go that instead of just trying to sell the chips, you just keep the chips for yourself and you
Starting point is 00:49:41 build your own infrastructure, right? And you provide the infrastructure as a service And you provide the infrastructure as a service or you provide the application as a service, right? So this is also an interesting strategy, right? What you mentioned now about the hyperscaler data center, this is very interesting because for example, we saw that AWS, they have their own chips, right? They have the inferential
Starting point is 00:50:06 and Google has its own chips TPU, but at the same time, they need to have also Nvidia GPUs to offer to their clients, right? So it's a balance that they have to do some balancing between their own chips and the NVIDIA to offer third-party solution at the same time. So you cannot exclude it, right? So you cannot only offer TPUs and not NVIDIA since most of the users are using NVIDIA chips, right?
Starting point is 00:50:42 But at the same time, you need to have a competitive advantage. So it's very interesting story and it is very elegant, you know, to try to balance these two forces, right? But I don't know, we'll see what happens in this domain. Yeah, I think one interesting development, let's say, along those lines is the fact that to some extent, at least, it seems to me that NVIDIA is also trying to do their way of addressing the very large inference market in a way that is also useful to their users. Yes, I guess that the hyperscaler provider will not be very happy about this initiative,
Starting point is 00:51:39 right, because it is competitive to their core market. So, and Nvidia on the other hand, does not want to lose their clients, right? Because Amazon, AWS, et cetera, it's the largest customers, right? So, it is really, you know, I'm also very curious to see what will happen and if Nvidia will um if nvidia will
Starting point is 00:52:07 manage to attract a lot of users and attract users from other space right or from aws or from azure or from a big cloud right so it's it's it's it's a very elegant, trying to balance these two markets. You need to handle it very carefully, I think, right? Because you don't want to lose also your main customers. Indeed. Yeah. Okay. So I think we're close to wrapping up.
Starting point is 00:52:39 We have just a few minutes left so let's do that by asking you to highlight some of the directions that you think are most promising for future developments. You already pointed towards neuromorphic computing and we sort of covered chiplets a little bit as well. There are also more exotic, let's say, solutions such as Photonic. We have seen with interest some new developments in Photonic that seem sort of promising. And again, coming back to the software stack,
Starting point is 00:53:22 we are also seeing the emergence of new programming languages that are specifically tailored for for AI models. And the promise that they bring is being closer to the hardware such as Mojo, for example, which is sort of like reimagining, let's say of Python, but with being tailored specifically for for AI chips and being more performant. Which one of those do you see as more promising going forward? It's hard to tell, but I think we are going to move like we are going to see a lot of new vendors especially in the domain of embedded systems and edge AI
Starting point is 00:54:09 where we will see I think specialized companies specialized for wearable devices or specialized for military specialized for video or for text.
Starting point is 00:54:26 And I think there is a huge market, right? Especially if we talk about the edge AI. And even if NVIDIA has a big percentage, even at this domain, there is room for some innovation. There is room for some companies. I think it's going to happen the same thing that happened, for example, let's talk about the GPUs, for example, right? So, NVIDIA is the leader in GPUs, but there was lack of GPUs for a wearable device, for example, for smartwatches and for a FitBand, etc. And we saw that, for example, a Greek company,
Starting point is 00:55:08 the Think Silicon, they developed a GPU that it was specialized for a fit band or for smart watches, etc. And it was acquired by applied material, right? So I think the main innovation is going to happen in areas that are too small for companies like NVIDIA or Intel or some other companies, but it is good enough for smaller companies that they can make specialized products for this area. So, and there we can see some exotic solutions. It can be neuro-morphic, it could be, for example, photonic, who will see, right? In memory computing, et cetera.
Starting point is 00:55:53 But exactly because there are different requirements and different specification in the domain of Edge AI, I think there is room for innovation in this area, especially in the HCI, for video, for text, different requirements are for, for example, for the hospitals, for the autonomous driving, for the aviation, anything, consumer electronics, etc. So I think I would definitely do some research, especially in the edge AI. Great. Well, thank you. Thank you very much for the conversation. Actually, my goal always when having conversations like this is to learn something
Starting point is 00:56:47 from the people I'm having the conversation with. And that definitely was the case today. So thank you very much for your time. And good luck with whatever it is that you choose to focus your next research on. Thank you. Thank you, George. Thank you for the invitation. Thanks for sticking around.
Starting point is 00:57:12 For more stories like this, check the link in bio and follow Link Data Orchestration.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.