Orchestrate all the Things - AI Chips in 2025: The end of “more GPUs is all you need”? Featuring InAccel CEO / Founder Chris Cachris
Episode Date: January 29, 2025It’s early 2025, and we may already be witnessing a redefining moment for AI as we’ve come to know it in the last couple of years. Is the canon of “more GPUs is all you need” about to cha...nge? Truth is, when we arranged a conversation on AI chips with Chris Kachris, neither the Stargate Project nor DeepSeek R1 had burst onto the AI scene. Even though we did not consciously anticipate these developments, we knew AI chips is a topic that deserves attention, and Kachris is an insider. Join us as we explore how the AI chip market is shaped today and tomorrow. AI chips and open source AI models are all part of the comprehensive curriculum on Pragmatic AI Training that is being developed by Orchestrate all the Things: https://linkeddataorchestration.com/services/training/pragmatic-ai-training/ Subscribe to the newsletter to be the first to know and enjoy discounted rates: https://linkeddataorchestration.com/orchestrate-all-the-things/newsletter/ Check out the article published here for additional background and references: : https://linkeddataorchestration.com/2025/01/29/ai-chips-in-2025-the-end-of-more-gpus-is-all-you-need/
Transcript
Discussion (0)
Welcome to Orchestrate All The Things. I'm George Anatiotis and we'll be connecting the dots together.
Stories about technology, data, AI and media and how they flow into each other, shaping our lives.
It's early 2025 and we may already be witnessing a redefining moment for AI as we've come to know it in the last couple of years.
Is the canon of more GPUs is all you need, about to change. Truth is, when we arranged the conversation on AI chips with Chris Kakris,
neither the Stargate project nor DeepSeek R1 had burst onto the AI scene.
Even though we did not consciously anticipate these developments,
we knew AI chips is a topic that deserves attention,
and Kakris is an insider.
I hope you will enjoy this.
If you like my work and orchestrate all the things,
you can subscribe to my podcast, available on all major platforms, my self-published newsletter,
also syndicated on Substack, Hackernan, Medium and Dzone, or follow Orchestrate All the Things
on your social media of choice. My great pleasure today to come back to a returning topic, the topic of AI chips. It's
something that we first started exploring seven years already back, so it first started in 2018,
and I have to say in advance that originally I had no background on the topic whatsoever,
other than, you know than the general background that computer
science graduates get.
However, at some point I felt that it was important to educate myself on that, for sort
of obvious reasons, because AI is and had already been a topic of growing importance,
and obviously what's fueing growth and what's also
setting the boundaries on what's possible with AI is hardware and more and more we have been
witnessing the evolution of hardware as well with more and more specialized and more and more
powerful so-called AI chips so in order to explore the latest developments in this area,
we have the pleasure of hosting today with us Mr.
Chris Cachris.
Chris is a man who wears many hats.
He has been in the industry
driving the development of a startup called InAxial.
He will share a few words on that.
And he has also been, and still is, in fact, a researcher
working on a number of topics under the AI chips umbrella.
So, Chris, thank you very much for making the time for today's conversation.
I'll pass on to you to do a slightly more nuanced
intro of yourself than the one I did for you.
Thank you George, thank you very much for the invitation. I'm happy to be here.
So I finished my PhD in Computer Engineering in Delft University of
Technology and then after spending some time in the silicon valley in xilinx then i came back in
in greece working on several research centers and at some point i was in the technical university
of athens and doing some research on specifically on the fpgas and how they can be utilized in the domain data center and in the cloud.
And out of this research, we tried to commercialize our experience.
So this is how we started in Axel with Elias Koromilas and Ioannis Stamelos.
And the main goal of InAxel was how to utilize FPGAs
in the data center and in the cloud.
So we spent five years at InAxel
and then I went back to the university, right?
And so now I'm in the University of West Attica
on the Department of Electrical and Electronic Engineering.
And I'm still working on several research areas
in the domain of AI, cloud computing,
and especially edge AI.
So it is an exciting area.
It is a really exciting area.
You see a lot of innovation.
And you see a lot of innovation. And you see a lot of
innovation coming both from the industry and from academia,
right. So you see a lot of research efforts from several
universities, especially for the art language model, you see a
lot of innovation of AI tips, mostly from industry, because it is an extensive, expensive
sport, but also from academia. Right, so it is a very exciting area.
Great. Well, thanks. Thanks for introducing yourself and also giving a little bit of background
on the area. And since, you know, the whole idea is trying to contextualize the discussion in
current developments as much as possible and what's on everyone's minds these days is well
a number of things which I find are somehow interrelated and also provide a good background
to this conversation so on the one hand that's a few days ago, we have the announcement
of the so-called Stargate initiative, which is kind of, you would maybe call it like a public-private
partnership, because it's an initiative that's announced by OpenAI, so the makers of chat GPT
and so forth. But at the same time time they have the endorsement and backing of the
u.s administration plus a number of industry heavyweights so the idea behind this initiative
is to build infrastructure specifically for the us to be to enable them to to compete and to to
outcompete rather the rest of the world basically basically. And that's a continuation of what we have been seeing in terms of US policy in the last couple of years.
So their policies have basically been trying to give themselves an advantage compared to the rest of the world
by doing things such as imposing limitations,
initially specifically for China and the BRICS countries.
But as of late, they are also trying to impose
limitations to access to GPUs for the rest of the world as well.
And at the same time, some days ago as well,
we are seeing another interesting development.
So the rise of open source models,
largely capable large language models, open source language models made in China,
that are rapidly climbing the leaderboards of these models and are displaying very interesting capabilities at a fraction of the cost of
what it normally takes to train these types of models for the rest of the world, let's
say, and despite having all of these limitations.
So that already sort of sets the scene, let's say, for the beginning of the conversation.
And I thought that we can have a look at the
implications, let's say, of the development of AI chips on different levels.
So we are sort of de facto starting with the upper layer.
So the geopolitics scene.
So what the Chinese companies have managed to achieve is
basically developing top quality, top capability open source large-angle models, while at the same
time facing restrictions in the compute power available to them. So I think it's worth exploring how exactly
they have managed to do that. Based on the information that we have available to us,
it seems that China is not entirely deprived of the latest of GPU power. However, they have very
limited access to the latest models. At the same time they seem to be strengthening their internal
industry. Chinese companies are developing GPUs of their own and they're also doing something
very interesting developing techniques that enable them to mix and match different GPU types
and models. I'm sure that you must be able to shed some light on this.
Yeah, very interesting because it is an expensive sport, right? If you are working, especially if
you want to develop a processor, an AI processor, especially for the training but also for inference,
it is very expensive.
So you need to allocate a lot of resources, not only in terms of money,
but also in terms of talents, engineering, etc.
So the Stargate initiative, it's very...
It's complementary to this Chips Act in the US,
that it's trying to make a much more stronger the US technology
and not so much depending on third parties like they have to outsource the manufacturing at the
TSMC or other companies. And we saw that during coronavirus that it was some limitation and especially
in the supply chain, exactly because the whole planet was basically depending on one or two
companies based in Taiwan or some other place.
So I think it's towards the right direction because they're trying to diversify the fabrication and they want to um to bring back the
know-how right because the the manufacturing process was initially developed in the us right
and then they outsource it some companies and now i think it makes sense that what they are trying
to do is bring it back the expertise that they have right so so it
is definitely towards the right direction and at the same time they have to compete china in which
we know that the government strongly support these initiatives um whether it is you know uh the chips
whether it is the large language models etc so. So China, the government of China, is very strongly supporting all of these companies
working towards these directions, and of course it makes sense also in the US and of course in Europe,
to try, the governments there, to try to support these initiatives.
Now, it's very interesting because what you mentioned about
the China large language model, the nice, the interesting part is that they not only open
source the model, but they also open source the training data. That is really something,
because most of the open source models, usually they don't or they hide the training data for several reasons, right?
It can be that they inflate some copyrights, it can be anything else.
So a lot of companies or a lot of industries or organizations, they are kind of reluctant to open source also the training data because,
I don't know, maybe some infringement of the copyright patent something anything right so so it's really something that they open source also the training
data right so so and not forget about europe that um i think these initiatives must also be applied
also to europe right so definitely europe does not want to stay behind on this one.
You know, it's hard to compete with NVIDIA, right?
NVIDIA right now has 80-90% of the share market,
especially in the data centers, in the cloud, right?
So, and it has huge amounts of money
that they are willing, you know, to keep innovating.
So it's very hard to compete.
But I think what is really interesting is that still there are two things, right?
First of all, it is the chips that are used for the training data.
And then they are used for the prediction or for the generated ai
right because you want to do the predictions or you want to do them to generate some text
the inference chips right so for the training of the data uh so nvidia is the leader and then you
have some companies like amd you have the intel company, Intel with the Gaudi chips that they try to compete
and they have some very good results.
So it's very hard to compete there.
But in the domain of inference, when you want not so much
of floating point operation, et cetera. This is where there are space for several
companies, startups, both in US, in Europe, in China
to innovate. And this comes
from the fact that if you are talking about, for example, embedded systems
you need different kinds of chips for the text
for the video, for different sectors you need different kind of chips for the text for the video for different sectors you
need different kind of chips in the factories you need different kind of chips in the
hospitals in automotive cars right in autonomous cars etc right so
i think there is a space for innovation specifically in the domain of edge AI.
Okay, so interestingly you did mention Europe in this kind of landscape let's say of sorts of
how different parts of the world are trying to cope and to compete in this AI race,
let's call it. And one of the notable things about Europe is that I've seen
many voices, let's say, that are calling for more investment in AI, in
compute for AI, and specifically in GPUs. And we would have to assume precisely
because of what you said.
Well, when people talk about GPUs,
they generally mean Nvidia GPUs.
So this is one way to go.
And it also seems to be the way
that the US Stargate initiative is pointing to.
So basically more compute and more power.
However, I'm wondering, is there maybe something
that the rest of the world can actually learn
from the way that the Chinese open source models
are being developed?
Because it seems to me that maybe they have managed
to be more efficient in the way that they do things
and also having to cope with the restrictions that they are facing may be more creative.
So not just rely on just throwing more compute at the problem,
but trying to make the training process more efficient and also to combine different
chips not necessarily the latest models the more powerful ones but doing so in a way that
that is more creative let's say do you do you think that is the case and if yes
is that something that the rest of the world can learn from
definitely it is a good use case because you know there is this motto that first
you copy right but then you start innovating right so this is a typical you know for especially
a good way to start is first to copy the leaders right and then to start innovating
and we have seen this what is happening for example with the electric, right, and then to start innovating. And we have seen this, what is happening, for example,
with the electric cars, right?
So, for example, a few years ago, there were a few major Chinese companies
developing the electric cars, and now you see that there are
some kind of leaders, right, outperforming even Tesla.
So, the same, I think, is going to happen also for some domains, right? Outperforming even Tesla. So the same, I think, is going to happen also
for some domains, right? First of all, about the large language model, it's very interesting
to see that the fact that they open source also the training data and the fact that they have managed to outperform some other models even with less
parameters means that there is room for a lot of innovation.
And they have given a very good example.
Now the other thing that you mentioned that they can, for example, mix and match. They can combine different versions of GPUs
and other processing units
in order to create a powerful data center or cloud.
This is very useful,
especially if you think that now every one year,
in the past, you have to buy some new equipment every three years,
every four years, etc.
Now the innovation is so fast that almost every year you have more and more powerful
chips and more powerful processors.
And it does make sense to throw away processors that are one year old, two year old, right? So definitely you need to find a way
to utilize the resources,
even if it is a kind of heterogeneous resources, right?
Even if there are different resources,
not even GPUs, right?
You can allocate resource, for example,
you can utilize resource from a GPU,
a PGA, typical x86 processor, et cetera.
And if this would be much more cost efficient, right?
Instead of, you know, every time buying them
the latest processors and throwing away the older one.
So definitely makes sense
and we have to learn something from out of it.
Right, so that's a very good cue
for the next question I wanted to throw at you,
which is sort of giving us a brief tool,
let's say, of the other options out there.
So if for whatever reason
you don't have access to the latest
and greatest NVIDIA GPUs,
what else is out there?
I know, for for example you are very
proficient in FPGAs because this is what you have been working for a number of years. However,
there is also other options. So one that has I've personally been trying to keep an eye on
for a number of years is the RISC-V architecture and I find that very interesting because
it's an application of the open source idea in the domain of hardware so I find that
particularly interesting. There's also the idea of chiplets so combining let's say your
different units on the same chip and there's also custom chips in A6.
Which one of those do you see as viable for,
as a viable GPU alternative?
Okay, so if we go to the training part, right,
there are some very good initiatives like the one from for example
google has its own tpus now it is a version four right and or when they're coming version five
that they're really powerful right and amd at the same time has um is releasing mi300
it's also very powerful and it's interesting to see that inside
there are specialized units,
accelerating transformer algorithm,
that it is the basic block of the chat GPT
and the other algorithm, right?
Intel has Gaudi chips that they're also very powerful.
Right, so you see that there are some alternatives
um the the nice thing about for example for example nvidia is that they don't sell just
the cheap right so the a very novel idea is that they decided to go to sell the whole system, right? So instead of selling just the chip
and trying to support some other vendors,
board vendors or CPU or computer vendors,
drug vendors, et cetera,
they decided to go vertically
and they decided to provide the whole system,
this DJX systems, it to go vertically and they decided to provide the whole system this djx systems so that it is
ai in a box somehow right so that gives them the room to make some innovation for example they are
use their proprietary and the link interfaces right whether right? Whether other companies, they're still based, the
communication is still based on Ethernet, that it's not so fast, the NVLink, right?
So one innovation is that NVIDIA decided to provide the whole system, not only the chip,
right? So while other companies, for example, in Delhi, they're just
selling chips, and then they are based on some other vendors that they have to do the integration.
So this creates some problems because usually, especially cloud providers,
they want to have the whole system, etc. Or even universities or research centers or companies, they just want to have the box
that works seamlessly. The other thing is that they have also a great ecosystem, right?
So they have the software platform, they have all of these things. On the other hand, now,
if we go to the inference part, right, I there there is a as i said before there is a big room
for a for innovation and you don't need so much powerful devices right so of course nvidia has
a10 a30 the most cost efficient t4 example and aws has its own chip called Inferentia. And especially in this domain, FPGA, for example, can prevail
because they can provide much lower latency.
That is very critical in some applications.
When we are talking about inference, we're talking whether you send an image
and you want to see the prediction or whether you send a question and you want to see the prediction or whether you send a question
and you want to see the answer.
So FPGAs are very good because they can provide low latency.
And of course, you can see also other companies.
You can see Krog, Cerebras, Graphcore, some of them are several companies that they are trying to gain some market share,
especially in this area. And what you mentioned about this, the chiplets that you have,
even if you are a startup, you can develop your own transformer accelerator, and you don't need anymore to develop your own chip.
You can just provide the IP core that can be fabricated into a chiplet and then the
chiplet can be integrated with processors, with memory, etc.
It is still very challenging because if you want to develop a chiplet, you need also very
fast interconnection network, right?
So that it takes a lot of area, especially in the dye.
So there are still some challenges there, but definitely in the domain of FEDS, it makes sense to try to allocate some resources and try to do some
differentiation there. And I think also if we talk about, for example, let's say in Europe,
right? So currently right now, Qualcomm, Snapdragon, these devices that are everybody's phone, they have specialized units for AI.
But in Europe, we have also very good companies like STMicro and XP.
Traditionally, we have good companies that they are good, especially, and they have a
good market share, especially in the domain of edge and edge AI.
So I think there is room for innovation for European companies and the European ecosystem,
especially in the inference part where you see a lot of possibilities right um this is something that is supported by several people right from cambrian
ai christos machillamos and several several researchers a supporter there is a room for
innovation in the domain of inference especially in the domain of edge ai okay yeah actually yes your um your view i think is corroborated by a lot
of independent analysts and and researchers that and also it makes sense from from a go-to-market
point of view because even if training is is very expensive computation, actually the bulk of the compute operations
in the lifetime of any AI model will be normally in inference.
Yes, it does cost a lot to train it, but the actual operation will in due time actually
accumulate to a larger amount of compute.
So it also makes sense to focus on inference from a business perspective.
Another important aspect and actually a part of the reason why NVIDIA is in the
position that they are, is the fact that they have invested heavily in their software stack.
So the CUDA platform is pretty much omnipresent, let's say, and people are getting familiar
with it from very early on.
So it's very extensive and there's a very good developer advocacy program that makes sure that everyone knows how to program there.
Obviously, this is something that competitors realize.
And therefore, in the last few years, we are seeing efforts to sort of replicate, let's say, not necessarily in terms of feature, but learn from what CUDA has achieved and
try to build an alternative ecosystem.
So we are seeing the one API initiative, which is spearheaded by Intel, and also AMD is trying
to build its own environment called ROKAN.
What's your opinion on these efforts? Do you see them gaining traction and being positioned,
that they may be in the future in a position to be competitive to point CUDA started, you know, not only for
graphical processing units but for GPUs and for games and etc. but also for high performance
computing and then at some point there were a lot of researchers trying to program GPUs
in CUDA and of course NVID very fortunate because you know a lot of matrix
multiplications that especially used in the GPUs for video games it's also the the struggling point
or the most computational intensive part also for AI for HPC for generated AIative AI, etc. And they built on top of that, right?
So they were very good at matrix multiplication.
So they were able to transform this innovation also for generative AI
and for AI application in general, right?
So the software part is very important, right?
Because at some point, whether you are a developer,
you want to, even if another chip is better at AI or at HPC,
you don't want to rewrite your code at all, right?
Whether it is written on Sys, Plus Plus, CUDA, et cetera,
you need to be able to support any kind of a chip
regardless of the programming language. And I think this is why one of the reason
that FPGAs were struggling
because it was very hard to program
even using high level synthesis,
it was very hard to program.
And once it was very hard to program and once it was,
they released the OpenCL and HLS high-level synthesis, then it became much easier to program FPGAs.
But if we go now in the domain of AI and the generated AI,
I think that the companies, especially the vendors,
the chip vendors need to provide an easy way
to program these devices, right?
Even if they have better performance, if it's not easy to program them, then they will not
buy it, right?
Because no one wants to rewrite the code.
So it is very important that if you are able to program these devices, whether it is in CUDA, ROCA, one API, but especially if it is easy to go from Python, you know, from Keras, from frameworks like this one, from TensorFlow, et cetera, to be able to program this one.
And that's why we see that there is for some hiding place right so
it's it's very important because using this framework it's much easier to
explore and compare different platforms because you have everything there you
have the data set you have the your code there and it's very easy to compare
different solution there right so so the software part it's it's very easy to compare different solutions there. Right. So the software part, it's very important.
Right.
So the software stack, you need to be able to program these chips very easily without having to rewrite your code.
Right.
Whether it is in CUDA, HLS or C, C++, Python, it is something very important but a lot of vendor companies usually they don't pay so much attention
and at some point they they find it in front of them right so because they provide the best
results but then nobody is going to bother because you you need to to rewrite your code or you need to learn a new language, right?
So it's very important to provide the software stack
that it's easy for the end user.
Right.
Just out of curiosity,
in your work with FPGAs,
how did you manage to get around this issue?
Yeah. How did you manage to get around this issue? Yeah, so actually this is our competitive advantage, right?
So at the beginning, we started providing accelerators for machine learning in the FPGAs,
but then we saw that this was the struggling point.
I mean, nobody was willing to change how he writes code, right?
So what the main innovation of Inaxel was that we developed the middleware
that allowed the data scientists and ML engineers to be able to utilize
the FPGAs without having to change a single line of code.
So this was the most important thing and that it was recognized by several vendors
that we managed to expose the FPGA resources to the end user, to the data scientists, to the
machine learning engineers and to be able to utilize
the performance of the FPGAs without having to change, for example, their Python code.
And just by importing a simple library, they were able to utilize the FPGAs.
And this is how we made some traction, and this is how a lot of companies that are starting using our framework
etc right because we we knew that the data scientist and the engineering they don't want
to rewrite their code right they don't want to change even a single line of code and this is
very important you know for the important lessons for for lessons for anybody working on the domain of semiconductors
and processors, right?
You need to provide something that is able
to be programmed very easy.
Right, so that makes me wonder specifically about Intel.
And I think it's very relevant to this conversation because of two things.
First, it seems that Intel, in a way, exemplifies what you just mentioned about the software being just as important or maybe even more important than the hardware itself. So, as you mentioned previously, Intel has its own line of AI chips
that they got through an acquisition, which is called Gaudi.
And the latest Gaudi version, Gaudi 3, seems to have missed its sales target
largely, reportedly, due to software basically because of the fact
that the software to access it, to program it was not in good shape and therefore that
caused them to miss their targets.
And another interesting piece of Intel, about Intel is the fact that they seem to be out in the market
trying to sell off their FPGA branch which is called Altera and that makes me
wonder even though through the acquisition of Finaxel they seem to have
managed to now offer a software layer for FPGAs that, as you mentioned, enables people to use them.
They still, judging from the fact that they're not seem to be interested in retaining their FPGA unit,
they still don't seem to be able to, well, to make the most of it.
How do you explain that?
Okay, so FPGAs can prevail in several sectors, especially because it has this low latency and much higher energy efficiency.
There is not any killer application for this, but bothilinx and altera when they were standalone
they had a lot of revenues because they were able to to sell these fpgs in several sectors from telecommunication sectors that was the main okay the mostly the killer application
was the telecommunication application networking networking, military application, etc.
So they have some advantage that it's not always easy to utilize in sectors like, for example, the training part, the data center.
However, still, there are some very good use cases, right? So for example,
Microsoft Azure, they are utilizing FPGAs in order to process, for example, some
searching and
because they have coupled very
novel, in a very novel way, the FPGAs with the Xeon processors.
There are some very good examples of how the FPGAs can prevail and can provide several
advantages.
It is not the killer application, for example, the AI training, and this is how, for example,
NVIDIA is currently the dominant player there.
However, there are still very good use cases, right?
I told you about the telecommunication,
networking, et cetera,
whereas PDAs traditionally are being used
and can offer several advantages,
low latency, low energy efficiency,
and several other advantages.
So I guess then we'll have to come to the conclusion that it has mostly to do with Intel's business strategy than the actual capability of the hardware itself.
Yes, I guess so, right. I guess it is in the same way that they tried to to spin off the the foundry
they have now they try to have a separate company for the foundry that can serve also third parties
right so so i think this is part of the intel strategy to have it separate right Right, okay, so another interesting area to explore which you also touched upon
already, inevitably I may add, is well this separation let's say between training and
inference and for most organizations actually training I don't think is something that they will get into that much
because it's complicated, it costs lots of resources and also simply because of the fact that
on a daily basis we are seeing new very capable models being released either as open source or
in proprietary models and there are many ways we are
already seeing many ways that people organizations are exploring the use of
these models either by leveraging API's typically the API's of OpenAI or
Anthropic that are building these proprietary models and making them available under certain
licenses and terms and so on so it's a very common way for organizations to start experimenting
or potentially many of them actually stay in on that on that trajectory let's say others however
choose to first start that way because it's simpler and faster.
And then when they mature their use cases, what many of them do is they take some open
source models and then try to adapt them to their use case.
So that brings us in the area of inference.
And by the way, something interesting that I sort of stumbled upon lately is the fact that you can also, there are already applications and frameworks that enable people to run these large language models, even locally on your local machine, assuming that it's sufficiently, let's say, powerful.
So for most people, it's actually the inference that matters.
And I know that you were also involved in a research effort recently that was precisely
about exploring different ways to enable faster inference specifically for large language
models. So I was wondering if you could summarize your findings and most importantly whether
any of these methods and frameworks that you investigated you think has the potential to
be directly transferable to how people utilize this and run these models? So it's very interesting area, right?
Especially for the inference part.
And as you mentioned, most of the companies,
even for companies,
even if they want to have a specialized LLM,
what they do is transfer learning,
meaning that they use an open source language model
and they try to fine tune it or adapt it,
fine tune it using their own documents, right?
In order to make it more relevant to their area, right?
So, and then they mostly care about the inference part.
Now the inference part now the inference part is is very tricky because you know you don't need to go
into 64-bit or 32-bits you can even do the processing using 16-bit or 8-bit or even 4-bits
right so you need specialized architectures.
And there are even some startup companies that they are doing, for example, chips
that it is specialized only for the transformer algorithm
that it is the most computational intensive part.
Now, in the domain of, you know, what is from commercial availability right now, it seems that FPGAs and GPU can
provide the best performance.
And when it comes to energy efficiency, maybe FPGAs can provide better energy efficiency
compared to GPUs.
But if we look at a little bit long term, right in three years or two years from now,
there are some potential, especially when we're talking about in-memory computing and
the memory storage in general, meaning that there is this technology that you can
couple together
now you have the memory and you have the computing path
with in-memory computing and memory store
etc. you can
combine together the memory and the computing power in the same way like our brain
works right so if if this if this technology in memory computing can be commercially available
in a cost efficient way then it can provide much better performance compared to the typical
processing technology that is currently using the typical CMOS technology.
So I think that in a couple of years, we are going to see some novel technology
using in-memory computing memories, et cetera,
neuromorphic computing, as they call it, right?
That is based on a neuromorphic computing
that can provide much better performance.
Currently, we see that the performance that the current chips have, it's really impressive,
but in terms of energy efficiency, it's much, much lower compared to how our brain works,
right?
Neuromorphic computing and in-memory computing, it's much more closer to how our brain works
and it is much more energy efficient so i i think there is a room for
innovation and some room for research especially in this in this domain as long as you know there
are some cost efficient solutions that can be commercial developments not very exotic
yeah actually that's a very important parameter and one that I'm personally glad to see that many vendors are starting to pay more attention to and emphasize.
So it's not just about performance itself, but it's also about the performance to energy consumption ratio. So yes, and in that respect having an energy efficient solution is
very important for a number of reasons ranging from financial to environmental. So is this a
direction that you see your research moving towards? Yeah, so especially in neuromorphic computing, you need some mix of experts, right,
a group of experts, because it has to do with also with analog electronics, digital electronics,
etc., right? So you need a larger group, right, in order to do some research in this domain that can
have some expertise both in the analog mix signal digital domain etc but yeah definitely i think
that some new technologies right whether it is neuromorphic computing or some other thing, definitely
it's going to provide some much more energy efficient solution because the energy is translated
to the energy bill, right, because at the bottom line, most of the users are interested how much tokens you have per
per dollar right and the donor is is depending especially on how much this processor consuming
right so so so definitely i think it makes sense to to try to explore some other uh solution
uh in order to find some much more energy efficient or cost efficient solutions.
Right, so speaking of users and again sort of coming back to how most users end up using the
available AI models and technology, again for most of them it's either through someone like OpenAI or Anthropic or which is probably the most
typical scenario through one of the hyperscalers and I think it would be a miss of us not to at
least mention what the hyperscalers are doing in this area so I would my summary would be that it's a kind of competition game. So all of them are very
much collaborating with Nvidia because obviously it's the dominant player. So they have to
stack up their data centers and they have to provide the latest and greatest options
to their users. But at the same time, because they don't want to be dependent on a third party, they're also investing heavily in developing their
own custom technologies and AI chips.
So I was wondering if you would like to offer like a brief take on each of the hyperscaler
strategy and which ones you see potentially standing
out and in what ways? It's a nice question because you know there are if
you are a cheap vendor there are several ways that you can try to sell and
promote your product.
And we have seen some of the cheap vendors,
Krog or Graf, etc.
We see that some of the vendors, of the cheap vendors,
they even try to have their own resources
or their own cloud and provide their FPGA
or their product as a service, right?
So we see that in some cases, this is also a good market fitter or a solution to go that
instead of just trying to sell the chips, you just keep the chips for yourself and you
build your own infrastructure, right?
And you provide the infrastructure as a service And you provide the infrastructure as a service
or you provide the application as a service, right?
So this is also an interesting strategy, right?
What you mentioned now about the hyperscaler data center,
this is very interesting because for example,
we saw that AWS, they have their own chips, right?
They have the inferential
and Google has its own chips TPU,
but at the same time,
they need to have also Nvidia GPUs
to offer to their clients, right?
So it's a balance that they have to do some balancing
between their own chips and the NVIDIA to offer third-party solution at the same time.
So you cannot exclude it, right?
So you cannot only offer TPUs and not NVIDIA since most of the users are using NVIDIA chips, right?
But at the same time, you need to have a competitive advantage.
So it's very interesting story
and it is very elegant, you know,
to try to balance these two forces, right?
But I don't know, we'll see what happens in this domain.
Yeah, I think one interesting development, let's say, along those lines is the fact that to some extent, at least, it seems to me that NVIDIA is also trying to do their way of addressing the very large inference market
in a way that is also useful to their users.
Yes, I guess that the hyperscaler provider will not be very happy about this initiative,
right, because it is competitive to their core market.
So, and Nvidia on the other hand,
does not want to lose their clients, right?
Because Amazon, AWS, et cetera,
it's the largest customers, right?
So, it is really, you know,
I'm also very curious to see what will happen
and if Nvidia will um if nvidia will
manage to attract a lot of users and attract users from other space right or from aws or from
azure or from a big cloud right so it's it's it's it's a very elegant, trying to balance these two markets.
You need to handle it very carefully, I think, right?
Because you don't want to lose also your main customers.
Indeed.
Yeah.
Okay.
So I think we're close to wrapping up.
We have just a few minutes left so let's do that by asking you to highlight some of the
directions that you think are most promising for future developments.
You already pointed towards neuromorphic computing and we sort of covered
chiplets a little bit as well. There are also more exotic, let's say,
solutions such as Photonic.
We have seen with interest some new developments
in Photonic that seem sort of promising.
And again, coming back to the software stack,
we are also seeing the emergence of new programming languages that are specifically tailored for
for AI models. And the promise that they bring is being closer
to the hardware such as Mojo, for example, which is sort of
like reimagining, let's say of Python, but with being tailored
specifically for for AI chips and being more performant.
Which one of those do you see as more promising going forward?
It's hard to tell, but I think we are going to move like we are going to see a lot of new vendors especially in the domain of
embedded systems and edge AI
where we will see
I think specialized
companies
specialized for
wearable
devices or specialized for military
specialized for
video or for text.
And I think there is a huge market, right?
Especially if we talk about the edge AI.
And even if NVIDIA has a big percentage, even at this domain, there is room for some innovation.
There is room for some companies.
I think it's going to happen the same thing that happened, for example,
let's talk about the GPUs, for example, right?
So, NVIDIA is the leader in GPUs, but there was lack of GPUs for a wearable device,
for example, for smartwatches and for a FitBand, etc. And we saw that, for example, a Greek company,
the Think Silicon, they developed a GPU that it was
specialized for a fit band or for smart watches, etc. And it
was acquired by applied material, right? So I think the
main innovation is going to happen in areas that are too small for companies like NVIDIA or Intel or some other companies,
but it is good enough for smaller companies that they can make specialized products for this area. So, and there we can see some exotic solutions.
It can be neuro-morphic, it could be, for example,
photonic, who will see, right?
In memory computing, et cetera.
But exactly because there are different requirements
and different specification in the domain of Edge AI,
I think there is room for innovation in this area, especially in the HCI, for video, for text, different requirements are for, for example, for the hospitals, for the autonomous driving, for the aviation, anything, consumer electronics, etc. So I think
I would definitely do some research, especially in the edge AI.
Great. Well, thank you.
Thank you very much for the conversation.
Actually, my goal always when having
conversations like this is to learn something
from the people I'm having the conversation with.
And that definitely was the case today.
So thank you very much for your time.
And good luck with whatever it is that you choose to focus your next research on.
Thank you.
Thank you, George.
Thank you for the invitation.
Thanks for sticking around.
For more stories like this, check the link in bio and follow Link Data Orchestration.