Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 3x20: GPUs and AI accelerators - What is the difference?
Episode Date: February 1, 2022AI is everywhere, and so are AI accelerators, from CPU to GPU to special-purpose hardware. Eitan Medina, Chief Operating Officer of Habana Labs, an Intel Company, joins Frederic Van Haren and Stephen ...Foskett to discuss the various specialized AI processors being developed today. Habana Labs has created a special-purpose AI training and inferencing processor with many unique features. Since deep learning is done at scale today, it makes sense to integrate enterprise networking with an accelerator like Habana Gaudi to increase overall system performance thanks to rDMA over Ethernet (RoCE) technology. Habana Gaudi is optimized for matrix math and also includes a fully-programmable vector core for Tensor processing. In October 2021, Amazon AWS launched the new DL1 instance based on Habana Gaudi, offering more performance than many GPU-based instances for a much lower total cost. Habana is very developer-focused as well, working with partners, data scientists, and end users to expand the accessibility of the platform in channels like GitHub and their own developer forum. Habana will soon introduce a 7 nm Gaudi 2 processor with much-improved performance and power efficiency. Habana Labs is also making their hardware more accessible thanks to their SynapseAPI and recently acquired cnvrg.io to bring a higher-level MLOps pipeline to AI. Three Questions: Frederic: At what point in time do you believe AI will be able to show compassion (like humans) if ever? Stephen: How big can ML models get? Will today's hundred-billion parameter model look small tomorrow or have we reached the limit? Edward Cui Founder of Graviti: Which will be more important in the future: Bigger and bigger ML or smaller ML? Gests and Hosts Eitan Medina, Chief Operating Officer of Habana Labs, an Intel Company. Visit habana.ai to learn more. Frederic Van Haren, Founder at HighFens Inc., Consultancy & Services. Connect with Frederic on Highfens.com or on Twitter at @FredericVHaren. Stephen Foskett, Publisher of Gestalt IT and Organizer of Tech Field Day. Find Stephen’s writing at GestaltIT.com and on Twitter at @SFoskett. Date: 2/01/2022 Tags: @SFoskett, @FredericVHaren, @HabanaLabs
Transcript
Discussion (0)
I'm Stephen Foskett.
I'm Frederick Van Haren.
And this is the Utilizing AI podcast.
Welcome to another episode of Utilizing AI,
the podcast about enterprise applications for machine learning,
deep learning, data science,
and other artificial intelligence topics.
Frederick, one of the things that is interesting to those of us who nerd out
about machine learning and AI in general is the use of various discrete components to make
these things happen, to accelerate learning, to accelerate training, to accelerate models.
And we've seen or had discussions about all sorts of things from CPUs to GPUs to FPGAs to special purpose ASICs as ways of accelerating artificial intelligence tasks.
Do you have any feelings about which direction the industry is going here?
Well, I think the one direction we definitely know is that there is a huge demand for compute power, right? And so the traditional accelerators like a CPU
are working well, but not as efficient
to address the problems that DL is posing today.
So I think looking at different types of accelerators
is really a nice way to kind of create
an efficient pipeline for your workloads.
And then additionally, typically we talk about accelerators for training. I think inference
nowadays is also seeing a lot more accelerators. So I'm really excited to talk about accelerators
and where the market is going. Yeah, it really is horses for courses, in my opinion. I mean,
you see all sorts of machine learning accelerators being added basically everywhere from mobile processors all the way up to specialized components in the data center, as you mentioned, for inferencing and training.
And that's one reason that I was really excited to have a conversation with Habana Labs, an Intel company, which makes a really cool AI accelerator. So joining us today
on Utilizing AI, I'm really thrilled to introduce Eitan Medina, the Chief Operating Officer for
Hibana Labs. Hi, I'm thrilled to be with you today and love to engage in a discussion about
where acceleration is going and how we think of helping the industry with the acceleration.
And I know that you have a background in this as well.
And of course, you're deeply involved in the development of the Havana Labs project.
Yes, my background is actually in engineering.
I started out as an electrical engineer.
I've been architecting chips. I've been CTO of Galileo Technology back
in the late 90s when we did the Ethernet switches and system controllers. I've spent time managing
engineering teams running development of mobile processors and moved to the business side some six years ago and was fortunate enough to join Habana Labs in 2018
when it was still a startup running in stealth mode
with really big ideas around how to architect processors
from the ground up for doing inference and training
in the data center and the cloud.
And this is what I've been doing since.
I wonder if, just to kind of set the groundwork here, maybe you could just tell us a little bit,
what exactly is the Havana Labs product? Sure. So if we kind of think about a server
in a data center running machine learning workloads and we open it up.
What you typically see when you open up a server that is dedicated to doing deep learning training or high capacity inferencing, you see a pair of host CPUs and you see a bunch of accelerators
that are connected to this host CPU and together as a system, they're supposed to execute really efficiently,
either training a deep learning model or doing some high capacity
inferencing in a data center.
And these data centers are data centers that companies
that kind of really are heavy duty users of AI are having
or the same type of service would be in a cloud where someone would, you know, do a training job in the cloud or do even offline or online inferencing.
So when we started the company and we looked at what's the state of the art in terms of efficiency,
what was back then is basically GPUs that were basically architected originally for graphics.
But because of the parallelism that they have, they've been very efficient compared to using just CPUs, standalone CPUs.
So when we looked at where the market was going, we thought of an opportunity that if you architect from the ground up,
the exercise was what could you do that would be both more efficient, but also allow users
to migrate to it.
Because one of the pitfalls of coming up with the ground up design is that you make too
many architectural changes that are actually increasing friction for the end user.
So a big focus was how do we architect a really efficient accelerators, but that gives both the flexibility and the programming models that customers are used to. And the other, so that
was one key question. The other opportunity that we saw is that since training of deep learning model is done more and more at scale, and this is because the models are so big that you need to gang up many accelerators to collaborate, we thought that if we can find a way to utilize enterprise networking more efficiently, this could actually both solve bottlenecks and reduce system costs.
So one of the unique things with the Havana Gaudi architecture,
which is our training line,
is that we managed to not only design the accelerator that does the compute,
but also integrate on it our own design of very, very high speed, rocky engines,
RDMA over Ethernet engine, so that networking and compute is integrated at the chip level.
And this way, we are utilizing the same Ethernet networking as the core protocol that the accelerators talk with each other in the box
and also outside the box.
So that's an innovation
and that we thought could really help AI.
So that's what is actually deployed today.
Now, if we go back to the hardware level,
the question is, what can you do better?
After all, a GPU is a pretty efficient machine, definitely much more efficient for some of
the computers than a general purpose CPU.
So the opportunity that we saw is coming up with the heterogeneous architecture.
If you look at neural network, there are many operations where you do a matrix math operations, and
there are a lot of operations that are pure vector. And all of these need to be done very
efficiently. So what we've done is we integrated a centralized matrix math engine that is very, very configurable and very efficient.
And a cluster of tensor processor cores that we architected from the ground up with custom instruction set and AI-centric operations that are natively supported there.
So the combination of a fully programmable tensor processing cores and a very highly efficient matrix math engine allows us to get to really, really high, what kind of, let's say, throughput you get out of a given server compared to how much does it cost to make it.
So at the end of the day, the Apple to Apple comparison really requires looking at a workload and looking at the cost of using a server and what kind of
productivity you get out of it. So just this last October, AWS launched a server based on the Gaudi
training processor. This is the first server in their history that is not based on GPU acceleration, but based on the
Habana Gaudi acceleration.
And at the end of the day, what they've shown is what is the pricing of using the server
versus existing GPU-based server and what the actual performance is.
So if you go to EC2 website and you look at the DL1 pricing, which is based on Gaudi versus, for example,
the P4D, which is based on A100 or the P3DN, it is based on V100. You can see that the Gaudi
server performance lands in between the V100 and A100 for a benchmark like, let's say,
ResNet-50 training. However, because the accelerator is so efficient,
the end cost to the consumer is much lower.
The P3DN costs $32.77 an hour in the spot pricing,
while the DL1 costs $13.11.
So $13.11 versus 32, you know, it ends up being an actual 40 over 40% cost reduction to
train a resonant 50 model on Gaudi versus the GPU. So that's the end result, right? And of course,
the reason are efficiency in the chip level efficiency in the system level, thanks to this high level of integration
that reduces component counts.
And at the end of the day,
all this is only useful if it's easy to use, right?
So we worked diligently to come up with a software stack
that integrates with frameworks like TensorFlow and PyTorch
and there's lots of collateral we prepared in the habana.ai developer site to show users
how to migrate their models, give them reference models, scripts, setups, and all the education
material to help them in the journey of trying out Habana Gaudi. And this is really the key.
The key, when you come in an acceleration, you have to obviously create a benefit to
the end customer in the forms of performance, price performance, et cetera, but you also
need it to be easy to use, right?
So that's the biggest challenge in acceleration is how to make sure you cover both.
So that migrating to your accelerator doesn't require tons of effort from the end customer.
After all, the people's time at the end is even more important than the cost to use training.
So our mission is to lower the barriers to using AI by lowering the cost to train and lowering the time to train.
But we also need to support the right level of abstraction and programming languages and
programming models that customers need. So that's where we focused on training as well as inference.
Yeah, I think you brought up a lot of interesting questions. I think you already answered half of my questions so far.
But I totally agree on the migration and the entry barrier.
How easy or how difficult is it for somebody to swap out their workloads from, let's say, a traditional accelerator to Habana.
Is that difficult?
Is there a learning curve?
Or is it pretty straightforward considering you have PyTorch and, you know,
that the look and feel is the same?
So there is always a learning curve.
The key is how to make that learning curve not very steep and how to set expectation correctly
about what the customers would get.
So if we start from what is that effort, right?
You can, if you go to habana.ai,
you can see a how-to guide that shows you very specifically,
you know, take that script,
add these two lines where you refer to Havana HPU and run your workload.
We provide a list of reference models so people can really look at what models are running,
at what performance, what is the script, what type of parameters are used,
so that it's easy for them to take models
that are similar or derivative of it and really go through the steps to ease their work.
We have set up also an online support structure where people can go to GitHub and write to
us, do pull requests and communicate with us.
And in terms of benefit to end customers,
you can see a list of customers who actually contributed quotes
to the actual announcement of the product
where they gave some evidence
on how easy was it for them to pull their workloads.
So you can see companies coming from different industries.
For example, you can see a statement by RiskFuel
that are in the financial services where they do a lot of derivatives.
Model derivatives are using residual networks.
These are not the typical benchmark, right? But they found it pretty straightforward
to take a workload, compile it, and actually profile it
in order to really extract the best performance.
And we're pleasantly surprised with their experience.
You can see statements by Seagate, by other companies,
each handling different type of applications
from manufacturing to retail to automotive
to other spaces.
Over time, of course,
we would add more and more models,
more reference models for more model families.
But what I'm encouraged to see
is that they're already a nice group of customers that can see an immediate benefit.
And, you know, if you're a customer that have their own unique model, you know, I would advise to just take a look at our GitHub.
Just take an impression of, you know, the type of models that we already ported, the performance that we are able to reach and how to use them and communicate
with us and to evaluate, you know,
how close are we to make the experience easy or not?
We have, you know,
our software teams and machine learning data scientists are continuously
expanding the coverage of what we have.
Right now, most of the focus we have is in vision and language models.
And so, you know, we are not yet covering other applications like, you know, speech to text or,
you know, recommendation models are, you know, still sparsely covered.
So, but the list is growing.
So what I would advise is take a look, go to habana.ai, the developer site,
to get a quick impression of what we have and communicate with us to get a very straightforward answer of,
do you have support for my model?
If not, is it coming soon?
And, you know, we'd love to get the feedback from the community.
Right.
I think having an ecosystem is really important.
Now, you mentioned GitHub.
Do you allow customers to upload their models and what they have been doing with Habana?
Or is it more where they communicate with you directly
and then you post it on GitHub?
We'd love for customers to share with us models.
Typically what we find is customers are a bit cagey
about really giving their model or uploading their model,
because for many companies,
this is like the thing that they differentiate with.
And so typically that's not an issue.
The issue here is, you know, just making sure we have a communication going between us and
people.
So the GitHub is really encourage people to, you know, post requests openly on GitHub and
build a community, like you said. We are also working with ecosystem partners on trying to make models more accessible and
more available in different repositories and expanding the tools that are available to
customers.
So this work never ends.
And we know this is where we need to really invest our time. Even though we are
selling accelerator, most of our engineers are actually software engineers. To sell a piece of
hardware, you need more software engineers than hardware engineers, and that's where the growth
is. So there's lots of work to do to make it easier and more accessible to more and more people.
There's also lots of innovation ahead of us in terms of silicon that's about to come up.
Our first generation was 60 nanometer, which is relative to the, let's say, startup.
There are seven nanometer is an old process node.
Still, with this 60 nanometer, we are able to give this 40% advantage in this example I just
mentioned, but we're really excited towards introducing soon our 7 nanometer Gaudi 2,
which really will give a big leap in terms of performance and capacity, right, for customers.
And while we do that, obviously the software team are continuously working on our Synapse AI software stack expanding.
So what we're most excited about is enabling more and more customers to use the accelerators and really watching what people are telling us via GitHub, via developer forum, whichever channel works for a customer, we'd
love to get feedback.
Do you see a particular type of workload that works really well with Havana, or do you feel
that Havana can address most of the common DL workloads today? I think that if you're looking for the area where the biggest advantage is with Gaudi
One today, it's definitely in the area of vision models, right?
In vision, we are able to show the biggest price performance advantage.
And of course, this price performance advantage would be realized if you
rent the DL1 instance over at Amazon, or you call a super micro and buy a server, right, for on-prem.
So I would say vision, which include the things like image classification, object detection,
semantic segmentation.
You know, there are different modalities for vision models,
and these are applicable to various industries,
whether it's, you know, training models that would look at pictures or videos,
whether it's in a retail, automotive,
financial services, you know, manufacturing. There are many, many uses to vision models.
There's also, you know, interesting users
in language models where we also have, you know,
an advantage for customers and the the space is evolving quickly.
We are always learning some new application
that we didn't think about, right?
Where people are transitioning.
And the feedback that we hear from our customers
is that the number one issue in AI today is cost
because the models are getting bigger
and more and more companies figure that,
you know, they have lots of data and to monetize it,
there's a big opportunity in deep learning,
but the cost to train models has become
like one of the biggest barriers to entry
for many companies.
So, you know, there's a mission for all of us, right?
Actually, all the accelerator companies
to lower the cost to train in order to
democratize, you know, AI so that more and more people can use it. And also the research
will be able to, you know, make those models more accurate and more useful. So everything kind of
points to a challenge to all of us, right? To find ways to make more efficient use of resources,
whether it's the OPEX
or the CAPEX, right?
Whether it's the cost of the hardware
or the cost of the power
to operate it.
And this is a common challenge
for everyone to find ways
to make it more efficient.
And at the same time,
what you see is that
on the software ecosystem side,
more and more abstraction layers are being developed in the market.
And if you, you know, if you take a typical new college graduate, right, that just graduated from software and gets hired to the industry, that graduate is already used to using like higher level languages. That person will probably, is probably programming in Python or R. And that person is not going to enjoy very
much programming in CUDA or in C, or in all these lower level languages that, you know, six years
ago you had to use, right? So abstraction is going higher and higher so that more and more
people can actually join the industry, right? This is why Google developed TensorFlow. This is why
Facebook pushed PyTorch, right? They needed to hire more engineers so they couldn't
afford to ask everyone to write in these low-level, nobody will join. Right? So now the abstraction level goes higher all the time.
Right?
So TensorFlow and PyTorch will be low level languages
pretty soon, right?
There'll be higher abstraction, right?
So the mission of us, the hardware folks,
is to build the compilers,
you know, create, you, create containers that can be launched that will just hide everything below, right?
So people can program in these higher level languages.
So the software stack from the hardware vendors need to grow all the time, right?
And that's also an opportunity for the accelerator companies if they know how to
do that right you don't need to be compatible to the low level language of the this other company
you need to integrate all the way to the level where your end customer wants to interact with you
and so so that's why the software mission as an accelerator company is to create this integration
to these higher level languages.
And it also helps in a way, right?
Because in the past, CUDA was a barrier, right?
You know, that's a mode around GPU,
everything had to be CUDA,
but the more time passes, you see, it's not the case, right?
The barrier is not the lower level language,
it's the ability to build the full stack, right?
The ability to provide the reference model,
the ability to integrate with ecosystem partners.
It's not a smaller challenge, it's a different challenge.
That's what we focused on.
And with the launch of Gaudi in the Amazon cloud
and the ability to provide it to customers, we believe that we established a very healthy starting point.
So now our mission is to expand it.
I wanted to bring up on the note of both programming interfaces and other software interfaces, as well as the community note,
I have to bring in the Synapse API that you have developed, as well as the Converge.io
acquisition. So I wonder if, can you talk a little bit about what was the purpose of developing
Synapse API? I assume it's what you just described. And also, where is
that going? And then where you're going with Converge? Yeah. Okay. So let's maybe start from
the beginning. Let's say you're a startup, right? And you have this great idea and you just came
with a chip that's more efficient, right? Now, what your end customer are used to doing today when they're using, let's say, a GPU-based system is taking a TensorFlow container from NGC and a reference model and simply launching it, right, and running it on the hardware, right?
So you've got to build a software stack that integrates all the way to the framework. We call that software sub Synapse AI, but, you know,
from an end user perspective, you could call it whatever you want.
What they want is the ability to download the software,
launch the containers and take on,
take the exact same reference model they had, which, you know,
they had the script, right. That's ran it.
And just change a couple of lines, right?
Say it's running not on a GPU, but on your hardware,
and hopefully it just runs, right?
So when we start, that's the mission.
The mission is to build a software stack
that allows the ML developer to easily transition a model.
Now, Converge is coming to complement that with an
MLOps solution. So what is an MLOps? MLOps is machine learning operation. It's actually a new
term. And the need for MLOps was born from the complexity of how to manage development of a full pipeline.
And just to explain, like, why do you need this higher level software stack and what it does,
you have to consider that if you look holistically at the flow of how you deploy AI,
you need to be able to manage data sets, right?
Those data sets need to be processed and revisioned, right?
As you try out training models.
Now, the model themselves, you try a bunch of different models, right?
Those training runs are actually long.
Some of them converge, some do not converge.
You need to manage through this whole process.
And if you're using a cluster of compute, it's not just the ML developer.
It's also the DevOps engineer, the IT, and the management, right?
All of them are stakeholders in how all this expensive infrastructure are used.
And if you are an ML developer engineer and you need to collaborate with,
let's say, 20 or 30 other engineers,
each of them is dealing with different area in the pipe. You need to collaborate effectively,
right? So you kind of need some way to automate the processes that you do and make sure you keep
track of what you're doing, right? Otherwise, you lose productivity and the hardware, which is very expensive, is actually not utilized.
So MLOps, which is what Converge does, is actually a graphic user interface that gives all the constituents,
whether it's the ML developer, the DevOps, the IT, or top management, a kind of a shared view of what's going on and the tool is
able to help you execute policies in terms of priorities of who gets what machine time
how when to decide to stop a run or not stop a run get all the insights from how well is the training job, let's say progressing or not, and then act on it.
So this kind of solution allows increasing utilization
and not just of the hardware, but also the people's time.
So what we saw as an opportunity when we acquired Converge
is to actually invest in a solution that will actually increase
the efficiency in the next, you know, in a holistic way. And that's regardless of what
hardware backend you use, right? So if you take the Converge solution today and you download it,
Converge would support almost any hardware you have in the
backend, whether it's a GPU machine, a CPU machine, or a Gaudi-based DL1. It is able to
help you manage the utilization of resources both on-prem, if you have a data center inside the
company, or allow also bursting to any one of the clouds that you use.
So holistically, MLOps in our eyes is the way for companies to really get to the next
level of efficiency in terms of resource usage.
And in terms of the integration with Converge, it's really loosely coupled because we're
really trying to think from an end user perspective.
We need to allow the end user to choose
what hardware is most efficient for what they need to do
and still allow them to manage how they allocate it,
how they utilize it,
whether it's on-prem or on the cloud.
So Converge is a product of its own.
And the way to think about it is that as an end user,
you need this MLOps layer.
You need to be able to use it regardless of where your hardware is,
whether it's Havana, Intel, NVIDIA, or whatever combination.
We support basically everything that the end customers need.
And obviously, if we are doing a good job on both the accelerator
and the MLOps, customers know, customers would like to,
would love to, you know, keep working with us.
And so if you draw the full stack, the MLOps is at the top.
Below that, there's some layer of Kubernetes or, you know, scaling, right?
And below that basically is the ability to launch containers for whatever the hardware
is, right? So if you are using NVIDIA,
you probably take an NGC container.
If you're using Havana,
you take a container from a developer site.
If you're using straight CPUs to do ETL
or data pre-processing,
then you use that.
So this MLOps is a nice way to tie it all together
from an end customer user interface and policy execution center of command.
Yeah, I totally agree.
I think MLOps is really the super glue for the ecosystem to make it work, right?
I think maybe my final question there is you talked a little bit about reconfiguring the Habana chip.
One of the efficiency problems customers have is that in their workflow, they have multiple accelerators involved, meaning that they have to move data around left and right.
When you say reconfigure the Habana chip, do you believe that you could replace different types of accelerators in the same workflow?
So if you look at the typical workflow, typically, you know, the workflow starts with ingesting data and, you know, manipulating data.
You know, if you take a training workflow, you manipulate it to make it appropriate for training, and you'd want to use the Haban accelerator on it,
the software stack figures out which layers in the model
will execute more efficiently on the accelerator
or on the host CPU.
So the decision of what does the accelerator offload
is a decision that the compiler does when you give it your topology.
So the end user, typical end user would simply
develop their model.
If they want to go down to optimizing their performance,
which is completely optional, right?
They could use a profilers to see where time is spent.
They could see the level of priorities
in between the different engines
and start optimizing their model or the script.
But the starting point is the starting point
that you simply want to use it to let them work in the level of programming model
that they're used to and let the software that comes with the solution figure out how the work
is distributed between the accelerators and the whole system. And for training at scale,
that same compiler would know how to distribute the work across nodes, right?
And do things like data parallel training or other forms of parallelism.
You want the ML developer to really focus on the neural network model for the most part
and let the hardware company take the responsibility of how to divide the work between the different engines.
So that's the general approach of the accelerator.
And typically what you'd want is to offload
as much as you can to the hardware,
which is typically doing a lot of the heavy data crunching,
I would say, or the heavy compute tasks
and make sure the CPU has enough air
to deal with the data ingestion,
with managing the workloads,
with execute the control and command, right?
Let's say, because the framework,
usually the framework runtime
is what really gives the commands of which layers of the
graph go where. And you don't want to burden that host CPU to the point where the accelerator is
just waiting for it. So that's usually where the most time is spent by the software engineers in
the hardware company to make sure that their
architecture is balanced, the compiler is flexible enough, et cetera. But the end developer,
you don't want to burden them with this task because they'd rather spend time on their model.
Well, thank you so much for that and for joining us. But the time has come to move on to the final part of our podcast,
which is where we ask our guests three questions. Note to our listeners, our guest has not been
prepared for these questions ahead of time. So we're going to get their off the cuff answers.
And this season, we're also bringing in a little twist in that we're out adding a question from a previous
utilizing AI guest. So Frederick, I'll let you go first. You can ask the first of the three questions.
Sure. So at what point in time do you believe AI will be able to show compassion like humans, if ever? wow and
i think it it's it's so subjective right for different people that you'd be surprised i think
what will happen is that people will be able to develop interactive games that on a basic level
the reaction from that game could be definitely interpreted by a child level, the reaction from that game
could be definitely interpreted by a child
as the toy actually gave them some really meaningful feedback
that we have this emotional impact.
However, if you'd expect AI to really create a connection
to a human, to a level where a grown-up would feel
there's an emotional connection developing, I think it level where a grown-up would feel there's an emotional
connection developing, I think it's still a long time to go. So the answer is pretty quickly for
some people, young people, and some use cases, and many, many years in the future
for others. So it's not a binary answer in my mind.
Next, for my question, a little bit more technical.
How big can ML models get?
Today, we have 100 billion parameter models.
Will that look small tomorrow, or have we reached some kind of limit?
I think we're far away from the limit. The actual challenge is going to be
how economical is it for people to use what size models, not whether we can create
bigger and bigger models, right? There's always going to be this,
you know, very rich companies that are able to afford
running a gigantic model, right?
That give them business insights
that they have benefit in using.
And this race will continue.
I think that middle part of companies
that are mid-size,
like what size models can we,
and what kind of cost to train and influence these models
can we give this mid-market, right?
That's the real challenge in my eyes.
Not the, I mean, of course,
there's always this fun stuff
where we talk about the trillion, trillion parameters
for this one or two companies that, you know,
they can utilize the ad revenue
from the recommendation engines, right?
To justify using that.
But this is AI for the rich.
And this is where I think the challenge is
that we really have an impact on humanity
is this middle section.
How can we make the models more efficient,
bigger but efficient from their standpoint?
Because this is also impact what can you deploy
in the field in different end products.
And that to me seems more important
than the really gigantic models.
And now, as promised, we're using a question from a previous guest.
Here's one from Edward Cui, founder of Gravity.
Hi, this is Edward Cui from Gravity.
My question will be, will machine learning model be bigger and bigger or smaller model in the future will also be really important. I think smaller ML is going to be more important to more people than bigger ML.
And I think that's in line with what you just said in answer to my question as well.
Well, thank you so much for joining us, Aten.
We look forward to what your question might be for a future guest.
And if our listeners would like to join, they can just send an email to host at utilizing-ai.com and we'll record your question.
So, Eitan, where can people connect with you and follow your thoughts on enterprise AI?
Or is there something that you've recently done that you want to call attention to?
Well, as we are deploying our solution in the market
and we are busy in educating people
on what they can do with Habana,
I'd love for people to check out our developer site
at habana.ai and check out webinars
and other educational material that we post out there
and bringing users to what they can do
with our Habana accelerators.
How about you, Frederick? Is there anything new?
Yeah, you can find my latest blog on democratization of AI in the enterprise on LinkedIn
and on the Hyfence websites. And you can find me on LinkedIn and Twitter as Frederick V. Heron.
And as for me, and I guess Frederick as well,
if I don't want to overstep here,
I'm pretty excited to announce
that we've just set a date for AI Field Day 3.
So you can join us at AI Field Day 3,
either as a delegate or as a presenter,
just contact me.
Or if you'd like to just watch online,
that's going to be May 18th through 20th, 2022. So May 18 through
20. So thank you for listening to the Utilizing AI podcast. If you enjoyed this discussion,
please do remember to subscribe, rate, and review the show and your favorite podcast application.
And please do share it with your friends. This podcast was brought to you by gestaltit.com,
your home for IT coverage from across the enterprise.
For show notes and more episodes, you can go to utilizing-ai.com or find us on Twitter at utilizing underscore AI.
Thanks for joining us and we'll see you next week.