SemiWiki.com - Video EP1: A Discussion of Meeting the Challenges to Implement Gen AI in Semiconductor Design with Vishal Moondhra
Episode Date: February 5, 2025In this inaugural episode of the new Semiconductor Insiders video series, Dan is joined by Vishal Moondhra, VP of Solutions Engineering at Perforce Helix IPLM. Dan explores the risks and challenges of... using Gen AI in the semiconductor industry with Vishal. Liability, traceability, cost, and quality are discussed. The challenges… Read More
Transcript
Discussion (0)
Hello, my name is Daniel Nenny, the founder of SemiWiki, the open forum for semiconductor professionals.
Welcome to the Semiconductor Insiders video series, where we take 10 minutes to discuss leading-edge semiconductor design challenges with industry experts.
My guest today is Vishal Moondra, Vice President of Solutions at Perforce.
Thank you for your time today, Vishal.
Hi, Dan. It's great to be here. So let's talk about
the risks and challenges of using Gen AI in the semiconductor industry. Sure, so as everybody's
aware, you know, Gen AI is really becoming a huge part of software and hardware development across
the world. But using GenAI, especially in semiconductor industry,
actually poses a set of unique challenges and risks,
which may be hurting the adoption of this technology
to a great extent.
So we wanted to talk about what these challenges
and risks are and kind of focus on one or two of them
to understand how those risks can be mitigated.
For example, we talk about risks of liability. If you have
a Gen AI model that you want to use to generate part of your design or whatever it is, you
need to make sure that the ownership or the licensing of the data that is used for training
that model is very clear because there's a lot of sensitivity around the content and
the IP that you've purchased from third-party vendors, for example.
There's also this notion of lack of traceability, exactly how a model was trained, what went into it, and what came out of it.
So that traceability and liability part tends to be a huge hurdle for a lot of companies.
Also, there is a lot of high stakes involved, right?
If you get something wrong, or if you go to towards step out with some unknown bug or an understandable design,
then the stakes are very high because it may end up leading to respins, which are very expensive,
especially at the smaller nanometer designs.
And so you have to be very, very careful about, as we all know, as we head towards step out.
And then in general, there are data quality concerns, which is mixing well-routed,
understood internal design datasets with something that you don't understand as well. It tends to
lead to a lot of quality concerns. So these, in a nutshell, are the kinds of issues that are unique
to the semiconductor industry. Okay. So let's talk about the specific risks around design flows and Gen AI.
Sure, so one of the things that we talk about and we actually hear a lot from our customers,
almost everybody seems to agree with this, is the provenance part of the specific challenges,
right?
So how do we establish trust in making sure that an AI model was trained with the right
data sets, right?
In order to do that, you need to make sure
that there's clear and auditable data provenance
for all the data that's going into a data set.
You could be training it on old designs,
you could be training it on new,
on a whole bunch of things that you have with you,
but how do you know exactly what went
into that training data set?
And actual provenance of all the data that went in, not just the fact that, hey, this is the corpus of data that we train in, but how did that data show up?
What is the provenance behind each of those design pieces is a very important and very difficult problem to solve. especially making sure that your data training sets are not kind of polluted by something that might lead to compliance issues in the future,
security issues in the future, and that you're actually allowed to use that particular data or the IP that you're using in that training corpus,
without which you might run into these compliance sort of issues and security issues,
which could be a huge burden on companies, especially if you don't have provenance in
the data.
So this, I think, is the area that we would like to focus on from a Gen AI perspective
that will enable or help companies as they move down the journey of Gen AI in their design
flows.
So what about provenance specifically for AI training sets?
Right.
So in order to truly understand how to make sure that your AI training sets have all the
right provenance, meaning all the data that you're using for training sets, you actually
need to build out provenance as part of your workflow.
So you cannot bolt on provenance later on and say, hey, we're going to figure out how
to prove what went in after the fact, right?
So for that, we have talked to our customers and we believe that
you know breaking down your design upfront into IPs is one of the key steps that you need to do
before you can start building the training data sets for your Gen-AI models. So what that means
is in traditionally you know for years and years and years as our designs have just basically been
project centric. So you have this big blob
of files, you have this big blob of, you know, intermingled design, which is typically called
a project. And then you go from project to project to project by copying these designs from one
project to the next. Essentially, this undifferentiated large blob of files typically
tends to hide within it a whole bunch of things that you may not want
to push into a training model because that would mean that you may be violating compliance you may
be violating security rules you may be inadvertently using ips to train a data model that you're not
supposed to be using right so the first and foremost step is to take every design that you're
working on every project you're working on, break it into IPs.
And we talk about this quite a lot in other contexts as well. There are many other reasons
to do it. But in Gen AI training models, this is absolutely an important, most critical thing to do
is to feed your training data sets as a collection of hierarchical IPs.
Now, what you can do after that is establish provenance around each of these IPs
by attaching a lifecycle to each of these IPs by attaching
a lifecycle to each of these IPs.
So that will help you make traceability.
So you know the provenance of how a particular version of an IP that was fed into a training
model, what was its provenance?
How did it get to that point?
Because the system will track how the model evolved from version to version to version,
all the way down to the changes in individual
files, right? So that's the advantage of doing it with a collection of hierarchical IPs.
Further, the system can also be adjusted to add rules, which allow you to upfront decide whether
or not a given IP with its provenance is allowed to be part of a training dataset, because these
rules can be enforced using IP metadata.
So you might say, hey, this IP is a super secure IP or this IP has been purchased from a third party vendor and we flag that IP.
And as we build our training data sets or these hierarchical collections of IPs that
go into training data sets, the system automatically flags those training data sets of those IPs
being fed into them as not being available for training.
So because you have IPs and because you have provenance of each IP, you can then attach
metadata around it to allow your rules in your system to then automatically restrict you from
training your AI data sets with the wrong data. Also, it helps you do what we call incremental
training, which obviously everyone is doing, where you can say that I have trained a model with a corpus of IPs, which are all allowed,
legal, compliant. And then if I want to incrementally train this model further,
I can actually look at this hierarchy of IPs that have gone into my training dataset and figure out
which ones of them have changed. And what are the changes? Do I have to drive new versions of this
training? Do I have to vary my training algorithms? And if I have to do that, what are the changes? Do I have to drive new versions of this training? Do I have to vary my training algorithms? And if I have to do that, what are the
difference that will cause? So this incremental constant maintenance of a
training model is also enabled by having this hierarchical bomb of IP that is
used for training data models. So these kinds of advantages will help teams, will
help customers get much further along on their Gen AI journey while using IP-centric design models.
And how does IP lifecycle management address the provenance problem specifically?
So all the things I talked about so far, which is, you know, breaking down your design into IPs,
building a release methodology or a provenance methodology for each IP version,
putting together a hierarchical bill of materials
or a hierarchical BOM to feed your training dataset, all of this belong in the IP lifecycle
management tool or IPLM.
So when we talk about IPs or IP lifecycle management, an IP is actually anything that
enables your design.
It could be the traditional definition of IP, which is, hey, something I purchased from
an external vendor, which is delivered to me as a black box, right? Or even delivered to me as a bunch of RTL files,
but it is a property or an intellectual property that I bought from an external vendor.
That is the traditional definition of IP. But in IP lifecycle management, when we talk about
design being an IP-centric design, we expand the definition of IP to include
not only those external IP that you purchase, but also to include designs that are being worked on
inside your group specific to the project you're working on. Also includes things like PDKs. It
includes things like central teams delivering IPs to you, right? Or other IPs that were other blocks that have been used from previous design.
So anything, any of these design blocks can be called IP
and treated as if they were IP in the system.
That includes attaching metadata to all of these designs,
having high-level abstractions.
You can also have environment configurations for these IPs.
So all of these blocks, everything that is feeding your design is treated as a standalone
IP with its own lifecycle, which can then be treated just like any other IP that you
purchased.
So an IP lifecycle management platform or IPLM provides the basis for you to start down
this journey of breaking down your project into multiple IPs, attaching a lifecycle to
each of these IPs, regardless of whether
that's something you've purchased or something you've built.
Great.
Great discussion, Vishal.
Thank you.
Finally, just give us a little bit more about Perforce and IPLM.
Give us the elevator pitch here.
Sure.
So Perforce IPLM, or Helix IPLM as it is known, is a IP lifecycle management platform that provides
you this built-in traceability, release management, discovery and reuse platform, workspace management,
and basically pushes you to IP-centric design workflows.
So all the features I talk about are built into this platform called IPLM.
We have plenty of customers who are using it already.
It's quite popular in the semiconductor space.
A bunch of the biggest semiconductor companies are using it or evaluating it.
We have customers up and down the chain, small and big, that all really like the value that
IPLM can provide as a IP lifecycle management platform that they can then build their designs
on top of.
Thank you, Vishal.
Thank you.
That concludes our video.
Thank you for watching
and have a nice day.