SemiWiki.com - Video EP1: A Discussion of Meeting the Challenges to Implement Gen AI in Semiconductor Design with Vishal Moondhra

Starting point is 00:00:00 Hello, my name is Daniel Nenny, the founder of SemiWiki, the open forum for semiconductor professionals. Welcome to the Semiconductor Insiders video series, where we take 10 minutes to discuss leading-edge semiconductor design challenges with industry experts. My guest today is Vishal Moondra, Vice President of Solutions at Perforce. Thank you for your time today, Vishal. Hi, Dan. It's great to be here. So let's talk about the risks and challenges of using Gen AI in the semiconductor industry. Sure, so as everybody's aware, you know, Gen AI is really becoming a huge part of software and hardware development across the world. But using GenAI, especially in semiconductor industry,

Starting point is 00:00:45 actually poses a set of unique challenges and risks, which may be hurting the adoption of this technology to a great extent. So we wanted to talk about what these challenges and risks are and kind of focus on one or two of them to understand how those risks can be mitigated. For example, we talk about risks of liability. If you have a Gen AI model that you want to use to generate part of your design or whatever it is, you

Starting point is 00:01:13 need to make sure that the ownership or the licensing of the data that is used for training that model is very clear because there's a lot of sensitivity around the content and the IP that you've purchased from third-party vendors, for example. There's also this notion of lack of traceability, exactly how a model was trained, what went into it, and what came out of it. So that traceability and liability part tends to be a huge hurdle for a lot of companies. Also, there is a lot of high stakes involved, right? If you get something wrong, or if you go to towards step out with some unknown bug or an understandable design, then the stakes are very high because it may end up leading to respins, which are very expensive,

Starting point is 00:01:54 especially at the smaller nanometer designs. And so you have to be very, very careful about, as we all know, as we head towards step out. And then in general, there are data quality concerns, which is mixing well-routed, understood internal design datasets with something that you don't understand as well. It tends to lead to a lot of quality concerns. So these, in a nutshell, are the kinds of issues that are unique to the semiconductor industry. Okay. So let's talk about the specific risks around design flows and Gen AI. Sure, so one of the things that we talk about and we actually hear a lot from our customers, almost everybody seems to agree with this, is the provenance part of the specific challenges,

Starting point is 00:02:37 right? So how do we establish trust in making sure that an AI model was trained with the right data sets, right? In order to do that, you need to make sure that there's clear and auditable data provenance for all the data that's going into a data set. You could be training it on old designs, you could be training it on new,

Starting point is 00:02:56 on a whole bunch of things that you have with you, but how do you know exactly what went into that training data set? And actual provenance of all the data that went in, not just the fact that, hey, this is the corpus of data that we train in, but how did that data show up? What is the provenance behind each of those design pieces is a very important and very difficult problem to solve. especially making sure that your data training sets are not kind of polluted by something that might lead to compliance issues in the future, security issues in the future, and that you're actually allowed to use that particular data or the IP that you're using in that training corpus, without which you might run into these compliance sort of issues and security issues, which could be a huge burden on companies, especially if you don't have provenance in

Starting point is 00:03:46 the data. So this, I think, is the area that we would like to focus on from a Gen AI perspective that will enable or help companies as they move down the journey of Gen AI in their design flows. So what about provenance specifically for AI training sets? Right. So in order to truly understand how to make sure that your AI training sets have all the right provenance, meaning all the data that you're using for training sets, you actually

Starting point is 00:04:11 need to build out provenance as part of your workflow. So you cannot bolt on provenance later on and say, hey, we're going to figure out how to prove what went in after the fact, right? So for that, we have talked to our customers and we believe that you know breaking down your design upfront into IPs is one of the key steps that you need to do before you can start building the training data sets for your Gen-AI models. So what that means is in traditionally you know for years and years and years as our designs have just basically been project centric. So you have this big blob

Starting point is 00:04:46 of files, you have this big blob of, you know, intermingled design, which is typically called a project. And then you go from project to project to project by copying these designs from one project to the next. Essentially, this undifferentiated large blob of files typically tends to hide within it a whole bunch of things that you may not want to push into a training model because that would mean that you may be violating compliance you may be violating security rules you may be inadvertently using ips to train a data model that you're not supposed to be using right so the first and foremost step is to take every design that you're working on every project you're working on, break it into IPs.

Starting point is 00:05:25 And we talk about this quite a lot in other contexts as well. There are many other reasons to do it. But in Gen AI training models, this is absolutely an important, most critical thing to do is to feed your training data sets as a collection of hierarchical IPs. Now, what you can do after that is establish provenance around each of these IPs by attaching a lifecycle to each of these IPs by attaching a lifecycle to each of these IPs. So that will help you make traceability. So you know the provenance of how a particular version of an IP that was fed into a training

Starting point is 00:05:54 model, what was its provenance? How did it get to that point? Because the system will track how the model evolved from version to version to version, all the way down to the changes in individual files, right? So that's the advantage of doing it with a collection of hierarchical IPs. Further, the system can also be adjusted to add rules, which allow you to upfront decide whether or not a given IP with its provenance is allowed to be part of a training dataset, because these rules can be enforced using IP metadata.

Starting point is 00:06:29 So you might say, hey, this IP is a super secure IP or this IP has been purchased from a third party vendor and we flag that IP. And as we build our training data sets or these hierarchical collections of IPs that go into training data sets, the system automatically flags those training data sets of those IPs being fed into them as not being available for training. So because you have IPs and because you have provenance of each IP, you can then attach metadata around it to allow your rules in your system to then automatically restrict you from training your AI data sets with the wrong data. Also, it helps you do what we call incremental training, which obviously everyone is doing, where you can say that I have trained a model with a corpus of IPs, which are all allowed,

Starting point is 00:07:08 legal, compliant. And then if I want to incrementally train this model further, I can actually look at this hierarchy of IPs that have gone into my training dataset and figure out which ones of them have changed. And what are the changes? Do I have to drive new versions of this training? Do I have to vary my training algorithms? And if I have to do that, what are the changes? Do I have to drive new versions of this training? Do I have to vary my training algorithms? And if I have to do that, what are the difference that will cause? So this incremental constant maintenance of a training model is also enabled by having this hierarchical bomb of IP that is used for training data models. So these kinds of advantages will help teams, will help customers get much further along on their Gen AI journey while using IP-centric design models.

Starting point is 00:07:48 And how does IP lifecycle management address the provenance problem specifically? So all the things I talked about so far, which is, you know, breaking down your design into IPs, building a release methodology or a provenance methodology for each IP version, putting together a hierarchical bill of materials or a hierarchical BOM to feed your training dataset, all of this belong in the IP lifecycle management tool or IPLM. So when we talk about IPs or IP lifecycle management, an IP is actually anything that enables your design.

Starting point is 00:08:19 It could be the traditional definition of IP, which is, hey, something I purchased from an external vendor, which is delivered to me as a black box, right? Or even delivered to me as a bunch of RTL files, but it is a property or an intellectual property that I bought from an external vendor. That is the traditional definition of IP. But in IP lifecycle management, when we talk about design being an IP-centric design, we expand the definition of IP to include not only those external IP that you purchase, but also to include designs that are being worked on inside your group specific to the project you're working on. Also includes things like PDKs. It includes things like central teams delivering IPs to you, right? Or other IPs that were other blocks that have been used from previous design.

Starting point is 00:09:07 So anything, any of these design blocks can be called IP and treated as if they were IP in the system. That includes attaching metadata to all of these designs, having high-level abstractions. You can also have environment configurations for these IPs. So all of these blocks, everything that is feeding your design is treated as a standalone IP with its own lifecycle, which can then be treated just like any other IP that you purchased.

Starting point is 00:09:32 So an IP lifecycle management platform or IPLM provides the basis for you to start down this journey of breaking down your project into multiple IPs, attaching a lifecycle to each of these IPs, regardless of whether that's something you've purchased or something you've built. Great. Great discussion, Vishal. Thank you. Finally, just give us a little bit more about Perforce and IPLM.

Starting point is 00:09:55 Give us the elevator pitch here. Sure. So Perforce IPLM, or Helix IPLM as it is known, is a IP lifecycle management platform that provides you this built-in traceability, release management, discovery and reuse platform, workspace management, and basically pushes you to IP-centric design workflows. So all the features I talk about are built into this platform called IPLM. We have plenty of customers who are using it already. It's quite popular in the semiconductor space.

Starting point is 00:10:25 A bunch of the biggest semiconductor companies are using it or evaluating it. We have customers up and down the chain, small and big, that all really like the value that IPLM can provide as a IP lifecycle management platform that they can then build their designs on top of. Thank you, Vishal. Thank you. That concludes our video. Thank you for watching

Starting point is 00:10:48 and have a nice day.

Your Ad Here

SemiWiki.com - Video EP1: A Discussion of Meeting the Challenges to Implement Gen AI in Semiconductor Design with Vishal Moondhra

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.