Grey Beards on Systems - 172: Greybeards talk domain specific AI with Dr. Arun Subramaniyan, Founder & CEO, Articul8 AI

Starting point is 00:00:00 Hey everybody, Ray LaCasey here with Keith Townsend. Welcome to another sponsored episode of the Greybirds on Storage Podcasts, show where we get Greybirds bloggers together with storage assist the vendors to discuss upcoming products, technologies, and trends affecting the Peta Center today. We have with us today, Arun Subramanian, founder and CEO of Articulate 8 AI. He and his team just did a session at AI Field Day 7 a couple of weeks back with Keith and I in attendance. I thought he would make a great guest for our show. So Arun, why don't you tell us a little bit about yourself and what Articulate AI is all about?

Starting point is 00:00:53 Ray, thank you so much for having me. It's been a fantastic experience at the Field Day. So Articulate is a domain-specific Gen.A.I. Platform company that's focused on building a platform that enables use cases across a variety of high-value use cases and industries such as manufacturing, industrial, semiconductors, automotive aerospace, across the board, where the use cases require a level of domain expertise. to even get started. And once you get started, it's the level of scale and complexity that actually drives the requirements of a platform similar to what we're building. And we've been working very closely across several industries

Starting point is 00:01:44 and deploying it in production at scale. And that's really has been the focus of what we've been building. So when you mentioned scale, are you is articulate running both in the cloud as well as in, and customer environments and things of that nature? That's right. So we are by nature focused on industries

Starting point is 00:02:06 that are typically highly regulated, which means security is a non-negotiable starting point. And in most of the cases, our customers run our entire platform inside their security perimeter. So even if they're running in the cloud, we are running inside their virtual private club. Right, right, right, right.

Starting point is 00:02:25 What does the backend for articulating look like? Is this running on a box? Is it a GPU box? What is the infrastructure? Yeah, so typically it involves at least a few nodes or servers, and we do require GPUs at least a bad minimum to get started because the platform is driven by quite a few domain-specific models and some general purpose models.

Starting point is 00:02:56 And typically, the smaller, footprint we run on is about eight GPUs but the recommendation is about 16 GPUs and the GPUs can be anywhere from an L4 to a H-2-H-100 oh no that's that's actually quite bigger than I then I thought so the one of the details we didn't get into at field day is kind of the the breath and death of the mop that's running on the cluster, how much of this is articulate required for kind of the control plane for articulate, and how much of this is required or headroom for the models themselves that customers are using?

Starting point is 00:03:45 Yeah, that's a very good question, Keith. So most of that is really for the kind of scale the customer requires. The articulate platform itself and the control plane for articulate runs on a very minimal footprint. It's just really on a single CPU node and nothing else is required for that. However, what's acquired for most of the customer use cases is a combination of domain specific models and general purpose models that get deployed. And we have model mesh, that's our platform, that is a runtime reasoning engine and reasons on which models to call for what tasks and in what sequence depending on the outcome that particular task is actually

Starting point is 00:04:32 acquiring and what is the underlying data that feeds it so it's it's effectively an agenic solution that that calls multiple models based on whatever is required that is true and we have been doing it way before the term agentic framework has been coined but this one think of this as an agent of agents because it's multiple agents working together and think at runtime, deciding autonomously or semi-autonomously what to do to solve a task. We call a task a mission, and when missions run, it depends on the complexity of the data that comes in and what kind of outcomes are generated by intermediate agents that trigger other agents downstream.

Starting point is 00:05:18 So how tightly woven is your agent platform into kind of the underlay versus the desired outcome. So let's say that the desired outcome is that you want to reduce meantime to failure for a piece of hardware equipment and there are some agents working to do that outcome. Where's this where's the D mark between articulate in the in kind of the the application that's controlling the outcome or is that all melded together? So that is typically a a clean separation between the application and the logic that drives the application. We are usually predominantly in the logic that is driving the application all the way from the, say, the infrastructure layer up to the application API layer.

Starting point is 00:06:15 And in the specific example you gave, for example, the data sets required to feed, say, a meantime to failure model. and also running the failure model, all of those would be taken care by the articulate engine. But if there is a user input required at the application level, or typically that integrates into whatever application our customers are running, then that layer is what is about the articulate platform. I read somewhere in some of your documentation that, You really don't capture data as much as capture metadata.

Starting point is 00:07:01 I think it's from a security perspective. Is that true? That's actually a very good point as well. So most of our customers do not want to copy their systems of record. And most of our applications, we interface directly with systems of record. So there's no notion of us trying to move that data into our platform. by design, we what we call

Starting point is 00:07:27 perceive information from existing data stores and only store metadata that we need to store. So I give you the... No, please, go ahead. Yeah, I'll give you an example of that, right? So, for example, if you're reading a PDF document,

Starting point is 00:07:42 the document, the actual document necessarily doesn't get transferred. However, if there is a table in the document, we understand the table, we crop it, the image of a table gets stored on our site, uh the the understanding of the table meaning extracting that information out and then the extracted table gets maybe stored as a data frame that is what gets stored on our side and then the associated uh say downstream processing like embeddings of those particular

Starting point is 00:08:12 extracted information but the raw data continues to stay at the customer side with a semantic link to our knowledge graph. One of the things that I found surprising in reading your documentation is that you, I deal a lot with images and structured, it's not really a structured document, as much of a structured image. Can you talk about that? I mean, it seems like you're able to extract understanding from, let's say, a picture of a spreadsheet almost. That's right.

Starting point is 00:08:46 So, in fact, most of our use cases involve. us ingesting or understanding or perceiving multi-modal data sets. And some of our customers in regulated industries might call things like fully scanned non-OCRable kind of files as PDFs, sometimes from the 1960s or 70s. It's really being able to extract information that's meaningful from that. quite powerful from the get-go. The reason I use the table example is most of the models out there today can read tables at a 70-80% accuracy. The problem with industrial settings and especially in large enterprises is most of the tables are

Starting point is 00:09:38 not clean, meaning there are rows or columns that are merged, there are like units of numbers in the tables that are not necessarily in the table itself but somewhere else budded in the dogs are cryptic things of that nature yes and when the user asks the question naturally they want the system to be able to understand these nuances and connect the dots and in many cases these documents have the same tables or similar tables with similar sounding column names or variable names all over the place. So disambigrating that is really one of the things that we do well. So let's start to untangle some of the complexity of what you folks are abstract. So at the lowest level, well, not even the lowest level, but probably the lowest level that you

Starting point is 00:10:33 folks engage with the overall infrastructure, I have something serving up a model. That model has been tuned by your team and the platform, but at the end of the day, to identify a image in the chain of agentic AI, some model has to run. And fundamentally, I am going to get a hit rate, a successful hit rate, just based on the model. This is where, you know, a lot of the failure happens. So, you know, let's say we, without the fine-tuning, we're we're getting 70%, 80% accuracy. With the fine tuning, we're jumping to 85, 90%. What's the secret sauce?

Starting point is 00:11:21 Where are you guys coming in to play with your platform to improve upon what the raw model is doing? So two things we do, right? So first and foremost, we build a lot of these task-specific or domain-specific models ourselves. And in many cases, these models are built from the ground up. These are not just fine-tuned models. And the reason for that is you need them to do some very specific things that would get diluted

Starting point is 00:11:49 if you take a large model or if you take a general purpose model and try to fine tune it. That's what. And the second piece is even if you had a lot of these models that are individually super accurate, mixing them together generally is a hard problem because the errors propagate if you string them along. And second, you need to know when to call what. and in what sequence. So it's really that intelligence layer.

Starting point is 00:12:15 Think of it as, like the example I would use is think of what deep seek did. So adding reasoning to the overall process and allowing that reasoning chain to get through to your final outcome. Think of doing that reasoning chain as a system. So instead of calling the reasoning chain inside a model, now you're calling the reasoning chain across multiple models. That's really the secret source that we've built. And of course, like for that to work efficiently, we also need to be able to build the models in between and have a system that understands how to evaluate all of these different models on the same playing field.

Starting point is 00:12:59 So that's super helpful. Now this ties back to the earlier question about sizing. Are you sizing the infrastructure to do the training of those models? offend the customer's security domain or are you folks bringing that ability to train to a different cluster? So typically the sizing when we get into a customer deployment initially would have no training. So it is purely deploying the existing models we have or if we had to build new models for that particular domain, not necessarily that customer but for that particular domain, we would build it outside with data sets that we procure or augment ourselves. So we don't build any models with the customer's data that goes into the platform.

Starting point is 00:13:49 So that's the, I'm curious because now the size of the cluster is really something that I'm curious about. A lot of the industry numbers are showing utilization of these similar size cut clusters at around 10%. Because inferencing just really isn't that, you know, unless these models are much bigger than, what I'm guessing, I guess that's one part of the question. Then the second part of the question is, you know, what's the utilization of these clusters for a typical customer? Perfect question. So first and foremost, the general misconception is that a lot of these domain-specific

Starting point is 00:14:27 models are also small language models. In our case, the size of the model depends on two important things. One is how much data is there to ground the model? That's number one. But the second one is what is the specificity of the tasks that you want the model to do? So the way we build these models is that we will build the largest model possible that is supported by the data sets we have. And then we would condense the models down to the smallest size possible.

Starting point is 00:15:04 and smallest size is required for the specific tasks we are going after. And the reason we do that, and notice I'm using the word condense, not necessarily quantize or anything like that, because the way the large model gets processed to a small model depends again on the task and the particular dataset. Did you call it distillation? So the reason I'm not using the word distillation or quantization or anything like that

Starting point is 00:15:31 is because those are all associated with specific techniques. And we use all of them, but it depends on a particular task and the particular type of data. Yeah, I think you're highlighting one of the easy, obviously a misconception that I had is that the goal of these models isn't the same goal as a general purpose model. The goal of these models is to do a specific task. So you're trying to make it as accurate as possible for the given task. So the model itself may be quite bigger than your typical general purpose model. It's possible. So it depends on the domain, the model can be pretty large.

Starting point is 00:16:15 So just to give you a sense of why there's 8 or 16 GPUs required, and the GPUs can be different sizes, right? So it can go from all the way from L4 to an A100. Now, it's because the model sizes we deploy go from a 3 billion parameter model, all the way up to a few hundred billion parameter models. And it depends on the kind of use case we are walking into that gives the range of the models that get deployed. Usually the larger models are the general purpose models

Starting point is 00:16:48 or a model that is built for an entire domain. Yeah, so, okay, I'm getting a better picture. So for a domain model, it may, you know, it might skew to that larger piece, but then there is ultimately some level of reasoning. that's probably best done on a larger foundation model because why recreate something that's already been proven? Yes.

Starting point is 00:17:12 Versus a task-specific model, and again coming back to a model like a table understanding model, which takes an image of a table and outputs the actual extracted table as, say, a JSON output is an 8 billion parameter model. So since you folks don't collect data yourself, there's, I didn't hear any kind of notion of a vector store or a vector table to go back and do rag against or am I misunderstanding when you say you don't sort data. You're not, you're not storing, you're not copying the data by you keep creating some type of vector store of the data that's already existing. Yeah, so we actually have all sorts of different stores, including vector stores, but our primary store is a graph. But the vector store is also very important because the graph nodes connect to the vector stores.

Starting point is 00:18:14 So depending on what gets embedded, for example, an image would get embedded, like sometimes an entire document gets embedded, depending on what we choose to embed. But our primary source of retrieval is the graph, not direct this tool. I mean, you showed at the AI field day seven, the knowledge graph. I'm not sure what domain it was, but it was a very interesting three-dimensional graph, which had almost a parameterized view of the data, I guess. I'm not sure that's even correct. Yeah, so we call that the shape of data, because we are, Yes, the reason for that is we are using the words perceive very carefully because it's something that's active.

Starting point is 00:19:00 So when the data is processed, you're not only just analyzing the information, but you're also analyzing the connectivity, meaning this table came from this image, but then from this page, but then there's another document with a similar table with a similar concept. So the concepts are also aggregated, so topics are auto-generated, and the topics are, are aggregated into themes. So there is a hierarchical clustering that happens. And that hierarchy of information is really what is stored in the graph. And the reason we call it shape of data

Starting point is 00:19:35 is this the distance that is displayed in the graph that you guys saw actually has meaning. Meaning if the nodes on the graph are near each other versus they are far away from each other, that shows whether the topics are close together in that particular space, or they're far apart. It's almost like you're mapping the vector embeddings into a three-dimensional graph.

Starting point is 00:19:58 Yeah, so this is, so this is mapping the vector embeddings, the actual connectivities, the parent-child relationships, the commonalities of the topics itself, all of it together. Yeah, so, and I guess one of the things that I'm missing from the ability to do the graph is because you're acting what a component, and I'm assuming this is there, the business process itself part of the graph. Yeah. So the business process is in addition to this, right? So this is just the raw data itself. And as the business process gets mapped, that gets mapped onto the data shape. So you have a process map and a data map, which we can now connect because many times what you find in large enterprises is you may have a large corpus of data. But surprisingly or unsurprisingly,

Starting point is 00:20:47 your business processes run on a very small portion of that data. Yeah, I just ran a personal experiment that highlighted it. I guess we didn't. Three thousand emails that I tried to do analysis on, and it's not as simple as it sounds because of the different relationships between the business process. I'm trying to understand and just the unstructured data. Yes. So with that said, let's bring this, you know, from a high level,

Starting point is 00:21:18 I like architecture and I like to speak architecture from a, from the roles and responsibilities perspective. So I have this cluster that's running articulate. We have all of my maps and my graphs, my underlying data, my processes for ingesting this data, processes, this data, et cetera, from an application developer's perspective, am I just calling an endpoint to access this capability? What's my interface?

Starting point is 00:21:49 Yes. Yeah, so we, the primary interface is really calling APIs. So we provide APIs that an application developer can immediately start using without necessarily worrying about any of the underlying pieces. What I mean by that is if you say, for example, won't enable search. You won't enable in your use case of meantime to failure, a root cause analysis application. All your calling is the ability to call a set of agents that are either pre-built or you can build your own agents.

Starting point is 00:22:24 There are a set of pre-configure tools. You can also bring in your own tools, mix them together in a framework that actually runs all the infrastructure for you underneath. So I have a set of agents. So there's two categories of agents. There's agents that I can create at the application layer. And then there's, and I call that in my. model layer three then underneath that is the infrastructure agents your agents that are able to go out in it's not just collecting data they they can do actions because they're agents what type of

Starting point is 00:22:59 actions happens on that on your layer versus what type of actions happens at that application yeah so for example in the articulate layer you were like say fetching information connecting the dots or even analyzing the information on the fly merging them together to give it to the application most of that happens on the articulate layer sight right but even the agent that the customer is building that would interface directly with that application it could be multiple agents getting orchestrated that would be say a business process that they're defining with a conditional get they may not necessarily have a set process for getting to the condition, but once, but before giving them an outcome or an output, they need to get

Starting point is 00:23:52 through their condition. That's what they're defining the application there. Those agents can also run on article. So the, if I'm thinking about this from a IT governance and group perspective, the platform team is managing, the platform engineering team is managing articulate and they get a set of requirements from a group of application owners and they create and maintain these agents. And then the application owners can access those, call those agents as needed, get the outcomes they need as needed, or create their own agents that will run on this platform level. That's correct. And the platform engineering team, for example, would control, say, our back at the org level and the application developers would control what their applications can do

Starting point is 00:24:46 and their agents can do we are a pass through to whatever the enterprise our back is so meaning we can enforce our back that the customer is providing we are not the our back input like uh provider right so that's the the security handshake so if i'm in aWS and i am using i am as part of my our back control plane then you just integrate with whatever that whatever i'm doing that's that's right so it's a it's a fine-grained access control so whatever fine-grained access control the customer has will be a pass-through and say for example a platform team can set what we call a tenant, a tenant-wide access policy. For example, they can say this tool is accessible to the entire tenant,

Starting point is 00:25:36 or this agent or this sets of agents are accessible to the entire tenant. So let's talk about the messiness of agents a little bit. And since I'm most familiar with, actually I'm familiar with either AWS or GCP, but let's talk about AWS because I think that's the easier one for the audience to grasp. Let's say that some of this are lambdas that have to occur or listening to the AWS message bus for S3 activity. How do you help customers if the end result is that that activity or that trigger has to happen off of that cloud bus or that cloud event? Are you helping customers figure this out or are customers just basically given the API to build these connections themselves? I would say both because in many cases the customers don't want to handle that level of detail, right?

Starting point is 00:26:42 Where if they have an existing system running, then we are just integrating with that system. If it's a new system, then they don't necessarily want to get down to that level. So in the SaaS world, it used to be different Lambda functions running, of course. But now agents, as you mentioned, rightfully, are much more messier because it may not be very obvious to folks when they're running it, but most of the agents require multiple retries inside where calling a tool and getting an answer that you actually need is almost never a one-time activity. Yeah.

Starting point is 00:27:19 And those kinds of retries, making sure that things are repeatable. And in our case, everything needs to be auditable. So across the board, like you mentioned, the CISO teams, but it's not just the security teams, but also the business teams require full auditability of the process. That brings me back. I was thinking back to our session at AI Field there was a lot on the shape of data and the knowledge graph to some extent, but there wasn't much on the process map. From my perspective, and maybe I'm a little outdated,

Starting point is 00:27:52 but processes and enterprises aren't necessarily that well documented. I mean, a lot of human activity goes on to do things and that sort of stuff. How do you extract that information in an automated fashion? So in this case, the process map is automatically recorded, right? So every process that happens in the system, everything that the user is asking the system to do, is automatically logged and tagged. You're building that in real time.

Starting point is 00:28:21 Yeah, in real time. Exactly. And we showed briefly the model mesh running, and every step of it, for example, you ask a question, or you ask to generate a report, or you're saying, like, compute, say, what is the mean time to failure? Each of those trigger a set of actions,

Starting point is 00:28:41 and all of those actions are logged and tagged. And why? It seemed like it was. affecting the knowledge map yes yeah because it's a so you have your data map and then you have your activity map which is not disconnected from your data map because it's acting on the data map and many times the data that the activity is generating goes right back into your knowledge graph as well so now that we've tested on the process of mapping I asked a really I thought a great awesome question and I had to step

Starting point is 00:29:14 out and take a personal call doing our non-recorded part of it. I'm really curious about the experts that you have with the process layer, so these domain experts that you hired, some of the more successful AI projects that I've seen have been when the domain experts are actually closest to the technology, closest to the AI, so that If they both, and I think the best example of this is code development. Coders get the most production out of AI because they're both experts in AI and they're experts in code so they can accelerate their outcomes. That's right. Where are you seeing the advantages and kind of the role blocks between when you're pairing these domain experts with the AI experts to get the outcome?

Starting point is 00:30:13 Got it. So first and foremost, we go one step further, like in terms of building domain-specific models. All of our domain-specific models were built by domain experts. What I mean by that is, for example, if you take our semiconductor design model, the main developer in our AI research team who develops that used to work for DSMC, and then got his PhD in AI, and then ended up building the semiconductor model for us. I need to pause that for a sec. You had this expert in semi-conductors, who probably has a PhD in some type of relevant phase.

Starting point is 00:30:58 Just casually went out and got another PhD in AI. No, yeah, no, so he had a master's in electrical engineering. Then he worked for TSMC as a designer, and then he went out and got a PhD in AI. Okay. That makes me feel a lot less, you know, intimidating. And so back into the domain experts.

Starting point is 00:31:22 I mean, you have, I don't know, a half a dozen, maybe a dozen specific domain models at this point that you support, right? Yeah, so we have about half a dozen domain specific model. So we have an aerospace model, an energy model, a telecommunications model, semiconductor model, but we also have task-specific models. And in terms of experts that we have in-house, that's the primary domains that we actually went after in terms of building models. And the data sets that we use for those are either procured or created by us or in partnership

Starting point is 00:32:03 with consortia that own a lot of these datasets. For example, one of those kinds of partnerships that are public is in the energy domain with EPRI, which is the Electric Power Research Institute. They not only own the datasets, but they also have all the experts who validate and continuously curate the models. And if you notice, we are not in places like healthcare, not because they're not important. It's primarily because we don't have in-house expertise. So the way for us to expand is we'll have partnerships who would bring in those kinds of experts, but the primary domains we're going after, like in the last two years at least, has been all the domains that we have in-house experts in. Yeah, so I guess that brings up a question. So let's say I'm a

Starting point is 00:32:51 aerospace giant and I want to bring in articulate. What's the adoption look like? How does it fire up? I mean, obviously I've got proprietary data, I've got industry specific data, I've got processes over, I don't know, gazillion processes if I'm an aerospace giant. I mean, how does how does something like deploying articulate in our environment look like yeah so like we went after this from the get-go wanting to reduce the pain of enterprise deployments and having been on the other side of trying to implement these kinds of systems and taking too long we designed the system to be as pain free as possible on both sides because we are not hands-on keyboard on the customer side We're not a consulting shop.

Starting point is 00:33:37 So when our product gets deployed, the customers, engineers are the ones deploying it. And they need to be able to do that with as minimal training, but as quickly as possible. So the typical deployment time in the cloud, if it's a cloud that we already support between AWS and GCP and Azure, it's a matter of about three hours total in terms of going cold, not having any infrastructure, just being given an MTVPC to having the platform deployed and the data ingestion started. And most of that time goes into the actual cloud spinning up their infrastructure. Like you go ask for an EKS cluster, for example, on EWS. It takes time.

Starting point is 00:34:25 So you're going to be ingesting all this data in real time once that clusters up and operational. As you get questions, then you're starting to build the process maps. Is that how this works? Yes. So typically our promise to our customers also is that we would ingest at least 95% of the data required for that particular use case within the first 24 hours of the deployment going live. And the reason for that is we can size and scale the cluster based on that. and the 24-hour period is to make sure that the knowledge graph gets sufficiently populated. It may not be 100%, but it's sufficiently populated for you to then start using the application.

Starting point is 00:35:11 And as you start using more, the knowledge graph gets more populated. And I guess none of this is really surprising, especially in complex organizations. You're installing this cluster probably in its own dedicated VPC. That's right. And all of the security, all of the system connectivity, all of the, that infrastructure minutia that has to be worked out, that takes, even though the deployment of the cluster probably doesn't take that long.

Starting point is 00:35:38 No, what really does take is the figuring out the connectivity of rights. Yeah, the deployment takes minutes, to be honest. The actual infrastructure buildup and the full monitoring stack, because we are deploying our platform inside the customer's VPC, we also need to bring up the full monitoring stack, that the customer can use to monitor. Because we don't have any connectivity inside,

Starting point is 00:36:03 not even logs come back to us. Yeah, and there has to be a SOP developed between you and the customer to, so when they need support. Yes. Because some of this stuff, I would imagine, some of this stuff becomes mission critical. Absolutely.

Starting point is 00:36:17 And figuring out that this is not a small, no. This is not a small undertaking. No. And when, say for example, when the customer needs to send us logs, even that, is logged to make sure that they have auditability on their side, what came to our side and all of that.

Starting point is 00:36:35 Now, the other side of the spectrum is if a customer just wants to consume agents from us, like they can just go to the AWS marketplace, agent marketplace, and consume an agent. So it's as simple as that. I saw an AWS, you have two specific agents at this point. And one was, I want to characterize it as an LLM evaluation, tool? That's right. We call it

Starting point is 00:37:00 LLMIQ and we built it first of all in-house because to build model mesh we needed a system that can actually evaluate LLMs on the fly and that particular agent we put out there thinking that it will be very useful for customers and we've been proven

Starting point is 00:37:16 right there which is it's not just evaluating and giving you reports. That's one of the things that the agent can do but really it's a live agent meaning you can just ask it send it a question and it will tell you what is the best model to answer that question or do a task? Yeah, so this is super, I've unfortunately, I guess fortunately, because I'm learning.

Starting point is 00:37:42 But this is one of those things that the industry hasn't done us a very good service to when it comes to benchmarking models. We understand how fast DeepSeek will run on. on H100, 200 versus Intel Zeon processor. But that's useless when it comes to understanding what model do I need to run to get my anthology correct when I'm building a digital version of myself. That's right.

Starting point is 00:38:16 And this one will do it real time for you because you say here is my question or here's my task. Real time it'll tell you within milliseconds, what is the best model to use for that particular task, and you can use it as a router in an application if you wanted. I cannot grok how you're making this decision. To say, you know, I'm asking, you know, I want to digitize myself. It's a wrong question, obviously.

Starting point is 00:38:43 It needs to be much more specific. But how would you assess which of the, you know, half dozen LLMs out there would be the best to use? Very good question. So maybe I'll answer your question with a question. what do you think is my largest or some of my largest infrastructure costs for? Oh, as far as the company is concerned? Computational flow models, right?

Starting point is 00:39:09 Yeah, so typically the answer would be training models, which is true. So we do spend quite a bit on training models. But my second largest spend is on evaluating models. And the reason I can even produce an agent like LLMIQ is because we, we constantly, on a daily basis, evaluate pretty much all of the models that are out there that are state-of-the-art models on any of the benchmarks that are out there, on all our benchmarks. And most of those actually change on a daily basis. And we are running millions of inferences per model per day.

Starting point is 00:39:46 Yeah, so why let that data insight just sit internally? This becomes a very practical tool. Because as I've been, you know, building AI projects, one of my biggest questions, even on my little GV10 is, what's the right combination? Do I need a 70 billion parameter parameter model? Or will a 3 billion parameter model work for this specific task? And which 3 billion parameter model? Is it Quinn 2.5? Is it Quinn 3? Like, these are not, I'm just a guy in a basement. It will take me forever. to answer these questions. Yeah, and even if you did answer the questions, right, they are very effemittal. Like, they change almost on a day-to-day. They definitely change on a week-to-week basis. Yeah, when I, you know, in getting geekier, you know, which if I use the quantified, the quant-eight version of this model versus the 16, do I lose, do I lose a fidelity in my desire outcome? Like, these are all the questions.

Starting point is 00:40:53 that enterprises are struggling. Exactly. And the thing about LLMIQ is we today have two modes. One mode is it will generate a report for you. If you have a set of questions you just asked, or the other mode is it will give you a real-time answer that you can plug into a real-time pipeline. But we're adding a third mode, which is if you have questions like this,

Starting point is 00:41:14 exactly the same questions you ask, like should I use a quantized version, or should I use a version that is deployed on an L4 versus an A-10 versus an A-100, depending on whether you want more throughput or you want to optimize for latency, things like that. You don't necessarily have to be an expert to make these kinds of decisions. These are all like interrogations you can run with these. These come with serious infrastructure costs repercussions depending on, you know, you can really

Starting point is 00:41:52 overkill your infrastructure, you know, have overkill infrastructure. And it'll look right from a utilization perspective. The GPUs are being effectively utilized. You're just using the wrong models for the outcome. Yes. So this is a new feature of the LLM IQs to actually talk about the infrastructure level that could be used to run the model? Yes. So that's one of the features that we'll be releasing very soon, where we are like not just talk about infrastructure, but you can actually run an optimization. So many times this is a full-blown optimization where it's a perito optimal.

Starting point is 00:42:29 There is no one answer. It's about what trade-off you're willing to make. And the other AWS agent, I'm thinking it's a networking solution. Yes, the second one we debuted was a network topology agent. And the reason we also did that was every agent that you see, Every agent that you see out there most of the time is very general purpose. We wanted to introduce the notion of a domain-specific agent that does something very unique for certain domains.

Starting point is 00:43:02 So in this particular case, it's very useful in the networking world. It's also useful in the cybersecurity world. But if you have a complex set of logs and all you really have is logs, you don't necessarily know what system generated the logs. this would infer the topology of the underlying system, which in this particular case could be a set of networking gear in a telecommunications company, but it could also come from a complex cloud deployment

Starting point is 00:43:32 and you're looking at a cybersecurity law. And it actually infers the topology, and it can also detect whether there are any anomalies that are going through in the data set that is coming in. I don't know about you, but I have about 5,000 more questions. I don't know if we have the time because I've forgotten half of what we started out with. And then I'm going to like operational issues like, you know, who's handling driver hell and the driver held experience I just had with my GB10 just a few days ago.

Starting point is 00:44:09 And all of these questions that I still have, there's, this is a, you guys have done a lot of work and a relatively short period of time given where AI is at in general. Yeah, so we've been fortunate to be at the right places at the right time. So in fact, the actual founding team and the team that's building this, we've been at this since 2010, 2011,

Starting point is 00:44:37 doing large-scale GPU runs from the high-performance computing world to building some of the earliest models with, if you remember, the days of auto ML. We were building thousands of models trying to figure out what architecture works. And we were also part of the core team

Starting point is 00:44:54 at say Amazon when all of this infrastructure and the initial training models ended up getting trained. It's really that pedigree that got us here. Of course, as a company, we've been around only for two years, so less than two years, but we've been at it for quite a while. Well, Keith, you got one more chance. Is there some, what's the last question? Is there any last questions you'd like to ask

Starting point is 00:45:20 Arun? So I guess that is, is how often does this break? Because it is, AI infrastructure is still pretty fragile. So when you say this, like you mean the actual, not the application layer, the not calling application layer, the agents, but like practical stuff changes from day to day. Yes. Like, me getting a working version of VLM of is like a daily struggle for it. That's right. So that's what we call, what I would call undifferentiated heavy lifting in at least the application space. Right.

Starting point is 00:45:59 So the reason why I started with, we need at least 8 GPUs is when you get into an enterprise, a minimum level of availability is needed. And just because one model hangs or one GPU hangs, you really cannot have your application fail. And it may not be full-blown high availability use case, but then there's a certain notion of an SLA for availability. That's one thing. And you need to be resilient in terms of failures. There is no notion of being fail safe. You just have to accommodate failure, but still have your application working. And this is where the agentic workflow comes into play. You know, because it's fragile, if I have agents running the process and I make an API request to say,

Starting point is 00:46:51 hey, give me this data set, you can have some level of guarantee because there's an agent working to make this request happen. And then we'll retry it until it succeeds. At least sometime until it succeeds. And also, we have full vertical control from the infrastructure layer all the way up. So all of the models are ours or models that we're. a test. And so that level of control is required for us to be able to go into any of these kinds of environments. All right. Arun, is there anything else you'd like to say to our listening

Starting point is 00:47:25 audience before we close? I would say first and foremost, thank you for having me. But then in the enterprise world, it's never been this exciting. The kinds of use cases that are coming to us and the kinds of end users who are actually able to take advantage of really complex, what you would call agentic flows. And doing this at scale in production is really what energizes us. So if you're out there like having use cases in any of the industries I mentioned or even in industries that I didn't particularly mention that you think would benefit from having a platform that can actually understand your domain and run complex use cases, please reach out. But we're super excited to be solving problems like that.

Starting point is 00:48:12 All right. Well, this has been great. Thanks again for being on our show today. Ray and Keith, thank you so much for having me. And that's it for now. Bye, Roon. Bye, Keith. Bye, Ray. Bye.

Starting point is 00:48:24 Until next time. Next time, we will talk to the list system storage technology person. Any questions you want us to ask, please let us know. And if you enjoy our podcast, tell your friends about it. Please review us on Apple Podcast, Google Play, and Spotify, as this will help get the word out. Thank you.

Grey Beards on Systems - 172: Greybeards talk domain specific AI with Dr. Arun Subramaniyan, Founder & CEO, Articul8 AI

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.