Grey Beards on Systems - 172: Greybeards talk domain specific AI with Dr. Arun Subramaniyan, Founder & CEO, Articul8 AI
Episode Date: November 24, 2025At AIFD7, Articul8 AI presented a unique non-GenAI solution for industry-specific challenges. On our podcast, Dr. Arun Subramaniyan, Founder and CEO, Articul8 discussed how they deploy domain-specific... models across verticals. Their technology ingests and maps corporate data plus business processes to help automate and augment those business critical systems of record. Listen to the podcast to learn more...
Transcript
Discussion (0)
Hey everybody, Ray LaCasey here with Keith Townsend.
Welcome to another sponsored episode of the Greybirds on Storage Podcasts,
show where we get Greybirds bloggers together with storage assist the vendors
to discuss upcoming products, technologies, and trends affecting the Peta Center today.
We have with us today, Arun Subramanian, founder and CEO of Articulate 8 AI.
He and his team just did a session at AI Field Day 7 a couple of weeks back with Keith and I in attendance.
I thought he would make a great guest for our show.
So Arun, why don't you tell us a little bit about yourself and what Articulate AI is all about?
Ray, thank you so much for having me.
It's been a fantastic experience at the Field Day.
So Articulate is a domain-specific Gen.A.I. Platform company that's focused on building a platform that enables use cases across a variety of high-value use cases and industries such as manufacturing, industrial, semiconductors, automotive aerospace, across the board, where the use cases require a level of domain expertise.
to even get started.
And once you get started, it's the level of scale
and complexity that actually drives the requirements
of a platform similar to what we're building.
And we've been working very closely across several industries
and deploying it in production at scale.
And that's really has been the focus of what we've been building.
So when you mentioned scale,
are you is articulate running both in the cloud
as well as in,
and customer environments and things of that nature?
That's right.
So we are by nature focused on industries
that are typically highly regulated,
which means security is a non-negotiable starting point.
And in most of the cases,
our customers run our entire platform
inside their security perimeter.
So even if they're running in the cloud,
we are running inside their virtual private club.
Right, right, right, right.
What does the backend for articulating look like?
Is this running on a box?
Is it a GPU box?
What is the infrastructure?
Yeah, so typically it involves at least a few nodes or servers,
and we do require GPUs at least a bad minimum to get started
because the platform is driven by quite a few domain-specific models
and some general purpose models.
And typically, the smaller,
footprint we run on is about eight GPUs but the recommendation is about 16
GPUs and the GPUs can be anywhere from an L4 to a H-2-H-100 oh no that's that's
actually quite bigger than I then I thought so the one of the details we didn't
get into at field day is kind of the the breath and death of the mop
that's running on the cluster, how much of this is articulate required for kind of the control
plane for articulate, and how much of this is required or headroom for the models themselves
that customers are using?
Yeah, that's a very good question, Keith.
So most of that is really for the kind of scale the customer requires.
The articulate platform itself and the control plane for articulate runs on a
very minimal footprint. It's just really on a single CPU node and nothing else is
required for that. However, what's acquired for most of the customer use cases is a combination
of domain specific models and general purpose models that get deployed. And we have model mesh,
that's our platform, that is a runtime reasoning engine and reasons on which models to call for
what tasks and in what sequence depending on the outcome that particular task is actually
acquiring and what is the underlying data that feeds it so it's it's effectively an agenic
solution that that calls multiple models based on whatever is required that is true and we have
been doing it way before the term agentic framework has been coined but this one think of this as an
agent of agents because it's multiple agents working together and
think at runtime, deciding autonomously or semi-autonomously what to do to solve a task.
We call a task a mission, and when missions run, it depends on the complexity of the data that
comes in and what kind of outcomes are generated by intermediate agents that trigger other agents
downstream.
So how tightly woven is your agent platform into kind of the underlay versus the
desired outcome. So let's say that the desired outcome is that you want to reduce meantime to
failure for a piece of hardware equipment and there are some agents working to do that outcome.
Where's this where's the D mark between articulate in the in kind of the the application
that's controlling the outcome or is that all melded together? So that is typically a
a clean separation between the application and the logic that drives the application.
We are usually predominantly in the logic that is driving the application all the way from
the, say, the infrastructure layer up to the application API layer.
And in the specific example you gave, for example, the data sets required to feed, say, a meantime to failure model.
and also running the failure model,
all of those would be taken care by the articulate engine.
But if there is a user input required at the application level,
or typically that integrates into whatever application our customers are running,
then that layer is what is about the articulate platform.
I read somewhere in some of your documentation that,
You really don't capture data as much as capture metadata.
I think it's from a security perspective.
Is that true?
That's actually a very good point as well.
So most of our customers do not want to copy their systems of record.
And most of our applications, we interface directly with systems of record.
So there's no notion of us trying to move that data into our platform.
by design, we
what we call
perceive information
from existing data stores
and only store metadata
that we need to store.
So I give you the...
No, please, go ahead.
Yeah, I'll give you an example of that, right?
So, for example, if you're reading a PDF document,
the document, the actual document
necessarily doesn't get transferred.
However, if there is a table in the document,
we understand the table, we crop it,
the image of a table gets stored on our site,
uh the the understanding of the table meaning extracting that information out and then the
extracted table gets maybe stored as a data frame that is what gets stored on our side
and then the associated uh say downstream processing like embeddings of those particular
extracted information but the raw data continues to stay at the customer side with a semantic
link to our knowledge graph.
One of the things that I found surprising in reading your documentation is that you, I deal a lot
with images and structured, it's not really a structured document, as much of a structured image.
Can you talk about that?
I mean, it seems like you're able to extract understanding from, let's say, a picture of a
spreadsheet almost.
That's right.
So, in fact, most of our use cases involve.
us ingesting or understanding or perceiving multi-modal data sets.
And some of our customers in regulated industries might call things like fully scanned
non-OCRable kind of files as PDFs, sometimes from the 1960s or 70s.
It's really being able to extract information that's meaningful from that.
quite powerful from the get-go. The reason I use the table example is most of the
models out there today can read tables at a 70-80% accuracy. The problem with
industrial settings and especially in large enterprises is most of the tables are
not clean, meaning there are rows or columns that are merged, there are like
units of numbers in the tables that are not necessarily in the table itself but somewhere else
budded in the dogs are cryptic things of that nature yes and when the user asks the question naturally
they want the system to be able to understand these nuances and connect the dots and in many cases
these documents have the same tables or similar tables with similar sounding column names or variable
names all over the place. So disambigrating that is really one of the things that we do well.
So let's start to untangle some of the complexity of what you folks are abstract.
So at the lowest level, well, not even the lowest level, but probably the lowest level that you
folks engage with the overall infrastructure, I have something serving up a model. That model has
been tuned by your team and the platform, but at the end of the day, to identify a image in
the chain of agentic AI, some model has to run. And fundamentally, I am going to get a hit rate,
a successful hit rate, just based on the model. This is where, you know, a lot of the failure
happens. So, you know, let's say we, without the fine-tuning, we're
we're getting 70%, 80% accuracy.
With the fine tuning, we're jumping to 85, 90%.
What's the secret sauce?
Where are you guys coming in to play with your platform
to improve upon what the raw model is doing?
So two things we do, right?
So first and foremost, we build a lot of these task-specific
or domain-specific models ourselves.
And in many cases, these models are built from the ground up.
These are not just fine-tuned models.
And the reason for that is you need them to do some very specific things that would get diluted
if you take a large model or if you take a general purpose model and try to fine tune it.
That's what.
And the second piece is even if you had a lot of these models that are individually
super accurate, mixing them together generally is a hard problem because the errors propagate
if you string them along.
And second, you need to know when to call what.
and in what sequence.
So it's really that intelligence layer.
Think of it as, like the example I would use is think of what deep seek did.
So adding reasoning to the overall process and allowing that reasoning chain to get through
to your final outcome.
Think of doing that reasoning chain as a system.
So instead of calling the reasoning chain inside a model, now you're calling the reasoning
chain across multiple models. That's really the secret source that we've built. And of course,
like for that to work efficiently, we also need to be able to build the models in between
and have a system that understands how to evaluate all of these different models on the same playing field.
So that's super helpful. Now this ties back to the earlier question about sizing. Are you sizing the
infrastructure to do the training of those models?
offend the customer's security domain or are you folks bringing that ability to train to a different
cluster? So typically the sizing when we get into a customer deployment initially would have
no training. So it is purely deploying the existing models we have or if we had to build new
models for that particular domain, not necessarily that customer but for that particular domain,
we would build it outside with data sets that we procure or augment ourselves.
So we don't build any models with the customer's data that goes into the platform.
So that's the, I'm curious because now the size of the cluster is really something that I'm curious about.
A lot of the industry numbers are showing utilization of these similar size cut clusters at around 10%.
Because inferencing just really isn't that, you know, unless these models are much bigger than,
what I'm guessing, I guess that's one part of the question.
Then the second part of the question is, you know, what's the utilization of these clusters
for a typical customer?
Perfect question.
So first and foremost, the general misconception is that a lot of these domain-specific
models are also small language models.
In our case, the size of the model depends on two important things.
One is how much data is there to ground the model?
That's number one.
But the second one is what is the specificity of the tasks that you want the model to do?
So the way we build these models is that we will build the largest model possible
that is supported by the data sets we have.
And then we would condense the models down to the smallest size possible.
and smallest size is required for the specific tasks we are going after.
And the reason we do that,
and notice I'm using the word condense,
not necessarily quantize or anything like that,
because the way the large model gets processed to a small model
depends again on the task and the particular dataset.
Did you call it distillation?
So the reason I'm not using the word distillation or quantization or anything like that
is because those are all associated with specific techniques.
And we use all of them, but it depends on a particular task and the particular type of data.
Yeah, I think you're highlighting one of the easy, obviously a misconception that I had is that the goal of these models isn't the same goal as a general purpose model.
The goal of these models is to do a specific task.
So you're trying to make it as accurate as possible for the given task.
So the model itself may be quite bigger than your typical general purpose model.
It's possible.
So it depends on the domain, the model can be pretty large.
So just to give you a sense of why there's 8 or 16 GPUs required,
and the GPUs can be different sizes, right?
So it can go from all the way from L4 to an A100.
Now, it's because the model sizes we deploy go from a 3 billion parameter model,
all the way up to a few hundred billion parameter models.
And it depends on the kind of use case we are walking into
that gives the range of the models that get deployed.
Usually the larger models are the general purpose models
or a model that is built for an entire domain.
Yeah, so, okay, I'm getting a better picture.
So for a domain model, it may, you know,
it might skew to that larger piece,
but then there is ultimately some level of reasoning.
that's probably best done on a larger foundation model
because why recreate something that's already been proven?
Yes.
Versus a task-specific model,
and again coming back to a model like a table understanding model,
which takes an image of a table and outputs the actual extracted table
as, say, a JSON output is an 8 billion parameter model.
So since you folks don't collect data yourself, there's, I didn't hear any kind of notion of a vector store or a vector table to go back and do rag against or am I misunderstanding when you say you don't sort data.
You're not, you're not storing, you're not copying the data by you keep creating some type of vector store of the data that's already existing.
Yeah, so we actually have all sorts of different stores, including vector stores, but our primary store is a graph.
But the vector store is also very important because the graph nodes connect to the vector stores.
So depending on what gets embedded, for example, an image would get embedded, like sometimes an entire document gets embedded, depending on what we choose to embed.
But our primary source of retrieval is the graph, not direct this tool.
I mean, you showed at the AI field day seven, the knowledge graph.
I'm not sure what domain it was, but it was a very interesting three-dimensional graph,
which had almost a parameterized view of the data, I guess.
I'm not sure that's even correct.
Yeah, so we call that the shape of data, because we are,
Yes, the reason for that is we are using the words perceive very carefully because it's something that's active.
So when the data is processed, you're not only just analyzing the information, but you're also analyzing the connectivity,
meaning this table came from this image, but then from this page, but then there's another document with a similar table with a similar concept.
So the concepts are also aggregated, so topics are auto-generated, and the topics are,
are aggregated into themes.
So there is a hierarchical clustering that happens.
And that hierarchy of information is really
what is stored in the graph.
And the reason we call it shape of data
is this the distance that is displayed in the graph
that you guys saw actually has meaning.
Meaning if the nodes on the graph are near each other
versus they are far away from each other,
that shows whether the topics are close together
in that particular space,
or they're far apart.
It's almost like you're mapping the vector embeddings into a three-dimensional graph.
Yeah, so this is, so this is mapping the vector embeddings, the actual connectivities,
the parent-child relationships, the commonalities of the topics itself, all of it together.
Yeah, so, and I guess one of the things that I'm missing from the ability to do the graph
is because you're acting what a component, and I'm assuming this is there, the business process itself
part of the graph. Yeah. So the business process is in addition to this, right? So this is just the raw
data itself. And as the business process gets mapped, that gets mapped onto the data shape. So you have
a process map and a data map, which we can now connect because many times what you find in large
enterprises is you may have a large corpus of data. But surprisingly or unsurprisingly,
your business processes run on a very small portion of that data.
Yeah, I just ran a personal experiment that highlighted it.
I guess we didn't.
Three thousand emails that I tried to do analysis on, and it's not as simple as it sounds
because of the different relationships between the business process.
I'm trying to understand and just the unstructured data.
Yes.
So with that said, let's bring this, you know, from a high level,
I like architecture and I like to speak architecture from a,
from the roles and responsibilities perspective.
So I have this cluster that's running articulate.
We have all of my maps and my graphs, my underlying data,
my processes for ingesting this data, processes, this data, et cetera,
from an application developer's perspective,
am I just calling an endpoint to access this capability?
What's my interface?
Yes.
Yeah, so we, the primary interface is really calling APIs.
So we provide APIs that an application developer can immediately start using
without necessarily worrying about any of the underlying pieces.
What I mean by that is if you say, for example, won't enable search.
You won't enable in your use case of meantime to failure,
a root cause analysis application.
All your calling is the ability to call a set of agents that are either pre-built or you can build your own agents.
There are a set of pre-configure tools.
You can also bring in your own tools, mix them together in a framework that actually runs all the infrastructure for you underneath.
So I have a set of agents.
So there's two categories of agents.
There's agents that I can create at the application layer.
And then there's, and I call that in my.
model layer three then underneath that is the infrastructure agents your agents that are able to go
out in it's not just collecting data they they can do actions because they're agents what type of
actions happens on that on your layer versus what type of actions happens at that application
yeah so for example in the articulate layer you were like say fetching information connecting
the dots or even analyzing the information on the fly merging them together to give it to the
application most of that happens on the articulate layer sight right but even the agent that
the customer is building that would interface directly with that application it could be multiple
agents getting orchestrated that would be say a business process that they're defining with
a conditional get they may not necessarily have a set process for getting
to the condition, but once, but before giving them an outcome or an output, they need to get
through their condition. That's what they're defining the application there. Those agents can also
run on article. So the, if I'm thinking about this from a IT governance and group perspective,
the platform team is managing, the platform engineering team is managing articulate and they get a set of
requirements from a group of application owners and they create and maintain these agents.
And then the application owners can access those, call those agents as needed, get the outcomes
they need as needed, or create their own agents that will run on this platform level.
That's correct. And the platform engineering team, for example, would control, say,
our back at the org level and the application developers would control what their applications can do
and their agents can do we are a pass through to whatever the enterprise our back is so meaning we can
enforce our back that the customer is providing we are not the our back input like uh provider
right so that's the the security handshake so if i'm in aWS and i am using i am as part of my
our back control plane then you just integrate with whatever that whatever i'm doing that's
that's right so it's a it's a fine-grained access control so whatever fine-grained access control
the customer has will be a pass-through and say for example a platform team can set
what we call a tenant, a tenant-wide access policy.
For example, they can say this tool is accessible to the entire tenant,
or this agent or this sets of agents are accessible to the entire tenant.
So let's talk about the messiness of agents a little bit.
And since I'm most familiar with, actually I'm familiar with either AWS or GCP,
but let's talk about AWS because I think that's the easier one for the audience to grasp.
Let's say that some of this are lambdas that have to occur or listening to the AWS message bus for S3 activity.
How do you help customers if the end result is that that activity or that trigger has to happen off of that cloud bus or that cloud event?
Are you helping customers figure this out or are customers just basically given the API to build these connections themselves?
I would say both because in many cases the customers don't want to handle that level of detail, right?
Where if they have an existing system running, then we are just integrating with that system.
If it's a new system, then they don't necessarily want to get down to that level.
So in the SaaS world, it used to be different Lambda functions running, of course.
But now agents, as you mentioned, rightfully, are much more messier because it may not be very
obvious to folks when they're running it, but most of the agents require multiple retries
inside where calling a tool and getting an answer that you actually need is almost never
a one-time activity.
Yeah.
And those kinds of retries, making sure that things are repeatable.
And in our case, everything needs to be auditable.
So across the board, like you mentioned, the CISO teams, but it's not just the security teams,
but also the business teams require full auditability of the process.
That brings me back.
I was thinking back to our session at AI Field there was a lot on the shape of data and the knowledge graph to some extent,
but there wasn't much on the process map.
From my perspective, and maybe I'm a little outdated,
but processes and enterprises aren't necessarily that well documented.
I mean, a lot of human activity goes on to do things and that sort of stuff.
How do you extract that information in an automated fashion?
So in this case, the process map is automatically recorded, right?
So every process that happens in the system,
everything that the user is asking the system to do,
is automatically logged and tagged.
You're building that in real time.
Yeah, in real time.
Exactly.
And we showed briefly the model mesh running,
and every step of it, for example, you ask a question,
or you ask to generate a report,
or you're saying, like, compute, say,
what is the mean time to failure?
Each of those trigger a set of actions,
and all of those actions are logged and tagged.
And why?
It seemed like it was.
affecting the knowledge map yes yeah because it's a so you have your data map and then
you have your activity map which is not disconnected from your data map because it's
acting on the data map and many times the data that the activity is generating goes
right back into your knowledge graph as well so now that we've tested on the process
of mapping I asked a really I thought a great awesome question and I had to step
out and take a personal call doing our non-recorded part of it.
I'm really curious about the experts that you have with the process layer, so these domain
experts that you hired, some of the more successful AI projects that I've seen have been
when the domain experts are actually closest to the technology, closest to the AI, so that
If they both, and I think the best example of this is code development.
Coders get the most production out of AI because they're both experts in AI and they're experts in code so they can accelerate their outcomes.
That's right.
Where are you seeing the advantages and kind of the role blocks between when you're pairing these domain experts with the AI experts to get the outcome?
Got it. So first and foremost, we go one step further, like in terms of building domain-specific
models. All of our domain-specific models were built by domain experts. What I mean by that is,
for example, if you take our semiconductor design model, the main developer in our AI research
team who develops that used to work for DSMC, and then got his PhD in AI, and then
ended up building the semiconductor model for us.
I need to pause that for a sec.
You had this expert in semi-conductors,
who probably has a PhD in some type of relevant phase.
Just casually went out and got another PhD in AI.
No, yeah, no, so he had a master's in electrical engineering.
Then he worked for TSMC as a designer,
and then he went out and got a PhD in AI.
Okay.
That makes me feel a lot less, you know,
intimidating.
And so back into the domain experts.
I mean, you have, I don't know,
a half a dozen, maybe a dozen specific domain models at this point that you support, right?
Yeah, so we have about half a dozen domain specific model.
So we have an aerospace model, an energy model,
a telecommunications model,
semiconductor model, but we also have task-specific models. And in terms of experts that we have
in-house, that's the primary domains that we actually went after in terms of building models.
And the data sets that we use for those are either procured or created by us or in partnership
with consortia that own a lot of these datasets. For example, one of those kinds of partnerships
that are public is in the energy domain with EPRI, which is the Electric Power Research Institute.
They not only own the datasets, but they also have all the experts who validate and continuously
curate the models. And if you notice, we are not in places like healthcare, not because
they're not important. It's primarily because we don't have in-house expertise. So the way for us
to expand is we'll have partnerships who would bring in those kinds of experts, but the primary
domains we're going after, like in the last two years at least, has been all the domains that
we have in-house experts in. Yeah, so I guess that brings up a question. So let's say I'm a
aerospace giant and I want to bring in articulate. What's the adoption look like? How does it fire
up? I mean, obviously I've got proprietary data, I've got industry specific data, I've got
processes over, I don't know, gazillion processes if I'm an aerospace giant. I mean, how does
how does something like deploying articulate in our environment look like yeah so like we went after
this from the get-go wanting to reduce the pain of enterprise deployments and having been on the other
side of trying to implement these kinds of systems and taking too long we designed the system to be
as pain free as possible on both sides because we are not hands-on keyboard on the customer side
We're not a consulting shop.
So when our product gets deployed, the customers, engineers are the ones deploying it.
And they need to be able to do that with as minimal training, but as quickly as possible.
So the typical deployment time in the cloud, if it's a cloud that we already support between AWS and GCP and Azure,
it's a matter of about three hours total in terms of going cold, not having any infrastructure,
just being given an MTVPC to having the platform deployed and the data ingestion started.
And most of that time goes into the actual cloud spinning up their infrastructure.
Like you go ask for an EKS cluster, for example, on EWS.
It takes time.
So you're going to be ingesting all this data in real time once that clusters up and operational.
As you get questions, then you're starting to build the process maps.
Is that how this works?
Yes.
So typically our promise to our customers also is that we would ingest at least 95% of the data required for that particular use case within the first 24 hours of the deployment going live.
And the reason for that is we can size and scale the cluster based on that.
and the 24-hour period is to make sure that the knowledge graph gets sufficiently populated.
It may not be 100%, but it's sufficiently populated for you to then start using the application.
And as you start using more, the knowledge graph gets more populated.
And I guess none of this is really surprising, especially in complex organizations.
You're installing this cluster probably in its own dedicated VPC.
That's right.
And all of the security, all of the system connectivity, all of the,
that infrastructure minutia that has to be worked out,
that takes, even though the deployment of the cluster
probably doesn't take that long.
No, what really does take is the figuring out
the connectivity of rights.
Yeah, the deployment takes minutes, to be honest.
The actual infrastructure buildup and the full monitoring stack,
because we are deploying our platform inside the customer's
VPC, we also need to bring up the full monitoring stack,
that the customer can use to monitor.
Because we don't have any connectivity inside,
not even logs come back to us.
Yeah, and there has to be a SOP developed
between you and the customer to,
so when they need support.
Yes.
Because some of this stuff, I would imagine,
some of this stuff becomes mission critical.
Absolutely.
And figuring out that this is not a small,
no.
This is not a small undertaking.
No.
And when, say for example, when the customer
needs to send us logs, even that,
is logged to make sure that they have auditability on their side,
what came to our side and all of that.
Now, the other side of the spectrum
is if a customer just wants to consume agents from us,
like they can just go to the AWS marketplace,
agent marketplace, and consume an agent.
So it's as simple as that.
I saw an AWS, you have two specific agents at this point.
And one was, I want to characterize it as an LLM evaluation,
tool? That's right. We call it
LLMIQ and we built it first of all
in-house because to build
model mesh we needed a system
that can actually evaluate LLMs
on the fly and
that particular agent we put out there
thinking that it will be
very useful for customers and we've been proven
right there which is
it's not just evaluating and giving
you reports. That's one of the things that the agent
can do but really it's a live
agent meaning you can just ask it
send it a question
and it will tell you what is the best model to answer that question or do a task?
Yeah, so this is super, I've unfortunately, I guess fortunately, because I'm learning.
But this is one of those things that the industry hasn't done us a very good service to
when it comes to benchmarking models.
We understand how fast DeepSeek will run on.
on H100, 200 versus Intel Zeon processor.
But that's useless when it comes to understanding
what model do I need to run to get my anthology correct
when I'm building a digital version of myself.
That's right.
And this one will do it real time for you
because you say here is my question or here's my task.
Real time it'll tell you within milliseconds,
what is the best model to use for that particular task,
and you can use it as a router in an application if you wanted.
I cannot grok how you're making this decision.
To say, you know, I'm asking, you know, I want to digitize myself.
It's a wrong question, obviously.
It needs to be much more specific.
But how would you assess which of the, you know,
half dozen LLMs out there would be the best to use?
Very good question.
So maybe I'll answer your question with a question.
what do you think is my largest or some of my largest infrastructure costs for?
Oh, as far as the company is concerned?
Computational flow models, right?
Yeah, so typically the answer would be training models, which is true.
So we do spend quite a bit on training models.
But my second largest spend is on evaluating models.
And the reason I can even produce an agent like LLMIQ is because we,
we constantly, on a daily basis, evaluate pretty much all of the models that are out there
that are state-of-the-art models on any of the benchmarks that are out there, on all our benchmarks.
And most of those actually change on a daily basis.
And we are running millions of inferences per model per day.
Yeah, so why let that data insight just sit internally?
This becomes a very practical tool.
Because as I've been, you know, building AI projects, one of my biggest questions, even on my little GV10 is, what's the right combination? Do I need a 70 billion parameter parameter model? Or will a 3 billion parameter model work for this specific task? And which 3 billion parameter model? Is it Quinn 2.5? Is it Quinn 3? Like, these are not, I'm just a guy in a basement. It will take me forever.
to answer these questions. Yeah, and even if you did answer the questions, right, they are very
effemittal. Like, they change almost on a day-to-day. They definitely change on a week-to-week basis.
Yeah, when I, you know, in getting geekier, you know, which if I use the quantified,
the quant-eight version of this model versus the 16, do I lose, do I lose a fidelity in my
desire outcome? Like, these are all the questions.
that enterprises are struggling.
Exactly.
And the thing about LLMIQ is we today have two modes.
One mode is it will generate a report for you.
If you have a set of questions you just asked,
or the other mode is it will give you a real-time answer
that you can plug into a real-time pipeline.
But we're adding a third mode, which is if you have questions like this,
exactly the same questions you ask,
like should I use a quantized version,
or should I use a version that is deployed on an L4 versus an A-10
versus an A-100, depending on whether you want more throughput or you want to optimize for latency,
things like that.
You don't necessarily have to be an expert to make these kinds of decisions.
These are all like interrogations you can run with these.
These come with serious infrastructure costs repercussions depending on, you know, you can really
overkill your infrastructure, you know, have overkill infrastructure. And it'll look right from a
utilization perspective. The GPUs are being effectively utilized. You're just using the wrong
models for the outcome. Yes. So this is a new feature of the LLM IQs to actually talk about
the infrastructure level that could be used to run the model? Yes. So that's one of the features
that we'll be releasing very soon, where we are like not just talk about infrastructure,
but you can actually run an optimization.
So many times this is a full-blown optimization
where it's a perito optimal.
There is no one answer.
It's about what trade-off you're willing to make.
And the other AWS agent, I'm thinking it's a networking solution.
Yes, the second one we debuted was a network topology agent.
And the reason we also did that was every agent that you see,
Every agent that you see out there most of the time is very general purpose.
We wanted to introduce the notion of a domain-specific agent
that does something very unique for certain domains.
So in this particular case, it's very useful in the networking world.
It's also useful in the cybersecurity world.
But if you have a complex set of logs and all you really have is logs,
you don't necessarily know what system generated the logs.
this would infer the topology of the underlying system,
which in this particular case could be a set of networking gear
in a telecommunications company,
but it could also come from a complex cloud deployment
and you're looking at a cybersecurity law.
And it actually infers the topology,
and it can also detect whether there are any anomalies
that are going through in the data set that is coming in.
I don't know about you, but I have about 5,000 more questions.
I don't know if we have the time because I've forgotten half of what we started out with.
And then I'm going to like operational issues like, you know, who's handling driver hell
and the driver held experience I just had with my GB10 just a few days ago.
And all of these questions that I still have, there's, this is a, you guys have done a lot of work
and a relatively short period of time
given where AI is at in general.
Yeah, so we've been fortunate to be
at the right places at the right time.
So in fact, the actual founding team
and the team that's building this,
we've been at this since 2010, 2011,
doing large-scale GPU runs
from the high-performance computing world
to building some of the earliest models
with, if you remember,
the days of auto ML.
We were building thousands of models
trying to figure out what architecture works.
And we were also part of the core team
at say Amazon when all of this
infrastructure and the initial training models
ended up getting trained.
It's really that pedigree that got us here.
Of course, as a company, we've been around only for two years,
so less than two years, but we've been at it for quite a while.
Well, Keith, you got one more
chance. Is there some, what's the last question? Is there any last questions you'd like to ask
Arun? So I guess that is, is how often does this break? Because it is,
AI infrastructure is still pretty fragile. So when you say this, like you mean the
actual, not the application layer, the not calling application layer, the agents, but like practical
stuff changes from day to day. Yes.
Like, me getting a working version of VLM of is like a daily struggle for it.
That's right.
So that's what we call, what I would call undifferentiated heavy lifting in at least the application space.
Right.
So the reason why I started with, we need at least 8 GPUs is when you get into an enterprise,
a minimum level of availability is needed.
And just because one model hangs or one GPU hangs, you really cannot have your
application fail. And it may not be full-blown high availability use case, but then there's a
certain notion of an SLA for availability. That's one thing. And you need to be resilient in terms of
failures. There is no notion of being fail safe. You just have to accommodate failure, but still
have your application working. And this is where the agentic workflow comes into play. You know,
because it's fragile, if I have agents running the process and I make an API request to say,
hey, give me this data set, you can have some level of guarantee because there's an agent
working to make this request happen.
And then we'll retry it until it succeeds.
At least sometime until it succeeds.
And also, we have full vertical control from the infrastructure layer all the way up.
So all of the models are ours or models that we're.
a test. And so that level of control is required for us to be able to go into any of these
kinds of environments. All right. Arun, is there anything else you'd like to say to our listening
audience before we close? I would say first and foremost, thank you for having me. But then in the
enterprise world, it's never been this exciting. The kinds of use cases that are coming to us
and the kinds of end users who are actually able to take advantage of really complex, what you
would call agentic flows. And doing this at scale in production is really what energizes us.
So if you're out there like having use cases in any of the industries I mentioned or even
in industries that I didn't particularly mention that you think would benefit from having a
platform that can actually understand your domain and run complex use cases, please reach out.
But we're super excited to be solving problems like that.
All right. Well, this has been great.
Thanks again for being on our show today.
Ray and Keith, thank you so much for having me.
And that's it for now.
Bye, Roon.
Bye, Keith.
Bye, Ray.
Bye.
Until next time.
Next time, we will talk to the list system storage technology person.
Any questions you want us to ask, please let us know.
And if you enjoy our podcast, tell your friends about it.
Please review us on Apple Podcast, Google Play, and Spotify, as this will help get the word out.
Thank you.
