Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 06x04: Keeping Your GPUs Fed with a Data Pipeline from Hammerspace with Molly Presley
Episode Date: March 11, 2024AI training is a uniquely data-hungry application, and it requires a special data pipeline to keep expensive GPUs fed. This episode of Utilizing Tech focuses on the data platform for machine learning,... featuring Molly Pressley of Hammerspace along with Frederic Van Haren and Stephen Foskett. Nothing is worse than idle hardware, especially when it comes to expensive GPUs intended for ML training. Performance is important, but parallel access and access to multiple systems is just as important. Building an AI training environment requires identifying and eliminating bottlenecks at every layer, but many systems are simply not capable of scaling to the extent required by the largest GPU clusters. But a data pipeline goes way beyond storage: Training requires checkpoints, metadata, and access to different data points. And different models have unique requirements as well. Ultimately, AI applications require a flexible data pipeline not just high-performance storage. Hosts: Stephen Foskett, Organizer of Tech Field Day: https://www.linkedin.com/in/sfoskett/ Frederic Van Haren, CTO and Founder of HighFens, Inc.: https://www.linkedin.com/in/fredericvharen/ Guest: Molly Presley, Head of Global Marketing at Hammerspace: https://www.linkedin.com/in/mollyjpresley/ Follow Gestalt IT and Utilizing Tech Website: https://www.GestaltIT.com/ Utilizing Tech: https://www.UtilizingTech.com/ X/Twitter: https://www.twitter.com/GestaltIT X/Twitter: https://www.twitter.com/UtilizingTech LinkedIn: https://www.linkedin.com/company/Gestalt-IT Tags: #UtilizingAI #AI #AITraining @Hammerspace_Inc @UtilizingTech
Transcript
Discussion (0)
AI training is a uniquely data-hungry application,
and it requires a special data pipeline to keep those expensive GPUs fed.
This episode of Utilizing Tech focuses on the data platform for machine learning,
featuring Molly Presley from Hammerspace, along with Frederick Van Teren and myself.
Welcome to Utilizing Tech, the podcast about emerging technology from Gestalt IT,
part of the Futurum Group.
This season of Utilizing Tech is returning to the topic of artificial intelligence,
where we will explore the practical applications and impact of AI on technological innovations in enterprise IT.
I'm your host, Stephen Foskett, organizer of Tech Field Day and publisher of Gestalt IT.
Joining me this season as my co-host is Mr. Frederick Van Haren. Welcome to the show.
Thanks for having me.
So a lot of companies are starting to look at building their own AI infrastructure.
Certainly the hyperscalers have their own AI infrastructure, but a lot of the big enterprises are as well.
And of course, the gating factor seems to be the incredible cost of GPUs, right? Yeah, definitely. I mean, no doubt people
know that AI is all about GPUs, but there is a lot of other stuff that goes beyond the GPUs, right?
AI is all about data pipelines and data pipelines consists of bringing data to the GPUs through
some kind of connectivity, which could be InfiniBand or Ethernet. But all of these components together make up an AI solution.
So each component has to be at its best in order to generate efficient AI.
I mean, that's sort of the story of IT architecture forever, basically,
is you have to move the pipelines around, or the bottlenecks.
You have to mitigate the bottlenecks to the extent that you are able to
leverage the system to its maximum potential, or at least leverage the expensive components to
their maximum potential, right? Yeah, exactly. Yeah. It's not easy, right? AI is really
kind of a different animal compared to traditional IT, where you buy the pieces separately. You buy
your storage separately, your network separately, and your compute, CPUs, and GPUs separately.
AI is a single environment that works hand in hand. And to make things more difficult,
every workflow is very dynamic. So things change. So you have to have the ability to swap
out some components, upgrade those components while maintaining your efficiency. So in order to learn a little bit more about these data pipelines,
we've invited on as a guest this time, Molly Presley, who is head of global marketing from
Hammerspace, who you and I heard at AI Field Day present on this exact topic, basically making data
pipelines work and feeding the GPU beast. Welcome to the
show. Well, super happy to be here. Thanks, Stephen. Good to see you again. Good to see you,
Frederick. So first off, I guess, tell us a little bit about yourself and let's dive in.
Yeah, certainly. So like you mentioned, I run marketing for Hammerspace, which is a data
environment, data platform company. Certainly we do a lot in AI as that's become such a growing trend,
but we do a lot in serving data pipelines and building data architectures for
supercomputing environments, you know,
visual effects studios environments who are doing a lot of rendering.
So essentially if there's a lot of GPUs being put to work,
Hammerspace is usually sitting next to them to feed's a lot of GPUs being put to work, Hammerspace is usually
sitting next to them to feed the data to those GPUs. Yeah, it's interesting because you're not
a storage company. And let me tell you a little anecdote. I was talking to one of the big enterprise
storage companies the other day about AI. And of course, all the storage companies, what they talk
about when you talk AI, what they say is, oh, well, you need my product to feed the GPUs because my product is the best.
My product is this or that.
And they start talking about speeds and feeds.
And it was funny because then they mentioned that in the grand scheme of things, their product was essentially free because the cost of the rest of the infrastructure was so high that basically they were making easy sales.
And it kind of didn't resonate with me because I was thinking, you know, okay, so on the one hand,
you're saying that you need my product because it's so great. But on the other hand, you're
saying that they're just sort of getting thrown in there. It's not about storage, is it?
Oh, definitely not. I mean, I think that data has to sit somewhere, certainly. And sometimes it's a lot of data.
But the attributes of what it takes to build an AI data environment and know that you're going to be able to checkpoint, know that you're going to be able to stream the files, know that you're going to be able to do the training and always keep those GPUs fed is a lot more complex. You know, there's a lot
more involved in the workload attributes, the utilization of the infrastructure that's there.
So fully utilizing all the networking, all the storage systems, all the available nodes to keep
the GPUs busy. But in the end, yeah, storage isn't the solution to that.
It's just a necessary place to put the zeros and ones, so to speak.
Right. And definitely the complexity of data pipelines indicate that you're not
talking about a single storage solution. You're talking about multiple storage solutions,
which might be next to each other or in a separate data center. What are the challenges
you're seeing and what kind of challenges are you solving
in that area?
Yeah, definitely.
And there's a couple of ways I would look at it.
I'll start with a smaller environment and then talk about one of the really big hyperscale
ones.
In a smaller environment, we had a customer there running on an Isilon.
And Isilon's absolutely a great storage technology.
But then it was being put to work to try to feed 300 GPUs. And this was like a six or seven year old Isilon. And it just
couldn't do it. And you know, maybe if it was a modern one, it might have done better. You know,
it's not it's not a bash against Isilon. It's more of a those existing storage systems were not
designed for GPU computing, they were designed to scale out and add capacity and over time, be able to stream
some of the data back out of it as the application needed it, but they weren't
made for compute intensive environments.
That's really back in those days.
That's where you went to the super computing market and use technologies
like GPFS or Luster to feed big compute environments.
So your point that you made earlier of traditional IT environments just weren't designed for
this is absolutely true, yet they still exist and they still have data.
So in that particular customer's environment, the hyperscale NAS solution, which Hammerspace
talked about at AI Field Day, was leveraged just to plop on top of that environment and
use the existing hardware, but create a high performance
data path to the GPUs. So you take the same hardware, put a very high efficiency data
pipeline on top of it with the hyperscale NAS, and they got a couple of things. Their applications
could still write to the data the way they always have because it was standard enterprise NAS.
It met all their security requirements with data services and whatnot. So the IT team didn't reject it. You always have to get past IT,
of course. But the GPUs where they had 300 at the time that were sitting largely idle because the
storage system couldn't feed them, all of a sudden you put in this hyperscale NAS and they went and
bought 300 more GPUs. And with the same storage and same data set, now all of a sudden are computing with 600 GPUs.
So, you know, that was a smaller environment,
but it just really emphasizes that
you need to have an optimized data environment
for feeding the GPUs.
And that's really what Hammer Space does.
You swing over to the other end of the spectrum
and look at a hyperscale environment
where you have in, you know in a single AI training environment, a large language model training environment, they have 16,000 GPUs and they already owned their network, their storage systems, which are commodity white box storage systems and the GPUs. they needed was the data path technology. So they needed the ability to have a parallel file system
connectivity direct from their Linux clients to their commodity hardware using their Ethernet
network. And that's what Hammerspace provided is kind of that framework to connect everything.
And, you know, hyperscalers love to build their own stuff. But in this case, it was much better
for them to use what Hammerspace has built because it's already been built into Linux.
It already delivers all of their enterprise standard requirements and is able to deliver to these models at scale.
So there's kind of the scale pieces.
And then there's part you're talking about, Frederick, of often there's a lot of different data sources,
different storage systems, localities, things like that.
And that's the other piece Hammer's base addresses is ingesting those
different data sources into a single file system. So the AI model is working with a single data set,
even though it may be residing in different storage systems. So it's definitely just a
different way to think about architectures, like Stephen said, the entire AI environment versus
kind of a piece parted IT environment. Right.
I do see two components, right?
Scale, which is typical AI, right?
I mean, you have more and more data, so you need to process more and more data.
The other thing I see is data gravity, meaning, and you kind of referred to it a little bit,
right?
So people have storage devices, which has petabytes and petabytes of storage.
And it's not easy to kind of saying,
I'm going to buy a new storage device and migrate that data. Is there a way you can help those
customers that kind of have, maybe I wouldn't call them legacy storage devices, but at least
storage devices that are in place and very difficult to replace? Yeah, absolutely. We
call that metadata assimilation. So essentially you put
the Hammerspace hyperscale NAS technology in, and we assimilate the metadata out of the existing
storage systems. That could be an object storage, it could be a scale out NAS, whatever it is that
exists. And that happens almost instantly. So maybe you have two petabytes, but the assimilation of the
metadata takes a minute or two.
And so what you get is you're now working with our metadata and you're instantly accessing and using that environment.
And you can leave the files where it is.
You don't necessarily even have to move the files off the old hardware.
Or if you decide you're going to, that's done as an out-of-band process.
The files can move without it affecting the applications because the applications are connected to the metadata.
We've had customers, and I am a marketing person, but this isn't marketing.
This is what our customers have actually said.
They've said it's like a little bit of black magic what you guys can do with that, that you can migrate the applications to a new data source like that and later deal with the actual
physical move of the files. I think it was actually Los Alamos National Labs who said that.
Gary Greider down there is like, I just can't believe, you know, our big multi-petabyte
environment, you were able to assimilate the metadata that quickly. It's an interesting point
you bring up as far as black magic is concerned.
Do you feel customers understand the needs and the requirements for storage or data pipelines?
Or is it kind of all kind of throwing stuff at the wall and see what sticks?
Well, I think it's hard because everybody's website kind of sounds the same.
Everyone wants to talk about AI. So I think part of the problem is us marketing people and some people, how do you come up with a way to explain to customers what you do and why they need you when everybody's talking about it?
But that said, I think it's also confusing that a lot of the storage system companies want to be a part of this conversation.
And in a small environment, they absolutely can. But
even some of the most prominent, aggressive storage companies talking about AI tend to fall
down when you get to 10, 20, 30 storage nodes. Maybe that's a few hundred GPUs, even a thousand
GPUs. But when you're talking about any of this at scale, none of the existing storage systems
have the capability. And that's any of the scale of. None of the existing storage systems have the capability, and that's, you know,
any of the scale-out NAS or anything out there, the ability to feed these environments. So
I think it's confusing for customers. If you think about the really big ones who have been
building AI environments for a long time, like people who have the supercomputing space or the
hyperscale space, they get it very well. And the enterprises that
are starting to build their own AI environments will probably be looking closely at what those
organizations have done, you know, kind of fast following because the enterprises usually haven't
had this kind of compute workloads in their environments. It is interesting, isn't it?
Because if you look, I mean, Molly, you and I go back in the enterprise storage market for a while. A lot of enterprise storage solutions, they don't really scale out all that much. In fact, most supposed scale out systems, you know, when they talk about scaling out, they're talking about four nodes or eight nodes. And most of them, frankly, aren't scaling out at all. A lot of them are just scaling up and really,
really fast. And that's great, but it's only great to a point. You're right that if you look at the HPC space, it's not about, you know, getting really, really fast for like a benchmark
or something. It's about delivering that at massive scale with parallel access. I think
that's the other thing too, is most enterprise
storage systems are great at having multiple people accessing multiple types of data or multiple
clients accessing multiple types of data, but having multiple clients, many, many clients
accessing the same data. Now that's a different story with performance. I mean, a lot of them can
do it without performance, but to actually have it
combine performance and parallel access and real scale, that's not something you see in a lot of
enterprise storage systems. Yeah, I think that if you kind of take that and work up the stack
in these AI environments, they have to chase performance at every layer of the stack. So getting rid of
controllers, extra network hops, that helps to speed the data pipeline, but it also takes cost
out of it. So most scale-out NAS systems insert six or seven different hops as you go across
controllers, internal networking, to be able to scale out. And so that slows performance. And then also, as you look at
their maximum scale, whether it's four nodes or 20 nodes or whatever they are,
that just doesn't meet the scale of a big AI environment. So there's cost and complexity.
And then that idea of linear scale, where you scale scale for a bit and then all of a sudden
you just kind of curve and then fall off the cliff um scale out nas environments all of them do that
at some point it's just matter of where that point is um in the hyperscaler environment we looked at
or that we're running and i was mentioning earlier um i think the fastest scale out nas out there um
you know the one who's most boisterous about their technology did great up until about 2% of the test plan, and then it fell over.
And so that's fine, you know, for the right size environment.
But as you continue to go up, you might say, okay, well, it's not just about performance, but it's about cost efficiencies.
You know, if I have to add extra networking, if I have to go buy a special kind of flash storage, if I have to buy specialized storage systems, does that work for me?
Maybe, maybe not.
A lot of customers want to use the storage they already have.
They don't have a budget to go buy new storage for AI or a new network for AI or, you know, a new anything for AI.
So there's a lot of considerations on the economies also of
they have preferred storage, they have preferred vendors, and coming in with a new system isn't
necessarily an option. And then you kind of go on to that performance attribute of just the number
of clients or parallel streams. Supercomputing markets solved that a long time ago with
parallel file systems, but most enterprises have never run one.
They may or may not have the networking that's needed for them. They certainly don't have the
skill sets in-house typically. And they're just so specialized that when you're looking at
an enterprise standard deployment, standard server, standard everything, standard networking,
they just don't work. So these organizations are in a pickle trying to
figure out how to use the technologies that we have for this new initiative, and they don't
really work. And the ones maybe that do work, their IT teams don't support. So it's quite a
conundrum. And that's really what Hammerspace and Hyperscale NAS was designed to solve,
is bringing in the capabilities of HPC and then enterprise standard deployment
to help solve this problem.
Yeah, I also do see the difference between physical and logical.
And what I mean by that is the hardware device has a certain physical performance and criteria
and all that stuff.
But if somebody decides to load billions and trillions of files that are 10 bytes long,
that is a big, big problem for your metadata.
So your performance sometimes can be more logical than physical, right?
Where your metadata is becoming the bottleneck as opposed to the physical capabilities of
the device.
Is that something you see too?
Oh, absolutely.
I mentioned Gary earlier and didn't necessarily plan to talk about him on this call. But when you think about what Gary Greider says
about their national lab down there in Los Alamos is they have no less than seven different types
of file systems because of the different workload needs. There's the metadata intensive ones,
there's the streaming intensive, there's the distribution file systems. And it's just for the reason you're saying, Frederick, that
different workloads tend to need different types of file systems, and then you still have home
directories. And now AI is a whole nother thing, which is pretty metadata intensive, typically.
And you have checkpointing. So yes, absolutely. Looking at how do you build your data pipeline and, you know,
the HammerSpace technology, this is what it was built for, is those mixed type of workloads
and mixed type of demands and raising that above the storage system. So you can address those
demands above the storage system, then separate from that, you make the decision on do I want NVMe or object storage or QLC or blob, whatever it is.
So, yes, you absolutely need to think about what are the attributes of your workload and your AI environment.
If you're training a large language model versus doing Gen AI, it's different.
And having a data pipeline designed to meet those workload needs and make sure you can checkpoint and do the things you need to is super important.
And then separate from that, ideally make the decision on which networks
or which storage you're going to run on based on what your IT team wants to do.
That's a good introduction to my next question is when we talk about generative AI,
do you see different needs, different behavior and different requirements
compared to like more traditional AI from a few years ago? I do. And from the conversations that
I'm a part of, and I think generally that HammerSpace is a part of, it has a lot to do
with which model they're going to use and where it's going to run. So there's a lot of complexity
of are people going to try to build their own
models? Or are they going to try to use somebody else's models and train it with their own data?
Are they going to try to build it in their own data center or do it up in the cloud or in
Snowflake or in Databricks? So a lot of the decisions come down to which model are they
going to use? And then there's the complexity of how on earth do I get my data into that model?
Right? You know, it may or may get my data into that model, right?
You know, it may or may not be local to the model, and then they have to figure out
how to move it physically, you know, going back to how do you get the data to the model, but also
audit it, know who's loading which data where, and do you even want that data in the models,
and those kinds of decisions. I'm curious what your thoughts are there, Frederick, but that's
usually where we're hearing the conversations
around Gen AI is just around model selection
and where it's running.
Yeah, definitely.
I mean, I think generative AI is much closer
to people building applications, right?
If you look a little bit at the data pipelines
in traditional AI, you own the data
and you do the training, you do the inferencing,
you kind of do everything internally, and then you expose some kind of an application.
I think with generative AI, where most of the large language models are actually built by a
handful of organizations, right? So the training mostly is done by those larger organizations.
And people are focusing more and more to bring their own data
into the large language model
and then develop applications based on that.
So I do see the trends being slightly different
in the sense that I see more and more organizations
working more on the inference side with slightly less training,
while the larger organizations
that are delivering the large language models,
they are heavily focused on the 10,000, 20,000 GPUs and the petabytes of data.
So I do see that trend growing.
But the good piece about this is that it allows many more organizations to use and consume
AI than before.
Yeah.
And I think that you mentioned something about data gravity earlier, and it kind of comes
back to this conversation too, that organizations are trying to figure out, do they have to
design that their data is local to their model?
You know, do they have to build their entire environment in their data center or do they need to move everything to the cloud
or is there kind of a middle ground
where you can say, I want to use Mosaic
and I have all my data in the data center
and how do I do that?
You know, there's a lot of pieces there
of by overcoming data gravity,
by, you know, using local metadata to get performance
and only moving the files you need to
and things like that,
you know, data orchestration and pipeline orchestration is what I'm talking about,
is a really big deal here too. How do I orchestrate my pipeline and use models I want
and let my data sit where it is, is a really important conversation that I think a lot of
people are trying to figure out. We partner really heavily with Snowflake and increasingly with
Databricks because, you know, you look at those environments, they're great analytical environments, whether
it's AI or not. And they have different strategies. You know, Snowflake wants the data in Snowflake
Cloud. Databricks wants to reach out into the data center and orchestrating those pipelines
and delivering the performance where needed is super strategic to these organizations and a big
part of the decision criteria, you know, as HammerSpace is involved in these organizational conversations.
Well, since you are involved with these, that was kind of what I was going to ask is,
what are you seeing customers doing in the real world to answer these questions? So you mentioned
data gravity. Certainly, that's an important aspect when you're talking about cloud to on-prem to service provider.
But there's also the question of, I guess, making copies of data in order to have it be part of the data set.
Is there a desire to access data sources in their native locations and their native formats and, you know,
accessing different systems when training? Or is
it something where they basically take a checkpoint, pull in all the data, massage the data,
and then feed it all as sort of a coherent or consistent whole? You know, what's really
happening here in terms of data pipelines? My experience there is there's a drive to simplifying ETL, you know, that extracting,
you know, making a copy of the data, doing stuff with it and putting it back in, you
know, is inefficient.
It certainly causes risks of which copy is the gold copy of data.
You know, there's a lot of problems with it.
You know, it's just hard, you know, for a data scientist to move quickly. So the idea of unifying data sets,
having shared metadata where updates are kind of done within the metadata and you're not doing
copies, you know, copies are kind of the bane of any of these organizations, you know, because it's
expensive to maintain copies, you know, just because you have to store multiple copies, but also how do you figure out which is the gold data? So eliminating
copies and being able to do data science, data analytics, and training without data copies is
really what we see organizations trying to figure out how to do. You know, if it's at a small scale,
letting data scientists continue to do things the way they've always done is totally fine. But at this larger
scale, they're trying to simplify and take some steps out of those processes.
One of the things I do see as well is organizations looking at a higher return on investment.
And they do that by having an on-premises baseline infrastructure,
and then using one or more public clouds for peak capacity, right? So their challenge then
is time to kind of get the data into the public cloud and bringing any data back.
Is that something you're seeing too? Oh, totally. You know, there's the
whole, you know, it's almost a religious debate
or a political debate, probably not quite as charged as either of those in these days. But
do you move your data to your compute or your compute to your data? You know, customers,
that's just a basic architectural thing people have to think about. And, you know, it becomes
not a conversation if you don't own enough compute anyway. Maybe your supply chain constraint can't
buy GPUs because all the hyperscalers are consuming them, or maybe you don't have enough
money to buy all the GPUs you want for a specific job, whatever the reason is. Most organizations
are finding they really strongly desire to move their data to the compute so they can have the
flexibility of where the compute is, but they need the tools to do that, which requires data orchestration and pipeline orchestration, which is, you know, we, I wrote the book unstructured data orchestration for dummies, you know, so of course, this is the thing I'm passionate about, but orchestrating the data to the computer applications that are using it strategically is optimal. It's just, you know, not all companies, IT teams are used to setting up
their storage systems that way. The storage has owned the data because the file system is in the
storage and is not aware of anything outside of it. It can't run multiple, you know, the storage
system is kind of like a vault. And so orchestration opens up the ability to say, okay, let me let my data freely flow
to the compute environment.
So it's just a different way of thinking
about architecting data pipelines
and data and file systems
where the storage system doesn't own the data,
but instead it's orchestrated and placed
in whatever location
or next to whichever computer application needs it.
It's a huge shift in the thinking of legacy architectures, but it's how AI people think.
Yeah, it really is a huge shift. And I think that that's really fundamental to the challenge here,
not just for AI, but for basically every workload is, you know, the way that things have been done
is not always the way that things should be done. It's just the way that things have been done.
And so, you know, you look at how these systems are traditionally architected.
You're completely right.
I mean, it's always been about keeping the data, you know, in the storage, you know, having the storage system sort of own the data.
And that just doesn't reflect how things are going to be going with this modern world.
And, you know, it's completely true.
I think people are looking at leveraging the capabilities of service providers with the,
you know, GPU scarcity and the cost of GPUs to be able to access those.
You know, the things that you're saying, that would make all of that practical, right?
So somebody would be able to basically use burst capacity in a cloud or something like that.
That works.
Yeah, absolutely.
And it's done all the time.
I think you think about the storage systems owning the data and that paradigm kind of started to fracture a bit with hybrid cloud.
You know, but the driving factor was mostly around cost savings. Like I'd rather use the cloud
because I might need a few less IT resources or I can use Glacier and it's super cheap.
And it wasn't strategic enough to force an organizational redesign. AI is. AI is like,
I'm going to grow my business and make a whole, you know, millions and millions of dollars more
for my company because of AI. So it's, it's just more, it's forcing the issue more of, I need to be able to move my data and I'm going to
re-architect from legacy to something that helps me to be able to move my data to my
compute and applications and not have the storage systems own it. It's just, it's a
forcing function to me, but yeah, we move data around all the time. Our company's tagline is data in motion because we've raised the file system out of the storage so that the data can sit
in a global file system and be placed anywhere you want. A remote user's office, a cloud computing
region in London, a cloud computing region in Vancouver because it's cheaper compute and cheaper
power in a new data center, you know,
where use all my GPU compute until I've used it all and then burst the rest of the cloud.
That's kind of the sweet spot of where Hammerspace sits. And in AI, it's all around
orchestrating the pipeline so that the models know, have access to the data. And then if you
need to do some burst movement, we take care of
that, knowing which files are rendering where, and then keeping that all as a single data set
for the applications and users. Applications and users don't know that's happening.
Right. One of the things I do see as well is that when people design AI and they don't have
a background AI, they have a tendency to design for a fire and
forget, meaning kind of a static environment, while in reality, there are so many moving parts
that they don't have the ability to create and continue their building their efficient models.
Is that one of the areas where you can help as well, you know, help them kind of build on that efficiency in their pipelines.
Oh, absolutely.
The concept of designing for flexibility where you don't actually know what the results are or which data you might want to ingest later is part of it.
But also, most organizations are so early in their AI programs, they don't know which models or what their environment is going to look like
six months, much less two or three years from now. So being able to design for flexibility where
you have a single source of truth, which is a global file system, which runs across all of your
compute and storage environments, a single metadata layer. And then when you decide you
want to make changes, you want
to try something different with your workload. It has the performance attributes to meet that change
in the workload, or you want to use another data source you didn't before. You don't have to
re-architect anything. You don't have to reconnect anything. You're just working with a single source
of truth. And that source of truth of the data set, which is the global file system,
can place the data wherever you want it with data orchestration. So it can place the data.
Okay, now I want to try out Snowflake. I wasn't using them before. Okay, now I actually want to
use a part of Lama 2 or Lama 3. I wasn't before. Okay, no problem. Just place the data where it is,
but you're still just working with a single file system. Yeah. I think that what you're describing sounds like a dream to a lot of
people and maybe even some of the people that are, are implementing machine learning and training,
and they might be listening to this saying, man, I wish I could learn how to do that.
I would tell those people, check out the AI Field Day presentation from Hammerspace.
You just use your friendly search engine and look up Hammerspace and AI Field Day, and you'll find the recent presentation about this.
Before we go, Molly, where can people continue the conversation with you and learn more about Hammerspace?
Yeah, I think the AI Field day is a great place to start. A couple
of the folks who are deep in architecting this were on there, and we could certainly connect
to you to either of those gentlemen if you'd like to. But if you come to our website, there's a lot
of resources on our artificial intelligence solutions pages on unifying data pipelines,
automating data orchestration, things that we've talked about today. But if you want to
go deeper, there is an AI white paper you can find on our website. And that goes into really how,
if you're an IT person going, what the heck this doesn't sound like is possible, or, you know,
how would I do this with my systems or my environment? There's a white paper out there.
There's only one on artificial intelligence on our website that will kind of go in the details of how this is actually possible and what it would look like in your environment.
And of course, we'd love to talk to you about it directly.
Just hit the contact us on our website.
And Frederick, thanks again for co-hosting.
Well, thanks for having me again.
If you've enjoyed this podcast, please do give us a rating or review.
You'll find us in your favorite podcast applications.
Again, you'll find us as Utilizing Tech.
That's the name of the podcast series, though this season we're calling it Utilizing AI because that's what we're focused on here.
This podcast is brought to you by GestaltIT.com, your home for IT coverage from across the enterprise and by the Futurum Group.
For show notes and more episodes, head over to our dedicated website, utilizingtech.com
or find us on X Twitter and Mastodon at Utilizing Tech.
Thanks for listening and we will see you next Monday.