Grey Beards on Systems - 119: GreyBeards talk distributed cloud file systems with Glen Shok, VP Alliances,Panzura
Episode Date: May 11, 2021This month we turn to distributed (cloud) filesystems as we talk with Glen Shok (@gshok), VP of Alliances for Panzura. Panzura uses backend (cloud or onprem, S3 compatible) object store with a ring of... software (VMs) or hardware (appliance) gateways that provides caching for local files as well as managing and maintaining metadata which creates ā¦ Continue reading "119: GreyBeards talk distributed cloud file systems with Glen Shok, VP Alliances,Panzura"
Transcript
Discussion (0)
Hey everybody, Ray Lucchese here with Keith Townsend.
Welcome to another sponsored episode of the Greybeards on Storage podcast,
a show where we get Greybeards bloggers together with storage assistant vendors
to discuss upcoming products, technologies, VP of Alliances from Panzura.
So Glenn, why don't you tell us a little bit about yourself and what Panzura is all about?
Thanks. Happy to be here. Again, Glenn Schock, VP of Alliances over at Panzura.
I've been in the storage industry for 20 plus years at various companies.
My kind of gray beard kind of guy.
Exactly.
EMC and Oracle and a few other stops and startups here and there.
And now I've joined Panzura a little under a year ago.
And it's been an amazing ride, this company.
Panzura itself is a next-generation NAS company
that kind of decouples the filer head from the backend storage.
So we install on-premises as well as in any of the clouds.
We're in all of the marketplaces.
And we use any S3-compatible backend Object Store is our storage device, which means that we can use
GCS over Google or Azure Blog or obviously Amazon S3. And we're also compatible with some
on-premises hardware like IBM Cause and Cloudian. So what that enables you to have is a global file
system in any location that you have, as well as in the cloud.
And when you make it available in the cloud, you make your unstructured data available to all the cloud native applications.
So you get all of that capability on your company's unstructured data set.
Yeah, it brings up a whole slew of questions.
Yeah, I'm sure. So if it's sitting on S3, for instance, and I'm accessing it,
do I access it with, when I'm in the cloud, do I access it through Panzera or do I access it
directly through S, you know, just through Amazon S3 services? Obviously when I'm sitting in on-prem
and I'm trying to access the data, I'm accessing it through Panzera.
And that would be an appliance sitting on prem.
And it would be one appliance or multiple and then high availability types of things.
Oh, wow. I didn't bring a pen and paper for the questions.
I'll try to remember them.
Okay.
So here we go.
So we do install in a VM on-premises and in the cloud.
So we install via Hyper-V or VMDK,
so VMware or Microsoft,
depending on your favorite hypervisor choice,
onto standard HCI gear or server, whatever.
We need CPU, memory,
and some local disk resource on-premises.
In the cloud, we also install in a VM of sorts,
depending on the shape that you want to configure us for based on the workload.
So imagine if you're migrating applications into the cloud.
One of the hardest things to do is take one application and move it to the cloud from your data center because you don't know what dependencies that application has on other applications in the data center. And when I worked at Oracle and our customers started moving to Oracle Cloud,
one of the biggest problems was they would move like JDE
and then forget that JDE was dependent on 30 applications
that are still up in a data center.
High latency, call that a failed migration,
move back into the data center and said cloud sucked, right?
Well, the great thing about Panzura
is if you're running the global file system on-premises in your data centers, as well as in a VM in the cloud, you move one application over.
That application can still access via SMB or NFS the entire global file system.
And the way that our filer works is it does local caching, intelligent prefetching, block delta moves.
So every IO
feels local to that app that's sitting in the cloud. And that's how our customers are using us,
or one of the ways our customers are using us, is slow migrations into the cloud and making all
their data set available in all locations. It also enables cloud native folks to go multi-cloud and have the same data set available.
Yeah.
So I'm sitting here, I've got an on-prem NAS filer,
old traditional ancient iron kind of thing.
And I decided I want to move to the cloud.
I start, I fire up this Panzera VM effectively and then connect it to some cloud storage
someplace in the world.
And then I start moving files from the old
filer to the new filer. And all of a sudden I'm cloud native. And all of a sudden you can be cloud
native, right? So let's say that, and that's our primary target is NetApp and Isilon basically
refreshes. So let's say you have an old NetApp and you want, you know, your CIO or whatever says, okay, we want to start using the cloud and we want to get out of the data center business. So let's say you have an old net app and you want, you know, your CIO or whatever says,
okay, we want to start using the cloud and we want to get out of the data center business. So
what you do is you put a Panzera filer on-premises, run it again in a VMDK.
We help you, our professional services organization will help you migrate off of that net app
onto Panzera. Let's say that your favorite cloud provider is AWS. So your backend object store will
be S3 sitting, let's say you call it US East. And the great thing about that and using that object
store on the backend is that it's three-way replicated. You can also replicate one outside
the region in case the entire region goes down. So you have basically inherent off-siting,
you have inherent remote replication.
And now let's say that you want to,
and this is the fun part, use Google for analytics.
You can actually install Panzura in a VM in Google,
connect it to the S3 object store,
if that's what you wanted to do,
and run analytics on that data set, right?
Using and have cloud native applications such as cloud functions or do an ingest from unstructured
into structured into BigQuery. And that's the joy of this is it kind of democratizes your data.
It makes your data available to all the clouds, to all the cloud native apps.
You know, S3 is interesting and all that, but it costs.
It doesn't cost to ingest, but it costs to egress.
So is there something that you guys do to try to minimize the egress cost for some of those data?
So, you know, I'm reading and writing data on-prem.
So how does that play out, I guess? Good question. So we have really intelligent block moves on the back end, and we cache locally. So as you know, you've been in the storage
industry for a minute or two. Most NAS data, right? So if you're talking about 100% of the NAS data companies built up
over 20 years, 95% of that is old, right? It's not moving around very much. At the edges,
you got kind of like 5% hot, right? Everybody uses the same data set, the hot data at the edge.
And we cash at the edge. So depending on the workload our customers have, we cash at the edge between half a percent to a percent and a half of data. So when I came
on board, that was actually one of my questions is what are the egress fees look like here?
And they're amazingly small because again, if you have five sites installed, you have five
caches at the edge, all the data at the edge is what's kind of local.
And we only do block Delta moves. So let's say that I have a PowerPoint file that I uploaded in the global file system in San Francisco. Somebody opens it up for New York and edits it
for a minute. And then now I want to open it up in San Francisco again for read and write.
The only thing I'm going to open up and only thing I'm going to take out of the cloud at all is maybe just whatever
edits they made. And in that case, the only time that would actually happen is when the edits were
large enough not to be moved with the lock. So our Panzer filers communicate with each other out of
band, right, just over the network. And they request locks from one another.
And if, let's say, I only made 8K worth of changes, let's say a graphic change in New York,
it's going to actually move that graphic change out of band.
So it's the data that the Panzer...
What? What? What?
Right.
What do you mean you're moving the data out of band?
Well, it commits it to the object store.
So the commit happened to the back end object
store but because you're only like 8k away from like the whole file it's going to say hey here's
the lock and here's the difference between my version and your version and that's it now you're
up to date and you have the current version of the file what's also awesome is all of our backend is immutable, right? So all
of our rights are additive, even deletes are additive information to the backend object store.
So we're inherently immune to ransomware, right? So anytime you change a file, it's just more and
more information. So if ransomware hits your company, all that means to us is new encrypted files.
Yeah, it's just new encrypted files, which our system can detect and start shutting down
the devices that are causing that and say, hey, this is crazy.
And tell the administrator, maybe you guys should take a minute.
But more importantly, you can revert to the previous version of that file before it was
encrypted.
So we're very proud to say that while a bunch of our customers have been hit with ransom, you can revert to the previous version of that file before it was encrypted.
So we're very proud to say that while a bunch of our customers have been hit with ransomware,
none of them have paid a ransom on data sitting on a Panzera file there. They've all been able to revert and recover painlessly.
So it's almost like continuous data protection.
You're versioning every block update to a file on a backend of
Panzura? Yeah. So yeah, which is why we separate the metadata from the data and the filers and
constant communication with one another. We do back up the metadata, obviously, to the backend
object store. So any filer can recover itself, um, without any issues, but yeah, everything is a constant
change and update.
Um, and then we also have a capability of secure race.
So let's say your company has a policy that after seven years, um, data gets deleted,
right.
Um, just for legal hold or whatever, uh, we can basically remove all the snaps and remove
all the, everything, update everything for remove all the snaps and remove all the everything,
update everything for up to seven years, and then erase everything prior to that.
So that we keep the retention time that the company wants to set.
So let's take a step back.
The, as this, this use case of kind of transitioning off of a traditional SAN or HCI stack or whatever model
you had inside of your data center to a more virtual instance. Obviously, the dynamics between
a EMC VNX with a 10-gig connection, SSDs, blah, blah, blah. Going from that to a VMDK is going to be a different profile in performance and accessibility,
et cetera.
What is the customer experience in making that transition?
Right.
So our competition, we're not SAN, right?
So we're not Block, we're NAS. So,
but it's the same case, right? You know, a massive net app or a massive Isilon
performance wise compared to us. So that's a completely legit question. And again, what we're
finding is technology has allowed the company to catch up. I think Panzura is something like 12 years old as a company.
And they were honestly too early to a market
because the performance of local devices wasn't there.
The network performance wasn't there
and the cloud wasn't there.
But the great thing about Panzura today is-
It's all come back.
Yeah, everything is there, right?
Like a one gig line to the internet
costs you almost nothing, as well as really high performance VMs, right? I mean, you can buy some
Nutanix gear that blows away the front end of a NetApp or an Isilon because they're making
specialized hardware to do NAS,
whereas we're using general purpose machinery
that has crazy fast processors, really quick memory.
And then we also use NVMe locally as our cache.
And we use that as a cache for our metadata,
which makes our metadata operations quite fast.
And we also use it as a cache for the hot data
that's sitting at the edge.
So we're not going, let's say that you're using a file
at the edge, you know, on-premises somewhere.
We're not going constantly back to the object store
that could be sitting a thousand miles away.
You're doing the majority of your IO
at the cached edge of the appliance.
And then we only ship Deltas to commit.
We only ship Deltas to the backend.
And then to extract data, we only ship Deltas.
We have this neat collaboration feature
that allows you to be a byte range locking.
So for Adobe Premiere, Revit, Civil 3 AutoCAD, all that,
you can have the same file open in multiple locations
for collaboration, for read and for write,
which makes us really popular in the AEC
or architecture, engineering, construction space,
as well as M&E, media and entertainment space.
But we don't have to-
By the way, is locking on files?
Yes.
And the thing is, this is supported in the application.
So we have to support it in our file system.
But Adobe Premiere specifically supports it.
Revit, Civil 3D, all these have AutoCAD.
They all specifically support having multiple writers to the same file as long as it's in a different byte range.
So you only lock a specific byte range of the file. I can easily, you know, I'm a content creator, so I can easily
see the use case for this if I'm in Final Cut.
This is a serious problem for me as a
video editor and creator. If we're now in the virtual world
and if I'm working on a project and it's
a, let's say it's an hour long video and I want to do, you know, I want to edit the introduction and then somebody else is editing the talking head bits of it.
And then yet a third person is editing the outro or something like that.
I can't, when I'm not done, you can't do it today.
Well, no, I can't do it remotely. I can do it if we're all on the same local network and we're hitting the same NAS.
Those Final Cut and Adobe and all that supports that.
But doing it over a wide area network, that's where it starts to get interesting and complicated because we're not connected.
You know, you're not going to try and do this over VPN for the most part.
So there's the latency part of it, but there's also the interesting point that this is backed
in by S3, which isn't, or object storage, which isn't block addressable.
And it's big objects.
And we're doing that translation at this VMDK level.
It's really, really interesting technology.
Yeah.
And we have customers jointly with WorkSpot,
the VDI company that do exactly that live, right?
So we put ourselves,
we put a Panzer Refiler at the same data center
as the WorkSpot VDI and the company.
Everybody goes in through VDI and all across the world and simultaneously edits the same files.
So you don't have to have a powerful PC in front of you or a Mac.
You're basically going in through VDI using the power of their machines and then using the power of Panzura that is in various data centers and editing the same files in real time.
And that's through for the AEC space.
So there's significant challenges in doing this sort of stuff with object storage.
I mean, and Keith mentioned one that, you know, it's number one, it's not bite addressable, right? You got this huge multi-gigabyte or multi, well, yeah, maybe 50 or 60 gigabytes of video.
And, you know, got four or five people trying to access bits and parts of it and editing it.
So there's that aspect of it.
Here's the eventual consistency thing with S3.
So, yeah, as you probably know, eventual consistency says that if I do a write and somebody
else does a write to the same object, it's not necessarily going to work very well because it's
only eventually consistent. Right. But our filers take care of exactly that. First of all, you're
not doing object IO, right? You're doing either SMB or NFS nfs i understand that but there's object io behind it right
yes but there's caching in front of it and we only ship um we do 32 times 128 kio to the back end so
we're caching it um and we're caching it on nvmes which is non-volatile um so that you know in case
downtime or whatever happens but we're caching it and then committing.
And when we're doing simultaneous byte range locking,
you can see the IO between all of our Panzer refilers
gets really, really chatty, right?
So our Panzer refilers are talking to each other
about who's got what byte range in real time.
And they're not going to unlock that byte range
until we are 100%
positive that the commit's done and the lock is released. Our locking is very conservative.
That's one of the things that I tested really a lot before joining the company, because again,
storage 20 years, I've watched a bunch of people try to create global file systems,
and they've all failed. They've all failed because of data corruption. They tried to balance super high performance
with global file system and it doesn't balance out really well. One of the great things about
Panzura is the locking is even at the collaboration side is very conservative, right? There is a
strict communication between the filers. A lock
is not released until all the commits are done. And we know that it is releasable. So that's why,
again, we're so popular in that space for collaboration specifically because of the
conservative locking. We also have massive customers. We actually just signed as of May 1st,
we're going live with an MSP in New Zealand. New Zealand's telecom CCL is using us for global file
system as a service. All the New Zealand government will probably be on us, slowly migrating off of
their previous service. And the kind of testing that they did out there
was ridiculous.
Third party penetration tests, exactly.
Can, let's force a corruption,
let's do a split brain scenario.
Let's do everything we can do to kind of force a corruption
in a global file system.
And I mean, trust me, this has taken seven or eight months,
but we passed everything.
And we're finally going live.
Yeah, so talk to me a little bit.
So there's lots of, I'll call it, out-of-band communication going on.
You mentioned locking.
You even mentioned that block updates would actually be transmitted out of band. Yeah, you're committing it to the cloud,
but you're actually doing some,
it's almost cache to cache coherency going on
between Panzero VMDKs or something, right?
Yeah, at the metadata level.
It's exactly right.
We always say that we separate the metadata from the data.
So our filers talk metadata to one another
and they commit data to the backend object stores.
So, and then we back kind of back up our metadata to the backend object stores as well
in real time. But the metadata communication happens within what we call our filer ring
so that we ensure that every filer is kept constantly consistent. There is no one master
filer that has all of the metadata because that
would basically create a single point of failure, even if it was a backup. A lot of our competition
does that in the cloud. I'm not going to name them, but they basically say, okay, you have to
have connectivity to the cloud to talk to this master lock server. And that's the server that
has all the information. With us, all of our filers in
real time are constantly updated for metadata. And that's because WAN connections are cheap,
right? We're not doing egress from the cloud. We're just doing filer to filer communication
over WAN links that are very inexpensive right now. So let's talk about one of the more practical problems that are less technical, but technology
helps to ease some of the transition, which is namespaces.
It's great to have a global namespace, but this is a problem that I've tried to tackle
the past 15 years of my career in large enterprises. And it is not a, it's not, I mean,
going back to the beginning of Active Directory and being able to have a single directory space
in Active Directory, you're doing it via simple SMBs sharing. Not a new concept. Everyone wants
it. But for obvious reasons, we went earlier in this podcast, it's really difficult from a
metadata, SMB, blah, blah, blah. One of the other challenges that I've run into when trying to
implement global namespace, regardless of the technology, is adoption. Like we put the global
namespace out there to be consumed and developers, and we'll use the term developer
generically because it's not just developers, it's end users too. Developers simply don't use them.
They just revert to the what have you seen has been either a carrot and a stick or a great
motivator to get folks to actually adopt Panzura so that, you know, you guys get the follow-on
contract and license.
So they continue to use the old share name, for instance?
Is that what you're saying?
Yeah.
Yeah, simply use the old share name.
Something as simple as getting someone to change a share name in an Excel spreadsheet.
I mean, this is real work.
I mean, the business is ran on Excel.
Yeah, he's not wrong.
Our number one competition is human nature.
And people don't like to change.
So we always say like, well, we have technical competition out there from a data sheet standpoint.
And all the traditional vendors, NAS vendors, as well as some of the new ones that are coming up.
But none of that's really our competition.
Our competition is human nature, right?
And their willingness to change.
And what we're finding is some of our customers are bottoms up
and some of our customers are top down.
And you just have to, depends on who you're talking to.
So if you're talking at the C level, it's cost, right?
I have, and I'm a math guy, I built a TCO calculator that we have
that shows that after three sites, we're going to be about a third. So basically, you're replacing
three net ups, let's say, at three different sites, we're going to be a third the total cost
of ownership of traditional NAS. Once you get to like five to 10 sites, we're a quarter to a fifth
the total cost of ownership.
And that's because of global deduplication, encryption, dedupe, you know, you're basically
centralizing everything. You don't have to do backups because everything is immutable
and it's inherently off-sited and it's inherently three-way replicated. And now we support cloud
mirroring, which if you don't trust Google, you can mirror to AWS or the other way around
or to Microsoft.
So that if the entire cloud vendor goes down,
which has happened,
our filers will vote and automatically revert
to the secondary object store.
So that's a top-down approach of like,
look, you're paying way too much
for using technology that's 20 years old. But then there's always the early adopters, right,
at the bottom that say, wow, we've tested this thing and it works. And yeah, we're going to have
to do some process changes, but, and this is appealing to the techie really, once we show them that you can also
put a VM into AWS or in Google and then have cloud native services attack the data. And I'm not going
to mention a customer, but like, think of like one of the stodgiest organizations ever,
and they have a ton of security data
and they had to do COVID tracing.
Federal government, for instance.
Let's call it a stodgy organization.
And they have to do COVID tracing.
And all the security cameras go, of course,
to the local NAS, as they always have,
and it's stuck there.
And then they had to the local NAS, as they always have. And it's stuck there. And then they
had to figure out, do facial recognition, which there's no way they could do it locally. And then
track if two people crossed paths. And everything would have had to have been implemented locally,
right? They were talking about implementing massive amounts of hardware at every location
to be able to do something like this.
And by the way, buy very expensive software. You know who has facial recognition?
Otherwise known as computer vision? Every cloud vendor. It's available. It's pre-written.
Wouldn't it be nice if you just put a VM in the cloud, make all that unstructured data available
from all of your sites to one cloud vendor who can
then do computer vision on all of your cameras with no hardware investment, with no commitment,
with no contract, shut it down at any time, and only bring it up when you're actually worried,
right? When you say, oh, well, maybe these two people cross paths in this location.
Let's check. That's when you bring up that entire environment. So you're only paying for that environment when you need it.
That's the game changer at the techie level, right? They go, wow, we don't have to buy anything. We
can rent it and only do it when we need to. And all we have to do is move off of our stodgy old
stuff onto this new stuff, which if we're only receiving video, it's not impactful, right?
They're not changing their entire environment. They're just doing one thing. And then they start looking at it for other things.
They go, wow, this thing worked. Can we do this over here? And maybe that over there?
And then they upsell themselves. So it's just a different angle of approach, depending on who
you're talking to. And the techies don't care about cost, right? They just care about, does
it work? Can I solve the problem?
And the costs are, you know, above their heads.
Glenn, you mentioned a couple of things
in that little spiel.
One was cloud mirroring and voting filers.
So if I understand what you're saying,
I could mirror my, let's say S3 object store
that's behind Panzera to, you know,
Azure blob storage or Google storage. And then I would point my filers to
both mirrors, let's say. And then if one happens to go down, they would automatically, number one,
they're committing to both all the time. And number two, if one mirror went down, they'd go
off to the other one automatically. Correct. So the cloud mirroring is basically
a right splitting technology
at every one of the filers.
So yeah, yeah, yeah, I know.
I'm explaining it to your audience.
So like it's a right splitting technology
at the filer.
So yeah, it's doing commits
to both backend object stores simultaneously.
And then we don't want to split brain scenario again.
So all the filers basically have to agree that they can't get to, or the majority of
the filers have to agree that they cannot get to the primary object store, but they
can all get to the secondary one.
And then the file system basically is now reading and writing from the second one or
to the second one until the primary goes down.
I'm sorry, it goes back up.
If the primary goes back up, then basically that's when the administrator has to come in and say,
we want to revert back to the primary, make sure that it's updated, and then everybody move over.
So the fail back has to be, the fail over can be automatic. The fail back is much more
manual. It needs to be, right? Because somebody needs to make that decision. They just can't
willy nilly go back and forth. And there's data that has to be transferred to update the old
storage and stuff like that. So there's work to be done here. It's nice if it's automated,
but it has to be triggered through some sort of operator intervention of some type.
Yeah, I mean, the filers will, once they can all reach the old primary, they will update it.
They will move the deltas.
But then that failed back.
You know, there has to be some intelligence there.
Our filers are intelligent, but at some point you can't trust technology.
And you also mentioned that there was no need for backups.
I'm not sure I'd agree completely with that. I understand, you know, Tripoli redone it with one
out of region version and then having cloud mirroring across that, you know, have almost
five different copies of the data. But the challenge is, you know, if the system dies or
there's a system bug or something like that, or somehow, you know, God forbid,
something happens to securely, you know, infect your filer or something like that.
There's still a need for backup here.
Wouldn't you agree?
That is a bridge too far for some people.
And so, again, we are entirely immutable.
So you can't change anything.
But your immutability is dependent upon your system structure and system logic and all that stuff.
And your system is corrupted. It's potentially possible Cloudy and on-premises as well as our filer on-premises. And so there's no
off-siting and there's, you know, it's immutable, but it's not off-sited. So they do backups for
off-siting. New customers that come on board, they understand it, but they don't believe it.
And then they do their backups, they go through the machinations,
and they find out that they're never using recovery from backups.
Because we integrate like VSS, you can kind of right click, go on properties,
choose your version of the file that you want to recover.
Or from the admin panel, you can basically recreate the file system someplace else in another bucket if you wanted to.
So that they find that like, okay, we're doing this.
We're doing the old way, but we are never using it.
At no point in time do our customers say, oh, we need to recover from this backup.
So then they go, okay, we're going to stop this stupid
incremental forever thing. And now we still don't trust you guys. So we'll do a backup,
a full backup once a month and just offsite that. And then they go, well, maybe not once every six
months because we're never using this stuff and we have to do it. Right. So the, the, the trust has to be earned. Um, and yeah, I'm never going to argue with anybody who's
like, no, you must stop your back. You can do whatever you want. It's America. But, um,
over time they find that like, okay, this is a waste of time and money. So let's back off. Um,
but yeah, absolutely. Um, yeah, a lot of customers still do at least a once a year or once every month or once every six months.
They'll like push something off site.
I think that the problem or the mind shift is that the back end is very different. And I have the advantage of talking to Panzura the past few years as either an analyst
or a potential customer. Some of the questions that I've asked around this topic is just that.
So, you know, when you're thinking about backing up object storage, because at the end of the day,
the two components you need to back up is the object storage and the metadata system that does the translation
from object to SMB or NFS or the NAS.
And if I have, so I have to ask myself the question, what is my backup solution for object,
period?
Because I'm not necessarily backing up the NAS file system in itself.
I'm backing up object.
And that is a very different approach than what we do to NAS storage on-prem.
So the backup policy is very similar to what I would do in AWS.
I might do site-to-site replication, et cetera, et cetera.
But pure, you know, like going to tape or going to some archive medium,
that becomes a questionable aspect.
The one, because I do have version control over the metadata, et cetera, et cetera.
So I have the tools in my cloud platform to do versioning of the metadata,
which is essentially what I was doing in my backup solution on prep.
Yeah. Listen, you know, I'm an old kind of guy. I have Time Machine on the Mac.
I've used it for the last decade or so, and I've never gone to any of my backups.
But on top of that, I do a weekly backup to a separate disk.
I do a monthly backup, which I store in a safety deposit box.
Have I ever had to access any of these?
No.
But, you know, I just, it's just,
you can't tell me,
you can't tell me it's not worthwhile to backup.
I do the same thing.
The only reason I have my airport extreme
on my network,
because I have Eero now,
is that I have, because of Time Machine,
that I'm like,
I have that thing on a defibrillator, right? And
it's just, it's amazing. It works great. I do everything that you just said. I often, and then
I do a one time a year, I just back it up because that's all my family photos and stuff. And I keep,
and I keep a gun under my pillow. So it's just like, I'm with you, right? It's just like,
all of that is true, which is why I never argue with that. I was like, look, this is what the technology gives you.
This is what you can do, but you have to get to it yourself.
Granted, a lot of our TCO savings come from you don't have to do backups,
or at least you can step back or choose a different technology for doing backups
that's less expensive.
Back us up the time machine.
I don't care.
But you don't have to go the full data domain and remote replicate data domain and have two weeks on premises and all that because we do that for you. So why not just take a full every month and push it somewhere cheap? And granted, your RPO or RTO isn't that great with that, but we're giving you the RPO and RTO locally. Right, right, right.
So, all right. So next question, Glenn. So what about size of file systems and stuff like that?
We're talking, you know, object storage, typically petabytes kind of, can you, I mean, how many files
can you handle? Can you have 10 billion files, a trillion files? How many directories? Those sorts
of numbers, you know? Yeah, we don't run out.
We're basically a ZFS on the backend.
So you're a storage tech guy.
Like it's on the backend, right?
So a lot of the technology we've incorporated from ZFS.
So we really don't run out of files and all that.
Now, granted, because we keep adding,
because we're immutable, so we have a lot of metadata.
And what we will do with larger customers, and by larger, I mean like 10, 20 petabyte plus.
Look, I know you guys love the global file system, but it's time to maybe make two global file systems, an archive and a primary. And so
like where your hot data and stuff that's in the last year, let's say, will be in the primary ring.
And then everything else can be in a secondary ring. And all that means is you're mounting two
drive letters at that point. You're not getting the global dedupe,
but in theory, the new data is not really going to dedupe
with the old data.
And at the end of the day, object storage is cheap.
So if we find that we're in the multi-tens of petabyte space
with customers and the metadata is growing too large,
where even NVME has taken a minute to figure it out,
we'll ask them to split it up.
And you're talking about outliers for customers,
because, I mean, if you think about, you know,
how many customers have 40 petabytes of NAS globally,
it's not a lot.
Well, he said he's using Final Cut Pro, so yeah.
Yeah, exactly. Well, he said he's using Final Cut Pro, so yeah.
Putting all that stuff in the RV must be a big deal.
But to give context, I'm coming from a pharma background.
Even worse.
Genomic data and this stuff is petabytes at each site and making it accessible. The metadata challenge alone is big enough. So indexing and then making, helping scientists understand
where the data is at, even if, you know, I can't reasonably move it within, you know, a day or two
period of time. And if I have to send a snowball or whatever, I can identify if I have
the data or if I need to have the data recreated or created in the first place. So when you're
talking about drug discovery, and I have a set of genomic data sitting at a site that is critical
to my drug discovery process, and I don't know that it exists is a big problem.
Whether or not I can access it remotely or not is not as,
as less of an issue knowing that the data actually exists.
That's too funny. I was just talking to a prospect in the last month,
a giant hospital with exactly that situation where they have 40 petabytes
worth of data. And the guy that runs it
all is like, look, my researchers, they're paranoid. I can't put this stuff on Glacier
because that's what he's been doing. Right. And it's not immediately available. If they can't see
it, if they can't touch it, they start freaking out. So we need to make everything available
online instantaneously because they can go back to this stuff. They usually don't,
but if they can't see it, they freak out. So those are the types of problems that we're
trying to solve. That's exactly right. So Glenn, how is this thing sort of charged for? Is it
charged for by filer or data managed? Obviously you charge something for the thing, right?
No, we're altruistic.
Yeah, we charge based on managed capacity minus all of the snapshotting capacity overhead that we do.
So, actually, primary storage kind of thing would be theā¦
Yeah.
And we don't actually charge for mirroring either, which I feel is bad business. As far as the object storage is concerned, is that separately
charged to the customer then? Correct. So right now you're paying your, you know, two cents a
gig per month to pick your favorite vendor or, you know, whatever contract you have. And then
you're, what we're finding is our customers are paying for more capacity on the back end
than they're paying for us capacity-wise on the front end.
Because again, we don't charge for snapshots.
We don't charge for mirroring.
Yeah, exactly.
So we're not really worried about that.
And the way I talk about us is we're basically a toll road.
We're infrastructure.
We're not interesting.
The interesting thing is the stuff that you can do with it
once your infrastructure is built.
You know, all the cloud native services in the cloud,
all the global file system stuff, the collaboration.
But yeah, we're charging for capacity effectively,
just being a toll road.
And we're almost out of time here,
but there's a couple of questions with respect
to other options or other products that you guys offer.
I was on your website and it seems like there's more than just a global file system,
right? Right. So we also have Cloud Block Store, which is a Kubernetes PV, basically. So we're
Kubernetes storage. Yeah, exactly. If you want, it is also effectively a little global file system.
But it's super widely scalable.
So it scales based on cache hits.
So if you have a high performance requirement in the Kubernetes space,
such as media entertainment or genomics, research, stuff like that,
it is a crazy, crazy fast PV that can be made available to as many
devices that need it in real time. And it adds another container once it's not getting the cash
hits that you set. And it keeps adding and adding and adding and growing sideways until you're
getting the performance that you want. And then it collapses itself down once you're done doing this.
So it automatically scales up and scales down the many filers per se that are doing this
persistent volume support?
Exactly.
Yep.
That's interesting.
We unfortunately launched it at a bad time during COVID because this has to be ground
floor marketing.
You kind of have to market this to the nerds.
But we will be doing KubeCon.
I think that's going to be live in LA in October.
So we have to get out there.
Right now, it's in GKE only, in Google only.
We're going to have it in AWS as well
because what we're finding is, again,
they have 75% of the market.
So we have to go with the big boys. So we're finding is, again, they have 75% of the market, so we have to go with the big boys.
So we're doing that, and also, like I mentioned,
we launched our first global file system as a service offering in New Zealand with CCL,
but that's our future, to be quite honest.
At least that's what I feel.
I'm not going to speak for the rest of my executives.
But putting global file system,
making it available in the cloud,
if you think about it,
there's a lot of companies going fully cloud native,
and, but by cloud native,
they're not gonna be 100% in Google or Amazon or Microsoft.
They wanna use the best of breed for all three.
Wouldn't it be nice if all of your applications
and all the clouds had access to the same data set. Google,
Amazon, and Microsoft aren't going to make that. They're not going to acknowledge the existence of
the other cloud vendors. So Panzera is coming in and going, hey, we can offer global file system
as a service in all three, be in all three marketplaces. And that's what you're going to see
coming from us going forward is basically making Panzera as a service in the marketplaces and just charging dollars per gig per month or cents per gig per month for simultaneous access across the clouds.
And that's where everything as a service, right?
You're going from cars and RVs and all the things are basically rentable by the minute now.
All right. So, Keith, any last questions for Glenn?
No questions for me. We've had a great conversation.
Hey, Glenn.
Yeah, this is awesome.
Glenn, anything you'd like to say to our listening audience before we close?
No, this has been fantastic. I love the back and forth.
And this has been a great experience. Thank you.
All right. Well, this has been great. Thank you very much and forth. And this has been a great experience. Thank you.
All right.
Well, this has been great.
Thank you very much, Glenn, for being on our show today.
No worries.
Thank you.
That's it for now.
Bye, Keith.
Bye, Ray. And bye, Glenn.
Until next time.
Next time, we will talk to the most system storage technology person.
Any questions you want us to ask, please let us know.
And if you enjoy our podcast, tell your friends about it. Please review us on Apple Podcasts, Google Play, and Spotify,
as this will help get the word out. Thank you.