Grey Beards on Systems - 33: GreyBeards talk HPC storage with Frederic Van Haren, founder HighFens & former Sr. Director of HPC at Nuance
Episode Date: June 16, 2016In episode 33 we talk with Frederic Van Haren (@fvha), founder of HighFens, Inc. (@HighFens), a new HPC consultancy and former Senior Director of HPC at Nuance Communications. Howard and I got a chan...ce to talk with Frederic at a recent HPE storage deep dive event, I met up with him again during SFD10, where he … Continue reading "33: GreyBeards talk HPC storage with Frederic Van Haren, founder HighFens & former Sr. Director of HPC at Nuance"
Transcript
Discussion (0)
Hey everybody, Ray Lucchese here with Howard Marks here.
Welcome to the next episodes of Graybeards on Storage, a monthly podcast to show where
we get Graybeards storage and system bloggers to talk with storage system vendors and others
to discuss upcoming products, technologies, and trends affecting the data center today.
This is our 33rd episode of Greybeards on Storage, which was recorded on June 9, 2016.
We have with us here today Frederick von Herren, founder of Hyphens Consulting
and former senior director of HPC at Nuance.
Well, Frederick, tell us a little bit about some of your experiences at Nuance. Well, Frederik, tell us a little bit about some of your experiences at Nuance.
Well, thank you. Yeah, so I started about 10 years ago in high performance, and it was kind of weird
in the sense that I was running text-to-speech for R&D, and since I touched Linux and Windows
and other flavors of Unix, I guess I was the perfect candidate to start high-performance
computing at Nuance. In those days, high-performance basically meant there were 10 desktops connected
to the main network. And that was it, right? That was the great high-performance computing
environment. And so my management was telling me in those days that speech recognition needs a lot
of data in order to
improve the product, right? So speech recognition is statistical application. You're saying something
and then the application is trying to compare that with a subset of data. And then hopefully
from a statistical standpoint, the application can guess really what you're saying. So the mainstream
idea was let's start collecting as much data as we can
afford, because cost was still the driver. And then let's see how far we can push that. So in
the beginning, we started out with hiring a company that would provide us guidance on storage,
on servers and high performance file systems. And, you know, and high-performance file systems.
And once we got all the equipment in, I kind of realized that there were a lot of nuances, if I can say it like that, between raw storage and actual capacity, multi-core, single-core.
And I got interested in knowing more on what's behind all of this. One thing I learned
really quickly is there is always a bottleneck. The question is where it is and provide a platform,
if you wish, that you can control as an individual and replace pieces as you go along. Because I
believe that if you own the platform, you could replace vendor A
with vendor B and improve performance and hide the complexity from users. So one of the things
we decided to do is to swap out the original high-performance file system with IBM GPFS.
GPFS?
Yes, GPFS. They call it spectrum scale nowadays, I believe.
So that's about eight years ago. We wanted to look for a reliable and a scalable piece of software that could glue our storage devices together and make it look like one.
So a little bit of a fast forward from a user perspective.
Our users are seeing the exact same file systems as
they saw eight years ago, but we have to replace the hardware behind it four times, right? So if
you own the platform, something like GPFS, you have the power and control to make those decisions.
Because I came from R&D next to GPFS, for some reason I was convinced that I needed three tiers of storage.
And don't ask me why, because I really don't know.
It was a gut feeling where I needed high-performing, medium, and then low-performing.
And so I was asking people, what do you think?
High performance, medium, and average.
And at some point, we stumbled over a company that did ATA over Ethernet.
Oh, our friends at Co-Raid, well,
our former friends at Co-Raid. There you go. And so we said, look, why don't we try with,
you know, a few shelves, you know, Supermicro, all that good stuff, 25 drives per shelf.
So we started and to a certain degree, we kind of said, well, what if we buy 40 of those shelves and how far can we push it?
And before we know, we had 1,000 drives of co-rate and based on SATA drives, 500 terabytes SATA drives,
and performance was reasonable in the sense, you know, considering SATA drives and the amount of drives we had.
The only problem was having 2,000 drives of that really was not an option. It was management of it was a pain. Failure of drives was very, very difficult. At some point, we even decided
that we would put the webcams on the shelves just to see when the drives would fail.
What, the lights? You're kidding me. This is your management?
Well, this was the whole idea with Corate.
It was, you know, we will make these things really stupid.
And if your goal is to be really stupid, it's easy to achieve.
That's right.
And so one thing I learned out of it is that, you know, having a Tier 1 and Tier 2 and Tier 3 wasn't really a necessity.
I did understand that with good basic equipment, you could scale it out if the hardware would allow you to scale out.
So at some point we said, you know, co-rate, that's great, but that's not going to work for us.
We wanted to go a step up.
So we wanted to go really with a JBot that had relatively good management tools and preferably CLI because high-performance computing is all about automation. I mean,
nobody's going to log in into a thousand service manually. So you want to do all of this in an
automatic fashion. And that's how we started working with HP and we asked HP, so what's your,
what kind of JBOT with management tools can we use and get off the ground? And then before we know,
we started working with the MSAs, version 1, version 2, version 3.
Then they renamed it to P2000.
Then we had up to 12,000 drives of MSAs.
And wide striping across 12,000 drives
solved your performance problem
so you didn't need a faster tier?
Yeah, so the way you should look at it,
the MSA had 96 drives per dual controller.
And so we purchased blocks.
On one hand, we had SAS drives.
And I think in those days, we used a 300 gig 15K RPM.
And then for slower storage, we used the 750 gig SATA drives.
So we called the SAS site, we called it Scratch because it's heavy duty read-write.
And then we used and called the SATA site static. Basically, this is incoming data from users,
and we're not going to modify that data. So we're going to write it once, but read many times. And so
with the help of GPFS, we would create RAID blocks within an MSA stack. And then we took a bunch of LUNs
across multiple MSA stacks, and then we created file systems. So if you want to picture this,
imagine that we have stacks of MSAs in multiple racks, and then you create a file system
horizontally across all the different racks. And that's how you achieve your performance,
right? If you want more performance, you add more racks and more LUNs to the file system. So
12,000 drives was doable. Okay. And the MSAs were really a bit more than a JBUD because you were
using the basic data protection. That's right. That's right. We were looking for, from a price
point, we were looking something as close as possible. So Howard was right. You were wide striping the data across all 12,000 drives?
Yes. Yeah.
This would be a performer. This would be a screamer.
Yeah. Well, you know, 100 IOPs per drive, 12,000 drives.
Yeah, we're talking millions. Millions.
We're talking, you know, a lot of IOPs.
That's right. And so we had great luck with this whole environment,
but from a growth perspective,
we were pretty much doubling every 18 months.
So nobody was looking forward to that 24,000 drives, right?
So it's not because it worked really well with 12,000.
So we were back with the same issue as we had with the core rates,
basically was we're questioning ourselves about management,
we're questioning ourselves about costs and scalability. So then we started looking at what's the next step?
So where do we look? And just for your information, we test a lot of vendors, right? So what we're
trying to understand is what the market has to offer, what performance, what's the latency, the costs. And most importantly, at least from
what we saw, is also support after purchasing the equipment is really, really important. And
the way you can look at it is when you have that many hard drives, and certainly for the MSA,
you are a statistic. So if there is an issue in the firmware, you're going to find it.
Right. This only affects one out of 10,000 drives.
Yeah, that would be.
Oh, that's me.
Yes.
So, and it was interesting because there were cases where we started to find stuff before anybody else found it, right?
So the moment they had the new firmware, we would wait a little bit.
I mean, you don't always want to be the guinea pig.
But at some point, you converged to the new firmware and then you would realize, you know, here are the problems. So at
some point we started to decide to share all our logs with HP, the Colorado team, and they would
regularly look into our log files to see if there's anything abnormal going on that would
point to a bug in the firmware. So when you're talking log files, you're not talking about storage per se, you know,
the outboard logs for the drives or the storage controller, but your internal logs from your
activity and console logs and those sorts of things.
Is that what you're talking about?
Yes, that's right.
Yes.
So we would work together with the Colorado team and then they would ask us, here are
the logs we want from our device.
Here are the logs we would like to see from your application and then they would put one and one
together and then figure out what the issues were and you know they found a decent amount of stuff
i mean we're like we were like analytics for them right so they they they have a new firmware they
would give it to us and then we would pretty much tell them where the issues are. Okay, and basically this, what looks to me like five or six petabytes of data,
is audio files of voice to use for statistical analysis?
Yeah, it's combined.
So typically when a speech recognition happens, there are two files that are being generated.
One is the, as you suspect, the WAV file, the binary file, which is the sound.
And then there is a text file associated with it,
which is a textual representation
of what the system thinks was being said.
So yes, by nature, it's a lot of small files.
You know, what we try to do on the static side
is to take a lot of those files together
and power them up to make
them, you know, megabytes or gigabytes file. And sometimes we do that based on the metadata of the
file. So for example, we could put automotive data together, or in some cases, we would put that
tar file based on a customer. And then the Scratch base is a little bit more the Wild West, right? Where Scratch is heavy-duty rewriting, and that's where you can expect to see smaller files.
And smaller files is a challenge to your metadata.
Oh, yeah.
It's also a challenge to your overall performance, but it's always a challenge to keep people in line.
So every time new data is being added, we wanted to make sure that when they added more
data, they would stack them into our files. You found GPFS was up to the scaling, I mean,
from thousands of drives to 12,000 drives and beyond that? Yes. And the metadata, you know,
I'm not sure what the metadata engine looks like in GPFS was able to handle literally millions,
if not billions of files. Is that? That's right. I mean, every time
I utter something to something like Siri, that would represent a separate file or two.
Yes. So the way you can look at it is that per second, GPFS is processing two or three million
files. And so the GPFS product, you know, if you separate the backend data where the actual content of the file goes versus the metadata.
So the metadata is the metadata of the files, right?
It's pretty much like a database.
It's a textual row by row, and each row identifies metadata information for a file or a directory. And it's challenging, right? Because metadata is not a lot of capacity,
but you need a tremendous amount of IOPS, right?
And that's why...
Fred, Rick, when you're talking metadata,
you're not talking about like standard NFS file metadata,
but rather data that your systems are applying
to indicate like it's automobile or somebody talking? No, no, no.
I'm actually talking about file like an NFS metadata.
POSIX metadata.
Yes.
So, I mean, the way you can look at it is anytime a user wants to do something, they're going to hit the metadata, right?
So even if they do a simple LS, they're going to hit the metadata. Now, when you talk about the voice metadata, that's where we used to put that into a Hadoop cluster.
So there was a Hadoop cluster with 76 servers.
And there you could put in a query for files.
So, for example, you could say, as a researcher, I want to know where all the American English files are, 8 kilohertz, female collected in automotive,
that query, SQL query, would then go to the Hadoop cluster
and will return with a list of paths
to where the data is actually stored.
Wait a minute, you had a separate metadata server.
A Hadoop cluster is a separate metadata server?
Yes.
Oh. Well, you have to right so the last
thing you want is when you have 12 000 drives is somebody to go into unix and say find me
yeah you're going to be gone for days and the file that you were looking for could be there at the
time you start your sequence but it could be deleted in the middle of it, right?
So you need something that's a lot more performing.
We started out with Hadoop because we believed that Hadoop was a good use case,
but then we switched over to HP Vertica,
and our queries went from 12 minutes to 53 milliseconds or so.
So that was a huge boost.
And Vertigo is yet another scalable database service from HP?
It's a product from HP and HPE,
and it's a columnar database, right?
So I'm not sure if you're familiar with columnar databases.
Yeah, a little bit.
But it's still a SQL database solution.
Yes, it is.
And we went from 76 servers to 10.
Ray?
Yes, sir.
Verdict is also the back end that Nimble uses for InfoSight.
Ah, that's interesting.
I always thought they used Hadoop.
Nope.
That's a different discussion.
Ah, interesting, interesting.
So you mentioned that this was all statistically based.
It seems like the world is moving towards, and I'm not sure what the right terms are, but neural net, deep learning, machine learning kinds of things.
Is that transition happening for speech recognition as well?
Oh, yes.
Oh, yes.
It's very important. So the software or the algorithm you use to compare whatever you're saying versus the data you're trying to compare's say, a thousand speech recognitions at the same
time, you're going to need a thousand
CPU cores, right?
And that's a lot of equipment once you
want... Wait a minute, wait a minute. Something like Siri
that can be handling
almost a million, right? I mean...
Yes. You would need a million cores?
That's right. You can't
virtualize this stuff?
No. I mean, the problem... Virtualization doesn't give you more, right?
So if the CPU runs at three gigahertz, virtualizing is not going to give you six, right?
It might tell you you're going to get six, but you're not going to get more than three.
So the reality is, because of the algorithm, because of the single core approach, it was a bottleneck from a scaling perspective. And that's where machine learning
and neural networking is coming into play. And when I started working in speech recognition
15, 16 years ago, in those days, you were not going to run your speech recognition algorithm
on a CPU. You were going to a DSP. And today, what's the replacement for a DSP? It's a GPU, right? So
you're going to go to a company like NVIDIA and they will give you a lot more cores. Granted,
the GPU cores are not as fast as the CPU cores, but for speech recognition, it doesn't really
matter that much. It's really the amount of cores. So suddenly with neural networking and NVIDIA,
you have 3,000 cores or over 3,000 cores per GPU card,
and you can assign a bunch of those cores to a recognizer,
and now you can scale way, way beyond
what you could do in the past.
I give you some metrics, right?
If you buy a 1U server, like a DL360,
two sockets, you end up with, what, 24, 28 cores or so altogether. If you buy a server that can
host eight GPU cards, typically there are about four U, so four times more volume, but you get
eight times 3,000 cores, So that's 24,000 cores.
Granted, they're GPU cores, so we have to be careful there. But for speech recognition,
this is a huge boost. So in 4U, you have 24,000 cores. So you can scale and you can provide a lot
more flexibility than before. Now, it does come with its own problems, right?
So if you imagine that each CPU core
needs access to a certain amount of IOPS
and you go to GPUs,
I mean, then the math is going the other direction, right?
Yeah, because now you've got two cores per spindle
and that's going to be a problem.
Yes, and then you're a whole
different ball game but you know that's what that's why we have technology and innovation is
is to help you solve these problems and if you want a lot of iops and a single point or a single
server there is technology out there that can do that for you if you prefer a san architecture or
distributed architecture that's going to work for you as well.
But yeah, like I said in the beginning, right?
It's all about a bottleneck and the bottleneck moves around
and you just need to find ways to improve the bottleneck
and make sure that you control the platform
in the sense that you can make a decision to go left or right
without having to reteach your users on how to use the system.
Yeah, it's kind of the first lesson you have to learn as an IT architect
is you never solve anything.
You just move the bottleneck.
It's way too frequently that I see people,
especially in the early days of Flash, people would say,
I have a storage performance problem.
I only have 1,000 IOPS.
Let me go to a million IOPS.
And, of course, at 10,000 IOPS, the bottleneck was someplace else,
and they had already spent their whole budget for the year.
Yes, and I do like it, right?
But as you said, it's very difficult for people,
certainly the people who have to pay for the whole thing,
is that the bottleneck changes, right?
They look at the bottleneck as something negative,
while in reality a bottleneck is just where you improve the performance of your environment, and it's
just moving to somewhere else. But that doesn't mean that it's worse than before. It could,
but not necessarily. You mentioned that the machine learning, deep learning neural net
has got a different scalability, I guess, if I call it properly, than the old statistical Markov, hidden Markov
analysis. Can you explain that a little bit, Frederik? Yes, it's all about the ability to
parallelize, right? So if you look at a GPU card, for example, a GPU, it's running a single
application, but the data is different. I mean, if we go to the basics, what does a GPU card do typically in a laptop, for example?
The only application you're running is putting pixels on your screen,
but from position to position, the pixels are going to be different.
And so you can look at that as a highly parallelized application.
And that's the same thing you're doing with the neural networking,
where you basically say, I'm going to take
a large task, I'm going to cut it into pieces, I'm going to give a bunch of cores a task
or a set of data.
They all run the same application, and at the end of the cycle, they all come back together
and say, okay, we're going to take the data points with the highest return.
From a CPU perspective, it's quite the opposite, right?
Where with a CPU, you can run anything on a CPU or a core.
So you can have two cores running two totally different applications.
But if the only thing you're trying to do is the same thing over and over with different
data, then the ability of having the CPU doing two different things at the same time is not gaining you anything, right?
So now they have an architecture where they create a tremendous amount of flexibility you're not going to use.
You prefer to use technology where parallelization is built in.
So in the old Pid and Markov approach was a CPU based approach in the new neural networks,
the GPU based approach. Is that how I read this? Yeah, it's more about how wide you can parallelize.
And because GPUs have the ability to go to the thousand cores, it's naturally a better way to
use it. And, you know, in general, when I talk about gpu you can use the word gpu by accelerator cards right so
there's this intel also has the on fi there are other companies you know fgpas there's a lot of
tools you can use depending on your use case you need to understand how useful it's going to be to
your use case and then you choose what you want to do. I think Google actually came out with some new... TPUs, right? Yeah.
That's right.
So basically what that is is exactly the same thing,
except that for their particular use case,
they found that one way of doing it
was better than GPUs or FGPAs or Xeon Phi.
In reality, it's all about use case.
And same thing with storage, right?
It all depends on your use case and how to use and how to pick the environment.
And things get really interesting in the next couple of years when Intel comes out with the Xeons that include some FPGA built in.
Yeah, I think it's a natural thing, right?
So when we talk about high-performance computing, it's kind of, you know, people say, oh, that's a niche and it's very specific.
But since storage has become so cheap, a lot of people have enough storage and the ability to store a lot more data than they used to.
And so they're really entering, to a certain degree, the world of high-performance computing without really wanting to call it high performance computing. Yeah, we're starting to see that the technologies of high performance computing, like, you know,
large clustered file systems like GPFS, move their way into the commercial data center.
Yeah. And then there's, you know, GPFS is kind of the old guard, you know, they have been around
for long. Intel, although you would look at them as a hardware company, they are spending a lot
of money on Lustre, which is a competitor to GPFS, which has some interesting features.
You know, it's a fast-moving market.
It's not being ignored.
You have to keep track of all the different things.
And the good thing there, as always, is you have options, right?
Options depending on your use case.
Is GPFS open source?
No, no, no, no.
It's proprietary.
It's from IBM. No, no, I know it was IBM, but I didn't know whether it was open source? No, no, no, no. It's proprietary. It's from IBM.
No, no, I know it was IBM, but I didn't know whether it was open source.
No, no, no.
Lustre has an enterprise.
Paying version also has open source, so you, because the scale computing guys started with GPFS
before they developed their own object backend.
Yeah, but I think that was under license from IBM, though.
It's definitely under license, but they made a lot of changes
to make it a distributed as in addition to clustered file system.
Yeah, there was another one besides Lustre.
There was a GFS I think Red Hat has.
Yeah, GSF is pretty good as well,
but it's not as scalable as the others.
So if you want to use it up to 100 terabytes, that's okay.
But I would use Lustre or GPFS
if you are going to use more than 100 terabytes.
You mean I can almost have my own HPC site in my basement here?
Yes.
I just need one of these 4U3000 core.
That's right.
24,000, sorry.
I got enough hardware.
I just need an application.
You know, that's the problem is you need to figure out, you know, infrastructure is grand,
but it's really trying to solve a problem that matters.
You know, one of the questions that the data world has these days is how many people does
it take to administer a 24,000 drive environment with two tiers of scratch and static and a
dupe cluster?
I mean, how many do you have admins in this world?
I guess that's a question.
So the first comment is we do a lot with automation. And so we are a 24 by 7 environment, but my people only work 9 to 5.
That doesn't mean stuff doesn't happen after 5.
It's just that the way the platform is built and the redundancy,
there is no need to jump and run to the data center to replace something.
So from an admin perspective, we have one guy in the data center to do break fix, as you can suspect, mechanical device break,
they do break a lot. So a lot of his time goes to that. I have an operational team
that mostly works on application changes with the users. So we don't have a standard application,
any script is considered an application. So there's a lot of work to work with the users. So we don't have a standard application. Any script is considered
an application. So there's a lot of work to work with the users to make sure that they don't
destroy the environment. I mean, you can imagine, you know, if you need to write one megabyte,
you don't want one million one-byte files. You want a single one-megabyte file. But you would
be surprised sometimes, you you know that people don't
understand scale and then the engineering side i have one main architect i have one person who
looks at open source community because we look a lot at spark and that kind of stuff and we do a
lot of pocs and then i have another person on the engineering side who works mostly on security,
on automation tools, XCAD, Puppet, and that kind of stuff. So it's a relatively small,
fast-moving group. And this is petabytes of storage, right? Five, six petabytes of storage?
Yeah, on block side. And then we have about two on the object side. So yeah, we hear that a lot. A lot of people tell us, you know, that we
don't have a lot of people, but you know, you need to own the platform. You need to
automate a lot and have a good understanding of high availability, right? So in other words,
when you buy a storage device,
you have to make sure that your power is well divided. I mean, now with the new device,
it's not necessarily an issue. But if you look at the MSAs in the past, you know,
if you wired it wrong, you would end up with two power supplies on the same PDU,
the PDU would go down and boom, you pretty much have data loss, right? So yeah, it's making the right choice.
A lot of people ask me how we do it.
I guess it's difficult to explain.
For me personally, when we started, we just had two people, me and my main architect.
For some reason, when we design something and look at stuff, I kind of feel where the weak points are, and we try to approach them and handle them.
But we had issues in the past where we had bad batches of drives, bad batches of power supplies, and we never got hit by downtime because of the architecture.
You also had the advantage of not having 400 unique Snowflake applications that various departments bought that have to get supported.
Oh, yes. That's right. Yes. Yes.
The complication in the corporate data center, much of it comes from that.
Yes.
So back to the, you keep talking about owning the environment. I think that because you went to GPFS, you can effectively replace anything
underneath it that you want as long as it talks some sort of storage protocol. Is that kind of
how I should understand that? Yeah, it's not necessarily the storage protocol, but let's
give you an example. So for example, a lot of the devices out in the market, they will do tiering
for you, they will do snapshotting for you, But they all do it within the same device, right?
So imagine, let's take an example.
Let's assume I have three power devices and I want to build a file system across them.
So horizontally, I build the file system across, you know, three power one, three power two, and three power three.
I want to be able to control the tiering because if I tell 3PAR1, go ahead and start tiering,
that's going to hit my performance, right?
Because that device can decide to move data at any point in time,
but it doesn't know what 3PAR2 and 3PAR3 is doing.
And it doesn't know anything about your data or your applications.
That's right.
You have so much more context
that you can make more intelligent decisions.
Yes.
And as a result of this, we put the intelligence in GPFS and tell GPFS, this is what you need to do.
And that gives us – there's pros and cons, right?
If somebody comes up with an incredible new way of doing things that we typically do at the GPFS level, yes, that's going to hurt us.
But nowadays, we rely on basic functionality of storage devices. You know, 3PAR uses chunklets,
so that works really well. But from a tiering and all that good perspective, we prefer to do
that at the GPFS level for control. And as a result of this, we can say we're going to replace
3PAR with something else or upgrade the 3PAR and so on.
And GPFS allows you to take a hardware device while it's in production.
So, for example, if I say, you know, the example with the three 3PARs, if I want to upgrade 3PAR number three, I can tell GPFS move all the data from three to one and two.
It will do that online.
And then when all that's done,
I can take 3PAR 3 out of production
and the users don't know anything.
And then I can replace 3PAR 3 with, let's say, 3PAR 4
and then tell GPFS load balance the data
that's on 1 and 2 now across 1, 2, and 4.
And all you're relying on the 3PAR, the MSA, to do
is provide basic resiliency.
Yeah, and that's how we went from, you know,
Corate to MSA to 3PAR.
Our users are clueless on what we're doing.
As a matter of fact, when we asked them,
they have actually no idea.
They still think we're running on the same storage device
as we had eight years ago.
Okay, well, I mean, moving from MSA to 3PAR, there's a significant cost per gigabyte difference.
But having fewer devices managing more drives made simplified management to justify that?
Yeah.
So there was a cost.
So if we talk about price, the original three-part devices, like the V400, they were a lot more expensive than the models we used afterwards, the 7400s.
So I would say we had four 400s that were really expensive or at least more expensive than we wanted.
But the 7400s are price-wise, at least with our discount, much, much closer
to the MSA. Not the same price as the MSA, but at least closer to the MSAs and made it really
worthwhile to move for us. Right. And instead of having 96 drives per pair of controllers,
you've got several hundred drives. Oh, yes. And with a lot of cost savings too,
because each MSA came with a bunch of fiber
channel ports. And now we had a consolidation of fiber channel ports. I mean, fiber channel is not
cheap. So we actually, going from the MSAs to the three-pars, we actually reduced the amount of
fiber channel ports significantly. Wait a minute, you're a fiber channel SAN? I would have thought
this would have been IP SAN, iSCSI or something like that. No, no, no, it's fiber channel.
Because of throughput, latency, or?
Because of the bad taste in their mouth from ATA over Ethernet.
Ah!
Wait a minute, wait a minute, wait a minute.
The world is, you know, moving, I think.
Yes, but remember, this is what we, that's what we started doing 10 years ago.
And just full disclosure, I mean, the platform I just described, we're going to replace it soon with a completely new architecture where we do it differently.
But for the last, you know, eight, nine years, it was all fiber channel.
I mean, it works really well.
We have the fiber channel switches.
I have clients that are fiber channel suppliers, you know.
Biggest problem with fiber channels, paying for it.
That's right.
You seem like you guys do a lot of proof of concepts
for vendors, more so than
others, I would think.
I can't tell, really.
How often do you do proof of concepts?
All the time.
You're looking for a new technology that can provide
cost, reliability, or performance
advantages?
All of the above.
So what happened a couple of years ago is every time we tested a device, we would find issues.
And then the vendor was really interested in how we did our testing.
I mean, imagine we have like 10, 12 racks available to us for testing, right?
So who has 12, 13 racks they can deploy against a new device?
I mean, Howard is probably the only guy I know that has 10 or 12 racks.
I've only got five.
Ah, I see.
And so what we noticed is that people like HP and IBM, because we talk to pretty much everybody,
they started saying, hey, can you look at our device and see what do you like,
what don't you like, and so on. And as we were doing that,
we were getting higher and higher into the engineering teams
with all these companies.
And so we got a lot of information.
And then at some point, we had companies tell us,
hey, we're under NDA, but can you guys look at this?
And we have this new thing, and can you try it out?
And then before I know, I have VCs calling me up and say,
hey, we heard from such and such
that you tried, you know, product such and such in alpha. And what do you guys think? And
is this good technology? And we kept on doing that. I talked to a lot of startups.
Some of them don't even have a website. Others have a website and others are in a stage where
they're hitting the market. And it seems like we provide good technical feedback
that they can use moving forward.
And I came from R&D, so for me, it's all about technology.
If I can learn something and I can help people understand what we do
and I can learn what they're doing, then I consider that like a win-win.
Yep. And Nuance Management supported you in that, which is good.
Yes.
Most of the time, yes.
Well, and the other thing is you can help guide their technology to something that's
more amenable to what you're looking for, too.
I mean...
That's right.
Yes.
And I think that we have a tremendous reputation within the company of making things work,
right? So, and sometimes it's not a positive where if something really, really goes bad in a
different department, you know, they know where you live, right?
So then you're in troubleshooting.
But overall, I like technology.
I like working with vendors.
I like listening to what other people have to say.
And, you know, and that's why I pretty much decided to jump ship and say,
why don't I focus even more on technology than I used to?
And so far, so good.
I really like it.
There's no better way to say it.
So at Hyphens, you're consulting with these startups and companies about the technology?
Oh, yes.
Yeah.
You're also working with the HPC companies to identify technology that might be applicable
to their environments?
Yes.
It's interesting.
I mean, I left on May 2nd, and I have companies calling me up, asking them to work with them
from an HPC perspective, because they believe they have something that should be
good or a good fit for high-performance computing. So I get a lot of traction in those areas. We have
to see how we can make that all work. But there is clearly at this point no lack of interest.
Yeah, that's great. For a consultant just starting out, that's mind-boggling. I wish I was there.
So you mentioned Lustre and GPFS as predominantly the two solutions that you would look at in this sort of environment.
Yes.
You know, like I said, GPFS is kind of the older technologies out there.
It has some things we don't like from a scalability perspective that new technology like luster is taking over
and gpfs started out as a hundred percent block right so now they added features where there's
object you know with clever safe acquisition or they start to tear luster is based on a more of
a block slash object concept out of the gate right right? So it's more, the architecture is more modern
and more targeted, if you ask me.
And there are others out there, right?
So there's a lot of homegrown file systems.
But once you talk about petabytes and scalability,
it comes back to a statistic, right?
Do you want to be the person who is the number one or the first customer of
a new startup that created the new high-performance file system? We did it before, to be honest. I
mean, eight years ago, we did that with a company and we got burned significantly.
You know how they say you can tell the pioneers.
Yeah.
Pioneers are the guys with the arrows in their ass.
Yeah. Well, yeah, I've been there, done that.
I understand that completely.
And file systems take a while to mature.
Oh, yes, a lot of testing, a lot of validation.
Howard, do you have any last-minute questions to ask?
No, no, I'm getting it.
I've enjoyed the conversation.
Yeah, this has been great.
Frederick, do you have anything you want to say to the
GreyBirds on Storage audience?
I would say keep listening. I think it's
a great venue, and
let's keep it going. And we like it a lot.
And any of your listeners out there
interested in doing a review on iTunes
for the podcast, that would be great
as well and get us a little bit more exposure.
Well, this has been great. It's been a
pleasure to have Frederick with us here
on our podcast. Frederick,
where can we find you on the Twitters?
I have two. The first, the personal
one is FVHA,
which is an abbreviation
of my name, so FVHA.
And I also have one for
my company, which is at hyphens.
That's great.
And www.hyphens.com is your URL?
That's right.
Yeah.
I'm working on it.
Like you said, I didn't expect to have so much work and so much interest in the beginning,
so my website isn't really there yet.
I think we can all say that.
Yeah.
And we've been at it for years.
Some of us decades.
We won't go there anymore.
All right. Well, next month, we won't go there anymore. All right.
Well, next month we will talk to another startup storage technology person.
Any questions you might want us to ask, please let us know.
That's it for now.
Bye, Howard.
Bye, Ray.
Until next time, thanks again, Frederick, for being on our show.
Yep.
Thank you for having me.