Storage Developer Conference - #80: Thinking Fast & Slow: Intuition, Reasoning, and Emerging Memory
Episode Date: November 27, 2018...
Transcript
Discussion (0)
Hello, everybody. Mark Carlson here, SNEA Technical Council Co-Chair.
Welcome to the SDC Podcast. Every week, the SDC Podcast presents important technical topics to the storage developer community.
Each episode is hand-selected by the SNEA Technical Council from the presentations at our annual Storage
Developer Conference. The link to the slides is available in the show notes at snea.org
slash podcasts. You are listening to STC Podcast Episode 80. Well, good morning, and I see several
familiar faces, but lots of faces that I don't recognize.
So I think that makes it a good mix for this conference.
And I'm going to talk about today. We're going to start actually with your brain.
And so we're going to explore your brain just a little bit.
And I know that may sound dangerous, especially at 9 o'clock in the morning.
It may not be quite up and running yet.
But that's what we're going to do here in just a minute.
And I have a very specific reason for exploring your brain and how it operates.
So let's jump into it.
So here's some of the key points I want to make today.
First of all, that the brain can be modeled as two systems.
How many of you already believe me that your brain has two independent systems? Absolutely,
100% believe me. Okay, two basic systems. Okay, I only saw a couple hands go up, so
let's see a little bit later this morning if you agree with me.
So as we move ahead into the artificial intelligence era,
we have two really key tools in memory to make that happen,
and that's DRAM and NAND.
And I like to say the future of NAND is NAND.
The future of DRAM is DRAM.
And I stole that from somebody about 10 years ago.
And it's still true today.
But there are alternatives that are coming. And we're going to talk about how those emerging memories can be used
to solve some of these limitations that we have with DRAM and NAND.
So if you're at Micron or Toshiba or Samsung or Hynix, don't worry.
Your technologies aren't going away overnight.
And all of us that are designing memory or storage systems around that,
we don't have to throw out that knowledge,
but we do have to learn about the new technology.
So we're going to talk about that. And then the key point here is that emerging memory, as it gets commercialized,
helps accelerate this movement into the AI era.
So when S.W. Wirth asked me to talk, I asked him over the phone, I said, well,
what do you want me to talk about? And he says asked him over the phone, I said, well, what do you want me to
talk about? And he says, anything you want. And I thought, well, that's really dangerous. It could
be anything. But this is a book I'm working my way through. And how many of you read this book?
Okay, only a couple hands. I really highly recommend it to you. And I'm just going to
give you a tiny little taste of why I do that. This is a gentleman
who is a professor. He's a professor of psychology, yet he's done this work very scientifically over
decades of time. And what it describes is that we do have these two systems in our brain,
and they really help us in how we think and make decisions. So a lot of the book, which we're not going to go into here,
but is actually how our brain works and biases our decision making.
So these two systems, the one is a very fast, intuitive system.
It takes no effort.
The other is more deliberative, reasoning, etc.
All right, you ready for a quiz? I know it's nine in the morning and you're going, gee, I didn't, you know, do I have to use the number two pencil?
I wasn't ready for a quiz. I didn't study last night, but here we go. I'm going to show you two
images. Don't say anything, just going to show those two images and then we'll talk about what happened.
Okay, what happened in your brain? I saw you saw two images come up. We're going to we're going to go into this. And by the way, this is exactly the example that he uses in the book to lead off.
You saw this image on the left hand side. And immediately, what did your brain do? What are
some things you thought about? What's her emotion? She's angry. You know, who is this person? You might have said, that's my wife, my sister,
somebody here I met yesterday at the conference that wasn't very happy with me.
You know, you came to a bunch of conclusions immediately. You probably even might have
assumed, what are her next words? Not very positive, right? You have a model in your brain. As soon as you saw that image,
you called up that model and you had certain assumptions about it. Now, maybe true, maybe not,
but you were automatically geared without any effort to have certain information. You
retrieved that information immediately. How about on the right-hand side? How many of you solved that problem right away?
Okay, a handful of you math nerds probably did.
It turns out that we can hold like two-digit numbers.
If I came back a couple of minutes from now and asked you what problem was I asking to solve,
you'd remember those numbers.
But you parked it.
You recognized it as an equation,
but you said, I can come back later and solve it. So the system on the left is your intuition
system. It actually uses associative memory. And there's models built in your associative memory
that you had instant recall. The one on the right,
you parked it and you gave an interrupt to your reasoning system to say, do I have to solve that
now? If I have to solve that now, I know I can go do it, but I'll kind of do that offline. I'll do
that in the background. I'll park that and come back. So these are the way our two systems interact. The intuition system is immediate.
The reasoning system is interrupt driven and gets called by the intuition system and says,
what do I need to do with that information? Now, what are some more attributes about these two
systems? So in the case of the intuition system, like I said, it's lightning fast. It happened right away. How many of you did it take effort to recall that model? I bet none of
you took effort to immediately have a response there. So it's real time and it's approximate.
You didn't know who that woman was, but again, you might have assigned it to somebody you know in your life, etc. Now, how about the reasoning system? Oh, I'm sorry. Intuition, if we think
about where that happens for compute, that happens on the edge. And this is some, as we push compute
towards the edge, this is one of the things we want to have there. How about the reasoning system?
The reasoning system is slow.
It's deliberative.
It's very precise.
In other words, you could give an answer.
You could multiply 17 times 24 and give a very precise answer, but it takes effort.
And actually, science shows that when we invoke our reasoning system, we burn power.
Our blood pressure goes up.
Our tension goes down.
Think about it. When you're concentrating on solving a problem
and applying effort to it,
you miss things around you.
There's actually, they talk about in the book
that there's a very famous experiment where they say,
watch these basketball players
and tell me how many times they pass the ball.
And so if you're concentrating on counting the number of times it passed the ball,
you miss in the background somebody in a gorilla suit comes by and waves their arms.
How can that be?
You're focusing on counting.
Your reasoning system is taking over and consumes all that power.
So this reasoning system is what we have in our data centers,
and it's what we have traditionally for compute.
Okay, let me see how many of you now believe that you have at least two systems in your brain.
Okay, a few more hands, so I convinced some of you.
So again, this is just a little bit of taste from that book, but I again, highly recommend you read it.
Now let's go on.
We're going to spend most of our time
talking about the reasoning system in the data center
because that's something you can go build today
with the components you have today.
But we will come back and talk about
the intuition system at the edge.
So if you recognize this guy, Dr. von Neumann,
he gave us an architecture a long time ago for our reasoning system.
It works really well.
It works really well.
What do we have for the intuition system?
Well, there is a guy that's pretty similar,
and his name is Dr. Moda, and he's from IBM.
And maybe someday we'll talk about Dr. Moda
the way we talk about Dr. von Neumann.
And Dr. Moda, and this is literally
the mapping of a monkey brain.
No, they have not dissected human beings
to map the network in the brain,
live human beings,
to map the network in the brain at this point.
But one of the things he teaches
is the architecture that we've used
for reasoning systems
is not the architecture to be used for intuition.
So someday we may talk about him in the same sentence or the same way,
but we're going to store him.
So if you're a load and store person, we're going to store him for now.
And we'll come back later and talk about him.
But let's go into the reasoning system because we want to talk about
the memory tools we have today and the ones we have tomorrow.
So as we look at that von Neumann architecture, there's that memory unit.
And a long, long time ago, we had HDD and we had DRAM.
We're pretty happy with that.
But then this thing called NAND got invented.
And so we replaced it.
And for a while, we were happy with this situation.
But there's some problems.
So let's go in and talk about what are some of those problems
in our von Neumann architecture with DRAM and NAND.
So my friends at RAMBUS, this actually comes from Gary Brawner,
if you happen to know him.
He's presented it a couple times.
And what he does show is the ASP, the selling price for DRAM and NAND. There's an old axiom that memory
declines cost-wise at 30% per year. And if you map that out, that actually goes back many decades.
However, we see a big problem here on DRAM, and DRAM is flattening out. So its rate of change, now ASP and cost are not the same.
That is true.
But the rate of change in DRAM has really slowed over the last decade.
So there are scaling limits, fundamental scaling limits for DRAM,
the way we've known DRAM, that are causing its costs
and subsequently its price to flatten out.
Conversely, NAND is continuing along pretty much that historic curve of 30% per year.
So NAND's doing pretty well on pricing, but bottom line is DRAM costs too much.
And a friend of mine, Mark Webb, who is another consultant that speaks,
he has a great quote on that.
He says, well, and zero of the DRAM manufacturers
are concerned about this.
They love it.
They're making lots of money right now.
And if you're a customer, you're not too happy about that.
Okay, so let's go on and talk about NAND.
This is a look back at 2015.
Again, I stole this one from Mark Webb.
He had this nice talk at Flash Memory Summit this year. What about latency? Well, what we see is there's a continuum of
latency, but there is a big gap between NAND and DRAM. This is where we go from latency of hundreds
of microseconds down to tens of nanoseconds. So that is the gap that we'd like to fill.
And certainly there are new memory technologies that help us fill that gap.
So that's the other problem. NAND is too slow. So let's sum that up. So what are our limitations? DRAM costs too much.
NAND, it's latency.
The latency is too long,
specifically the read latency for storage systems.
Okay, so what do we want to do about that?
Well, we have this general category called emerging memory.
I don't even call it persistent memory or storage class memory. It's just all different types of memory that are coming forward.
And from talking to customers, what we hear is that it has to be a cost,
about one-third the DRAM cost or price, in order to be interesting.
We used to say half, and now it seems to be a third.
It seems to be the good target.
How about latency?
Well, the read latency of an emerging memory needs to be below one microsecond.
And I mean for the system, not just the memory chip itself,
not just that simple element, but it has to be all the way through.
And as a matter of fact, there's only some workloads that can be handled at one microsecond.
And those are some of the streaming workloads can be one microsecond.
But you have to get down to actually 500 nanoseconds in order for it to be used for all workloads.
So that's what we're hearing from architects today.
Well, in this category of emerging memory, does it even exist?
Does something like this even exist?
So I used this slide a couple of years ago at a talk I gave at Flash Memory Summit.
And these don't even come close to all the switching mechanisms.
There's actually an excellent talk in paper.
I wish it was on video, but a guy goes through and presents this great memory switching,
and it turns out it's a banana.
He put two electrodes in a banana and showed it switch.
So I like to say any fool can get anything to switch, just about.
So there's lots and lots of switching mechanisms.
Now, he had a talk yesterday, I believe, which was more on the physics itself and the mechanisms of switching mechanisms. Now, he had a talk yesterday, I believe, which was more on the
physics itself and the mechanisms of switching. So I'm not going to go into that. But this chart
gives you some idea of the diversity of physical effects that are there. And there's probably half
a dozen missing from this chart. So the bottom line here is there's too many switching mechanisms. There's almost an abundance of choices.
But let's dive in to a couple that have seen some work being done over the last decade.
And on the far left is MRAM, specifically spin torque MRAM.
That's the latest type, which gives us the speed we need.
The middle would be PCM. It's phase
change memory. Years ago, I like to call 3D crosspoint rebranded phase change. I see amber
in the back has shown up now. So I'll pick on amber later. So Intel has a lot of names for this.
This is Optane Technology 3D crosspoint. The mechanism is known
as phase change memory. And on the far right is resistive RAM. So these are the three that have
been researched a lot. There has been some limited commercialization of these technologies, both in
the case of standalone as well as embedded memory. So instead of getting into the physics, the science
behind it, let's look at companies.
Let's look at who's doing what in this area.
And this tends to change every couple of years.
So I had to update this chart.
So on the far left, I spent the past couple of years working on SpinTorque MRAM.
And one of the places where there's a really good value proposition for spin torque mram is in the embedded space
so as an embedded memory you know we now have announced four different there are four different
foundries three of which you can design on their pdk right now with spin torque mram at a 28 or 22
nanometer node so this technology is really coming forward. And that's being used for code
storage as well as some caching. Now in the standalone space, there's one company, which is
Everspin, which has brought products to market. And so we're going to talk about them a little bit
later in that area. Now how about in the middle? In the middle with phase change memory, there's Intel
and Micron, who unfortunately are getting divorced after a number of years, but have pursued this
technology. And we're going to talk quite a bit about 3D cross point and its relevance to both
storage and memory systems. In the embedded space, STelectronics is the lone holdout, but continues to develop this technology for embedded.
And then finally in resistive RAM, I used to have a lot more companies up here.
It seems like resistive RAM is slowing down a little bit.
We're not seeing the same adoption in the industry, et cetera.
So there's fewer companies pursuing it.
In the embedded space, Panasonic has developed a very nice technology,
and Crossbar also has licensed their technology to Foundry,
although we haven't seen as much uptake of that technology as we once thought.
In the standalone space, Adesto has shown their version of resistive RAM,
which is conductive bridging RAM,
and that's for relatively low density and almost can be considered embedded.
And then at Flash Memory Summit this year, Sony talked about pursuing a resistive RAM technology
that is in the hundreds of gigabit range, but is still years out. So let's focus in on things which
are shipping, certainly sampling, and products that
are shipping. Because again, we want to apply this to our data center, to our reasoning system,
and we want to see what can we build with these emerging memories that exist today.
Now I realize, and even later today there'll be a talk on another technology. There are dreams of many other sorts of technologies.
I put a splash of letters up here
because it always seems like there's a new one every year
that gets talked about.
But we don't see evidence of that.
There's no samples.
There's no products.
There's nothing we can build with that.
So let's be mindful of these other devices.
Although I'm still really curious about Micron's new memory B and when we're going to see that, what it is. Okay, so let's go back to the
key question. Does such an emerging memory exist that gives us one-third the cost of DRAM and a
read latency of less than one microsecond in the system? 3D cross point doesn't really cost one third of DRAM, according to my
friend Mark Webb. And the latency is not less than one microsecond. Oh, that's a problem.
How about MRAM? Okay, let's look at MRAM. Okay, MRAM's got the low latency,
but if there was a hell no symbol for cost, I would use that. I mean, it's not even
close. It's really, really expensive. Yeah, middle finger. Okay, yeah, you're right. I didn't think
of that. Good, good ad. Good ad. I like that. Oh, so that's not good. That's not good. Okay,
so our emerging memory candidates, neither of them meet our targets.
And this is one of the lessons, guys,
is that, you know, we keep setting up these targets.
I actually don't like storage class memory as a definition.
If you go back to the original IBM paper,
they defined something which my 30 years experience
tells me will never happen.
So we have to be a little bit careful about that.
And so I want to follow on that comment with, I call it the R word. I don't like to say it,
replacements. And those of you old enough to remember The Graduate, that's what it's taken
from. But there are no replacements. There are no one-to-one replacements in the memory business.
And I could go back in history and prove that to you.
The system always changes around the lowest cost memory, the smallest cell size.
So stop looking for those replacement technologies.
They don't exist.
They're not going to exist.
So we're moving from a NAND DRAM world to one with combinations of memory.
So now let's apply that concept to the technologies I just showed and also talk about what products are being built with those combinations of technologies.
Okay, so on 3D cross point, what can we do? Well, we can combine that on the memory bus itself with other DIMMs of DRAM
and to give us a winning combination
to give us that reduced latency.
So that allows us to achieve very low latency
from a system point of view.
So maybe the raw memory technology itself
didn't give us that 500 nanosecond read latency we wanted,
but how can we architect around DRAM and combine the two, 3D crosspoint plus DRAM to get there?
Didn't do anything for cost, though.
So again, I think that's one of the challenges for the manufacturers here is keep bringing the cost of that technology down.
How about an MRAM?
Well, in MRAM, we had a cost problem. So let's combine it with
NAND. NAND is a lot cheaper than DRAM. How do we combine it? And so by combining MRAM and NAND and
working those two together in a system, we can achieve our goals, both low cost and low latency.
Okay, so what are some products, again, Flash Memory Summit was last month, so, you know, good
time to talk about this. What are some products that we saw at Flash Memory Summit that featured
emerging memory and actual things that were being demoed? Let's put it that way.
Well, certainly, Intel Optane DIMMs are very exciting. And this is putting that technology, 3D crosspoint,
putting it on the DDR bus.
So Intel has done a variant of the DDR4 interface,
and you put both of these types of modules there.
And this came straight from their talk
that they gave at Flash Memory Summit.
And what we can see is, you know, even in a module,
these are very high capacity.
You know, hundreds of gigabytes going to terabytes on a single DIMM.
That's pretty exciting, especially when DRAM scaling is slowing down.
And as has been pointed out, even on DDR5,
we're kind of expecting the highest capacity might be 32 gigabytes of DRAM on a DIMM.
So these are really big numbers.
So nice job there on the capacity.
Now, Intel is focused very heavily
on architecting the whole system to make it useful.
And I think that's been an excellent effort
that we as a memory community owe a debt to Intel
in sponsoring that work.
Part of that has been done
through SNEA in the persistent memory twig. And it's a total effort to look at the architecture,
the software, the hardware, everything necessary to take these new memory technologies and make
them as useful as possible. So this chart here is showing latency, showing the stack up of the drive itself,
the controller, and the software, and how by architecting all the way down to a special
variant of the DDR4 bus, what that looks like. And what they're saying is we got the latency
really close to zero. Unfortunately, it's not less than one microsecond. But it is just a couple microseconds. So, again, good job.
And so what happens here is you have the DRAM as near memory,
and then you use the Optane, the 3D crosspoint as far memory,
and that gets the overall system latency where you need to be.
And then the bottom line is you can get terabytes of a memory pool
that each of your processor clusters can talk to.
Now, what's the only downside here? From my point of view, it's that it's Intel-controlled.
So if you want this technology, you're locked into Intel. If you want a wider supply base,
kind of out of luck. So again, that's one of the disadvantages here. But we'll come back and talk
about other efforts in that space.
Now let's switch to our other technology, which was MRAM.
And what did we see from IBM this year?
We saw from IBM this Flash Core module that combines MRAM and NAND.
And it uses just a small amount of MRAM for a very fast, persistent write cache, and that allows you to do logging, journaling
inside. And the nice thing about this is described by the architect, this is Brent Yardley from IBM
that talked about it, is that it goes with the drive itself. So you no longer have to worry about
backing up things that you might have outside of the drive. It's all internal. So this is a case where a small amount of MRAM with a large amount of NAND
is giving us that low latency and that low cost that we want.
So these are two of the technologies here that we talked about
and where they fall on that latency spectrum.
And they do a pretty good job of filling that in
and giving us new options for storage and memory systems.
There are a few other things, though, that are in there, and I'm not going to spend too much time in this talk discussing that.
But let's talk about what are some of those other memory combinations or other things you might want to know about as we try and fill in this latency
spectrum and give more options to storage system architects. So other stuff that I saw at Flash
Memory Summit this year that you might find interesting. I apologize for the poor graphic.
It looks like a spy graphic, etc. I think it largely was. Toshiba didn't post their presentation this year,
but they announced something called XL NAND.
And I like the graphic a lot
because typically NAND is architected
just to give you the highest density
per square millimeter of bit cells.
But what they're doing now
is they're architecting inside the memory chip itself
as all these small tiles.
And when you have small tiles like that,
you can do a massively parallel access,
reduces the latency.
So that was the nice part about it.
And they're saying it's a 10-time reduced latency versus TLC.
So again, this is not a new technology,
but this is a new design architecture utilizing NAND to deliver lower latency.
So that's also a trend.
It's still not one microsecond.
Sorry about that.
But it does give us something that's new.
How would you use this?
And I think the way you would use it seems to be a market that it's starting to get going,
which is very low latency SSDs attached to compute nodes
that are not part of the storage system.
Maybe they're attached to the networking system itself,
somewhere like that.
So there seems to be this development of new capabilities
or new products that are trying to attack that market
for low latency SSD.
There are some competitors here.
Intel has their Optane SSD, which is 3D Crosspoint itself,
which is out there in the market.
And there's also Samsung Xenand, which is doing something similar,
which is, let's just make NAND faster.
So that's one of the things that I saw that also helps fill in
that latency spectrum that we talked about.
What's an activity that goes outside of that Intel-controlled environment?
Is in JETIC, there's work being done
on something called NVDIMP.
And I know there's multiple versions of NVDIMP.
We seem to like funky nomenclature in this industry.
Too many letters and hard to keep track.
You need a decoder ring. But the idea here is that this now, you re-architect the DDR5 bus
to have a protocol which handles non-deterministic behavior. So non-voltal memory needs non-deterministic behavior.
When you give a write command to an NVM,
it doesn't know necessarily how long it's going to take to write.
So it needs to have that ability to stall the host, etc.
So this is an effort to do that without interrupting
the progress of the DRAM on the same bus.
So we're headed towards a future here where the memory bus is used for two different types of
memory, volatile and non-volatile, with different characteristics. So we need a different protocol.
And Bill Gervasi, who's going to be speaking later today about Nantero's technology. He talks about it being a non-determined credit-based
system. He says time for cleanup, but basically it's holding off the host and having this ability
to do both. So this is a very interesting activity, and it is intended to be an open standard.
It'll compete with the Intel Optane DIMMs, but it's backed by all the major memory companies.
Now, if we could just get a major processor company, AMD, I'm looking at you, to sign up,
then things would be good. So I think that's what's needed here is still
support from the processor company. So this is something to keep track of. So let's sum up.
This is kind of what I put together for you, your cheat sheet, your watch
list, things I learned from Flash Memory Summit this year, things to keep watching. I think the
Everspin MRAM in low latency SSDs is an interesting one. We'll see how IBM does with their Flash Core
module, which is the first implementation of this. I think Intel's 3D crosspoint in the Optane DIMMs for server memory
continue to be a very interesting space. It's easy to make fun of Intel and say you over-promised
and under-delivered, but at the same time, we do expect that hundreds of millions of dollars of
3D crosspoint will be shipped within the next few years. So it is still the largest leading emerging memory effort that's out there.
So I'll rein in my making too much fun of Intel.
The Toshiba XL NAND, I think we're going to see more of this.
How to make NAND faster and give us lower latency.
And then finally, that NVDIMM P.
So that's your watch list. Now let's go back and let's
load Dr. Moda and talk about intuition at the edge. I'm going to switch gears. So this is a
really different topic. It's one I personally find extremely interesting. Now, what are some things that Dr. Moda has taught us?
And this came out of a project IBM called True North.
I think they have a new name for it now.
I don't happen to know what that is.
Your intuition system in your brain operates on 18 watts of power.
You do a lot on 18 watts. Give yourself an A+. All right, way to go, brain. You are super
efficient. If we took a von Neumann architecture, took it to the edge to do everything your
intuition system does, that takes more than 20 gigawatts. other words it ain't working you cannot use a von neumann
architecture it will not be energy efficient enough to do your intuition job at the edge
also if we look at the architecture of the monkey brain what we see is a lot of little compute nodes with a lot of networking.
And it's quite fascinating, actually, to follow through that, the different elements of your
brain and how it's networked.
So we have a highly, highly networked, again, it's what IBM, I think, used to call in the
past mesh computing.
And so it's an old idea brought back again.
How can we implement that?
Now, one of the things that there's very good evidence for,
including excellent academic papers and some industry papers now,
are showing that trained neural nets perform that lightning-fast intuition.
Now, that makes a lot of sense,
because when I showed you the picture of the woman at the very top, you called up immediately a model from your associative memory.
In other words, you store all these models.
If I said the word room to you, you'd immediately call up a model.
What does it look like, et cetera?
So you have all these models.
It makes sense that we would have these trained models stored somewhere in our brain.
And neural nets do that extremely well.
Now, an interesting way that you can use emerging memory is to hold the weight inside a trained
neural net. So again, that's the way we're storing information. And it's a very compact way of
storing. And when I say hold a weight or an analog memory, this is like MLC plus. This is lots and lots of levels. And I'll show you a little bit how we utilize this.
Once we have the weights, then we want to access this array in parallel and then sum the weights
to choose which model is the best to apply to the situation. So that's using what I call analog combined circuitry
to run that through and then find that correct model.
And keep in mind this is an approximate.
Our intuition system gives us an approximate answer.
It doesn't need to be precise.
So the most efficient implementation
is using analog memory,
and that might be six or more levels per cell.
I mean, again, IBM did some work on this that showed five to six levels is particularly good for this.
Here's a nice thing.
You don't need seven see 28 nanometer chips utilizing this technology
outperforming 14 or 16 nanometer FinFET by a lot. Now that's power efficiency, that's prediction
accuracy, it's the architecture. The architecture is much better than using a GPU
and doing lots of matrix math.
So as the architecture changes,
the cost paradigm changes.
Now this makes sense to do at the edge.
And some of those places we'll talk about in a minute
are in vision or image processing.
How many of you have heard of this company called
Mythic IC? Okay, very few. And maybe a few years from now, you'll think back and say,
yeah, I heard about that. They're a startup. I don't know if they'll succeed. But they're doing
something very interesting. And I also like that they have competitors. There's one called Sintient in Southern California. They did a paper at Hot Chips that was a fascinating paper.
Some of their members don't like it, but their CTO went and posted all the slides online and
did a full explanation. So if you're interested in this topic, search Mythic at Hot Chips 2018, and you can find a very, very good
description of their technology and how they implement it, as well as a good walkthrough of
neural nets and why you should care. I'm going to touch on a couple things really quickly. Over in
the right-hand side, this is the memory array itself, just shown as a three-by-three. So each
individual memory cell that's acting as an analog device, think of it as a
variable resistor. And then what we're doing down here is we're actually accessing, we're figuring
out how much current when we access all these things in parallel. So that is what helps us decide
what model to choose. Now it's a blended system because while this is the neural net in the middle, in this single tile,
you can see that there's some digital logic using von Neumann concepts
that are driving it.
So it's not just a non-von Neumann machine.
It's a mixture, a blend of the two.
Then one of the things we mentioned very quickly
when we talked about the brain
is that there's several compute nodes and then you
put them together in this mesh or sea of tiles. So again, this architecture is very close to how
our brain operates and gives us results similar to how our brain operates. And they plan on using it for vision and image recognition
and voice recognition right now.
They're currently using embedded NOR flash because
it exists. It's easy to integrate
and it holds those weights in that
analog mode.
That's my takeaway for it,
what you get out of that.
Another company that presented at Flash Memory Summit was Crossbar,
and what's interesting is Crossbar has kind of gone in this direction
of AI and the application for resistive RAM in that space.
And they also talked about, here's their specific,
describing the flow of how information goes through
and how an RRAM array in a trained neural net can do that.
So that's, I think, an interesting question,
is that is RRAM a technology which is especially suited
for analog memory and neural nets?
Okay, so those are two examples of companies
that are doing interesting things in this space.
So one of my favorite questions to ask,
whether it's somebody else speaking
or when I'm preparing to speak, is why do I care?
And hopefully not too many of you are thinking that right now.
Hopefully I've made my case of why you care.
But let's recap a little bit.
What is emerging memory enabling in artificial intelligence?
And I think in the data center it comes down to this.
How do I get terabytes of memory, a larger memory pool that I can access?
More data, better decisions, better training, better decisions, more precise outcomes.
So how do we get that larger amount of memory to operate on?
On the edge, we talked about having this analog memory in this new architecture
and then giving us this brand new capability
which our brain does
but our compute systems don't do very well
which is that lightning fast intuition
at very low power.
So again I think this is the major takeaway
of how I am stating that emerging memory
accelerates us for AI.
Okay, final wrap-up, takeaway points.
Your brain has two distinct systems.
And I saw more hands go up, so let's see it again.
How many now believe this?
Yeah, I think there's a lot more hands.
Okay, and so good. I think there's a lot more hands. Okay.
And so good. I think I've made that point there. Our reasoning systems that are based on DRAM and AND need reduced latency and cost. And I gave you very specific numbers about what we want to
achieve. Unfortunately, none of our memory candidates do a one-to-one replacement.
And so stop thinking about memory as a one-to-one replacement, but think about combinations.
How do we combine different memory technologies to achieve the system-level goal?
And then in the last section, I wrapped up, the intuition systems really do require that new architecture
to get us to operate at low power.
And we utilize emerging memory in an analog mode
to hold those weights.
And the goal here is to accelerate both the performance
and the adoption of AI systems.
Okay, advertisements.
I get to wrap up finally with advertisements.
So Persistent Memory Summit is coming.
This is a very popular event.
If you haven't signed up before, you can literally register right now.
You can take out your laptop and bring it up and register right now.
And I advise you to do it because it's standing room only.
And I think the progression of the industry can be clearly seen through every year the advance in persistent memory adoption at that.
And then finally, this is my advertisement.
And if you want to follow me, you can track me on LinkedIn or on Twitter.
Okay. Very good. Thank you very much.
Do I get to introduce Amber?
That could be dangerous.
We have time for questions. We have ten minutes.
Good. Thank you.
I don't know the answer to that, Jim.
Jim, would you please give us the answer of why it's being held here?
It's huge.
It's huge.
Jim Pappas finally gave in.
Other questions?
If not, all right, thanks, Dave.
Oh, Tom's raising his hand.
I'll give you one.
He doesn't need a mic.
Probably not.
Can you hear me?
So, Dave, one thing you did mention on some of the neural network processing
is you sort of indirectly got into it with the in-memory processing.
And I'm curious, I, because of your background,
maybe you've been trying to speak a little bit about those two aspects.
Yeah, I think there's two different spaces.
And so, yeah, I'll repeat the question
or I'll rephrase the question from Tom.
You know, he was talking about to widen out a little bit
and talk about the application of emerging memory
to more of the tasks.
I focused on the intuition system at the edge.
There certainly is, and I think NVIDIA does a very good job of explaining it,
so if you want to reference material of why we do training in the data center,
we do training right now today with GPUs and lots and lots of SRAM,
and sometimes we use DRAM.
So that's the training function done in the data center,
edge we talked about.
In the data center for training,
SRAM scaling also has had difficulty
and is getting more and more expensive.
So there is a need to bring more memory into that space,
and in particular, to have it be persistent.
This is, in particular, a task where that standalone STM RAM can go
and offer new capabilities in that training function.
That can be very helpful
as you're doing the MAC operations,
doing that matrix operation,
to have that be non-volatile
and be able to store entries at relatively low power and come
go back and forth so so i think that would be my way my quick way of answering that that there's
application for other emerging memories to other parts of the ai system in the data center for
training i'm trying to take it is are Are there any implications that you see in science for less network traffic, say, over cellular networks, because you're able to do more processing?
Sure.
This is my daughter, by the way.
Thanks, Lucy. So because you're able to do more processing at the edge, you're not shipping back and forth all the data to the server farms.
And perhaps more latency in my great phone here.
Sorry, less latency in the phone and less data getting stored where it could be mined by people that I don't want it to be mined by.
No question.
There are not two levels in an IoT to data center system.
I've seen talks by Oracle before that there's five different levels.
You store and process information of different complexity at each level.
In other words, if I want to do something locally and make a decision,
I don't want to go back to a data center, AWS data center,
hundreds to thousands of miles away.
I want to do it right there.
There is a time and cost penalty
every time you transmit data a long distance.
Sometimes referred to as fog computing,
intelligence at the edge,
how do we do more decision-making locally
and store less and less
or store more complex information
all the way back at the data center.
So I think that drives the embedded adoption of MRAM in particular,
giving us a high-density option of an emerging memory,
and allows integration directly into the SoC.
You can see it in your own phone, though, too,
is that the amount of storage capability you have in your own phone has kind of plateaued over the last couple of years.
Now we're talking about how do we enable sensors to have enough memory, enough intelligence to do the job so that we can even unburden the mobile processor.
So I think it's that multiple layers that we have to think about.
So, Dave, we've been watching persistent memory emerge for a long time in
your crystal ball. What do you see in five years for, uh, what's taking so long? When the hell
are you going to get there? Yes. This is a common question. Usually I'm the moderator and I get to
shout that at the panelists, you know, what's taking so long. Um, it's a very good question. I think, again, the leading effort is the total effort that
Intel has done. I would continue to look for when do Intel processors ship with the right
memory controller integrated that allow access to persistent memory that's still somewhat limited
myself i'm looking at 2021 and beyond which is a little more conservative than some other estimates
but i i think it is taking a while you know the nv dim n portion is still very nichey it's still
quite small i hear those producers complaining that they're not shipping enough of it. So again,
I think it's going to take still years of time. I don't think this is a hockey stick in 2019.
Okay. All right. I think we've run out of questions.
Thanks for listening. If you have questions about the material presented in this podcast,
be sure and join our developers mailing list by sending an email to developers-subscribe at sneha.org.
Here you can ask questions and discuss this topic further with your peers in the storage developer community.
For additional information about the Storage Developer Conference,
visit www.storagedeveloper.org.