Grey Beards on Systems - 38: GreyBeards talk with Rob Peglar, Senior VP and CTO, Symbolic IO

Starting point is 00:00:00 Hey everybody, Ray Lucchese here with Howard Marks here. Welcome to the next episode of Greybeards on Storage, monthly podcast to show where we get Greybeards storage and system bloggers to talk with storage and system vendors to discuss upcoming products, technologies, and trends affecting the data center today. This is our 38th episode of Greybeards on Storage, which was recorded on November 12, 2016. We have with us here today Rob Pegler, Senior Vice President and CTO of Symbolic.io. So, Rob, why don't you tell us a little bit about yourself and your company? Well, thanks, Ray and Howard. I appreciate

Starting point is 00:00:45 you guys having me on the podcast. Appropriately titled, I might add, myself included. You have a beard? I used to. And if I dared to grow one out now, it would most definitely be great. Yeah, I wanted to go with elder cockers on storage, but that would only work in Israel. That's exactly right. Anyway, so where do I begin? I could begin at the beginning, but that would be boring. So approaching my 40th year in the business, dare I say it. Congratulations. Well, thank you, sir.

Starting point is 00:01:17 It's been a long and very interesting ride and still is really super interesting, at least for me. I joined Symbolic.io back in July. We are all of 26-something people, maybe 27 by now. We might have hired our next person. Well, with hyper growth, you know. That's exactly right. That's exactly right. So we're growing by the proverbial leaps and bounds.

Starting point is 00:01:40 Been there since July and leading the charge as CTO and a senior vice president and executive role as well. It's a great amount of fun. It's led and founded by a guy who you probably know, Brian Dignamorello, a former CTO at HP Storage. Did stints at EMC, did stints at NetApp. Brilliant, brilliant guy. I just enjoy the heck out of working for him. And actually is the big reason why Symbolic I.O. started, he has a number of the patents

Starting point is 00:02:09 and devised a lot of the intellectual property that we use, and we have plenty of it, which is a great thing. Now, before that was VP of Advanced Storage at Micron for not quite 18 months, 2015 into 2016. That was a great gig. Learned a lot about the semiconductor business and SSDs and whatnot. And I really enjoyed the role, but the gig at Symbolic.io was just too good to pass up. Doing some really interesting things all around computationally defined storage,

Starting point is 00:02:38 which I'm sure we'll get into. And then before that, I tended to work in decades. I was at Zyotech. You now know them as XIO Storage for the better part of a decade. I was at EMC Isilon as the CTO of Americas for four years or so. That was a wonderful gig. Really enjoyed that. And then before that, back a decade at Storage Tech, and I started my career writing HPC code for control data. So my roots are actually, as a programmer, I wrote code for a

Starting point is 00:03:06 living for the better part of 20 years, mostly in the HPC arena operating system, storage stacks, go figure, IO routines, runtime compilers, you name it, we did it. Mostly in Assembler and C to go along with the grade beard theme here. Yeah, I couldn't see myself. Such things should be in relatively lovable languages. That's exactly correct. So in May, I will enter my 40th year. And as far as birthdays, I'm not sure how I stack up with you guys. Probably pretty close, but I prefer to state them in hexadecimal.

Starting point is 00:03:38 I just had my three Charlie-eth birthday a couple of weeks ago. So there you have it. Okay, well, we can figure this out. That's a big one for you. Congratulations. It is. It is. 3Charlie is a big one. But like I said, you're as young as you feel, and I feel very young, especially doing some of the things we're doing now around computationally defined storage. It's a terrific thing. You didn't mention your role at SNIA. You've been with SNEA now for quite a while too, right? You know, I have. I have, Ray. This is my second, kind of second big go-around with SNEA. I served on the board and as treasurer, actually, for glutton for punishment,

Starting point is 00:04:15 I suppose, but treasurer for the better part of three years in the mid-2000s. SNEA is an absolutely wonderful organization. I must give it a little plug here. Despite its many critics over the years, we're doing great work. Critics? SNEA critics? Oh, yeah. Oh, right, me. Something like that.

Starting point is 00:04:35 That's right. But these days, like I said, this is my second go-around. I was reelected to the board in October, another two-year term. Congratulations and condolences. Thank you, sir, on both accounts. I appreciate that. But I had a hiatus when I was with EMC because one of the SNEA rules is thou shalt not have two board members from the same company. Wade Adams was on the board at the time and still is, actually, chairman emeritus, as well as an active board participant. So I stepped away. And then when I joined Micron in 2015, I was reappointed to the

Starting point is 00:05:10 board and an appointed seat and now withstood election. And here we are with another two-year term. So I enjoy the work. We're doing some great things, especially in the area of non-volatile memory, SSDs and whatnot, and the NVM programming model, which is finding great favor amongst the industry. So we're actually branching out quite a bit, not necessarily away from storage, but there are so many different forms and so many different aspects of storage these days that we at the SNIA now have several new initiatives going, like I said, around non-volatile memory in particular. So that's a very interesting area. Yeah, well, as far as I'm concerned, wherever data persists is storage.

Starting point is 00:05:50 You could say that, and you'd be right, as it turns out. Yeah, and now the difference, of course, between volatile or non-persistent memory and persistent memory, and we now have companies, Micron included, but to be fair, many others as well, that are into that market. And now I think we'll start to have a memory hierarchy in addition to a storage hierarchy that we've had for quite a few years now. Well, everything old is new again. Yeah, it's the graybeards. No, it's a memory hierarchy that existed a long time ago, actually. You bet it did, Ray. Oh, yeah. When I was in college, you know, the college mainframe was a 36065,

Starting point is 00:06:29 and it had 2 megabytes of core and 8 megabytes of LCS, which was slow core. That's exactly right. And, by the way, core was persistent memory. That's just the memory. Yeah. Right. That's exactly right. So, no, I remember those days as well. And doing some undergrad work on a 360 Model 50 in PL1, no less. That was a lot of fun. Yeah, me too. Yeah, I was a chemist, so we used Fortran.

Starting point is 00:06:55 There you go. I used Fortran, PL1, and Assembler in college. Very good. Yeah, so what is old is absolutely new again. You know, the old saying in IT, which is usually true, is that there's no new ideas in IT. There are only old ones resurrected. And persistent memory today, at least, with the various forms, and NVDIMS in particular, kind of the first wave of persistent memory, if you like, are we're utilizing persistent memory technology at Symbolic.io. That's one of the major kind of tech reasons why I became very interested in the company

Starting point is 00:07:30 because they are arguably heads up about the use of persistent memory, how it works, getting it to work in a garden variety server and changing code to do that, restructuring some of the elements inside Linux to deal with persistent memory. And it's been a lot of work and a lot of both hardware and software engineering, but the results are fantastic, at least so far. So what is this computationally defined I.O.? I'm not even sure I said it right. Well, you got two-thirds of it right, Ray. That's pretty good. Computationally defined storage. So, you know, IO is has been well known for for literally decades, especially applications running on top of a Linux kernel and doing the so-called syscalls and POSIX IO read, write, open, close, iOctal for now decades. So what we haven't done, though, as an industry is thought about the nature of

Starting point is 00:08:27 binary, right? How do we treat binary both inside a CPU and in memory? Now, you might think this is a pretty crazy notion. It's like, Rob, well, geez, it's just a bunch of 011s, right? Well, it is. But how do you deal with it in an efficient manner, both inside the CPU and in memory? So actually, our founder, after he left NetApp, he spent quite a bit of time literally just thinking about this. And as you guys well know, we as an industry have been pursuing, you know, kind of faster, bigger, stronger for quite some time. Different forms of media, NAND and now 3D NAND. Persistent memory types like Crosspoint and Resistive and MRAM and you name it. There's a number of efforts going on. And then all sorts of buses and PCI Express, Gen 2, 3, 4, OmniPath, you know, you name it.

Starting point is 00:09:20 There's been a lot of advancements in the technology surrounding the data itself, but very few, if any, arguably, have stopped about to consider the nature of binaries. So that is what computationally defined storage started with. You know, how do we deal with binary in a much more efficient manner? And for us, it became clear that instead of storing a lot of raw data in memory, which obviously the CPU needs data, it's got to be able to read and write memory over its memory channels to get anything done. You know, the registers are only so large and onboard cache is only so large. So it has to be tied to this thing that we know as memory, specifically volatile DRAM, which has been the memory of choice in the volatile world for going on 30 years now. And how do you represent data in memory? Well, do you do it with the usual 32 and 64-bit quantities and ASCII representation and things like that, floating point representation,

Starting point is 00:10:18 that's all great. But it turns out there's a much more efficient way to actually treat binary in memory than its raw form. This goes back to the days of encoding data. And strictly speaking, computational defined storage is both an encoding method, but it's also a compute method. Again, the CPU needs data in registers. not revealing anything NDA here, but instead of treating data and binary as raw in memory, we represent the data in memory in a much smaller, much more efficient way, still zeros and ones, of course, we call them symbolic bit markers. And then when the CPU or the application, I should say, you know, does a read, right, a POSIX read or a read of a file or of a block device or any IO, then we take that bit marker and in the CPU, take that bit marker in memory and then

Starting point is 00:11:12 re-represent the data, typically in a register or maybe in the L1 cache, but it's going to be in a register at some point, the CPU has to operate on it, and then reconstruct the data. And now, once again, you have nice 32 and 64-bit patterns, but they all sprung forth from a relatively small marker and a set of instructors to go along with it. So you may be getting the point why we call it symbolic I.O., because data is represented by symbols. We call them symbolic bit markers. And then we let the CPU do the classic dirty work, if you like, of reassembling those into 32 and 64-bit quantities, which the CPU likes to deal with. Remember, the CPU can't deal with something like a 512-byte sector. Its registers aren't wide enough. It has to do a lot of fetching to get all that data into memory, and then a lot of fetching to get it into registers.

Starting point is 00:12:06 Something like a Huffman encoding of the data? Well, you can think of it that way. You know, Professor Huffman at MIT invented this method back in the 50s. You talk about graybeards on storage. These guys were thinking about that back then. And how do you represent data efficiently, right? And Huffman's technique was pretty interesting because it looked at data in patterns, you know, again, zeros and ones, kind of like DNA almost, right? An ACTG, and there's only two sets of base pairs, and it's either a zero or a one. So it's looking at the encoding of information, right? Taking a symbol and the length of a symbol,

Starting point is 00:12:41 what is the most efficient way to represent that data, that symbol, in bits, because that's the material we have to work with inside of memory is zeros and ones. So, Huffman devised this method where he could take arbitrary data, doesn't matter what type it is, unlike compression and dedupe, encoding works on any kind of binary data. And this goes back even further, Howard and I talked about this, about the work of Claude Shannon, who ironically worked at AT&T and Bell Labs, which by the way, is the home office in Holmdel, New Jersey for Symbolic IO. So there is a nice historic tie to that. But Shannon's paper back in the 30s looked at the optimal methods of transmitting data over a

Starting point is 00:13:23 channel, right? Then there was things, the channel entropy and other things. And I urge the listeners to go back and read that paper, nearly 90 years old now, but still very relevant, used all sorts of- And quite a bit more readable than any paper I've read, written in the past 10 or 15 years. You're right, Howard. It is very readable, although quite long. It reads much more like a textbook, if you like, than actual academic paper. Claude Shannon took the time to write it in such a manner that, as you said, Howard, lots of people could read it. It was very readable. And it proved mathematically about the optimal way to drive symbols across a

Starting point is 00:14:03 channel, either with errors or without, you know, an error-free channel, and how do you structure the symbols to do that? And then Huffman went even further, 20 years later in the 50s, and looked at encoding of symbols. What is the optimal in terms of size, right? Smaller is better. So how do I look at various patterns in the data and construct encoding or markers that we call it and represent that data efficiently? And the net result is really twofold. proprietary methods to go way, way, way beyond. And I mean, way beyond what Huffman originally envisioned in terms of bit markers and what do you do with zeros and ones. And the net result is twofold. So number one, we save space in memory and it turns out we use persistent memory. Number one, because we've figured out it took us over a year to get everything working right.

Starting point is 00:15:06 Because the NVDIM-Ns in the market today, they're made by several manufacturers. That's both the good news and the bad news. And they're not as standardized, as it turns out, as DRAM DIMMs are. So we did a lot of work to get NVDIMs to play nicely with each other. And they do now, using our code. I love that about early technologies like FibreTable. Yeah, isn't that great? Yeah, and Jetica standardized NVDIMS.

Starting point is 00:15:29 You know, there are three standardized types now. But you still need to qualify everything. Oh, yes, you do. And tweak your BIOSes. Oh, yes. So we did all sorts of work at all sorts of levels, including BIOS work, to get these nice little critters to play together in the same sandbox.

Starting point is 00:15:45 So that was a lot of the time spent at Symbolic IO, which Brian actually started back in 2012. So it's about a four-year-old company now. Came out of stealth officially in May of 2016. I joined in July, and we're steamrolling, guns blazing here to get to a full GA product at the end of the year and into early 2017. Oh, excellent. Yeah, so there is a website, symbolicio.com, and it gives you some, not a great level of detail, but enough to get by on what we call the IRIS, or Intensified RAM Intelligence Server, IRIS. From the, frankly, Rob, from the website, it looks like there's necromancy involved. Yeah, I give you my full assurance that there is not. The website's actually going through a

Starting point is 00:16:39 redesign now, so we'll take that under advisement, as it were. But in any event, we decided, you know, the key to symbolic I.O. is software. It's a software company at heart. And yes, we're using a server to instantiate the algorithms. We do need some processing to do the encoding and decoding. Again, we're doing this all in memory, all with CPU instructions. That's actually where my concern comes. Go ahead. Because now, today, I do a read, and the processor fetches a whole cache line. You bet. Right.

Starting point is 00:17:17 From memory and gives me whatever part of that cache line I was actually asking for. Right. And that's some small integer number of clock cycles. Correct. With you, I'm going to issue a read and you're going to pull up a cache line and take an even smaller piece of what got returned and then process it to become the data I asked for. Isn't there a lot of latency involved in that?

Starting point is 00:17:45 Well, sure there is. Nothing is for free in compute, as you guys well know, or storage, by the way. You know, it's always this classic time-space trade-off. And I'll tell you what, if you can reduce the space that's being used in memory, number one, you can get an awful lot more data represented in memory than you could before. And number two, the CPU is the fastest thing that we have, right? It's unquestionably much, much faster in terms of generating data. Remember, the CPU is a bit generator by any other way. Oh, yeah. It's really good at constructing zeros and ones in registers.

Starting point is 00:18:21 It's really good about tearing apart cache lines once the data gets in there so the question becomes okay do you put a bunch of raw data through the cache line or do you put symbolic bit markers through the cache line the cpu goes oh i know what to do with this and believe me the cpus are really good at boolean arithmetic especially these uh you know with multiple cores multiple threads several gigahertz per processor. So you're making basically the same argument that, you know, StoreWise and the other guys who were doing compression made, which is by squeezing more data through the fat pipe, it more than makes up for the extra compute you need. Well, I'll tell you what, there is some truth to that. And, you know,

Starting point is 00:19:05 getting data from a persistent media, for example, over a bus into memory, you know, the classic DMA technique, that's one thing. And doing that on compressed data helps. As you well know, Howard, the best IO is no IO, the best data transfer is no data transfer. Happens in zero time. So I always liked those zero-based records where, you know, from a storage perspective, they were very quick. Right. That's exactly right. So, you know, reading fewer bits through the channels, if you like, from peripherals, and again, that's an old gray-beard word, peripherals, as von Neumann taught us. But it is true because it is quite distant. So the less data you have to send across

Starting point is 00:19:45 the PCI Express bus, you know, a common instantiation of a bus now into memory, that's a good thing. And now what symbolic IO has done is tackled kind of the final mile, if you like, and that's what happens over the memory channels between the CPU and DRAM. And how much data do you end up transferring over that is certainly much faster than transforming data over a bus, no question about it. Each CPU channel now at reasonable memory speeds is 17 gigabytes per second. And then depending on CPU type, there are either three or four of them per socket. So you're into the 50, 60 gigabytes per second per socket. And now with advanced memory speeds going all the way up to 3,200, that number goes up and we approach 90 gigabytes a

Starting point is 00:20:32 second. That's by far the fastest channel in any architecture. And it's not even close if you compare it to a PCI Express bus. So using that resource efficiently now becomes paramount because, A, it's the fastest thing we have out there. And it's the thing that the CPU depends on because in order to do a DMA transfer, the CPU basically has to give up and let somebody else handle the transaction. It's too large, right? The granularities of data are too large. So instead of trying to de-duper compress, you know, a 512 byte sector or a 4K sector, something like that, we're dealing with very small quantities and we let the CPU reconstruct the data, again, using its very, very fast techniques, you know, several instructions per nanosecond. These are pipeline processors, multi-core, multi-threaded, and they can do Boolean math many orders of magnitude

Starting point is 00:21:27 faster than you can suck data through a PCI Express channel into memory. So we take advantage of the fastest resource we have, and not having to move all those raw bits is helpful, and not having to persist all those raw bits is very, very helpful, especially if you consider the size of persistent memory today, relatively small. So my question, I thought you were a storage device. You seem like you're more of a new cache for the CPU. Well, I'll tell you what, it's not a cache, but it is, Ray, computationally defined storage. So is it a storage device? Sure. We store bits. What's behind a persistent memory? Is there another tier behind a storage device? Sure. We store bits. What's behind a persistent memory? Is there another tier behind a persistent memory?

Starting point is 00:22:08 Well, indeed there is. And in fact, if you look at the offering that we call Iris Store, it is a, you could call it a hyper-converged machine if you like. Again, we use a server because a server is a very convenient platform to instantiate this on. And if we need the space or the user wants to use, say, an NVMe SSD directly, we will use it in symbolic IOs code as tier two storage persistent.

Starting point is 00:22:33 And again, we will persist the symbolic bit markers on that storage, which is much more efficient than storage and raw data and read and write those through the bus into memory. And we know we have markers.

Starting point is 00:22:45 And then again, we let the CPU do its thing and put the bits back together again. Well, lo and behold, now the application has its data in memory as it expects. So we use, and it may sound crazy to storage folks, and believe me, I am one of them. And the NVMe SSD is the fastest persistent memory peripheral that we have right now, but it in turn pales in comparison to using a symbolic store module. This is what we call NVDIMM technology, and it's aptly named because it is literally persistent memory, unlike an NVMe SSD, which is a classic storage device sitting on a bus with controllers, with firmware, and it's block oriented. Remember, persistent memory, store modules in particular, is byte accessible. So this is another big, big difference between a storage device, as nice as NVMe SSDs are. I mean, at Micron, we oversaw the production of many different types of those SSDs, and they're

Starting point is 00:23:43 great. They're very fast. They're relatively high capacity now, you know, ones of terabytes in size, as opposed to tens of terabytes, but they will get to tens of terabytes in size here pretty quick, I think, especially with 3D NAND. But again, they're block-oriented devices, and the CPU can't deal with the block. It only deals in byte addressability. It knows what to do addressing memory in terms of bytes. And that's where store modules come in very, very handy because all we're doing is CPU load stores to fetch and, you know, fetch and store the data in persistent memory. And then there's all the benefits, of course, of the actual persistence of the memory, what happens literally when the power goes out. And with store module

Starting point is 00:24:26 technology, of course, we are using store modules, which are part DRAM and part NAND. The DRAM flushes to the NAND side on power loss. Again, we put in a lot of great engineering work to make that happen, you know, very consistently. And then on power up, the reverse happens and we repopulate volatile memory with the marker table and off you go and this all takes you know tens of seconds to do so it's a it's not only a very fast technique but it's a very efficient way to make sure that what you write into memory you know in a CPU actually is persistent so So we really, you know, this is kind of the inflection point of persistent memory. And again, Brian's algorithms and work around the encoding of data, you put those two together. And yes, it is a storage device in some sense. In fact,

Starting point is 00:25:17 Symbolic.io, when it first started four years ago, their original product target was a storage array. And, you know, using the marker technique to store raw binary, you know, explode the raw binary, construct markers out of it, and then store the markers extremely efficiently, much more so than deduper compression ever can. And the math is there to prove it, by the way. And then we were going to go out in the market with a storage array that had this wonderful technique embedded in it. And then we were going to go out on the market with a storage array that had this wonderful technique embedded in it. And then we realized that, well, storage arrays are one thing, but what is the effect? You know, what if? What is the effect on actual compute? Because now, not only the storage space, you know, the persistent space is very efficient. What's

Starting point is 00:26:02 actually in RAM now is very efficient. And we let the CPU deal with taking the markers and reconstructing the user data out of that. So now RAM becomes a very efficient resource instead of this area where lots and lots of applications have to reserve lots and lots of RAM because they're going to suck raw user binary into it. So that leads into the next series of questions I had. Sure. You know, accepting the backend part. Yep. What's the programming model? You know, are you guys introducing a shim into Linux so that you look like memory? Well, here's the deal. The programming model is as you would program today. Now, we considered, you know, things like, well, do we make the user go through and, you know, use the PMMI.io libraries and reconstruct their programs and have to recompile just to use the persistent memory? And the answer is absolutely not. We have shielded the applications from that. And we wrote both user space, a lot of user space

Starting point is 00:27:11 code, you know, daemon processes, and also kernel space code that goes inside of an iris. So you can think of SimCE, capital C, capital E, computational engine, as the operating system, if you like. Now, it turns out that we are based on Linux, as many other operating systems are, but SimCE is an OS by itself, if you like, and that's what the user space applications talk to when they do IO. Again, read, write, open, close, I-octal, they're talking to SimCE. And then we do all that handling on their behalf. In other words, you don't have to change one bit, you know, pun intended, of an application. So you present as a file system that way?

Starting point is 00:28:01 Well, you can present your application, can use file systems as they would normally. We actually present what we call a LEM. You could think of it like a LUN. It's an area of persistent storage. The operating system presents it as a volume, just like it would anything else in any other Linux system. And you can use that as a block device, again, but you don't have to have a special block IO driver.

Starting point is 00:28:25 SimCE is handling all of that. So if you do a block read or a block write, you think you're talking to a block device and SimCE is doing all the bitmarking handling for you. So that's a wonderful thing, but you just get a hellaciously fast block device because you're doing it at CPU speed. And then the other aspect of it is we have an infused hypervisor that we offer in Iris Vault and Iris Store, as it turns out. Iris Compute doesn't have the infused hypervisor. It's more of a one used the hypervisor. And they had a garden variety Microsoft Windows 2012 environment running SQL Server 2012. They didn't have to change a lick of code. They ran the application. And you're running like huge old RAM disk. Well, it's not a RAM disk per se.

Starting point is 00:29:19 Again, you're doing block IO. We'll accept your block IO calls, and we will use CPU load store instructions into symbolic IO store modules to handle and persist your data. It is literally that simple. Rob, it is a RAM disk. There's just a whole lot going on between the RAM and the disk. Well, you're all right. It's not the stupid device implied, and you have persistence. That's right. You have persistence. So it's an the stupid device implied and you have persistence. That's right.

Starting point is 00:29:47 So it's an NVRAM disk. Well, it is a technique where users can do I.O. in and out of persistent memory without coding change. So you mentioned Iris Vault, Iris Store, and Iris Compute. Those three separate packages? Those are three separate instantiations, Ray. Again, we chose to instantiate in a classic kind of 2U server platform. That's what Iris is. You know, two socket server, a certain amount of DRAM, and then, you know, as much persistent

Starting point is 00:30:18 memory as we can cram into the thing, which is great. You know, we're up to several hundred gigabytes of persistent memory per iris. And iris compute is kind of the bare metal approach, again, without the hypervisor part, without techniques like symbolic IO blink. And blink is very interesting. You guys know what data snapshots are. They've been around for a long time and copy on write and redirect on write and all the various methods that snapshots get done. Symbolic IO blink is actually not only taking snapshots or what you think is a snapshot, but it's doing that to the marker table. And it's also snapping the entire configuration of the machine. We call it a blink. Again, that's a very appropriate term. We take a blink in the entire,

Starting point is 00:31:01 so all the IP addresses and DNS settings and active directory connections and LDAP and all of the configuration information, which by the way, is the bugaboo of sysadmins trying to do restores. You know, the data is one thing, restoring the data is one thing, but restoring whole configurations takes a lot of time. Well, with a blink, you get to capture all of that. Literally, you can write it to what we call a blink card, which is a persistent storage inside of Iris. Not memory, but more like a regular SSD. Or you can send it to an external device. You can restore the blink to another Iris.

Starting point is 00:31:35 And literally, within a few minutes, that Iris now takes on the entire personality of the Iris contained in the blink. And the blinks are quite small. They might be tens of gigabytes in size, representing the entire state of the iris contained in the blink. And the blinks are quite small. They might be tens of gigabytes in size, representing the entire state of the machine. So you also get that with Vault, and you don't get that with Compute. So again, Compute is the bare metal approach using a symbolic IO iris server with store modules included, no SSDs, no hard drives of any form. And if you're willing to write code, and some people might be willing to do that, to use, for example, the pmem.io libraries, you know, God bless you, have a good time with that. You can do that. And some people will choose to do that.

Starting point is 00:32:17 But again, we think that much more, many more people can use Vault. Well, there are exciting things coming from there. That's right. So we think many more people will use Vault. And then some users that want to combine storage, you know, classic block-level storage, SSDs in particular, and VME in particular, with the symbolic IOSIMCE functionality will choose to add disk to their server. And that's what we call Iris Store. So you might envision, you know, 24, two and a half inch U.2 SSDs, four of which can be NVMe capable. And then you can also use NVMe half-linked cards inside the Iris. There's a number of slots for that. So you might be able to populate the Iris with 400, maybe even 500 terabytes raw of block storage, in addition to the persistent memory storage, and then use the Symbolic.io software to store highly efficiently. We think you might get up

Starting point is 00:33:14 to two petabytes and two U, and that's just for starters at today's densities. And again, you think about, you know, I've always had kind of as a goal as a storage guy to get a petabyte per U density in there. And we think we have it now. It's symbolic IO. Again, a lot more work to do. I don't know. Always is a long time, Robert. Always is a long time.

Starting point is 00:33:35 It wasn't that long ago you and I were talking about petabyte customers as being unicorns. That's exactly right. And in the year 2000, you know, 16 years ago, the petabyte customer was a rarity. Now they are quite garden variety, as we all know. And now it's the exabyte customer. Yeah, well, now you see more than a petabyte per use. So that's a really interesting thing. Again, persisting, you know, literally, and the classic, you know, analogy, Library of Congress, right? How many Library of Congresses can you store in one rack? That sort of thing. So it is a very interesting approach. And I think it's Libraries of Congress, but that's a whole other thing.

Starting point is 00:34:24 There you go. Is the symbol table different depending on the data that's being encoded? Well, the table itself really isn't. And again, I don't want to go too far into the weeds here. But again, we're representing all the patterns, regardless of file system, regardless of application that we see. And by the way, as time passes, you know, the iris has been up for days, weeks, months, years. We constantly reevaluate the table because as one of the encoding techniques, we want to represent the longest or the most frequent pattern with the shortest possible bitmarker or the most efficient bitmarker, both in terms of size and computational speed. So we constantly look through that.

Starting point is 00:35:06 And as a factor of that, you might actually see what we call amplification grow over time. And then new data is ingested. Either the application is generating it and doing POSIX writes, they're storing data, or data might be coming in off a network read, and it has to land in memory, and then it gets persisted. So new data is always coming in all the time. So we're constantly looking at the table, making it as efficient as possible. All the while, it's being persisted in persistent memory. And again, we can do an awful lot. The math that Brian has worked out and the patented algorithms are really a sight to behold.

Starting point is 00:35:50 And that's really the net effect of the machine is not only very dense storage, but also now very dense areas in memory in DRAM, which, by the way, saves you a ton of effort when you're thinking about buffer caches and applications and buffer caches and systems. We've had those for decades because we had the devices on the other end were so slow that we had the cache data in RAM because RAM was the only thing that the CPU could actually read in a byte addressable manner. But now, if we persist the entire marker table in persistent memory, the CPU always has as its spec and call the representation of the user data. That has been seen in the past. The new user data, of course, will construct a new marker. We'll add it to the table.

Starting point is 00:36:30 If one already isn't present, and again, as more time goes by, chances are we've seen that pattern before. So again, without going too much into the weeds, it's a really efficient way to do things. And it saves a ton of RAM space. We've seen customers, for example, run a given application, for example, a SQL Server database using four times less or fewer cores and 20 times fewer RAM. So what they could do in 80 gigabytes of RAM because the encoding. Run that by me again. 20 times less RAM and four times less cores to do the same application. For the same application, only much, much faster because now we're not bound by the I.O. speeds of any device, any rotating media, any SSD media.

Starting point is 00:37:21 We are always inside the CPU. Now, that's a high watermark. We might go higher than that someday., that's a high watermark. You know, we might go higher than that someday. You might go lower than that someday. But we've seen that and an actual customer and what they allocated. And they actually, believe it or not, they went down to one core and one gig of RAM. Now, the application took three times as long as it did with four cores and four gigs of RAM because there was swapping going on inside the virtual machine. But you knew that. Again, the effect of what you store in DRAM is really, really important. And that's the genius behind the work.

Starting point is 00:37:54 Yeah, no, the shocking part to me is that you're not trading additional CPU utilization for all of this. It's the reduction in CPU utilization. That's right. That's exactly right. That shocks me. Yep. It is amazing to see when you have, you know, and again, when we go fully GA, the world will see this. You see an Iris performing at full blast, and its CPUs might be going around at 7% or 8%.

Starting point is 00:38:26 But, you know, the CPU utilization is going to be anytime you do a write from cache to memory or a read from memory to cache. That's where the symbolic transformation occurs. Once it's in cache, it's in bit form and all that. Sure. I mean, the user has to deal with, but if you read a bit marker into a cache line along with an instructor's and the CPU goes, oh, I know what to do with that marker. I'll do certain Boolean operations on it. And lo and behold, now in a register, I have what the user really wanted. So again, it's a very, very interesting technique and one that was sorely needed, quite frankly. Again, we can chase faster media, we can chase faster buses, but our approach, as radical as it sounds, and it kind of is, is

Starting point is 00:39:11 think about the way you use binary, not just on a classic story. You guys are really making my life difficult, Rob. Well, I'll tell you what, I think that's great. It took me two years to figure out how to write a benchmark to handle compression and deduplication. And now you guys are bringing me... Something different. A third way to do data reduction that I've got to worry about. Well, I'll tell you what. You actually don't have to worry too terribly much. Again, we're application consistent.

Starting point is 00:39:42 You don't have to recode anything. You just enjoy the benefits of computationally defined storage. I think his benchmark is trying to control for compression or control for deduplication. And so he's going to have to control for the symbolic expansion factor or compression factor. I have to create synthetic data that reduces like real data. And I just figured out how to do that for compression and deduplication. And now I got to reverse engineer what you do so I can generate data that reduces like real data for what you do. Uh-huh. Well, good luck with that, Howard. I wish you well, my friend.

Starting point is 00:40:15 Yeah. Thanks. That was my point. I'll tell you what, us graybeards, Howard, we can learn new things. So it's all good. Wow. Yeah. Gosh, we haven't talked about NVME or Fabric or we've talked a little bit about the NVDIMS and stuff like that. You've been using NVDIMS that are available today? We are. They're available from several, actually several different manufacturers and we have used them all. Believe me, if you look at our lab, we've got every persistent storage NVDIM known on the planet on all different shades and flavors and colors and you name it. So we've tried it. And the stuff that's coming from like Micron and others, Crosspoint, Reram, all the PRAM, all that stuff, that's also

Starting point is 00:40:57 potentially pluggable into this environment. It's just a question of when those are available, right? I think that's exactly right. And, you know, the fact that it's persistent, which is great for us, because now we have finally a persistent media that is byte addressable. And again, our store modules, NVDIMS, are the first instantiation of that. You know, when the platforms of the day start to embed other forms of persistent memory, that, again, key to us is byte addressable, right? Block addressable is one thing, but we've been living in that world for a long time and it's all good. But byte addressability is key for us. So when we get to the point where we have crosspoint down on the board and byte addressable and DDR4 or who knows, 5, 6, whatever memory channels going to it. If we have other forms of memory with byte addressable memory channels, sure.

Starting point is 00:41:51 And I'm sure Brian would tell you that too. Bring it all on. We'll use it. Test it. Even then we get into the, and we need two-tier memory because I want the code to run out of the DRAM NVDIMM, but I want the data tables to be stored in the cross point. You are exactly correct, my friend Howard. And now we have come full circle back to the memory hierarchy. Yeah. Okay. Before we move on to other interesting things. So you've got these three

Starting point is 00:42:18 appliances. Is there HA between appliances? Does it scale out? Or are you crawling before you walk and saying, look, this is really cool, but it just does what it does and resiliency is an application level problem for now. yet, is have multiple irises be talking to each other. We had a name for this before. We're in the midst of renaming it and doing some new marketing. But using blink technology and being able to replicate blinks, we envision forms of HA around iris. Again, not fully fleshed out yet, and I don't want to reveal NDA roadmap details, but the answer is yes, we will have the ability to put several irises together and use them. I wouldn't call it a clustered file system because it's not, I wouldn't call it a bunch of clustered block devices because it's not, you know, it's a new entity. So rather than trying to describe it in old terms, I'll just leave it for what it is. And then, again, the use of Blink is very, very important, and the ability for irises live to replicate Blinks between each

Starting point is 00:43:32 other in a mesh fashion, if you like, and keeping each other's bitmarker tables, again, which is the heart of what we do, keeping them all up to date. So there's some interesting technologies going around. You can call it replication if you like. And in fact, it is replicating blinks between one machine and another. So we've got work to do on that. But we envision folks both using the infused hypervisor version, constructing virtual machines, and homing those on iris in addition to, or maybe additive to, using iris as a bare metal approach as well. And then, of course, the whole world of IRIS store, which may be able to replace,

Starting point is 00:44:11 you know, many, many rack units of, you know, garden variety block storage much more efficiently as well. So there's a number of things that you'll see how are people using IRIS for. In one sense, IRIS is really a general purpose server. We don't want to pigeonhole it into hyperconverged, although it does hyperconverged workloads and hyperconverged methodology quite nicely with the infused hypervisor. But we don't want to pigeonhole it into that because it's very useful, for example, in certain types of HPC workloads or certain types of database workloads. We have multiple POCs going as we speak. There are more to come, upcoming here in November and

Starting point is 00:44:52 December. And then again, when first of the year hits, you'll see a lot more exposed from Symbolic IO in terms of what Iris is doing in the customer base and what Iris could do both now and in the future. Yeah, I hesitate to say this, but we're running out of time here, gents. This has been great, Rob. Is there any final question from you, Howard, that you might want to ask Rob? Oh, no. I'm at the, I get it well enough to understand that it might actually be possible. And now we'll only believe it works when I get my hot little hands on it

Starting point is 00:45:25 stage. So like everybody else, I got to wait for it to go GA. Sounds extremely familiar to me from somewhere in my past. Rob, is there anything you'd like to say to our listener audience? Well, I,

Starting point is 00:45:37 number one, if you're listening to Greybeards on storage, you're arguably listening to the best podcast going. Thank you, Rob. At least for storage, kind of a mutual back slapping society here because i'm a great beard too if i dare to well thank you hard but no i mean the the funny thing is after you know again approaching 40 years uh we've we've seen a lot we've done a lot we've coded a lot but yet there are still just

Starting point is 00:46:03 absolutely fascinating frontiers in this crazy world that we call compute and storage. I'm blessed enough to be right smack in the middle of one, looking at new ways to treat binary data. And I think there's going to be a lot of progress in that field as well going forward, because you can't argue with the math. That's the interesting thing. So, but I love to see all the other the other. My problem is you've got me thinking about, you know, where this is 20 years from now when all the math's just built directly into the process. Yeah, there you go. And then when we're really gray beard and we're sitting back and enjoying all those younger people, enjoying the fruits of that labor. Right. That'll be great. But I love to see all the new media and the new buses. That's all great, you know, pursuing all of that. But I think we're on to something here at Symbolic IO and treating binary data in a different way, in a much more efficient way. So anyway,

Starting point is 00:46:56 to all the folks on the podcast, you know, stay tuned. Keep thinking about more efficient ways to do storage and compute, especially compute, because at the end of the day, everybody knows that, you know, we can make all sorts of interesting storage devices. But if you can't get the data in and out that the CPU can use it efficiently, that's something. That's exactly right. So. All right, gents. Well, this has been great. It's been a pleasure to have Rob with us here on our podcast. Thank you, Ray. Next month, we'll talk to another startup storage technology person. Any questions you want us to ask, please let us know.

Starting point is 00:47:32 That's it for now. Bye, Howard. Bye, Ray. Until next time. Thanks again, Rob. You bet. Thank you.

Grey Beards on Systems - 38: GreyBeards talk with Rob Peglar, Senior VP and CTO, Symbolic IO

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.