Grey Beards on Systems - 097: GreyBeards talk open source S3 object store with AB Periasamy, CEO MinIO

Starting point is 00:00:00 Hey everybody, Ray Lucchese here with Keith Townsend. Welcome to the next episode of the Greybeards on Storage podcast, a show where we get Greybeards storage bloggers to talk with system vendors to discuss upcoming products, technologies and trends affecting the data center today. This Great Beards on Storage episode was recorded on February 3rd, 2020. We have with us here today, A.B. Periasani, CEO of MinIO. So A.B., why don't you tell us a little bit about yourself and what's new with MinIO?

Starting point is 00:00:37 Hello, everyone. This is A.B., A.B. Periasani, and I'm one of the co-founders and CEO of MinIO. And MinIO is an object storage. It's exactly like Amazon S3, API compatible with Amazon S3. And the primary purpose behind MinIO is the world is going to produce an amazing amount of data, and the bulk of the data is going to be outside of Amazon AWS. And if you are inside AWS, you have Amazon S3.

Starting point is 00:01:09 You know what, if you are outside AWS. And that's where Minivo was born. And it's an object storage that you can run on any bare metal VM containers, pretty much anywhere. It's really easy to do. And it gives you a complete end-to-end object storage stack. We were at Storage Field Day, or at least I was at Storage Field Day 19 here a week or so ago, and we talked with MinIO there, and I was pretty impressed with the 100% open source version. This is actually all open source, right? Yeah, it is 100% open source.

Starting point is 00:01:40 So I would like to call it free software. You know the difference, right? Yeah, there is difference there. So S3 compatible. I mean, a lot of companies say they're S3 compatible. I mean, how do you verify something like that? The easiest way to verify is to run your applications, right? Because end of the day, standards and compatibility, it doesn't matter on paper. What is it does it work and like often we have heard from a few of the product guys outside of our company that why don't you talk about like all these great teachers you have and we often tell them that it doesn't matter right there is only one thing that

Starting point is 00:02:19 matters that it just works and you would hear that very commonly across our community and you would know that why that it just works it also means that it is simple right end of the day if it doesn't work everything else is up for debate right now the s3 compatibility itself amazon published a rest spec but there are many interpretations of it and when interpretations theirience their own SDKs interpret the S3 spec differently and different versions of the same say Java SDK would have different Implementations right and Amazon service is very forgiving because they have made incremental changes continuously over many years and Have accommodated all those changes. how do you get this right the only way you can get this right is you do only one thing and we stuck with s3 api and we always told no to any other like swift api or anything else and no file api we will do one thing and one thing really well and this is

Starting point is 00:03:18 where we even went to the extent of writing the client and sds because if you see no other object storage actually has clients and SDKs because they thought it'll just use AWS SDK. Minaj actually works with the AWS SDKs as well. Really? Isn't that kind of unusual to actually work with the interface providers SDK? I mean, yeah, that's how we test our code and we test our SDKs with Amazon server. And here the important detail is this, right? Amazon, because of the variations of their implementations, the only way you can get it right is you have to have an amazing amount of installed base. And that's where the open source comes in to help us. If we break something, we actually will hear within 30 minutes from our

Starting point is 00:04:05 community, new GitHub issues being raised, and we will go fix it. Actually, we have seen multiple occasions that Amazon, when they roll out new updates, they break tools. All software has bugs, right? And there are a whole bunch of applications using our SDKs to talk to Amazon, and Amazon would actually go ahead and fix those, their versions too, to be compatible with our SDKs to talk to Amazon. And Amazon would actually go ahead and fix those, their versions too, to be compatible with our SDK. So overall, the best way to get this is to get the compatibility is show that it's working. Wait, wait, wait. AB, so you've got an SDK, Amazon's got an SDK, and there are people using your SDK to talk to Amazon S3 storage. Yeah.

Starting point is 00:04:46 And if Amazon S3 storage rolls out a function, functional enhancement that breaks your SDK. Yeah. They fix it. You would be surprised, right? Actually, I was surprised too. That's, you know, that actually I have tremendous respect for Amazon. Like you would actually hear this commonly, right?

Starting point is 00:05:03 Not just that, like, it's a surprise of whether Amazon will fix it, even in terms of they being nice to us. I've heard like their VP R&D saying that he personally likes Minayo, right? We have seen like on multiple occasions how nice they were. There were times where community would ask that what if Amazon comes after us from s3 api uh any in yeah like because of the google oracle case right they have actually always volunteered their their guys would come in and say that they are actually willing to work with the community and it they it shows how bold and how confident they are and also they have been nice to us and the community overall when you reason with them just simply the company's aws is run by engineers if you

Starting point is 00:05:46 show them look there is a bug in their system and here is they don't question about minivo sdk is their competitor they don't even see us as a competition maybe because we are small or we are in a different zone but they have been pretty nice and very reasonable not only about us right delight the customer that's how they see it. It's more than pleasing us or anything. They have always been right to their community. And that's how issues get fixed and we move forward. I think from their perspective,

Starting point is 00:06:16 any data in S3 is good data from their perspective, whether it's their S3 or not, you know? Because it's more applications using the interface. It's more users using that storage. And, you know, over time, if they want to use the cloud, they can use the cloud. If not, they can use MinIO. It works for them. Yeah.

Starting point is 00:06:35 Like their Snowball, for example, like our community actually likes our tools to deal with Snowball because you want to be often able to move data between two object storage systems. Their tools won't be able to do that. And they actually, we have seen on a number of occasions that they have been very nice to the community. So Snowball, is there, you know, it's like a, it's effectively an external drive or something like that that you can use to load up data into S3?

Starting point is 00:07:02 Like if you want to move data into S3 or move data, right? And the problem is that your internet bandwidth is limited. The best bandwidth you can get, don't ask about latency. The best bandwidth is FedEx or UPS. They actually load up all your data in a box. It's really a portable to you server type, right? Like actually it's not to you. It's a slightly different form factor. It's really a portable to you server type, right? Like actually it's not to you. It's a slightly different form factor. It's a tower model, right? You just, they ship

Starting point is 00:07:29 the box to you and then it actually looks like an S3 compatible object storage with some variations. So we all have to do is make our tool recognize those variations. So it looks like a proper object storage. And now you can move the data in and out of the box so essentially it's a very portable way to move data from from edge where the data is getting generated or your price data center yeah i've done a few snowballs or at least i've supported the ingesting of snowballs so basically it's it is basically either a, I think there's a USB interface and there's a 10 gigabit interface on the device itself. I'm correct with that, right, AB? Yeah, exactly.

Starting point is 00:08:11 Yeah. And then it's just, it's literally an external drive. You copy the data to the external drive. But if you want to use it in a way that you would naturally use your application, it oddly enough doesn't come with an S3 interface. Yeah, that's well said. It's a giant external drive, the size of the server. Yeah, I've done some transfers, bulk transfers in the past to Azure and Amazon, but it was pre-Snowball. So I guess I'm dating myself. So there's plenty of functionality here behind S3. I mean, A, it's multi-zone replication. B, there's encryption. C, it handles a gazillion buckets and, I don't know, a quintillion objects per bucket.

Starting point is 00:08:55 I mean, does MinIO handle all this stuff? Yep. You know, it's actually, when you say it like that, it sounds monumental, right? It is monumental. But if you go into the data center, if you see how they actually are, it's not only Amazon, it's Google, Facebook, everywhere you look around, how do they really build giant systems? You will find one common pattern between Minio and them.

Starting point is 00:09:22 They actually don't build giant systems. They build many smaller systems where each system, when you say small, it's not necessarily small, but it's actually the size of a failure domain that you can handle and tolerate. So essentially, they are building lots of smaller systems endlessly. So your thousandth cluster deployment is just no different from your very first cluster. It's just many separate clusters. You turn your scalability into a provisioning problem.

Starting point is 00:09:51 Provisioning is much better understood and easy to handle than making your algorithms and data structures handle trillions and trillions of changes at a given time. That's the old school enterprise thinking. So you mentioned the fact that you can handle clusters and failure domains and stuff like, is a failure domain something like a rack? Would that be a failure domain? So it depends between customer to customer. We have seen in the financial space, they want to keep failure domain across four to six racks so they can do cross rack a ratio so a total rack can go down in some cases now I'm seeing they actually want a ratio code across data centers too so a total rack data center can go down but in general we find that a rack is actually a good unit so you have a switch at the top of the rack and many nodes can fail inside

Starting point is 00:10:45 the rack. You choose the parity, erase your code parity accordingly. But that way, whenever you want to scale, you no longer look at like adding one drive or one JBOD unit. Those days are gone. You always add in sets of servers. Those are simply like failure domain. So can we combine two concepts? One is the AWS API, and then this idea of failure domains. One of a common challenge that I'll see is that people want the power of S3 storage on-prem from a replication perspective, because S3 is a great protocol for data replication in general. Are you guys seeing use cases where they're using MinIO to basically expand or create a different failure domain,

Starting point is 00:11:36 but that failure domain is basically on-prem and out of the AWS data center? So that's what I would, I actually assumed that, but it didn't exactly happen like that. What I initially thought was maybe they, like there are customers who would keep one leg on public cloud, one leg on private cloud, and mirror between both.

Starting point is 00:11:57 So they always have everything safe, but in reality, right, people actually are fairly confident about AWS not losing your data. So once you go to public cloud, you don't really think about making a copy of AWS, like literally backing up a DR for AWS S3 is on-prem. That sounds ridiculous for many of the users. And I would actually agree with that, right?

Starting point is 00:12:20 Because AWS S3 is pretty rock solid. The reason that what they actually come to MinIO is quite different. There are two types I see. One, they don't come out. They actually like S3. They stay there, but then they actually put MinIO S3 caching gateway. They actually have an edge caching storage, the same server. It's actually the same product, instead of using built-in erasure code, you can point to a remote storage. And the remote storage could be S3 or Azure Blob. Wait, wait, wait, wait, wait, wait.

Starting point is 00:12:54 So you've got, not only do you have an open source S3 storage solution, but now you also have a caching solution for not only S3 storage, but Azure Blob? Yeah, actually, yeah, it's not. There are actually like two dozen adapters out there floating around. I will come to this and we can talk more about the gateway. But the other, like there are, Minivo can act like a gateway to non-S3 compatible storage or even just Amazon S3 itself. The primary reason people use us in front of the public cloud is act as a cash, like an edge cash at the private cloud. That way,

Starting point is 00:13:34 bulk of your dollars spent on AWS, like for these guys, when they're outside, when the application is outside the public cloud, they actually spend on bandwidth, not storage. And in between, acting like a caching storage at the edge makes it faster and makes it cheaper. That's one use case, right? The other use case I'm finding is that when they decide to come to Minivo, they actually come out of Amazon S3. They see that beyond like a few petabytes, it becomes expensive. It's like staying in a hotel for a long time. And that's when they take Minivo, go to a colo, or go to some leased data center. And they want to keep control over the data and the software stack. You're talking a few petabytes?

Starting point is 00:14:19 It's like it's nothing. Actually, that's the case. That's what we are finding. I was surprised. I thought maybe like when they grow 10 petabyte plus, right? I was surprised to see startups coming out of Amazon to Minivo. I'm like, why would you do that? For you, time is the most precious commodity you want to focus focus on productivity right but they always say that some for a startup spending even a 50k 100k a month a quarter like for them more expensive than a big company

Starting point is 00:14:53 so this is pretty simple concept that you know i ran into a bit in my desire to support kind of like pharmaceuticals and people researching extremely large data sets. When you're sharing data across organizations and you want the same data set, ingesting that data into a new platform is the largest expense. I mean, I've spent millions of dollars of companies' money, you know, doing connectivity to either I2 or to Tengebit, DirectConnect, et cetera, option is this cost of ingesting these data across platforms that are

Starting point is 00:15:49 best suited for the type of analytics work you want to go against. So a couple of petabytes of data is literally nothing because we're transferring that kind of data via snowball. The problem is constantly ingesting and sharing that data across organizations. Yeah. So, yeah, I always thought that Amazon S3, you know, getting the data in was free, but getting the data out was costly. But then even getting the data in, there's a bandwidth cost there, I guess, right? Well, there's a bandwidth to provision it, blah, blah, blah. Two weeks later, it's there. And the fun part there is, right, like we would think that, like, how can you possibly ingest all that data into the cloud?

Starting point is 00:16:39 And what you find is that IT, in most cases, is even slower because for them to buy a hardware appliance, set it up, and going through the process, most of the times, these are applications team or shadow IT team. They actually don't want to ask someone and go through the procurement and all the whole process. They just go to public cloud for productivity reasons. And that's a bigger problem than even ingest. But it always starts small. It's not like one day they wake up and say, I want to ingest now two petabytes to public cloud. They start with 100 terabytes or so, and then it starts to snowball, literally.

Starting point is 00:17:16 Well, it literally doesn't even start with 100 terabytes. It could be just a terabyte of data. And it just, before you know it, the work gets around that hey you know what I was able to answer this question way quicker than I could going through IT and people it becomes you know the the the next factor up of file sharing yeah and the other thing you mentioned that the at the storage field day was that people are using MinIO in Amazon it's's bizarre. It is bizarre, and I still don't recommend it, right? And people do it. When you are open source, you

Starting point is 00:17:49 can't control. All you can tell is advise them that why they're losing money doing that, putting us on top of EBS. EBS is three times more expensive than S3. Some of them are just, like, it came with GitLab, it came with one of their application

Starting point is 00:18:07 stack that bundled or the other one they say it's multi-cloud portability. But I think the bigger reason is convenience beats security cost, everything else in the end. I still try to educate them that go use S3. Yeah, you can't fix a process issue. If I'm conditioned to manage block storage in my operations to set up a block and this application needs object, it's literally easier or less expensive from an operations perspective to front-end it with MinIO than it is to change my operating model to support supporting object. So, yeah, back to the functionality perspective.

Starting point is 00:18:49 A, does MinIO support, you know, object encryption? Yes. In fact, like we had to support even stronger encryption than what Amazon published because most of our deployments are growing in the enterprise space and the customers tend to, the early adopters came from the financial institutions and for them, they have to be paranoid about security and encryption. And they came to Main.io because of those reasons,

Starting point is 00:19:16 the private cloud, right? And we actually not only did the same Amazon, like Amazon's AES-256 class algorithm algorithm we also added chacha 20 poly 1305 which also gave a tamper proofing capability and we actually went full scale on encryption and the key here is right whenever any security mechanism you put in place if it actually causes friction they are not going to adopt it. Anytime it is harder, slower, security again is a second thought when it becomes harder. That's where we went to great lengths to improve the speed. You can do inline encryption

Starting point is 00:19:58 at high speed and still choke a 100 gigabit Ethernet switch. And without any SSL accelerators or like some add-on GPUs, standard Xeon CPUs are just what you need to do high-speed in-line encryption. And every object is independently encrypted with a separate key. And you support like key-mip things and that sort of thing? Actually, there, like keymip, like we support Vault and then the standard ones.

Starting point is 00:20:29 You could use even Amazon keymip to bootstrap the system. But here's the thing, right? We didn't stop just supporting keymip. The problem is that every single object coming in, if we have to make a request to a a vault or something for kms for each and every object to get a new key the the at the speeds at which we are taking in objects no key kms can handle the load we actually wrote our own uh kms a bridge it actually acts like a stateless distributed key management service, and it

Starting point is 00:21:06 can handle very high load. But that system simply bootstraps itself with an external corporate-controlled or public cloud. We're talking about performance, too, but before we get to that, Piles, replication. So you could replicate objects across failure domains? So there is erasure code within the failure domain, and then there is replication across failure domain. There are two types of replication. The one that everybody is using today is basically a tool called MC Mirror. And Mirror is like R-Sync,

Starting point is 00:21:47 but it's built for large scale and continuous. So you just start it, probably most of the time they run it like a container and it can mirror between object storage and file system or object storage and object storage. It even works between two file systems and it actually subscribes to change notification. In MinIO, we support a Lambda compute notification. Like in file system, they have something like inodify.

Starting point is 00:22:10 It looks for these changes and that's how it knows to keep two or more sites in sync. That's what everybody is using today. We are actually pretty close to releasing a new product that's also open source. It's called Radio. And Radio can keep multiple sites with synchronous replication or synchronous erasure code across data centers. Wait, wait. Synchronous? Synchronous means that I don't even copy an object.

Starting point is 00:22:42 I can't tell that the object has been stored until it's actually stored in both locations, right? Correct. That is exactly right. Radio? Yeah. Is that what it's called? It is called radio. We look for a lot of names, but there is an explanation written about radio on the site.

Starting point is 00:23:02 It's called redundant array of independent distributed object store, but that's just for fun. It is, you need a simple name. And also radio feels like, like how the, all the receivers receive at the same time. So it has a nice theme to it. Maybe one of the things that I can't quite wrap my hands around, head around is this concept of compatibility and

Starting point is 00:23:26 people using min IO inside of AWS. I get that. But you keyed on an idea or concept that I can't see how you guys can get around, which is AWS alerting. So when I can, when I write an object to S3, I can trigger Lambda. I can create an event and trigger Lambda. How are customers getting kind of around that inability to do traditional AWS triggers to trigger stuff like Lambda, et cetera? Yeah, good point here. So let's talk about details, right? So first, starting, I'll go into this Lambda specifically, but starting from the S3 API itself, most people don't know, we were the first to implement S3 v4 API, and everybody else copied our code or copied our product itself. And before us, there was like, occasionally they will talk about S3 API. It's really a partial

Starting point is 00:24:27 S3 V2. They will have some get object, put object. Mostly people supported Swift. We saw that Amazon will convince the rest of the industry to adopt S3. So we knew that that's the API to get it right. But we went to great length that things like from Lambda compute to S3 select, like SQL API for object storage, end to end, we have all these implementations. Anything that we did not do is because like Amazon got it wrong. And like, say, for example, bucket policies, Amazon also introduced ACLs. We knew that it was redundant and wrongly done, but eventually Amazon deprecated it. Now let's talk about the Lambda compute notification. We saw that that was very

Starting point is 00:25:12 powerful. In fact, before Amazon introduced, we actually had a different implementation because we had our own reasons. Customers kept asking that they needed some kind of notification. Later on between Amazon and us, we exchanged the product guys. We actually exchanged notes. And we have a very similar story of why the Lambda compute notification was born and how it was born. But let's talk here details. Amazon actually sends the notification through Amazon SNS or SQS, like the notification service or a queuing service. But then if we are in the private cloud and we can't write to Amazon SNS, SQS for us to be compatible, it has to be exactly like Amazon. But we have to deliver the notification to what?

Starting point is 00:25:57 And this is where in the private cloud, customers actually use Kafka, like from Confluent, like AMQP, you know, I was surprised that like community, one of the community member wrote MQTT, the IoT protocol equivalent, like AMQP, but used in the IoT world. But I've actually seen that MQTT, we are paying customers using this in medical space. We actually support a whole bunch of these message delivery systems, as well as like even webhook, you know, if we can even store these events in a database, so you can say query, show me all the check deposited in this ATM center or this region from this time to this time, we can store these events in a database or send it through like a notification systems like Kafka. we have a more sophisticated,

Starting point is 00:26:49 more flexible implementation than even what Amazon has, but it looks exactly like Amazon. You use Amazon SDK to trigger these notifications and you would just be fine. So I get the same functionality practically, and then I can implement something like an OpenFast on my side to take advantage of some of the same types of serverless functions that I look for on Lambda. Yeah, actually, Goody brought OpenFast.

Starting point is 00:27:14 They actually did a pretty nice job of turning your Kubernetes environment with MinIO and application containers. Actually, I see that a whole bunch of code can be written as just Lambda functions and OpenFast made it really easy by integrating Minivo's notification mechanism. And from application developers point of view,

Starting point is 00:27:35 you shouldn't be building the infrastructure. You should be writing useful application code and OpenFast actually receives these webhook notifications. We work with the project from early on and we have actually a lot of people using it. So, yeah, so Lambda is there, replication is there, encryption is there. And you mentioned erasure code.

Starting point is 00:27:58 So within a failure domain, is it like, what's the erasure code level? Is it a two nodenode failure or a two-device failure? So when we started, we sounded radical and almost like it was a little hard to digest for many. We used erasure code until we started doing it. Everybody else was doing erasure code offline. So we were the first to do inline erasure code at

Starting point is 00:28:25 high speed but then the thing was not about inline erasure code or bitrot the thing was that how like we we actually the default till today the default erasure code is n by 2 data and parity meaning say if you have 16 servers you could lose eight servers and you wouldn't have lost data all like eight servers all the drives in it right and a that is very high redundancy my point was look everybody else is doing replication three times or more and this is even with even with this level of redundancy you're only talking about two copies equivalent and it to me people's time is more valuable than server cost or drive cost and you deploy these machines forget it don't go to troubleshoot drives and servers that's why I left that as the default initially it was hard to digest for many uh it's just the

Starting point is 00:29:20 perception of it right they did not realize that the alternative was making three copies they were okay with hdfs or anything else doing copies three copies and not okay with just two copies equivalent of erasure code but now everybody understands that but i would say that for servers that are attended once in few months you can actually do 12 data for parity. That means anytime you can lose four servers or four data, four drives, and you won't lose availability or data. Yeah. Well, I would think 12 plus four would be more than adequate for most environments. Can you talk a little bit about performance? I mean, one of the challenges to some extent with S3 is that performance is not necessarily a given. It's not very predictable. It depends on the load on Amazon, that sort of stuff.

Starting point is 00:30:08 Is that correct understanding? Maybe I should start there. So for the most part, that's actually false, right? AWS is actually fast. It was a perception created by the enterprise storage vendors who added an object storage play, and they simply took the traditional content addressable storage or some kind of archival storage and because they were slow they simply

Starting point is 00:30:31 positioned object storages for archival and somehow the industry bought that idea because the enterprise IT school they were basically backing up to the public cloud so the perception was object storage is meant for archival but that that's not true at all. If you are inside AWS, if you talk to the AWS customers, they know, like Snowflake, imagine building a product like Snowflake on top of an archival product. It would not work, right? So what is Snowflake? I mean, we've started there. Snowflake is essentially a data warehouse play. Amazon has an equivalent like Redshift. It's basically SQL on top of unstructured data stored on Amazon S3 in a disaggregated storage and compute play. and they relied on sand like systems right but it did not scale because databases were good at querying and indexing the data but not at storage this is where they realized that by disaggregating storage and compute they can become stateless they can scale elastically and more resilient

Starting point is 00:31:37 but the object storage would deal with petabytes and petabytes of data. So Snowflake is actually an ultra-modern implementation of what a modern data warehouse has to look like. Now I see from Teradata to Splunk to everybody caught up with that idea. It's the new trend in the overall data processing world. And so Snowflake is effectively a database using S3 as a backend storage. Yeah, for unstructured data. It's not meant for the DB2 SQL server or Oracle transactional data, but it is meant for analyzing. But SQL and unstructured data

Starting point is 00:32:12 don't go together, in my mind. It's not oxymoron. It's worse. It's in conflict, in my mind. SQL means it's structured.'s it's it's uh it's in conflict in my mind sql means it's structured that is what i also thought because in the early days right so like two i would say two years before uh one of my friend from intel probably you know him dave gohan he was the first to bring it to the notice and he he was like intel was working with us closely and he he came from the financial space he wanted databases to run on object storage and i thought that was like nuts right i was trying

Starting point is 00:32:52 to convince him that object storage yeah it's crazy right because the database is all about mutations and and iops it's a latency play object storage is all about throughput and bandwidth play right it's blob and immutability and they are really up completely like oil and water they don't really mix at all but yeah and i was arguing with him that i would not make object storage yeah iops play and it would be bad for object storage because it would lose its very strength and what he argued he was the first to convince me on the subject and then i saw within the next two years the entire industry changed in that direction and what actually he meant he was the one to educate me on this subject he's saying object storage should exactly stay as object storage what it

Starting point is 00:33:42 is good at what he means is the modern databases would pull the blob to memory and then you do all mutations in memory when you're ready to commit write the table segments back to object storage that's actually a throughput play so the databases will actually store the table segments like extends on the object storage and all mutations are on the client side, that makes the database even faster. And from there, Amazon started realizing that, but if the table segments are in some JSON CSV type format, which is actually most of the enterprise bulk data is just CSV, JSON for semi-structured or even structured data.

Starting point is 00:34:25 Then Amazon brought in predicate pushdown, where the database layer can even push the SQL query all the way down to object storage. It made a lot of sense. And we now actually see that as the killer commercial enterprise application world for object storage. I'm about to go back and listen to this podcast and learn some stuff. It's pretty good stuff.

Starting point is 00:34:48 Yeah, I still try to get my understanding around it. So yeah, it's like an in-memory database that only gets loaded out or sent, or destaged rather, to the object store when all the mutations are done. It's brought in, it's processed in memory, and then it's destaged. Here's the important point here, right?

Starting point is 00:35:08 Why when I, it connected the dots for me. Because previously my job was like building Gluster, right? And Gluster was a file system. So I understood these problems and I hate complexity, right? The complexity is the one that kills scalability, everything. If a product is complex, you cannot build a business around it, right? And the part I hated the most was the POSIX API. And when I saw S3 API and Amazon pushed it, I knew that that was the right move. Technically, I could see that Amazon got it right.

Starting point is 00:35:46 And I wished Amazon should succeed. And I knew Amazon would succeed, right? They removed the friction by actually solving one very fundamental problem in all storage systems. This is the heart of the problem that will explain why finally database guys are understanding this correctly. Storage systems in the past somehow took an API like POSIX, which was never designed for a network storage, even for a block, right?

Starting point is 00:36:12 Why would you want to make mutations across the network? If we both are changing the data across machines, across the network, for us to be coordinated and be in sync, the storage system has no idea. The APIs, while it looks like so rich, they're actually quite crude and they don't have the sophistication that is needed. And if you want to do it correctly, you have to do all operations across the network synchronously. And these are all very chatty, small operations. You just cannot solve this problem any sand nas vendor if they claim to be fast they are it's bunch of hacks and it just does not scale at the scale that modern applications

Starting point is 00:36:52 need the only way to solve that problem is reduce the functionality the scope of what a storage system should do storage systems should only focus on keeping the data durable, ultra durable, and it scales. Keeping it simple is the solution to all of the storage problems. And Amazon understood this really well. But if you keep it simple, how do I then do the things I did in the past? If I have to mutate the data, don't mutate the data on the storage machine. That's the heart of the problem. If you are willing to rewrite your application, meaning you take the data, mutate in memory, and then when you are ready, you know better when the data is ready to be committed.

Starting point is 00:37:31 Okay, so the keys and various indexing and all that stuff. So you mentioned JSON and CSV. So there are some structured on top of this unstructured data. And so they're using, let's say, CSVs and stuff like that and say, okay, this column is a key one and this column is key two, and it's providing a separate index blob. It depends from database to database.

Starting point is 00:37:54 Say, like, let's take some examples like Splunk, for example, right? They actually write the indexed, full-text index data. They even have like a Bloom filter and it's exactly all the all the table segments and the index everything they did the job they did the compute on the hot tier they call it a hot tier but it's essentially a cache and and that storage is local drives attached to the splunk worker nodes they call it the indexers right but once the index is done, the data and the index is actually written as

Starting point is 00:38:27 just segments, collection of objects to an object store, and multiple indexers share it. So the data, if you look at the Splunk backend, it actually looks like, it's actually a Splunk proprietary format. But the Steradata Vantage, for example, like customers would actually do schema on read, which is a lot of customers like that idea. They want to capture the data into MinIO through Kafka or Fluentd, something like that. And then they can have Teradata or Vertica or Spark or Presto. Multiple systems can coexist. They pull the data from a common repository.

Starting point is 00:38:59 There they would like to keep the data in a CSV JSON format. So MinIO supports, fully supports Snowflake kind of functionality? If we were able to run Snowflake on-prem? Today, Snowflake is not there on-prem, but actually I find like Presto, Splunk, Vertica, Teradata, everybody has like native object storage support. They have been working with us for a while, and there are actually people using it today. Okay, so you don't necessarily need to do Snowflake, but you've got capabilities similar to that. If you actually wanted to use Snowflake in Amazon, you could potentially have

Starting point is 00:39:36 a MinIO storage behind it. Yeah, I think that goes back to the expense of using ebs as the back end again i'm really i'm really intrigued with like you know using this to back in like a redis or a or something like that where i'm getting cheap and deep storage on the back end but i'm getting the advantages of of my in memory as and i'm using you know my persistent layer of RAM or my persistent storage to mitigate kind of not having fast disk on the back end. I can see from an architect's perspective, this is a fun, fun solution to play around with. You actually touched upon an interesting topic

Starting point is 00:40:19 that I actually see that's where the future is headed. Opt-in memory type, like persistent memory, is actually best suited on the client side, acting as a cache, so they can have giant amount of persistent cache. And Redis APIs can get even simpler now because essentially of persistent memory. So all the mutations will happen on Optane memory type, which actually comes very nicely into this modern cloud-native architecture.

Starting point is 00:40:45 Yeah, it makes sense. If you have petabytes of data and it's typically cold, but you want to do a specific analysis against that set, you can load the sets you want off of cold or this near-cold storage object, do it and then not touch the data for another one or two years. It's kind of the promised land of in-memory databases. This is like Hadoop second generation or something. It's like third generation Hadoop.

Starting point is 00:41:16 So we were at the SFD19, and they were talking about Hadoop is all dead. I said, well, somebody's got to replace it, but this is what's replacing it. So he's using object storage and an SQL front end. More like Spark and object storage type. Yeah, yeah. But there are other solutions. You know, I'm running out of time here, but there's a couple of things I wanted to mention.

Starting point is 00:41:39 And you made mention of, I guess, 20,000 stars on your GitHub. So your open source solution is available as a GitHub repository, I assume, right? Yeah. So what does 20,000 stars mean? I mean, I think I have four stars on one of mine or something like that. There is a way to interpret that, right? The 20,000 stars by themselves mean nothing, right? The important

Starting point is 00:42:05 part is that, say, if I wrote a popular JavaScript library, chances are I actually would get 50,000 stars, right? But what matters is that 20,000 stars for an object storage is actually, it's probably the first time we are seeing this, that object storage is getting the kind of traction beyond IT into the whole application ecosystem itself. And because Minivo is made easier, so just like how in the early days, only like few privileged people can have access to the Unix machines and Windows, like everybody can install

Starting point is 00:42:40 and run their own web server and application stack. It's happening in the storage industry now that storage becoming simpler and purely software defined as to an object storage system is much simpler than a database right so why what's wrong with that like in terms of getting the adoption and combined with kubernetes type player making it easier and easier so the now back to the github stars that that get, that kind of stars for an object storage is actually high. And to me, it's not the stars alone that matters. The reason I picked stars is it's very hard to cheat. It's very hard to fake it. Like Docker, someone can actually repeatedly do pulls, so you can fake that, right? What we see is a general trend of all of the metrics,

Starting point is 00:43:27 whether it's Slack members or number of pull requests to everything is in a general reflection, we see very high number. So I tend to believe that GitHub starts are actually very real for us. But I also see it's not the GitHub starts itself I care. It's the rate of growth that we care. And the rate of growth is significant. And so the other thing I wanted to ask is how often do you release a new version of MinIO? Is it on a yearly basis? It sounds a little scary. We actually make every week.

Starting point is 00:44:00 Every week? Every week. Sometimes even multiple. You do realize this is an enterprise storage podcast. Yeah. Yeah, but, you know, as long as it – so Amazon does it weekly, if not more often, right? So every one of these other players in the cloud is developing and incrementally enhancing their solutions all the time. And if our customers are running like the old school enterprise, they are going to lose out to Amazon, right? We have to show that your on-prem storage should be cheaper, easier to run than public

Starting point is 00:44:30 cloud. And if I can show that, then I win. And we don't have a choice, right? The latest version should be the most stable version. And a software is only stable when there are no more users left. I've been there, done that. That's not a fun place to be. All right, well, listen, Keith,

Starting point is 00:44:57 any last questions for AB before we sign off? No, I think I have a bunch of homework to do, starting with the Storage Field Day 2019 episode. This is pretty interesting. I was there. I still didn't learn half this stuff. AB, is there anything you'd like to say to our listening audience before we sign off? Actually, Jonathan, maybe we can give some useful links to the blog post to you so they can later, maybe they can, if they're interested. But the point is that some of these are actually very, very new for many because it's happening at such a fast pace. And I actually have seen like many storage experts, they are finding these surprises, right?

Starting point is 00:45:46 Oh, God. I'm a storage expert. I'm very surprised. And there is a lot of confusion in the industry because everybody tends to read from sources that are mostly driven by commercial marketing, right? And here, these kind of podcasts are actually pretty useful for people to get educated. The questions were great because this is not about some vendor comparison or vendor survey, right? It's actually, you got straight into the heart of these topics that people are curious to learn.

Starting point is 00:46:22 I enjoyed it. Good, good. Well, this has been great. Thank you very much, A.B., for being on our show today. Thank you for having me. Next time, we will talk to another system storage technology person. Any question you want us to ask, please let us know. And if you enjoy our podcast, tell your friends about it.

Starting point is 00:46:38 And please review us on iTunes and Google Play and Spotify as this will help get the word out. That's it for now. Bye, Keith. Bye, Ray. And That's it for now. Bye, Keith. Bye, Roy. And bye, AB. Thanks, Ray.

Grey Beards on Systems - 097: GreyBeards talk open source S3 object store with AB Periasamy, CEO MinIO

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.