Storage Developer Conference - #137: Caching on PMEM: an Iterative Approach

Starting point is 00:00:00 Hello, everybody. Mark Carlson here, SNEA Technical Council Co-Chair. Welcome to the SDC Podcast. Every week, the SDC Podcast presents important technical topics to the storage developer community. Each episode is hand-selected by the SNEA Technical Council from the presentations at our annual Storage Developer Conference. The link to the slides is available in the show notes at snea.org slash podcasts. You are listening to SDC Podcast, Episode 137. Hi, everyone. Today, we are going to talk about how Twitter explores caching with PMAN in an iterative approach. My name is Juncheng Yang. I'm a third-year PhD student at CMU studying caching and storage systems. I have been working with Yao and Twitter since earlier this year on how to use PMM efficiently for caching.

Starting point is 00:01:09 Hi, everyone. My name is Yao. I'm a software engineer, and I've been with Twitter for almost a decade. And I spend a considerable amount of time working on caching systems. This project is a collaboration between Intel and Twitter since the end of 2018, and it is still ongoing. With intro out of the way, let's jump into the talk. As a software engineer, I really enjoy observing and learning about new hardware trends as they present opportunities to reimagine software design and architecture. However, writing good software that takes full advantage of new hardware takes time. It requires software owners with intimate understanding of their problem domain to be comfortable and confident with their knowledge of the new

Starting point is 00:01:57 hardware and have the opportunity to muse on the many possible combinations of the two sides. In most businesses, the opportunity to do a blue sky or clean slate project is rare, while there is almost always room for incremental improvements. So what we hope to highlight with our case study here is how to create not just a goal, but also a path to keep evolving the software with the hardware with a solid productionization plan pretty much from the beginning and never lose sight of the business goals. Here's how the talk is laid out. First, I want to talk about the basic considerations that we have that decides the shape and the goals of the project. Then we will spend the majority of the

Starting point is 00:02:42 time talking about the several iterations that we have gone through in putting cache on PMAN. Finally, we will talk about the several lessons we have learned. First, I want to lay down why we wanted to do it, how we think this technology can help Twitter, and the constraints we operate under. An overview of caching at Twitter is probably helpful to give context. Twitter has over 300 mostly single-tenant cache clusters in production. In aggregate, we have tens of thousands of instances, and they span over many thousands of hosts in multiple data centers. Our jobs are mostly small. We have a few cores and a few gigs of memory per instance, although there is some notable outliers with much bigger heap.

Starting point is 00:03:31 The largest single tenant cluster has a max QPS of 50 million per second. And most of our caches do uphold a pretty strict SLO, which has a P999 under five milliseconds end to end. And many of our services actually come to depend on that. So there are a few things that's worth noting. Number one is at Twitter, cache is really mission critical. Whenever cache is done, there's a good chance that the site is done.

Starting point is 00:04:04 The second thing is, although cache is not the largest service at Twitter, it does take up a fair amount of resources. So anything that optimize cash efficiency is likely to reduce cost for the business. Finally, because we have so many instances, operational burden is relatively high. So anything that allows us to do operations a little bit easier and faster is a welcome change. How did we envision PMEM to help cache? First, the higher storage density per DIMM means that we can put more data into each instance. So if our clusters are memory bound, which is true for the majority of them, that means we can reduce TCO by reducing the number of instances in each cluster.

Starting point is 00:04:56 On the flip side, we may choose just to cache a larger fraction of the working set and improve hit rate. And this has some secondary benefits, such as avoiding slower and more expensive requests to the storage backend end, as well as improving the scalability and reliability of the site as a whole. The other aspect of persistent memory is its durability. And this really provides an opportunity for us to reimagine what cache operations should look like. Previously, we relied on spacing out cache restarts and rely on organic traffic to warm up caches. But with durable data, what we can do is to achieve graceful shutdown and actively rebuild state after reboot. This means the operations can be done much faster, shortens the maintenance window, and improves the data availability during the window of maintenance, which tend to make the site a little bit more vulnerable due to the lack of data in cache.

Starting point is 00:05:52 As a mission critical service, there is a number of strings attached to cache. One aspect is we want the new changes to be highly maintainable. We don't really want to fork the existing codebase. We want them to live in the same codebase. We also want to retain the same high-level APIs so we can minimize the change exposed to the rest of the logic. And the other aspect of things is we want the change to be highly operable because we imagine even introducing PMEM in the near future, for a long time, we will see both PMAM and DRAM-backed caches to coexist. This means we want to be able to invoke either DRAM or PMAM caches based on some configurations. Also, since our customers come to depend on the published SLO that we have, we would like for PM backed caches to have the same

Starting point is 00:06:45 predictive performance that we have with DRAM backed caches. With all the foundations laid out, we decided to take an iterative approach. Here's the principles that we try to follow. Number one is to always show progress. You can call it a survival instinct, but big projects that are stalled often get canceled, and I've seen it with my own eyes. And because we're dealing with something new, we would like to be flexible. That means we will try to identify issues and maybe modify future plans based on what we have learned so far. Also, we would like to gain some more confidence as we go along. This means we would like to verify our various hypotheses based on the results that we have seen and stay within the constraints as much as possible during all the iterations. And here's the plan. To meet our

Starting point is 00:07:41 constraints, the best option for us really is to use a modular caching framework. Here in the bottom right corner, I'm showing an architectural diagram of Pelican, which is a modular cache framework developed at Twitter. And it has many components, and it can behave as either memcached or Redis based on how the binary is structured. But for our talk today, really the relevant part is the data store module, which hides all the complexity of the underlying storage and presents a very simple key value interface to the rest of the system, as you may expect in systems like Memcached or Redis. And we want to do a few rounds of both testing and development. First, we just want to have a sanity check of, does PMEM work with an unmodified cache at all? After that, we would like to do the minimum amount of changes needed to really explore the durability aspect of the persistent memory.

Starting point is 00:08:49 And finally, with all the insights and knowledge that we have learned along the way, we want to ask, what is the right design of cache that is PMEM-oriented or at least PMEM-friendly. We came up with a test design before we started doing anything, and we really tried hard to make it look like production caches. This is why we have a broad range of object sizes that are on the smaller side. Also, we have a much higher connection cons per instance, because that reflects what the topology is like in Twitter's production. And the focus of our tests are on the impact of PMAN versus DRAM in terms of throughput and latency. And we were also curious if memory mode and app direct mode made a difference other than the fact that memory mode was not durable. And the things that we care most about on top of these are if things will change based on how much of the data set is residing on PMAN, as well as bottleneck analysis, which gives some insight into how the system is behaving and also can point out what future direction

Starting point is 00:10:00 we should be pursuing. First iteration of the test was very simple because it didn't require any code changes, and it was very cheap to run, especially for Twitter, considering Intel did all the work. And this test was run on Intel's hardware, and the notable thing about their configuration is that they have a fully populated system

Starting point is 00:10:20 with 12 channels of PMAM in a 2-2-2 configuration. So the way to read this chart on the right, first we should look at the latencies at P999 and note that throughout the test, the latency stayed within target, which is 5 millisecond at P999. And the other thing to note is that in terms of throughput, there is a drop when we go to two kilobyte value size.

Starting point is 00:10:46 And this is largely due to the fact that once you go over one MTU, it takes more than one network packet to transfer the payload. But within the one kilobyte range of value size, throughput actually remains remarkably consistent between different value sizes as well as different data set sizes. And remember, in memory mode, the higher the data set size, the large proportion of all the data that we are accessing are coming actually out of PMAN. So this shows excellent scalability in terms of cache performance when it comes to PMEM. The good results of the first iteration did not come as a surprise, but it was definitely reassuring. With that, we're ready to move into the next phase where we're making minimum changes so that we can use PMEM as durable store. And for that, we need to modify the module that I was mentioning earlier, which is the data store module within Pelican. We added a new abstraction called data pool so that we can give

Starting point is 00:11:53 the key value interface an option to choose which media they are putting the data onto. And one path of that preserves what was the logic before where allocation happens inside DRAN. So there is no change if this configuration is used. In addition, we introduced a new configuration and new option to go to file back PMEM. And for that code path, we build our logic on top of PMEM DK, which greatly simplifies the amount of code that we need to write in Datapool itself. So overall, this change used no more than 300 lines of C code, which is very minimum compared to the overall volume of the code. And because we want to keep the changes small, we only put slabs onto persistent memory if it's used.

Starting point is 00:12:46 And hash table, because it's full of pointers to DRAM locations, remains in DRAM and remains unchanged. Now that we have the new changes, we went back to testing Intel's equipment again. This configuration is a little different from the memory mode configuration with fewer jobs, so the throughput numbers are not directly comparable. But first, the latency stays within SLO, and second, we see the same very consistent results across a broad set of value sizes as well as different data set size. So this means in terms of performance and scalability, AppDirectMode checks out for running cache on PMAN.

Starting point is 00:13:32 Now that we're putting PMAN in AppDirectMode, the more interesting question to ask is, what are we going to do with the durability? Intel helped us make a few changes, which allowed the binary to load data files from the PMAM and reconstruct the hash table once it finished booting. So it could start serving the data that was previously residing on PMAM after the reboot. And we followed up with two tests.

Starting point is 00:13:57 In the first test, we tried to reconstruct the hash table for about 100 gigabytes of slab data. And it completed in about four minutes. In the second test, we were doing 18 such reconstructions concurrently, and the completion time was only a little bit longer at five minutes. So these results told us two things. First is, compared to the maintenance schedule we have today, where we wait typically 20 minutes between restarting two instances of cache belonging to the same job, this will allow us to speed up the schedule by one to two orders of magnitude

Starting point is 00:14:32 if we keep the heap at its current size, which is a few gigabytes. On the other hand, of course, to really take advantage of such a change requires other architectural changes to come with it. For example, we will have to have a solution for all the in-flight writes when the binary is finishing its rebuild. At this point, Intel offered to give us a few samples of PMEM to be tried on our next generation platform developed by the Twitter hardware engineering team. So we decided we would do more or less the same experiments that Intel did for us so far and observe if the results agree. And we were quite surprised by the numbers we saw in memory mode running pretty much unaltered cache, because when the PMEM occupancy rate was high at 40 or 75% by using large datasets, the TEM latency deteriorated really quickly.

Starting point is 00:15:31 Looking a little closer at our configuration and compare that to Intel's, we realized the main difference here is that while we have a higher total capacity, storage capacity of persistent memory, we install them as fewer large DIMMs. So here we have four channels as compared to Intel's 12, and that reduction in persistent memory bandwidth really mattered in our configuration here. But not all hope is lost. Next, we tested the new cache in AppDirect mode. And to our relief, the results are very much in line with what we were expecting based on Intel's results,

Starting point is 00:16:13 despite Twitter's platform only have four channels of PMM compared to 12 and have a higher number of connections. And you can see that our P999 Max came down more than 10x compared to in memory mode. So this clearly sends a message about which mode we should operate our future caches on, even we only want to use it for the higher memory density. And that concluded the first two iterations of our attempt to put cache on PMAN. I want to take a moment to reflect on what we have learned so far. First is the bottleneck. Most people who have worked a lot with cache know that cache's bottleneck is primarily

Starting point is 00:16:58 inside the kernel network stack. But it does seem possible to put that bottleneck on PMAM, especially if the number of channels is very small. And when it comes to memory mode versus AppDirect mode, it's clear that under constrained configurations, AppDirect mode offers far more predictable performance when it comes to PMAM access. Mostly because the programmer is allowed to use the memory bandwidth to PMM and DRAM more judicially by separating traffic. And the code change to do this is quite modest, I think due to two factors. First is the modular design of Pelican really localizes the change of all the related code into one module.

Starting point is 00:17:42 The other is by using PMDK, most of the heavy lifting is already done for us through a few simple APIs. So the amount of change we need to introduce directly into the application is somewhat limited. And also based on all the insight we have gained about how the application behaves on PMAM, we think we can improve things a little bit further. One is a lot of the memory traffic seems to be targeting metadata. So we should think about how we lay out metadata between PMEM and DRAM and hopefully direct most of the high traffic area to DRAM. The other is it's perfectly reasonable to use pointers when everything is laid out in DRAN.

Starting point is 00:18:32 However, the use of pointers is not particularly compatible with durable storage, especially persistent memory. how we introduce a lot more substantial changes to the storage module to work better with persistent memory. Thank you, Yao. Now, let me talk about how we design a new storage module for Pelican. Before I go into the design, let me talk about something else. First, I want to ask a question. What is PMAN good or bad at? Think about this for one second. As has already been pointed out in many studies,

Starting point is 00:19:23 PMAN is good at sequential and large accesses, while small and random access are not good for PMAN. So my second question is, what is a cache's memory access pattern? If we look at an emitter cache closely, you will see emitter cache have a lot of random reads and random writes. So at first glance, you may say, p-mail cache are not compatible, right? But wait a second, does this remind you of anything?

Starting point is 00:20:10 It reminds me of the days when spinning hard drive are used for storage. In order to fully utilize the performance of spinning hard drive, people have designed all kinds of new data structure. For example, log-structured file system and log-structured kubator stuff. Like some framework example, log-structured file system and log-structured kubator stock. Like some famous example including like LSM3 database, right?

Starting point is 00:20:30 LSM3 database is also log-structured to utilize the sequential write performance of either spin index or SSD. Okay, you may ask, can we use the same design here? The answer is not really, because there are multiple sources of random memory accesses. And we cannot directly use this data structure in caching because of the requirement of caching a difference. So, so where are these random memory accesses? So, there are two sources. One is hash table, one is object storage. Okay, now let's look closely where the random memory accesses are coming from. The first one is hash table. In memory cory caching uses hash table to do key index, so the key lookups are very fast. And most of the in-memory caching, like memcached and peg, use object chain hash table. Object chain hash table uses object chaining to resolve hash collision.

Starting point is 00:21:50 So, for example, we have three objects hashed to the same bucket, and we chain these three objects into the one hash chain. When we do lookup, we find this bucket and we check each object on the chain and see whether it is object we are looking for. So it's easy to see there are random memory access, I mean random reads here, right? Because every time you go down the object chain, it is a random read. So that's why we have random reads. Then how about writes? We have random writes here because we want to delete an object, say you want to delete the second object in this bucket, say this one. You need to update

Starting point is 00:22:33 the pointer in the first object, and it needs to be pointed to the third object. So this update is a random write. That's why, this is why a hash table have random writes. Besides hash table, the other styles of random memory accesses are slab memory management. Object writes, expirations, deletions, evictions, all cause random reads and writes. Now what is a slab-based memory management? A slab-based memory management cuts the memory into fixed size slabs, where each slab is one megabyte trunk.

Starting point is 00:23:20 And it further divides each slab into smaller chunks. So the objects are stored in these smaller chunks as we show in this figure. So different slab class of different size. So this slab class store objects up to 96 bytes. Okay, now let's see how this slap storage cause a random use of random write. It causes random write because every time, say,

Starting point is 00:23:53 an object is deleted or expired, you need to free space, say, the trunk, and you need to add this trunk to the slap free queue so that it can be reused later. So adding this to the slab free queue so that it can be reused later. So adding this to the slab free queue requires at least two run rights. First you need to update object metadata to mark this empty space. Second, you need to update the slab free queue metadata. You do, you need to indicate this slap has one more free trunk, and the queue needs to point to this item. That's why you have random writes. So the main source of random writes.

Starting point is 00:24:57 Now let's see how pelicun SLAM module optimizes for PMAM. So pelicun uses SLAM based eviction, which performs batched evictions without updating metadata for each object. Compared to the memcached which performs eviction for each object, so each eviction needs to update the object metadata and the slab metadata. Well, in Pelicun, it's batched, so you don't need to update each object each time. So it avoids a lot of random writes. Moreover, since we do slab-based eviction, after we evict one slab, to write on this slab a sequential is very important. So this slab eviction improves the SLAM module performance for PMM a lot. However, this is not enough.

Starting point is 00:25:48 This does not resolve object expirations and object deletions problem. So when an object is expired, we still need to move the trunk into the free queue, similarly for object deletes. So that's why we have a new design called CellCache. CellCache is a segment structure cache that has three components. The first one is TTL buckets. The second one is object store. The third one is a new hash table. The TTL buckets are used to facilitate efficient and proactive TTL exposition. This is because we observe that TTLs are widely used in in-memory caching.

Starting point is 00:26:30 And being able to remove expired objects efficiently from the cache is very important to a cache's efficiency. That's one important improvement of that cache. But that's not related to PMAT. The second part is object store. Objects store where the objects are stored. Compared to slab, we use segments, where a segment is a small log storing objects

Starting point is 00:27:04 written at similar time with the same approximate detail. In other words, objects here in one segment are similar. They share the detail and share the creation time. And each segment is a panel because it's a log. The third one is a hash table. Instead of using an object chain hash table, we use a new technique called bug chaining or bucket chain. So we have eight slots for each hash bucket inside one. So we store seven items in this hash bucket, because the first one is a bucket information slot. Okay, so that's the new hash table. Now let me talk about how, what, what SecHache has achieved. In terms of optimization for PMAN, SELCache transforms all random PMAN writes

Starting point is 00:28:12 into sequential writes. Do notice that's all. Second, SELCache move random small metadata reads into DRAM. Third, Zcash uses PMAN only as an object store, so it does not have any updatable metadata. So then for a GET request, Zcash reads PMN only once most of the time, and it has no write. For set request, I mean, a write request, it writes once and it's sequential. Sorry. For all other bookkeeping operations, setcache performs them sequentially in batch, which improves both throughput and performance on PMAT.

Starting point is 00:29:10 Okay, that's not the end of cell cache. Besides performance, cell cache also provides better memory efficiency. First, it can efficiently remove all expired objects from the cache immediately after expiration. Second, it has much smaller object metadata. Memcached uses 56 bytes object metadata for each object, 56 bytes. Pelicun Slab module has reduced it to 38 bytes, which is a pretty impressive improvement. Now, Slab cache further improves on this

Starting point is 00:29:54 and reduces the item metadata to 5 bytes, which is a 90% reduction compared to in-memory caching. And this is very important for in-memory caching because a lot of in-memory caching. And this is very important for in-memory caching because a lot of in-memory caching clusters store small objects.

Starting point is 00:30:13 Third, setcache use a merge-based segment eviction algorithm, which improves the set cache memory efficiency by reducing the mistrash. So overall with this technique, set cache reduce the memory footprint of Twitter's largest cache cluster by 60%. Okay, so that's memory efficiency.

Starting point is 00:30:44 That's not relevant to today's talk. Now let's come back to performance on PMAT. The first organization I mentioned earlier is Zcash transforms all random writes into sequential writes. Let's see how. The first is hash table. We call it object training hash table. Every time when we update an object,

Starting point is 00:31:09 we need to update the pointer of previous, in previous object to point to the next object, right? With this new hash table, we want to delete an object or update an object. You just change the entry in this hash bucket. Say we have seven buckets for items. So you want to change this one or delete this one. You just change it or delete in the hash table,

Starting point is 00:31:38 which is in DRUN. So there's no touch on PMAT for object update and pvc. So that's HatchTable. So that's HatchTable. Now let's talk about object store. Instead of using slabs, we're using segments. Segments are small logs, which are append only. So we don't have random writes on the segment.

Starting point is 00:32:05 Moreover, because segments store objects of similar creation time and detail, we store object metadata in segments. So all the objects in the segment share one copy of metadata. So we don't have, we don't need to update object metadata anymore. Then to remove an object out of the cache, we just remove its hash table entry. We do not do anything else. We do not update object metadata. And to expire an object, because objects in the same segment have the same expiring time, we do not expel objects. Each time when it expels, this whole segment expels.

Starting point is 00:32:55 So this is done sequentially in batch, compared to object expel. So that's why we don't have metadata update during expel and eviction and de-race. Okay, second, move small random metadata, reads and writes into DRAM. As I mentioned earlier, each segment has one copy of metadata and objects do not have metadata. So this one copy of segment metadata is small and can be stored in DRAM. So all the updates

Starting point is 00:33:37 to this shared segment metadata is done in DRAM and we don't need to touch PMAN to update this segment header. Now it's time to show you some results. Here I'm going to show you Zcash microbenchmarks. This microbenchmark is done using Twitter's protection fleet. And we are only using one DIMM of PMAT. On the left, I'm showing the three throughput. On the right, I'm showing the right throughput. Both are using items of 64 bytes. First, comparing slab storage and cell cache. We see that for three throughput throughput, setcache improves it

Starting point is 00:34:28 around 100%. So it doubles 3D throughput from 8 million QPS to 16 million QPS for concurrency 16.3. For write, at concurrency level 8, we see that Slab Cache has 2.5 times higher throughput compared to Slab Torch, which is amazing, right? Comparing read and write, we see that write is much lower. This is because the PMAN write bandwidth is lower than read bandwidth. So that's some preliminary results we have got. So what's next? So far, we have only looked at the micro benchmarks. Next, we are planning to look at performance on real workloads with Zcash on PMAM.

Starting point is 00:35:33 Because PMAM provides a way for persistence and faster recovery, we also want to look at how to achieve fast recovery with Zcash. Besides this, we plan to look at the memory hierarchy of using PMAM for caching. So far, we have only looked at how to use PMAM for caching. Now, since we are pushing PMAM caching

Starting point is 00:36:03 to limit of PMAM, we start to look at how we should use PMAM plus DRAM for caching. And what is the correct memory hierarchy for PMAM? So that's our next step. Yeah, that's it. Now, let me hand over to Yao. So that's the journey for putting cash on PMM at Twitter so far.

Starting point is 00:36:32 At this point, I want to try to look back and also looking from a higher point of view and say a few things about the lessons we have learned. So the takeaways more specifically for caching is that in the end, it was an exercise of avoiding turning PMEM into the new bottleneck. So as long as we were doing that, everything was going to work out fine. And in this regard, AppDirect mode is the clear winner because it allows us to judiciously use the limited memory bandwidth with the more limited memory bandwidth we had for PMAN versus the more abundant memory bandwidth we had for DRAM. This is interesting because it's a somewhat different conclusion from what the author or the maintainer of the Memcached project ended up concluding in his project, I think mainly because he was doing most of his experiments on Intel's hardware,

Starting point is 00:37:27 which had much more abundant key man bandwidth in an interleaved mode. But even for Twitter, memory modes serve its purposes along the way really well because it pointed out where things need to change to fit our use cases. So in this regard, it really is important to do due diligence,

Starting point is 00:37:48 especially when it comes to new hardware, regardless of how many demos or benchmarks that have been published, because nobody else can exactly reproduce the production environment for you. So you are the only person who can do it and gain ultimate confidence in the final solution. We also try to be really disciplined when it comes to changes, not making changes to

Starting point is 00:38:15 the software for the sake of it, but innovate only when there is a clear problem to be solved, which was based on all the insight and observations we have from previous rounds of testing. Finally, I want to say that with the introduction of assistive memory, cache is really on the verge of being transformed into a more durable service that is quite different from what it was before. And the full ramification of this is still something that I'm thinking about all the time, and I don't fully understand. But this seems to be a really exciting opportunity, blurring the line between what used to be considered in-memory caching and more proper full-flash storage solution that's much more expensive and much slower. I think this is going to be a major undertaking that has a lot of potential, and I'm hoping we can continue to make progress

Starting point is 00:39:06 in that direction using the same iterative approach. And there are broader takeaways for software adoption of PMEM in general. As a storage developer, you may wonder, you know, whenever there is a new piece of hardware technology, where it can be deployed to realize the most gain and most value. And to answer that question, one actually needs to look beyond just the storage system. After all, any functional piece of software does many things at once. So to understand whether a system could benefit from upgrade from SSD to PMAM, or can use higher density but lower throughput PMAM in exchange of DRAM, one really needs to understand where the software is spending time today at runtime.

Starting point is 00:39:54 And for any adopters, usually there are some business goals they are trying to achieve with the new change. For example, had Twitter's business goal been putting more data into memory so we can have a bigger data working set, we also could have afford Intel Labs configuration, which has a fully populated DIMM with a lot more channels. We could have been done with memory mode. We didn't necessarily even need to go to app direct mode. And in fact, that was the conclusion that was drawn by the maintainer of the Memcached project after testing with Intel's lab equipment, because that's what they saw. But because Twitter had different business goals and Twitter had business constraint, we went a little bit further and we decided we want to take advantage of the durability

Starting point is 00:40:45 and we want to use the PMAM bandwidth more traditionally. And another thing that is interesting is often compared to really, really nice demos or even very nice benchmarks, putting something in production comes with a lot of baggage in terms of how the software can be developed or maintained and how the software should be operated. So the sooner we can take these constraints into consideration, the more likely that the plan we put forward will have a chance of finally succeeding. And another thing to consider is, you know, for any adopters, it's likely there will be quite some time between they start considering a new technology and they finally committing to it. So this is a very lossy path because things going wrong along the way can result in the customer or the adopter going away and abandoning the initiative. So the more we can make this a gentle and easy path with a lot of possible accidents along the way in the shape that is the most suitable and makes the most sense for the

Starting point is 00:41:57 adopter, the more likely that people will get on this path quickly and make the right decision for themselves, for their business cases. So while it's often considered that software is more malleable and has a fast turnaround, which is true a lot of times compared to hardware, which has long cycles and well-defined pacing, truly transforming software, especially in light of a fundamentally different class of hardware, really takes time. So I think it would be helpful if the software developer and hardware developer can come together and give the initiative a proper amount of time and consideration so we can

Starting point is 00:42:41 fully unlock the potential of the underlying hardware in a software layer. So with that, we conclude our talk, and we are happy to answer any questions you may have. Thanks for listening. If you have questions about the material presented in this podcast, be sure and join our developers mailing list by sending an email to developers-subscribe at snea.org. Here you can ask questions and discuss this topic further with your peers in the storage developer community. For additional information about the Storage Developer Conference, visit www.storagedeveloper.org.

Storage Developer Conference - #137: Caching on PMEM: an Iterative Approach

...

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

Storage Developer Conference - #137: Caching on PMEM: an Iterative Approach

...

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.