The Good Tech Companies - When Should You Use a Cache With MongoDB?

Starting point is 00:00:00 This audio is presented by Hacker Noon, where anyone can learn anything about any technology. When should you use a cache with MongoDB, by MongoDB? This article was written by Andrew Morgan of MongoDB. From time to time, I'll run a design review for an application being migrated from a relational database onto MongoDB, where the customer shares an architectural diagram showing a caching layer, typically Redis, sitting between the app server and MongoDB. I like to keep the architecture as simple as possible, after all, each layer brings its own complexity and management costs, so I'll ask why the

Starting point is 00:00:36 kaching layer is there. Of course, the answer is always that it's there to speed up data access. This reveals a misunderstanding of both the reason why caching layers were created and what MongoDB provides. I've yet to finish a design review without recommending that the cache tier be removed. So to answer the question in the title of this article, when should you use a cache with MongoDB? The answer is probably never. This article attempts to explain why, but if you get to the end and still think your application needs it, then I'd love to discuss your app with you. Why were caches like memcached and redis invented, and why do they thrive? Caching tiers were introduced because it was too slow for applications to read the required

Starting point is 00:01:16 data directly from a relational database. Does this mean there aren't smart developers working on Oracle, DB2, Postgres, MySQL, etc.? Why couldn't those developers make relational databases fast? The answer is that all those databases were written by great developers who included indexes, internal database caches, and other features to make reading a record as fast as possible. The problem is that the application rarely needs to read just a single record from the normalized relational database. Instead, it typically needs to perform multiple joins across many tables to form a single business object. These joins are expensive, they're slow and consume many resources. For this reason, the application doesn't want to incur that cost every time they read the same business object.

Starting point is 00:02:05 That's where the caching tier adds value. Join the normalized, relational data once and then cache the results so that the application can efficiently fetch the same results many times. There's also the issue of data distribution. Most relational databases were designed 50 years ago when an enterprise would run the database and any applications in a single data center. Fast forward to today, when enterprise sand customers are spread worldwide, with everyone wanting to work with the same data. You don't want globally distributed app servers to suffer the latency and expense of continually fetching the same data from a database located on a different continent. You want a copy of the data located locally close to every app server that needs it. Relational databases were not designed with this data distribution requirement in mind.

Starting point is 00:02:47 RDBMS vendors have attempted to bolt on various solutions to work around this, but they're far from optimal. Instead, many enterprises delegate the data distribution to a distributed cache tier. Note that Redis and Memcached are widely used for session handling for web applications where persistence isn't a requirement. In that case, the cache is the only datastore, i.e. not a cache layer between the application and MongoDB. While you can, and people do, use MongoDB for session management, that's beyond the scope of this article. So, what's wrong with having a caching tier?

Starting point is 00:03:23 Introducing a caching layer is often a great solution when your database can't deliver the performance and latency your application needs. However, this extra data tier comes with costs. The obvious ones are the software licenses and hardware required to provide the caching service. Less obvious is the extra load on developers. It's a new query language, and possibly programming language, to master. What happens when the data in the RDBMS changes? How are those changes propagated

Starting point is 00:03:51 to your cache tier? So, a cache tier has to pay its way by delivering tangible benefits over having your application access the database directly. What's different with MONGODB, the MongoDB document model? In MongoDB, we want you to store your data structured in a way that makes Ideficient to quickly satisfy your application's most frequent queries, or those with the toughest SLAs. MongoDB mirrors the structure of objects by letting a single record, document, contain embedded, nested, objects. Support for arrays allows one-to-many and many-to-many relationships without joining

Starting point is 00:04:26 multiple collections. In many cases, the business object required by the application will map to a single MongoDB document. In other cases, it might require multiple documents that can be fetched with a single index lookup. MongoDB has its own internal LRU, least recently used, cache, so if your document has been accessed recently, chances are it's already in memory. So, a swith redis, MongoDB can satisfy the application's query by fetching a single

Starting point is 00:04:55 document, object from memory. Note that MongoDB supports joins, but we try to structure your data to minimize ether use. The other value add from a caching layer is data locality in distributed architectures. MongoDB has this built in. A MongoDB replica set has a single primary node that handles all writes, together with up to 49 secondary nodes, each with a copy of the data. For the lowest latency queries,

Starting point is 00:05:21 you can place secondaries locally at each of your app server locations. MongoDB is responsible for keeping the data in the secondary nodes up to date with the primary, so you don't need to write and maintain any extra synchronization code. What does I think? The responses created by generative AI are driven by the information that's been published by real people, and so it should represent popular opinion on a topic. I thought it would be interesting to see what an AI has come to understand as conventional wisdom on why people place a cache in front of MongoDB. I asked Chad GPT-4-0 this question greater than, explain why I would use a cache layer, such as Redis, rather than having my greater than

Starting point is 00:06:00 application read data from MongoDB directly. Greater than greater than improved performance. Redis operates entirely in memory, making it greater than significantly faster than MongoDB, which relies on disk I O for data greater than retrieval. As described above, MongoDB has its own in-memory cache, so the documents you frequently access will be in memory and no disk access is required. Greater than reduced load on MongoDB.

Starting point is 00:06:27 Frequently accessing MongoDB directly for the same greater than data increases query load, which can slow down the database, especially under greater than heavy read traffic. MongoDB is scalable. Extra secondary nodes can be added to the replica set to add extra query bandwidth. MongoDB's sharding, partitioning, can scale data capacity or write throughput horizontally. Greater than handling high read traffic. Applications with high read-to-write ratios greater than E, G, web apps, APIs, benefit from Redis's ability to serve cached data greater than quickly. Greater than greater than MongoDB's database cache provides the same benefits without the

Starting point is 00:07:07 extra developer effort to synchronize data changes. Greater than, faster access to frequently used data. Redis is ideal for caching frequently greater than accessed or hot data, e.g. user sessions, configurations, or product greater than details. Greater than greater than frequently accessed, hot data will be held in MongoDBs in memory database cache. Greater than lower latency for geo-distributed applications. By replicating Redis caches greater than closer to end users,

Starting point is 00:07:37 you can avoid high network latency when querying MongoDB greater than from distant locations. Greater than greater than data locality can be solved by placing replicas near your app server sites. Greater than support for expiring data TTL. Redis has a built-in time to live TTL. Greater than feature that automatically removes cached data after a specified duration. Greater than greater than MongoDB uses an LRU cache, so any documents that are no longer being queried will be removed from memory if the space is needed for more recently queried data. MongoDB also has be resource intensive.

Starting point is 00:08:25 Greater than especially with complex queries, leading to increased infrastructure costs. Greater than greater than your MongoDB schema should be designed so that your important queries don't require complex queries. Greater than use cases for specialized data structures. Redis supports advanced data greater than structures like lists, sets, sorted sets, hashes, and streams, which MongoDB greater than doesn't provide natively. Greater than greater than, MongoDB supports lists and sets.

Starting point is 00:08:54 Hashes can be represented in MongoDB as an array of documents containing key value pairs, the MongoDB attribute pattern. MongoDB time series collections meet the same needs as Redis streams. Greater than, resilience and fault tolerance. A cache layer can serve as a fallback if greater than MongoDB is temporarily unavailable or under heavy load. Greater than greater than, MongoDB can scale vertically or horizontally to meet any load demands. ScalingJohn be automated when using MongoDB Atlas.

Starting point is 00:09:26 MongoDB replica sets provide fault tolerance for both reads and writes. Greater than simplified complex query results. MongoDB can take time to compute complex greater than queries, e.g. Aggregations, joins, for frequently requested results. Greater than greater than your MongoDB schema should be designed to avoid the need to run complex queries frequently. Results can be stored, cached, in a MongoDB materialized view, avoiding the need to repeatedly execute the same complex query aggregation. Note that the response you get from Chad GPT is heavily skewed by the question you ask.

Starting point is 00:10:03 If I change my prompt to explain why I shouldn't use a cache layer such as Redis, rather than having my application read data from MongoDB directly, Chad GPT is happy to dissuade me from adding the cache layer, citing issues such as increased system complexity, data consistency issues, performance for write heavy workloads,

Starting point is 00:10:22 cost, query flexibility, maintenance and reliability, small datasets, where the active data set fits in MongoDB's cache, and real-time reporting. SUMMARYA cache layer can add much value when your RDBMS cannot deliver the query performance your application demands. When using MongoDB, the database of record on cache functionality is combined in a single layer, saving you money and developer time. A distributed cache can mitigate shortfalls in your RDBMS, but MongoDB has a built-in distribution. Respond to this article if you still believe your application would benefit from a cache layer between your application and MongoDB.

Starting point is 00:11:02 I'd love to take a look. Learn more about M-O-N-G-O-D-B Design Reviews Design reviews are a chance for a design expert from MongoDB to advise you on how best to use MongoDB for your application. The reviews are focused on making you successful using MongoDB. It's never too early to request a review. By engaging us early, perhaps before you've even decided to use MongoDB, we can advise you when you have the best opportunity to act on it. This article explained how designing a MongoDB schema that matches how your application works

Starting point is 00:11:33 with data can meet your performance requirements without needing a cache layer. If you want help to come up with that schema, then a design review is how to get that help. Would your application benefit from a review? Schedule your design review is how to get that help. Would your application benefit from a review? Schedule your design review. Thank you for listening to this Hacker Noon story, read by Artificial Intelligence. Visit hackernoon.com to read, write, learn and publish.

Your Ad Here

The Good Tech Companies - When Should You Use a Cache With MongoDB?

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.