FamilyWallpapers is one of the largest online stores for wallpaper and paint in the Nordic region. Majako works in a close partnership with the store owners to provide development and hosting, and FamilyWallpapers has seen incredible growth over the past two years. With a constantly growing product catalogue and five stores serving as many countries, the database now contains:
At the time of writing (2023-01-30), monitoring reports an average of 1390 requests per minute over the past month, typically ranging between a few hundred requests per minute during the night and above 3500 during peak hours. Google Analytics shows 368 active users in the last 30 minutes: a good but representative traffic level for a Monday afternoon.
In order to provide stable service for that many users without needlessly soaring hosting costs, an automatically scaling hosting environment is a necessity. Somewhat complicating the matter, however, the store admins perform daily product imports, price updates and other administrative work that requires all instances to stay in sync so as to display the same information for all customers. Unfortunately, the preexisting implementation of distributed caching in nopCommerce was, to put it mildly, very slow for the volumes of data involved. The store owners, meanwhile, rightfully priding themselves on running one of the fastest shopping sites in Sweden, were not willing to compromise on response times.
The solution we ended up developing for FamilyWallpapers has, along with a number of other performance optimisations, recently been submitted by us to the core nopCommerce project and is slated to be included in the 4.70 release later this year. We are proud to say that FamilyWallpapers has been running as a distributed, scalable nopCommerce app for over two months with no loss in performance, and it is our hope that our contribution will enable many more nopCommerce stores to continue growing and truly move onto the cloud with all its benefits.
Our changes to the caching strategy can broadly be summarised in three points:
RedisSynchronizedMemory
, a regular fast memory cache that uses Redis events to keep instances in sync,For FamilyWallpapers, these combined improvements resulted in a decrease of, on average,
on a benchmark battery compared to using the distributed cache included in nopCommerce 4.60. Compared to the old memory cache, we observed a reduction of
Lower and more predictable memory usage allows us to host the site on a cheaper server without risking performance dips or downtime, while faster startup and warmup lets us quickly scale to meet increased traffic during peak hours. Faster response times, of course, are crucial to a smooth user experience for the customers.
In the comparisons below, we use the latest 4.60 release of nopCommerce as a baseline, compared to the development branch incorporating our optimisations. The database is a copy of the FamilyWallpapers database, and we use a C1 Basic Azure Redis server (1 GB cache size, 1000 client connections, low network performance). "Warmup" refers to making one request each (in sequence) to the main page, two category pages with different category structure, and the cart, in order to populate the cache. Our regular list of warmup URLs targets a product page as well, but decided not to include that here as they require extra plugins to function.
Startup tests are measured from and including application launch to the end of the warmup phase. For the load tests, we make 10 parallel requests at a time to a total of 200 requests, to the already warmed-up index page.
We measure the execution time from beginning to end as defined for each benchmark above, together with memory and CPU usage during the process. Each experiment is repeated three times for each setting, and all three runs are plotted together.
Here we compare the old implementation of the distributed cache using Redis with our new Redis-synchronised memory cache.
Cache implementation | Startup time (avg.) | Max. memory (avg.) |
---|---|---|
Unoptimised Redis, first load | 1568.4 s | 3240.6 MB[2] |
Unoptimised Redis, second load | 157.2 s | 773.3 MB |
Redis-synchronised memory cache | 23.3 s | 945.3 MB |
During the second startup, Redis is already populated, leading to shorter startup times and less memory usage for the pure distributed cache. The Redis-synchronised memory cache is not affected, as it does not store data on Redis in the first place.
During the load test, although the application is already warmed up, the pure Redis cache still has to retrieve data from Redis for each new request. As this is a rather time-consuming operation, this leads to the CPU spending most of the time idly waiting for data before it can process a request.
Cache implementation | Response time (avg.) |
---|---|
Unoptimised Redis | 20549 ms |
Redis-synchronised memory cache | 867 ms |
The cache optimisations are not limited to distributed caching: we have also improved the regular memory cache in several ways. Owing to enhancements both in how data is stored in the cache and in what kind of data structures are used to cache large collections, startup time has been greatly reduced.
Cache implementation | Startup time (avg.) | Max. memory (avg.) |
---|---|---|
Unoptimised memory cache | 175.5 s | 1302.2 MB |
Optimised memory cache | 24.0 s | 953.8 MB |
The main difference between the old and the new memory-cache implementations is that we now avoid duplicating work when a cache miss occurs. On a warmed-up page, this makes no difference, but the effect on performance is particularly visible when multiple requests for the same page arrive at the same time and the page in question has not been warmed up. In effect, this is the same as when a page loads its components in parallel during warmup, even on a single request. Note that this effect could be visible whenever part of the cache is cleared for whatever reason, such as when a product or category is updated during heavy traffic.
Below, we make a single batch of 10 simultaneous requests to a cold page. 4 requests are processed simultaneously, but the optimised cache acquires the data once while the other three requests wait for the task to finish without using additional resources. The unoptimised cache, on the other hand, starts four identical acquisition tasks that all interfere with each other by using the same resources. Much extra memory is required for data that will later be discarded.
Cache implementation | Total response time (avg.) | Max. memory (avg.) |
---|---|---|
Unoptimised memory cache | 547.7 s | 2067.6 MB |
Optimised memory cache | 18.1 s | 1108.8 MB |
Thanks to several major improvements in the caching strategy, FamilyWallpapers has been able to continue growing with nopCommerce and move from a single virtual machine onto a distributed and scalable modern hosting solution on the cloud, without sacrificing response times. The site can reliably handle increased traffic during peak hours while scaling down to save costs during the night, all while serving content from a huge database at lightning speed.
These improvements have been accepted into the core nopCommerce project and will be available in the next release. Until then, if you are looking to speed up your nopCommerce store or move to distributed hosting, do not hesitate to reach out to Majako via the links below. We also offer plugins for HTML output caching, which further reduces response times and server load, and ElasticSearch integration for fast and improved search and filtering.
Let's keep growing with nopCommerce and continue giving our customers a smooth and enjoyable shopping experience!
majako.se (Swedish site)
The benchmark scripts we used for this article
The pull request for rewriting the cache
The pull request for optimising large collections
It should be noted that the actual response times on the live site were already much lower on most pages (around 500 ms), thanks to Majako's HTML cache plugin. Please contact us at support@majako.se for more information! ↩
We are not entirely certain why the unoptimised Redis cache uses as much memory as it does, but it could be that it first stores the actual objects in memory as they are created, then the JSON-serialised strings that are cached on Redis, and then again the same JSON strings as they are retrieved from Redis when they are next needed, and again deserialised back into C# objects. In that case, the garbage collector will eventually take care of most of them, but the application still requires an enormous amount of extra memory during startup. Even so, the Redis server reported a maximum of 225 MB, so it is likely that the same data is also being created multiple times on simultaneous cache misses. ↩
great
Hey, awesome feature, work, and research.
Quick question. Are eventual consistency / dirty reads a problem with this approach, or did you work around that issue (or accept the trade-off)? It's not clear to me if the application RAM memory model can be behind (out of sync) with redis while waiting for an event, or if there is some sort of synchronization logic that blocks requests. Or some other mechanism to prevents dirty reads. Not even sure if that would be possible, but curious if I am asking the right question and if you considered this at all.
Guest
Great work! thanks for sharing...