Why Session State Should Not Be Stored In A Distributed Cache

Web developers often refer to session state stores and caches interchangeably, while in actuality they serve different purposes.

A cache serves as a caching layer between a web application and an external data source. Caches exist mainly to lighten the load on the external data source, thus improving the performance of the application.

The purpose of a session state store is to store a user’s workspace. Session state stores enable client activity to be persisted consistently across several HTTP requests.

Applications that utilize a cache write to the external data source, typically a database, and read from the cache. This technique can significantly improve the performance of applications that mostly read from the data source.
Caches are designed to be read speedily, simultaneously by many clients and threads. Some cache implementations can link the cached object to the data source, such that if the data source is updated, the cache is invalidated.

Session state stores are not linked to external data sources by design, although it can be consumed by an application to do so. While it may be necessary to store certain parts of a client’s session state to a database, there is usually no need to store all of the session state to a database. In fact, many database-less web applications rely solely on session state to operate.
Session state implementations are designed such that each client has exclusive access to its session data. Even though caches can be consumed by an application to simulate this, exclusivity is not enforced and there is always a chance that a client will be able to access another client’s session data due to either poor design or security flaws.

A distributed cache spreads out an application’s caching layer across many machines, which allow high-traffic web applications to scale out by adding more machines as demand increases. Performance can also be improved by distributing session state data across many machines; therefore, it is worthwhile to examine the requirements and nuances of cached data and session state before deciding on a distributed solution to apply.

Distributed caches, like local caches, are most effective when used to cache data that changes infrequently. They also support simultaneous fast reads of a cached object by many threads. This is where session state sharply differs from cached data.

Session state has little need for speedy multiple-thread access to a single stored resource because a client can only exclusively read or update its session. In addition, the usage pattern of session state is unpredictable. Some applications update session data very frequently while some do not. Session state storage designers normally safely assume that session state is write-heavy.

Caches are configured by default to use either an optimistic concurrency mechanism or no concurrency control at all, to access cached data. This design is driven by the strong requirement to eliminate blocking by any means possible, and works superbly due to the lower proportion of writes to reads.
Session state stores, on the other hand, utilize a pessimistic concurrency mechanism to access stored data. This works effectively because of the exclusive nature of resource access.

The number of concurrent session state accesses to a stored resource can increase if a user opens up several web browser instances of the same application or if the application makes use of numerous AJAX calls. Notwithstanding multiple instances and AJAX-intensive applications, a user’s session cannot have more than a handful of concurrent access attempts.
A pessimistic concurrency mechanism, as used by session state stores, can gracefully handle a few concurrent accesses on a write-heavy resource, and more importantly, provide consistent data to all operations. Inconsistencies in served data can arise, if a cache with no concurrency control is employed to store write-heavy session state. This problem becomes more apparent if the application is AJAX-intensive.

Critical applications that rely on session state require failover and redundancy support. These features are usually built into commercial session state storage solutions.
Caches have no need for failover or redundancy because caches are simply a caching layer: if the requested data cannot be retrieved from the cache, it can always be fetched from the primary source. Therefore, most distributed cache implementations do not support failover or redundancy; issues solution architects seldom remember when moving session storage to a distributed cache.

The conundrum of where to store session state arises when an application needs to scale to accommodate more users.
While there are a few commercial distributed session state storage solutions, there are no free robust alternatives, and the usual consensus is to store session state in freely available distributed cache solutions, or eliminate session state entirely from the application.

Moreover, even when session state is manageably stored in a distributed cache, most often, the same servers that are caching infrequently changing data are used to store session state. Sharing the cache this way leads to performance degradation.
This occurs because whenever the cache server needs to store a new cached object or remove an expired one, it has to momentarily suspend all read operations internally on all other cached objects until the object is added or removed. The overall outcome is sub-optimal reads for cached infrequently changing data.

Developers and architects should carefully weigh the aforementioned issues before moving locally stored session state to a distributed storage and should, whenever possible, opt for a solution that was specifically built for distributed session state storage.

Justin W
10/15/2009 11:46:00 AM | Reply

What would be wrong with using a distributed cache which uses identity hashing to load-balance and only contains a single copy, on exactly one instance, of each record? Or does that no longer fit the accepted definition of distributed cache?

Sunny
10/15/2009 12:26:52 PM | Reply

Justin,

That works fine if those records are read from a database and changes are persisted to the database.
If updates are saved directly to the cache, then you have to make sure the cache is set up to use pessimistic locking or else you might run into problems.