Why Session State Should Not Be Stored In A Distributed Cache

clock Wednesday, 30 September 2009 21:16 by sunny

Web developers often refer to session state stores and caches interchangeably, while in actuality they serve different purposes.

A cache serves as a caching layer between a web application and an external data source. Caches exist mainly to lighten the load on the external data source, thus improving the performance of the application.

The purpose of a session state store is to store a user’s workspace. Session state stores enable client activity to be persisted consistently across several HTTP requests.

Applications that utilize a cache write to the external data source, typically a database, and read from the cache. This technique can significantly improve the performance of applications that mostly read from the data source.
Caches are designed to be read speedily, simultaneously by many clients and threads. Some cache implementations can link the cached object to the data source, such that if the data source is updated, the cache is invalidated.

Session state stores are not linked to external data sources by design, although it can be consumed by an application to do so. While it may be necessary to store certain parts of a client’s session state to a database, there is usually no need to store all of the session state to a database. In fact, many database-less web applications rely solely on session state to operate.
Session state implementations are designed such that each client has exclusive access to its session data. Even though caches can be consumed by an application to simulate this, exclusivity is not enforced and there is always a chance that a client will be able to access another client’s session data due to either poor design or security flaws.

A distributed cache spreads out an application’s caching layer across many machines, which allow high-traffic web applications to scale out by adding more machines as demand increases. Performance can also be improved by distributing session state data across many machines; therefore, it is worthwhile to examine the requirements and nuances of cached data and session state before deciding on a distributed solution to apply.

Distributed caches, like local caches, are most effective when used to cache data that changes infrequently. They also support simultaneous fast reads of a cached object by many threads. This is where session state sharply differs from cached data.

Session state has little need for speedy multiple-thread access to a single stored resource because a client can only exclusively read or update its session. In addition, the usage pattern of session state is unpredictable. Some applications update session data very frequently while some do not. Session state storage designers normally safely assume that session state is write-heavy.

Caches are configured by default to use either an optimistic concurrency mechanism or no concurrency control at all, to access cached data. This design is driven by the strong requirement to eliminate blocking by any means possible, and works superbly due to the lower proportion of writes to reads.
Session state stores, on the other hand, utilize a pessimistic concurrency mechanism to access stored data. This works effectively because of the exclusive nature of resource access.

The number of concurrent session state accesses to a stored resource can increase if a user opens up several web browser instances of the same application or if the application makes use of numerous AJAX calls. Notwithstanding multiple instances and AJAX-intensive applications, a user’s session cannot have more than a handful of concurrent access attempts.
A pessimistic concurrency mechanism, as used by session state stores, can gracefully handle a few concurrent accesses on a write-heavy resource, and more importantly, provide consistent data to all operations. Inconsistencies in served data can arise, if a cache with no concurrency control is employed to store write-heavy session state. This problem becomes more apparent if the application is AJAX-intensive.

Critical applications that rely on session state require failover and redundancy support. These features are usually built into commercial session state storage solutions.
Caches have no need for failover or redundancy because caches are simply a caching layer: if the requested data cannot be retrieved from the cache, it can always be fetched from the primary source. Therefore, most distributed cache implementations do not support failover or redundancy; issues solution architects seldom remember when moving session storage to a distributed cache.

The conundrum of where to store session state arises when an application needs to scale to accommodate more users.
While there are a few commercial distributed session state storage solutions, there are no free robust alternatives, and the usual consensus is to store session state in freely available distributed cache solutions, or eliminate session state entirely from the application.

Moreover, even when session state is manageably stored in a distributed cache, most often, the same servers that are caching infrequently changing data are used to store session state. Sharing the cache this way leads to performance degradation.
This occurs because whenever the cache server needs to store a new cached object or remove an expired one, it has to momentarily suspend all read operations internally on all other cached objects until the object is added or removed. The overall outcome is sub-optimal reads for cached infrequently changing data.

Developers and architects should carefully weigh the aforementioned issues before moving locally stored session state to a distributed storage and should, whenever possible, opt for a solution that was specifically built for distributed session state storage.

Digg It!RedditDel.icio.usStumbleUponTechnorati

When Documentation Does Not Match Implementation

clock Saturday, 19 September 2009 19:39 by sunny

Not too long ago, I needed to piece out the communication protocol between the ASP.NET state server and the web server, because I was working on a peer to peer version of the ASP.NET state server.

I made some effort to obtain the protocol as described in this post. Along the way, I found the protocol documentation by Microsoft, and was delighted. I could now design my implementation of the state server based on this information.

While going through the documentation, I noticed that the format of some of the messages did not match what I had earlier observed, so I decided to verify the protocol to be extra sure – Boy, was I in for a surprise.

First, there are bold, wrong statements in the documentation:

From section 3.1.5.3: “Because a client sends a lock-cookie value along with the session state data, the state server MUST store the lock-cookie value. Internally, the state server MUST also store the date and time when the state server received the lock-cookie value. This information is necessary if the state server ever has to send response-locked messages, as specified in sections 2.2.5.2 and 2.2.5.4.”

This is a wrong, misleading statement. A LockCookie value is used by a Set Request message to unlock a locked session entry (if it is locked) before storing the new data.
The state server has no use for storing the client LockCookie value, in fact, the state service MUST never store any LockCookie value sent by the client or in any way let the client influence LockCookie values as they are exclusively generated by the server.

From section 3.1.5.5: “A client can acquire an exclusive lock on session state by using either a successful GetExclusive_Request or Set_Request message.”

This is another off the mark statement. A client can only acquire an exclusive lock with the GetExclusive Request.
Even a cursory look at the SessionStateStoreProviderBase class is sufficient to confirm that this statement is wrong.

Then, there is important information that is left unstated:

The document does not mention anywhere that the ActionFlags header actually indicates that the server should only store the presented data if the unique session id does not already exist. If the ActionFlags header value is set to 1, and the server already has the presented session id, the existing session data will not be updated with the new one, however the state server will still reply with an OK response (as if it stored the data). This behavior is not easily noticeable to the casual observer, but is important to implement a 100% compatible state server.

There are other inaccuracies and misleading statements in the documentation that makes it virtually impossible for anyone to develop a state server implementation using the Microsoft documentation.
I had to painstakingly piece out the protocol from scratch. I’ve published the correct version of the protocol in PDF format and HTML format for reference purposes.

What's more, if you take a look at the history of the Microsoft document, you’ll notice that it has been edited more than twenty times over the course of almost three years. You’d think that after twenty edits, it will be somewhat accurate, but after almost three years of editing the documentation for a major ASP.NET server, Microsoft still manages to get it wrong.  It’s safe to say that either Microsoft is doing this intentionally or they do not know how their own technology works.

What’s even more disturbing is that this documentation is for a protocol, not for a piece of software. Do protocols change every other month? Imagine the chaos that would ensue if every developer had to second guess each statement in RFC-2821 when writing an email client and then also make sure that the protocol hasn’t changed every other month.

It seems the only reason Microsoft publishes these specs at all is to pay lip service to the European Union because there is no point publishing specs that are innacurate and can't save developers’ time when implementing a technology.

Digg It!RedditDel.icio.usStumbleUponTechnorati