Sunny Ahuwanya's Blog

Mostly notes on .NET and C#

Probing the ASP.NET State Service Part I

I have been thinking about developing an alternative ASP.NET State Service which can transparently replace the Microsoft-provided state service.

To be successful, the alternative state service has to convince the web server that it is the state service. That means I need to know the high-level communication protocol used between the state service and the web server.
How does the web server talk to the state service? Do they communicate via .NET remoting calls? Is there a proprietary protocol? Or is the state service really a fancy web service?
Let's investigate.

I'll create an echo server that will listen on a port. I can then set the echo server’s address as the state service address for a web application. If the state service protocol runs over TCP, I should be able to see data transmitted by the web application.

Here's the code snippet for a simple windows socket server that echoes received data:

int port = 24242; //as opposed to the default 42424 :)

IPEndPoint endPoint = new IPEndPoint(IPAddress.Any, port);
Socket server = new Socket(AddressFamily.InterNetwork, SocketType.Stream,ProtocolType.Tcp);
Socket clientHandler = server.Accept();
byte[] transmission = new byte[1024];

//print transmitted data
(To run the code snippet, you'll need to add using statements to reference System.Net and System.Net.Sockets namespaces.)

I then create a test web application and add the following line in the system.web section of the Web.config file:

<sessionState mode="StateServer" stateConnectionString="tcpip=localhost:24242" cookieless="false" timeout="20"/>
<!-- -->

This configures the ASP.NET application to use the state service, except it's actually our echo server.

I also add the following line of code to the web application's Page_Load event handler.

 Session.Add("Test Entry", "Test Data");

Now, I start the echo server (actually a console application) and fire up the web application to see if any transmitted data is captured and displayed in the console window.

Wow. Is that HTTP?
I see a PUT verb followed by some non-standard headers, followed by some binary data.
Well, that certainly makes my task easier considering that if the service was using .NET remoting calls; I'd have needed to delve into the innards of the .NET Remoting API.

I also notice that the transmitted data is not encrypted. I can clearly see "Test Data" in the transmission. Um, Microsoft -- this is not good.

How about the gibberish looking tidbit in the PUT statement? That must be some kind of identifier.

I add the following line in the system.web section of the Web.config file to see if it will cause the data to be encrypted.

<machineKey validationKey='016E9B3DAA748525DCDBAAC999BB390D63E7E1095F56B737887C10291567085B5A3A2142E6C86F06F07558D77260122C1174419212B6A117B6977B285EA8722B' />
<!-- -->

No such luck, however, the encoded data in parenthesis in the PUT verb line changed.
Hmm. Let's try to make some sense out of all these data.

The PUT Verb With Encoded Data:

After URL-decoding the data, I get  /3e50a960(iE+KOE6bwMI7BuHXun98zlcnkb8=)/miztsjiek5gvzu55km3xun55. The backslash is probably a delimiter and the text in parentheses is probably base64-encoded. (the "=" sign gave that away)

After running and observing the web application a few times, I figured the three parts of the PUT verb line are:

Part A: /3e50a960 is constant (I have no clue what it represents. Application ID? Machine ID? Some kind of magic number? I'll investigate)
Part B: (iE+KOE6bwMI7BuHXun98zlcnkb8=) is derived from MachineKey and is used by the state service to differentiate session data from different machines
Part C: /miztsjiek5gvzu55km3xun55 is the session ID

The Headers:

Host: The Hostname of the web application.
Timeout: The number of minutes a session can be idle before it is discarded.
Content-Length: The length of the binary data.
ExtraFlags: No clue. I'll investigate.
LockCookie: Looks eerily familiar. I'll investigate.

Binary Data:

This could be either information about an item to be updated, added or removed from the state service, in which case the web application retrieves and updates items as needed OR it could simply be a serialized list of all items to be stored in the session, in which case the web application reads all items at the beginning of a web request cycle and updates them at the end of the request cycle. To test my assumptions; I'll add a couple more lines of code to the Page_Load event handler.

Session.Add("test entry 2", "some test data");
Session.Add("test entry 3", "even MORE test data");

I then run the web application to see the transmitted data.

I can see the new items I added in the binary data area. I can now conclude that at the start of a web request, the web application retrieves this list, possibly modifies it and sends it back to the service at the end of the request.

Let me pause right here and think about this for a minute.

Is this model efficient? Is it better to read all items in the session at once and update them all at once, or is it better to read and update items as needed?

Well, since the items are stored by session ID and only one session ID is assigned to one user at any point in time. It means the chance that a session item is requested by multiple users/machines is negligible. Therefore there is no need to retrieve and update items as needed because there is no need for concurrency.

Another advantage with this model is that the state service does not need to know about items being stored. It simply stores whatever data is sent to it. This makes development easier.

Also, retrieving and updating items as needed increases the number of connections to the state service. This can greatly affect the scalability of a web application.

The downside to reading and updating all items at once is bandwidth-related: If an application stores a lot of information in the session but uses only a few at a time, the web application will move a lot of unnecessary data to and fro, which may clog the network. This may not be an issue if your web server and state service are physically close or run on the same computer.

A new question comes up; why is the web application NOT querying the state service for the session data when it starts? Does this mean any information already stored by the state service will be overwritten whenever the web application starts?

I ended up with more questions than I began with and so I still need to investigate further but at least I have gleaned some basic facts needed to start working on an alternative state service.

1. The ASP.NET web server and the state service communicate via HTTP.
2. The state service stores the binary data sent by the web server (probably in a large dictionary) using the Session ID + a derivative of the machine key as the key.
3. The state service does not need to itemize the data to be stored because all items are read and updated at once.
4. The transmitted data is not encrypted (at least not by default). The state service is not concerned about this since it simply stores data.