Supporting Cache Controls in node.js

I am a big fan of Node.js and am happy to be seeing more instances across the internet. However a trend I have noticed is a misuse or lack of cache control headers, so I am going to show you a quick sample of how to enable support for ETag’s and the If-None-Match Cache Validation Method.

Before we get into the code let’s first make sure we understand the ETag and If-None-Match HTTP Request and Response Headers.

The ETag Header is part of the HTTP protocol and it is one of several mechanisms that HTTP provides for web cache validation, and allows a client to make conditional requests. This allows caches to be more efficient, and saves bandwidth, as a web server does not need to send a full response if the content has not changed. 

The If-None-Match Header is sent by clients and is the last ETag they received for the object. If the file has not changed the ETag Response and If-None-Matched Request Header will match allowing the server can respond with a 304 Not Modified containing no payload and the requested object will be loaded from cache. This reduces server resources as you are not spending Disk or Network IO to read and send the response and bandwidth as the response is very small compared to returning an object.

So now that we know and understand these values what do they mean to you and how can you use them?

var http = require('http'),
  fs = require('fs');

http.createServer(function (request, response) {
  fs.readFile('.' + request.url, function (err, data) {
    if (err) {
      if (err.errno === 34) {
        response.statusCode = 404;
      }
      else {
        response.statusCode = 500;
      }
      response.end()
    }
    else {
      fs.stat('.' + request.url, function (err, stat) {
        if (err) {
          response.statusCode = 500;
          response.end()
        }
        else {
          etag = stat.size + '-' + Date.parse(stat.mtime);
          response.setHeader('Last-Modified', stat.mtime);

          if (request.headers['if-none-match'] === etag) {
            response.statusCode = 304;
            response.end();
          }
          else {
            response.setHeader('Content-Length', data.length);
            response.setHeader('ETag', etag);
            response.statusCode = 200;
            response.end(data);
          }
        }
      })
    }
  })
}).listen(8080);

Now lets take the above example and examine the parts related to ETag Support

fs.readFile('.' + request.url, function(err,data) {     
  if(err) {
    if(err.errno === 34) {
      response.statusCode = 404;
    }
    else {
      response.statusCode = 500;
    }
    response.end();
  }

So here we read the file the user requested and return a 404 File not Found error if the file is not found, a 500 Internal Server Error if an error occurs reading the file, or assign the contents of the file to the variable data.

fs.stat('.' + request.url, function(err,stat) {
  if(err) {
    response.statusCode = 500;
    response.end()

This next part is where we start to build our ETag, using fs.stat we gather some data about our requested file. If an error occurs reading the file information we return a 500 Internal Server Error to the end user

etag = stat.size + '-' + Date.parse(stat.mtime);

response.setHeader('Last-Modified', stat.mtime);

if(request.headers['if-none-match'] === etag) {
  response.statusCode = 304;
  response.end();
}
else {
  response.setHeader('Content-Length', data.length);
  response.setHeader('ETag', etag);
  response.statusCode = 200;
  response.end(data);
}

Ok so now that we are done with all of our basic error handling lets get to the actual part that pertains to handling the request and responding to the client. We accomplish this using the stat object returned from the fs.stats call from above.

The first thing we want to do is create the etag, typically this is crafted using inode details, file size, and file modified time. This however is not the best solution as if you have a cluster of servers the inode will be different across machines so we are going to take that out of the equation here and only use file size and last modified times.

Once we have our etag data for the requested file we set the common value Last-Modified which will be used by both response types.

Now we check if the request contains the header If-None-Modified and if it matches our etag. If they do match we response with a 304 Not Modified and that is it, if they do not match we set the Content-Length and ETag Headers as well as the HTTP Status Code 200 and the requested file contents as the payload.

Now let’s do some testing. Grab a copy of the script above and create a test file such as test.css, start up the example script and open your browser to http://localhost:8080/test.css

If you are using Google Chrome or Firefox open up your Network Tab in the Developer Tools and notice the first request returned a status code of 200 and the contents of your file. Now reload the page, you should get back a 304 Not Modified Response and if you look at the bytes transferred you will notice this was much smaller than your first request.

Remember just because you use a reverse proxy in front of your Node.js application does not mean you are sending these headers to your clients. Do some testing on your own to see the difference in performance and server resources once you enable support for ETag’s in your applications

The sample provided does not cover all of your error handling, security or performance, it is meant as an example only and should be secured before using in a production environment.

5 thoughts on “Supporting Cache Controls in node.js

  1. My circumstances is that server judge the request and return 304 automaticaly so that the server function did not run. I must press F5 to force refresh my page to get new state of the page.
    How to solve this problem.

    • If this is on a page you need content to always be fresh you could set no-cache headers.

      It would depend on the application, page and what you were doing for how you would want to set your caching.

      If this is an API or Ajax call I would set no-cache headers so you are always returning ‘fresh’ data and do the caching on the back-end using something like Memcached. If this is on static content and changes to .CSS or .JS are not being reflected you could use a versioning scheme and increment the file name or pass a parameter such as file.css?v=1, file.css?v=2 or file-1.1.css, file-1.2.css.

  2. Just curious, does it make sense to use ETag and If-None-Match for static content?

    I use Nginx as a reverse proxy and configure it to serve static content and relay everything else to Node: http://skovalyov.blogspot.dk/2012/10/deploy-multiple-node-applications-on.html

    And Nginx sends Last-Modified header. Like this:

    Last-Modified:Mon, 25 Mar 2013 13:43:46 GMT

    Then the browser sends If-Modified-Since header. Like this:

    If-Modified-Since:Mon, 25 Mar 2013 13:43:46 GMT

    I consider for static content it is a bit more informative and prefer to use ETag and If-None-Match for RPC calls. For example, when sending steady large responses dependent on a slow 3rd party API.

  3. There is a lot of debate over ETags in general and depending how you use or configure them I agree where they can be a negative impact on things. A common way to generate the ETag is to use the inode information which is not consistent across multiple servers. In a load balancing situation the ETag can change just based on which server you hit. In my example code above I used the file size and modified time as those should be consistent in a load balanced or multiserver environment. In Apache you can set ETag -inode to disable the use of Inodes in generating the ETag for Apache. NGiNX you can only enable or disable ETag support and I have not really dug into how they generate the ETags so I am not sure if they are susceptible to this problem.

    So ETags aside my preferred approach for Static Content is as follows.

    Create 2-3 Virtual Hosts for Static Content (Depending on your application). So for example you have static.foo.bar, static1.foo.bar, static2.foo.bar. These can be simple CNAMES to your main domain. I set expires max; and the document_root to the root of my static content.

    I then create a Virtual Host for foo.bar which is the reverse proxy into my application server.

    Sample NGiNX Configuration


    server {
    listen 80;
    server_name static.foo.com static1.foo.com static2.foo.com;
    root /path/to/app/public;
    access_log off;
    expires max;
    add_header Cache-Control "public, max-age=315360000";
    }

    server {
    listen 80;
    server_name foo.com;
    root /path/to/app/public;

    location / {
    proxy_set_header X-Real-IP $remote_addr;
    proxy_pass http://localhost:3000;
    }
    }

    A few things you gain here are first parallel downloads of static content and Far Future Expiration headers on your static content managed by NGiNX.

    HTTP 1.1 Spec recommends downloading 2 objects per host name at a time. HTTP 1.0 oddly recommends 4 objects per host. Most browsers follow these specs and multi-thread downloads at those values.

    Yahoo years ago did a series of tests and recommend 2 Hosts for Static Content whereas Gomez running the same test recommend 3 Hosts for Static Content. This really is somewhat dependent on your application. If for example you have a large product catalog with lots of thumbnails 3 hosts might be better than 2.

    So all of this is great …. now here comes the problem with this. Far future expiration headers on say main.css might have negative affects if the browser loads the cached version vs. the live. One way to combat this is with a versioning schema. Each release you increment the version number of the file so main.css becomes main.css?v=1, main.css?v=2, so on an so forth. Which will force the browser to fetch a new version with each change / release.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>