Supporting Cache Controls in node.js

I am a big fan of Node.js and am happy to be seeing more instances across the internet. However a trend I have noticed is a misuse or lack of cache control headers, so I am going to show you a quick sample of how to enable support for ETag's and the If-None-Match Cache Validation Method.

Before we get into the code let's first make sure we understand the ETag and If-None-Match HTTP Request and Response Headers.

The ETag Header is part of the HTTP protocol and it is one of several mechanisms that HTTP provides for web cache validation, and allows a client to make conditional requests. This allows caches to be more efficient, and saves bandwidth, as a web server does not need to send a full response if the content has not changed. 

The If-None-Match Header is sent by clients and is the last ETag they received for the object. If the file has not changed the ETag Response and If-None-Matched Request Header will match allowing the server can respond with a 304 Not Modified containing no payload and the requested object will be loaded from cache. This reduces server resources as you are not spending Disk or Network IO to read and send the response and bandwidth as the response is very small compared to returning an object.

So now that we know and understand these values what do they mean to you and how can you use them?

var http = require('http'),
  fs = require('fs');

http.createServer(function (request, response) {
  fs.readFile('.' + request.url, function (err, data) {
    if (err) {
      if (err.errno === 34) {
        response.statusCode = 404;
      }
      else {
        response.statusCode = 500;
      }
      response.end()
    }
    else {
      fs.stat('.' + request.url, function (err, stat) {
        if (err) {
          response.statusCode = 500;
          response.end()
        }
        else {
          etag = stat.size + '-' + Date.parse(stat.mtime);
          response.setHeader('Last-Modified', stat.mtime);

          if (request.headers['if-none-match'] === etag) {
            response.statusCode = 304;
            response.end();
          }
          else {
            response.setHeader('Content-Length', data.length);
            response.setHeader('ETag', etag);
            response.statusCode = 200;
            response.end(data);
          }
        }
      })
    }
  })
}).listen(8080);

Now lets take the above example and examine the parts related to ETag Support

fs.readFile('.' + request.url, function(err,data) {     
  if(err) {
    if(err.errno === 34) {
      response.statusCode = 404;
    }
    else {
      response.statusCode = 500;
    }
    response.end();
  }

So here we read the file the user requested and return a 404 File not Found error if the file is not found, a 500 Internal Server Error if an error occurs reading the file, or assign the contents of the file to the variable data.

fs.stat('.' + request.url, function(err,stat) {
  if(err) {
    response.statusCode = 500;
    response.end()

This next part is where we start to build our ETag, using fs.stat we gather some data about our requested file. If an error occurs reading the file information we return a 500 Internal Server Error to the end user

etag = stat.size + '-' + Date.parse(stat.mtime);

response.setHeader('Last-Modified', stat.mtime);

if(request.headers['if-none-match'] === etag) {
  response.statusCode = 304;
  response.end();
}
else {
  response.setHeader('Content-Length', data.length);
  response.setHeader('ETag', etag);
  response.statusCode = 200;
  response.end(data);
}

Ok so now that we are done with all of our basic error handling lets get to the actual part that pertains to handling the request and responding to the client. We accomplish this using the stat object returned from the fs.stats call from above.

The first thing we want to do is create the etag, typically this is crafted using inode details, file size, and file modified time. This however is not the best solution as if you have a cluster of servers the inode will be different across machines so we are going to take that out of the equation here and only use file size and last modified times.

Once we have our etag data for the requested file we set the common value Last-Modified which will be used by both response types.

Now we check if the request contains the header If-None-Modified and if it matches our etag. If they do match we response with a 304 Not Modified and that is it, if they do not match we set the Content-Length and ETag Headers as well as the HTTP Status Code 200 and the requested file contents as the payload.

Now let's do some testing. Grab a copy of the script above and create a test file such as test.css, start up the example script and open your browser to http://localhost:8080/test.css

If you are using Google Chrome or Firefox open up your Network Tab in the Developer Tools and notice the first request returned a status code of 200 and the contents of your file. Now reload the page, you should get back a 304 Not Modified Response and if you look at the bytes transferred you will notice this was much smaller than your first request.

Remember just because you use a reverse proxy in front of your Node.js application does not mean you are sending these headers to your clients. Do some testing on your own to see the difference in performance and server resources once you enable support for ETag's in your applications

The sample provided does not cover all of your error handling, security or performance, it is meant as an example only and should be secured before using in a production environment.

Decommissioning Servers

We're currently undergoing an internal project to identify, consolidate and decommission aging equipment on our Network. It's funny, but I always find it hard to turn off equipment that has been running for 1.5 years without an issue.

srv01# uptime

 5:31PM  up 530 days, 10:11, 1 user, load averages: 0.46, 0.23, 0.09

 

people# uptime

 5:53PM  up 530 days, 14:30, 2 users, load averages: 0.00, 0.00, 0.00

 

inetops# uptime

 5:54PM  up 531 days, 11:45, 2 users, load averages: 0.00, 0.00, 0.00

378 Thousand Prefixes, 196 MB of Ram

That's how big the IPv4 routing table is now...

BGP table version is 111517930, main routing table version 111517930
378326 network entries using 45777446 bytes of memory
2328831 path entries using 121099212 bytes of memory
422707/66873 BGP path/bestpath attribute entries using 32125732 bytes of memory
189870 BGP AS-PATH entries using 5090228 bytes of memory
12651 BGP community entries using 1048178 bytes of memory
2 BGP extended community entries using 48 bytes of memory
29 BGP route-map cache entries using 928 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 205141772 total bytes of memory
1130746 received paths for inbound soft reconfiguration
BGP activity 4351207/3965251 prefixes, 55077482/52711615 paths, scan interval 60 secs

Apache Mod Proxy Configuration

Setting up mod_proxy is becoming a more common task, especially with all the enhancements to the module of late such as mod_proxy_ajp. Setting up a Front End Apache server to proxy requests to say your Tomcat Application Server offers you a few things such as security and performance optimizations, however it is quite easy to make a small mistake and turn your web site into an open web proxy for the Internet to use anonymously.

Apache can be configured in both a forward and reverse proxy (also known as gateway) mode.

A typical usage of a forward proxy is to provide Internet access to internal clients that are otherwise restricted by a firewall. The forward proxy can also use caching (as provided by mod_cache) to reduce network usage.

The forward proxy is activated using the ProxyRequests directive. Because forward proxies allow clients to access arbitrary sites through your server and to hide their true origin, it is essential that you secure your server so that only authorized clients can access the proxy before activating a forward proxy.

A reverse proxy (or gateway), by contrast, appears to the client just like an ordinary web server. No special configuration on the client is necessary. The client makes ordinary requests for content in the name-space of the reverse proxy. The reverse proxy then decides where to send those requests, and returns the content as if it was itself the origin.

A typical usage of a reverse proxy is to provide Internet users access to a server that is behind a firewall. Reverse proxies can also be used to balance load among several back-end servers, or to provide caching for a slower back-end server. In addition, reverse proxies can be used simply to bring several servers into the same URL space.

Below is a sample configuration using Apache as a reverse proxy in front of a Tomcat Application Server using the Tomcat HTTP Connector.

    ProxyRequests Off
    ProxyPass / http://localhost:8080/
    ProxyPassReverse / http://localhost:8080/ 

The key here which is sometimes mistaken is ProxyRequests Off, some people think you need to turn this flag to use Mod Proxy at all. This is the mistake. By turning that flag on you are telling your Web Server to proxy requests from users to the resource they requested. 

A reverse proxy is activated using the ProxyPass directive or the [P] flag to the RewriteRule directive. It is not necessary to turn ProxyRequests on in order to configure a reverse proxy.

Want to test this, turn ProxyRequests On and restart your Apache Web Server. Now open your web browser and enable a proxy server using your website as the address and port 80. Now open http://blog.phyber.com as you can see the site loaded as expected. Check out your access logs on your Web Server for confirmation you are using your Web Server as a Proxy Server.

Other useful mod_proxy examples include load balancing to multiple backend hosts. In this example we will be proxying traffic to two backend servers using the cookie JSESSIONID for sticky sessions.

<Proxy balancer://mycluster>
    BalancerMember ajp://server01:8009 stickysession=JSESSIONID
    BalancerMember ajp://server02:8009 stickysession=JSESSIONID
</Proxy>

ProxyPass /webapp balancer://mycluster/webapp

Note in this configuration using mod_proxy_ajp a ProxyPassReverse is not needed as the AJP request includes the original host given to the proxy, and the application server can respond properly so no rewriting of the host request is required.

In the next example we will take use of mod_disk_cache to serve content and if it not found pass off to the request to a remote server behind the firewall with the exception of static content such as javascript, stylesheets and images which will be served locally from Apache itself.

<VirtualHost *:80>
    ServerName www.phyber.com
    DocumentRoot /opt/phyber/html

    CacheEnable disk /
    CacheRoot /var/cache/phyber
    CacheDefaultExpire 60
    CacheMaxExpire 3600

    ProxyPass /images !
    ProxyPass /stylesheets !
    ProxyPass /javascripts !
    ProxyPass / http://remote.phyber.com/
    ProxyPassReverse / http://remote.phyber.com/
</VirtualHost>

Note in this configuration we have ProxyPass /images ! the exclamation point tells mod_proxy not to proxy requests for that folder. So all requests for http://www.phyber.com/images/ will load directly from the local server from the folder /opt/phyber/html/images/. In this configuration mod_disk_cache is used first and if the requested object is not found in the cache the request is forwarded on to the server remote.phyber.com unless the object is from the static content directories excluded.

Look for furture articles on common mod_proxy usage and mod_cache here a http://blog.phyber.com

Apache Name Based SSL Virtual Hosting using Server Name Identification (SNI)

One difficulty when it comes to hosting multiple websites on single servers has always been SSL. By the nature of how the web works when you open a website in your browser a HTTP Header with the requested host is present. A web server can then read this information and direct traffic to the appropriate Virtual Host matching the Host Header. With SSL this information is encrypted when the request first hits the Web Server so the Web Server replies with the default site, once the data is decrypted the header can be read but by this time it is too late in the process and the default Virtual Host replies.

The way to get around this in the past has been to assign each SSL host to its own IP Address on sub interfaces so the Web Server can distinguish between the various sites. This approach has worked however it becomes quite a maintenance nightmare in your Apache configuration as well as your server’s network configuration.

The solution to this problem is an extension to the SSL protocol named Server Name Identification (SNI) introduced in RFC 4366 which allows the client to include the requested hostname in the first message of the SSL handshake allowing the Web Server to determine which Named Virtual Host to send the request to and setup the connection accordingly.

Starting with Apache 2.2.12 and later you can take advantage of SNI for your SSL Virtual Hosting. There are some requirements for SNI to work properly.

  • You must be running OpenSSL 0.9.8f or later
  • You must configure OpenSSL with the TLS Extensions Enabled
  • Apache must be built with OpenSSL Support (--with-ssl)

Another requirement is the Client must support SNI. Below is a list of Browsers supporting SNI

  • Mozilla Firefox 2.0 or later
  • Opera 8.0 or later (with TLS 1.1 enabled)
  • Internet Explorer 7.0 or later (on Vista, not XP)
  • Google Chrome
  • Safari 3.2.1 on Mac OS X 10.5.6

If the client does not support SNI Apache will return a 403 Error Response to the user. You can disable Strict Host Checking in which the default Virtual Host will respond as before when a request came in that did not match any defined Virtual Hosts, this is because once again the server is unable to read the Host header until it completed the SSL negotiation and can read the data.

To enable Named Based Virtual Hosting for SSL you have to make a few changes to your existing Apache Configurations. Below you can find a sample configuration for two different domains running SSL.

 

NameVirtualHost *:443

<VirtualHost *:443>
    ServerName www.domain-one.com
    DocumentRoot  /var/www/domain-one.com

    SSLEngine on
    SSLProtocol all -SSLv2
    SSLCipherSuite ALL:!ADH:!EXPORT:!SSLv2:RC4+RSA:+HIGH:+MEDIUM:+LOW
    SSLCertificateFile conf.d/www.domain-one.com.crt
    SSLCertificateKeyFile conf.d/www.domain-one.com.key
    ErrorLog logs/www.domain-one.com_error_log
    CustomLog logs/www.domain-one.com_access_log
</VirtualHost>

<VirtualHost *:443>
    ServerName www.domain-two.com
    DocumentRoot  /var/www/domain-two.com

    SSLEngine on
    SSLProtocol all -SSLv2
    SSLCipherSuite ALL:!ADH:!EXPORT:!SSLv2:RC4+RSA:+HIGH:+MEDIUM:+LOW
    SSLCertificateFile conf.d/www.domain-two.com.crt
    SSLCertificateKeyFile conf.d/www.domain-two.com.key
   ErrorLog logs/www.domain-two.com_error_log
   CustomLog logs/www.domain-two.com_access_log
</VirtualHost>

Level3 Outage - Likely Juniper Bug

Information is still limited at this time - at approximately 14:15 UTC an event occured causing our BGP Sessions with Level3 to reset and re-establish five minutes later.

There are confirmed reports that this event within Level3 has affected most of North America. This outage has affected other networks running Juniper routers with the majority of them seeing their devices core dump and reload.

At this point the speculation is that a new BGP Update Bug within specific Juniper version trains was just discovered/triggered.

Phyber's engineering staff will be monitoring the situation closely and will disable our Level3 peerings if necessary to maintain stability of the network.

One Wilshire Power Notification

Date: Thursday, October 13, 2011 

Location of EventOne Wilshire, 624 Grand Ave, Los Angeles California

Critical Load: On Generator Power

Risk to Operation: Possible disruption in powered services.

Summary Description of Event:  The Data Center has experienced a utility power disruption. While power in the data center is protected, CoreSite facilities staff is on site and verifying proper functionality of the facility systems.