practical http caching
browsers need to be explicitly told by the server how to cache every single thing they receive, or else the browser will store files in its cache and guess at when the stored items need to be updated. As the maintainer of the server config, and as the developer of the scripts running on the server, it means that you must explicitly tell the browser how to cache.
If you do not tell the browser how to cache, it will guess. Usually the browser’s guesses aren’t too bad, which is actually why it is so insidious. Since browsers usually (but not always) make decent guesses, developers don’t immediately realize the need to meticulously control caching, until this happens: Some day long after development is complete, a developer makes a small change to a
.js file. Then for some confusing reason, numerous users start having their pages crash. It turns out that their browsers NEVER re-downloaded that changed
.js file, but actually continued to use the old
.js file from their browser caches. Troubleshooting this is a nightmare. A similar but less catastrophic story happens when a developer changes the contents of an image file, but the site's everyday users never get to see the new image because their browsers keep serving up an old one.
two aspects of caching
this explains the two basic aspects of caching that all servers must handle. There are a lot more cache related headers in http, but these are the minimum of what's needed and they will cover most situations.
caching always has two aspects that need to be handled. The first is governed by an http header called
Cache-Control: max-age=<seconds> and is sent by the server. (also see the note on "Expires" below which does the same thing.) Max-age tells the browser how long in seconds it is allowed to blindly re-use the cached version before asking the server if a newer copy is available. Usually you don’t want this to be 0 seconds, which means ‘always ask if the file has changed’, because users might be surfing quickly from one page to another within your site and there is no sense in requiring a re-checking of most things that are only a minute or so old.
you also don’t want it to be a very large number (which browsers might default to sometimes, like in the
.js file change example above.) You’ll need to set this on a case by case basis depending on how often you intend to change the resource. I tend to use use 60 seconds for really important things like
.html, .php, .js, .css, and 1 hour for images. In theory, the default expiration for a server should always be set to
max-age=0 or some other low number so that the caching defaults to "never cache" or "only cache for a short time". The idea is that it's better that the user sees the latest data by default. No matter what you decide for your yourself, the server needs to have some value set as a default max-age or else the browser’s default will be used, which could be anything.
note that the
Expires: <date> header is an alternative to
Cache-Control: max-age=<seconds>, except it requires a specific date and time instead of a relative future time. Generally, it's easier to think of things expiring x number of seconds or hours into the future than at a specific time, so I don't tend to use "Expires".
the second aspect of caching is called
If-Modified-Since: <date>, which is a header sent by the browser when asking for a file from the server. This allows the server to report back that the copy in the browser’s cache hasn’t changed on the server, and therefore reduces network bandwidth by avoiding a re-send of the whole file. The server should simply check the date and time of the file sitting in the server's file system and only send the file if the date is newer than the date passed in via the
If-Modified-Since header. If the file hasn't changed, it should send a short
304 Not Modified response. For scripts, a simple file date-time check may not be enough. For example, a script should check the dates of all the files it imports, includes or otherwise relies upon and only return a 304 if all of the files haven't changed since the
a note about
<date> in the
If-Modified-Since header is formatted like this:
<day-name>, <day> <month> <year> <hour>:<minute>:<second> GMT.
If-Modified-Since: Sat, 30 Apr 2022 07:51:00 GMT