When was the last time you obtained a 10x (ten times, 1000%) performance gain with a single improvement?
Not recently, I bet. Most optimizations work incrementally, eking out 3% here, 2% there, and only achieve an observable effect by iterating many such optimization steps. Even algorithmic improvements, such as replacing an O(n²) algorithm by an O(n·log(n)) one, typically get you on the order of 3 or 4 times performance improvement, at least on the data sizes in typical use at the time the improvement is made. So let me tell you how I improved performance of JPS, my web app to apply IPS patches, tenfold.
Once upon a time…
Soon after the initial public version of JPS, I started working on support of another format that (among other processing) requires the CRC32 of the whole file to be obtained, which is best done in blocks of, say, 1024 bytes rather than reading from the file byte by byte. Given my prior experiences, I dreaded the performance penalty from having to (re-)visit every single byte of the file, but it turned out to perform surprisingly well. Why couldn’t I get the same performance when processing IPS files?
From now on, my longer posts will have random photos from my various trips inserted to serve as breathers. This is the Mont Blanc, lighted by sunset light.
And now I turn on the cache, and measure the performance improvement… and files that used to take Chrome 50 seconds to process now take 5 seconds! (Chrome being my reference browser for development of JPS). And the 10x factor is consistent, applying over various source files, often turning the processing time into “too short to measure”, and over various platforms: the same files which took around 200 seconds on Chrome for Android now take 20 (and the behavior of desktop Chrome on Windows was the same as on Mac OS X). Similar improvements could be observed with desktop Firefox, with processing times going from 20 seconds to 2 seconds.
I reported these findings on the Chromium discussion forums (Chrome being the worst offender), because surely that meant something was wrong with Chrome somewhere. However, not much came out of it, so I decided to productize the cache so as to deploy these performance improvements in production.
From proof of concept to production-worthy code
The proof of concept assumed that, for every read operation except from the first it could just append a new read request from the client to its todo list, and once control would bubble up back to the cache code, the request could be served there if it was in cache. That worked in most cases, at least enough to get performance measurements; but in some cases, a new request would be logged from code that was not called from a callback from our cache, so it would never bubble up back to our code and never be served, and the pump would drain.
A young ibex.
Easy enough, I thought, I will get rid of the todo list and instead I will always defer processing by calling
But it was slow. Even slower than without the cache.
Turns out, the overhead of calling
setTimeout(,O) and getting called back by it was killing this solution. What to do, what to do, what to do? Back to the drawing board, I came up with the solution: reinstate the todo list, and use it, but only if we can tell for sure that we are within code that is being called by cache code — which entails keeping track of that information. If we are not within code that is being called by cache code, only then use
setTimeout(,O). That managed to both work in all cases and with good performance.
And then I also had to support aborting requests, adding a number of unit tests, fix a few bugs… and then it was done.
The Grandes Jorasses.
What have we learned?
- Don’t diss CS or the CS curriculum. You never know when what you learn there might turn out to be useful.
- Sometimes the obvious solution is the right one.
- The source of slowness isn’t reading files, per se, but rather the shocking overhead of calling a Web API and getting called back by it (whether it be
setTimeout(,O)), which by my estimates is around 2 ms for each such operation with Chrome on a modern desktop machine. This is crazy. Other browsers (with the exception of Internet Explorer/Edge, which I have not been able to test) fare better, but still have enough overhead that you have to wonder what is going on in there.
Get the code
I set up a specific project for the cache code: you can get the code on BitBucket, and I also published it on NPM as simple-file-cache. It is free to use and modify (under the terms of the BSD license). If you find it useful, I request that you consider donating to the ACLU and the UNHCR, however.
P.S.: While I’ve got your attention, I’m happy to report that JPS will soon support Safari, as this browser is finally about to get support for the download attribute and downloading blobs, normally as part of Safari 10.1, which is meant to arrive with Mac OS X 10.12.4. Being able to be used on a stock install of Mac OS X will be a huge milestone for JPS and the viability of web apps in general as a way to circumvent Developer ID and Gatekeeper.