What will the long-term solutions be to Meltdown and Spectre?

It’s hard to believe it has now been more than one year since the disclosure of Meltdown and Spectre. There was so much frenzy in the first days and weeks that it has perhaps obscured the fact any solutions we currently have are temporary, barely secure, spackle-everywhere stopgap mitigations, and now that the dust has settled on that, I thought I’d look at what researchers and other contributors have come up with in the last year to provide secure processors – without of course requiring all of us to rewrite all our software from scratch.

Context

Do I need to remind you of Meltdown and Spectre? No, of course not; even if you’re reading this 20 years from now you will have no trouble finding good intro material to these. So as we discuss solutions, my only contribution would be this: it is important to realize designers were not lazy. For instance, they did not “forget” the caches as part of undoing speculative work in the processor, as you can’t “undo” the effect of speculation on the caches: for one, how would you reload the data that was evicted (necessary in order to be a real undo)? You can’t really have checkpoints in the cache that you roll back to, either: SafeSpec explores that, and besides still leaking state, more importantly it precludes any kind of multi-core or multi-socket configuration (SafeSpec is incompatible with cache coherency protocols), a non-starter in this day and age (undoing cache state is also problematic in multi-core setups, as the cache state resulting from speculative execution would be transitorily visible to other cores).

It is also important to realize preventing aliasing in branch prediction tracking slots would not fundamentally solve anything: even if this was done, attackers could still poison BHS and possibly BTB by coercing the kernel into taking (resp. not taking) the attacked branch, through the execution of ordinary syscalls, and then use speculative execution driven by that to leak data through the caches.

Besides information specific to Meltdown and Spectre, my recommended reading before we begin is Ulrich Drepper on the modern computer memory architecture, still current, and Dan Luu on branch prediction: this will tell you the myriad places where processors generate and store implicit information needed for modern performance.

The goal

As opposed to the current mitigations, we need systemic protection against whole classes of attacks, not just the current ones: it’s not just that hardware cannot be patched, but it also has dramatically longer design cycles which means protecting only against known risks at the start of a projet would make the protections obsolete by the time the hardware ships. And even if patching was a possibility, it’s not like the patch treadmill is desirable, anyway (in particular, adding fences, etc. around manually identified vulnerable sequences feels completely insane to me and amounts to a dangerous game of whack-a-vulnerability: vulnerable sequences will end up being added to kernel code faster than security-conscious people will figure them out). Take for instance, the Intel doc which described the Spectre (and Meltdown) vulnerability as being a variant of the “confused deputy”; this is its correct classification, but I feel this deputy is confused because he has been given responsibility of the effect of speculative executions of his code paths, a staggering responsibility he has never requested in the first place! No, we need to attack these kinds of vulnerabilities at the root, such that they cannot spawn new heads, and those two techniques do so.

DAWG

First is DAWG. The fundamental idea is very intriguing: it is designed to close off any kind of cache side channel state¹, not merely tag state (that is, whether a value is present in the cache or not), and designed to close off data leaks regardless of which phenomenon would feed any such side channel: it is not limited to speculative execution. How do they ensure that? DAWG does so by having the OS dynamically partition all cache levels, and then assign the partitions, in a fashion similar to PCID.

This means that even with a single processor core, there are multiple caches at each level, one per trust domain, each separate from its siblings, and having a proportional fraction of the size and associativity of the corresponding physical cache of that level (cache line size and cache way size are unaffected). This piggybacks on recent evolutions (Intel CAT) to manage the cache as a resource to provision, but CAT is intended for QoS and offers limited security guarantees.

As long as data stays within its trust domain, that is all there is to it. When cross-partition data transfer is necessary, however, the kernel performs it by first setting up a mixed context where reads are to be considered as belonging to one domain, but writes to another, then performs the reads and writes themselves: it affords best possible cache usage during and after transfer.

Such an organization raises a number of sub-problems, but they seem to have done a good job of addressing those. For instance, since each cache level is effectively partitioned, the same cache line may be in multiple places in the same physical cache level, in different domains, which is not traditional and requires some care in the implementation. The kernel has access to separate controls for where eviction can happen, and where hits can happens, this is necessary for a transition period whenever resizing the partitions. DAWG integrates itself with cache coherency protocols, by having each cache partition behave mostly, but not exactly, like logically separate cache for cache coherency purposes: one particularly important limitation we will come back to is that DAWG cannot handle a trust domain attempting to load a line for writing when a different domain already owns that line for writing.

In terms of functional insertion, they have a clever design where they interpose in a limited number of cache operations so as not to insert themselves in the most latency-critical parts (tag detection, hit multiplexing, etc.). It requires some integration with the cache replacement algorithm, and they show how to do so with tree-PLRU (Pseudo Least Recently Used) and NRU (Not Recently Used).

In terms of features, DAWG allows sharing memory read-only, CoW (Copy on Write), and one-way shared memory where only one trust domain can have write access. DAWG only features a modest efficiency hit compared to the insecure baseline, though it depends on policy (CAT has similar policy-dependent behavior).

On the other hand, there are a few, though manageable, limitations.

  • DAWG disallows sharing physical memory between different trust domains where more than one domain has write access, due to impossibility to manage cache coherence when more than one domain wants to write to two cache lines corresponding to the same physical address. I feel this is manageable: such a technique is probably extremely hard to secure given the possibility of a side channel through cache coherency management state, as MeltdownPrime and SpectrePrime have demonstrated, so we would need an overview of the main uses of where such memory sharing happens; off the top of my head, the typical use is for the framebuffer used for IPC with WindowServer/X11, in which case the need in the first place is only for one-way transfer, the solution here would be to change permissions to restrict write rights to one side only.
  • DAWG provides no solution for transfer in/out of shared physical memory between different trust domains where neither is the kernel. But as we just saw, the allocation of such a thing need only be done by specific processes (perhaps those marked with a specific sandbox permission?), and transfer could be performed by the kernel on behalf of the allocating domain through a new syscall.
  • Hot data outside the kernel such as oft-called functions in shared libraries (think objc_msgSend()), while residing in a single place in physical memory, would end up being copied in every cache partition, thus reducing effective capacity of all physical caches (hot data from the kernel would only need to be present in the kernel partition, regardless of which process makes the syscall).
  • Efficient operation relies on the kernel managing the partitioning according to the needs of each trust domain, which is not trivial: partition ID management could be done in a fashion similar to PCID, however that still leaves the determination of partition size, keeping in mind that the cache at every level needs to be partitioned, including those shared between different cores which therefore have more clients and thus require more partitioning, additionally with limited granularity, granularity which depends on the level: a 16-way set associative cache may be partitioned in increments of 1/16th of its capacity, but a 4-way set associative cache only by fourths of its capacity. Easy.
  • DAWG guards between explicit trust domains, so it cannot protect against an attacker in the same process. This could be mitigated by everyone adopting the Chrome method: sorry Robert, but maybe “mixing code with different trust labels in the same address space” needs to become a black art.

InvisiSpec

The basic idea of InvisiSpec corresponds to the avenue I evoked back then, which is that speculative loads only bring data to the processor without affecting cache state (either bringing that data to a cache level where it wasn’t, or modifying cache replacement policy, or other metadata), with the cache being updated only when the load is validated.

Well, that’s it, god job everyone? Of course not, the devil is in the details, including some I never began to suspect: validation cannot happen just any random way. InvisiSpec details how this is done in practice, the main technique being special loads performed solely for validation purposes: once loaded, the processors only uses this data, if ever, to compare it against the speculatively loaded data kept in a speculation buffer, and if the values match, processing can proceed; and while you would think that could raise ABA issues, it is not the case, as we’re going to see.

Overall, InvisiSpec proposes a very interesting model of a modern processor: first, a speculation part that performs computations while “playing pretend”: it doesn’t matter at that point whether data is correct (of course, it needs to be correct most of the time to serve any purpose), then the reorder buffer part, which can be seen as the “real” processing that executes according to the sequential model of the processor, except it uses results already computed by the speculative engine, when they exist. In fact, if these results don’t exist (e.g. the data was invalidated), the reorder buffer has to have the speculative engine retry, and the reorder buffer waits for it to be done: it does not execute the calculations (ALU, etc.) inline. A third part makes sure to quickly (i.e. with low latency) feed the speculative engine with data that is right most of the time, and do so invisibly: loads performed by the speculative engine can fetch from the caches but do not populate any cache, and are instead stored in the speculation buffer in order to remember that any results were obtained from these inputs.

This model piggybacks on existing infrastructure of current out of order processors: the reorder buffer is already the part in charge of making sure instructions “retire”, i.e. commit their effect, in order; in particular, on x86 processors the reorder buffer is responsible for invalidating loads executed out of order, including instructions after those, when it detects cache coherence traffic that invalidates the corresponding cache line. Ever wondered how x86 processors could maintain a strongly ordered memory model while executing instructions out of order? Now you know.

InvisiSpec has to do much more, however, as it cannot rely on cache coherence traffic: since the initial load is invisible, by design, other caches are allowed to think they have exclusive access (Modified/Exclusive/Shared/Invalid, or MESI, model) and won’t externally signal any change. Therefore, if the memory ordering model stipulates that loads must appear to occur in order, then it is necessary for the reorder buffer to perform a full validation, i.e. not only must it perform a fresh, new, non-speculative load as if the load was executed for the first time (thus allowing the caches to be populated), but then it has to wait for it to complete and compare the loaded data with the speculatively loaded one; if they are equal, then the results precomputed by the speculative engine for the downstream computations are correct as well, and the reorder buffer can proceed with these instructions: it does not even matter if A compared equal to A but the memory cell held the value B in between, as the only thing that matters is whether the downstream computation is valid for value A, which is true if and only if the speculative engine was fed an equal value A when it executed.

This leads into a much more tractable model for security: as far as leaking state is concerned, security researchers only need to look at operation of the reorder buffer; on the other hand, performance engineers will mostly look at the upstream parts, to make sure speculation will be invalidated as rarely as possible, but still look at the reorder buffer to make sure validation latencies will be covered, as far as possible.

Notably, InvisiSpec protects against attackers living in the same address space or trust boundary, and since it is cache-agnostic, it does not restrict memory sharing in any way.

The following limitations can be noted in InvisiSpec:

  • InvisiSpec only protects against speculation-related attacks, not other kinds of attacks that also use the cache as a side channel. Additional techniques will be needed for those.
  • InvisiSpec adds a significant efficiency hit compared to insecure baseline, both in execution time (22% to 80% increase on SPEC benchmarks, lower is better) and cache traffic (34% to 60% increase on SPEC benchmarks, lower is better), the latter of which is one of the main drivers of power usage. That will need to be improved before people will switch to a secure processor, otherwise they will keep using “good enough” mitigations; more work is needed in that area. My analysis would be that most of that efficiency hit is related to the requirement to protect against an attacker in the same address space: any two pair of loads could be an attacker/victim pair! The result is that pipelining is mostly defeated to the extent it is used to protect against load latencies. I am skeptical with regard to their suggestion for the processor to disable interrupts after a load has committed, and until the next load gets to commit, so as to allow the latter to start validation early (disabling interrupts serves to remove the last potential source of events that could prevent the latter load from committing): this would add an important constraint to interrupt management, which furthermore is unlikely to compose well with similar constraints.

The future

This isn’t the last we will hear of work needed to secure processors post-Meltdown and Spectre; I am sure novel techniques will be proposed. At any rate, we in the computing industry as a whole need to start demanding of Intel and others what systemic protections they are putting in their processors, be they DAWG or InvisiSpec or something else, which will ensure whole classes of attacks become impossible.


  1. At least on the digital side: DAWG does not guard against power usage or electromagnetic radiation leaks, or rowhammer-like attacks.

Software Reenchantment

I’ve had Nikitonsky’s Software Disenchantment post in my mind ever since it was posted… which was four months ago. It’s fair to say I’m obsessed and need to get the following thoughts out of my system.

First because it resonated with me, of course. I recognize many of the situations he describes, and I share many of his concerns. There is no doubt that many evolutions in software development seem incongruous and at odds with recommendations for writing reliable software. The increasing complexity of the software stack, in particular, is undoubtedly a recipe for bugs to thrive, able to hide in that complexity.

Yet some of that complexification, even controversial, is nevertheless a progress. The example that comes to mind is Chrome and more specifically its architecture of running each tab (HTML rendering, JavaScript, etc.) on its own process for reliability and security, and the related decision to develop a high-performance JavaScript engine, V8, that dynamically compiles JavaScript to native code and runs that (if you need a refresher, Scott McCloud’s comic is still relevant). Yes, this makes Chrome a resource hog, and initially I was skeptical about the need: the JavaScript engine controls the generated code, so if it did its work correctly, it would properly enforce same-origin and other restrictions, without the need of the per-tab process architecture and its overhead of creation, memory occupation, etc. of many shell processes.

But later on I started seeing things differently. It is clear that browser developers have been for the last few years engaged in a competition for performance, features, etc., even if they don’t all favor the same benchmarks. In that fast-paced environment, it would be a hard dilemma between going for features and performance at the risk of bugs, especially security vulnerabilities, slipping through the cracks, and instead moving at a more careful pace, at the risk of being left behind by more innovative browsers and being marginalized; and even if your competitor’s vulnerabilities end up catching up with him in the long term, that still leaves enough time for your browser to be so marginalized that it cannot recover. We’re not far from a variant of the prisoner’s dilemma. Chrome resolved that dilemma by going for performance and features, and at the same time investing up front in an architecture that provides a safety net so that a single vulnerability doesn’t mean the attacker can escape the jail yet, and bugs of other kinds are mitigated. This frees the developers working on most of the browser code, in particular on the JavaScript engine, from excessively needing to worry about security and bugs, with the few people having the most expertise on that instead working on the sandbox architecture of the browser.

So that’s good for the browser ecosystem, but the benefits extend beyond that: indeed the oneupmanship from this competition will also democratize software development. Look, C/C++ is my whole carrier, I love system languages, there are many things you can do only using them even in the applicative space (e.g. real-time code such as for A/V applications), and I intend to work in system languages as long as I possibly can. But I realize these languages, C/C++ in particular, have an unforgiving macho “it’s your fault you failed, you should have been more careful” attitude that makes them unsuitable for most people. Chrome and the other high-performance browsers that the others have become since then vastly extend the opportunities of JavaScript development, with it starting now to be credible for many kinds of desktop-like applications. JavaScript has many faults, but it is also vastly more forgiving than C/C++, if only by virtue of it providing memory safety and garbage collection. And most web users can discover JavaScript by themselves with “view source”. Even if C/C++ isn’t the only game in town for application development (Java and C# are somewhat more approchable, for instance), this nevertheless removes quite a hurdle to starting application development, and this can only be a good thing.

And of course, the per-tab process architecture of Chrome really means it ends up piggybacking on the well-understood process separation mechanism of the OS, itself relying of the privilege separation feature of the processor, and after meltdown and spectre it would seem this bedrock is more fragile than we thought… but process separation still looks like a smart choice even in this context, as a long-term solution will be provided in newer processors for code running in different address spaces (at the cost of more address space separation, itself mitigated by features such as PCID), while running untrusted code in the same address space will have no such solution and is going to become more and more of a black art.

So I hope that, if you considered Chrome to be bloated, you realize now it’s not so clear-cut. So more complexity can be a good thing. On the other hand, I have an inkling that the piling on of dependencies in the npm world in general, and in web development specifically, is soon going to be unsustainable, but I’d love to be shown wrong. We need to take a long, hard look at that area in general.

So yes, it’s becoming harder to even tell if we software engineers can be proud of the current state of our discipline or not. So what can be done to make the situation clearer, and if necessary, improve it?

First, we need to treat software engineering (and processor engineering, as we’re at it) just like any other engineering discipline, by having software developers need to be licensed in order to work on actual software. Just like a public works engineer needs to be licensed before he can design bridges, a software engineer will need to be licensed before he can design software that handles personal data, with the requirement repeating down to the dependencies: for this purpose, only library software that has itself been developed by licensed software engineers could be used. We would need to grandfather in existing software, of course, but this is necessary as software mistakes are (generally) not directly lethal, but can be just as disruptive to society as a whole when personal data massively leaks. Making software development require a license would in particular provide some protection against pressure from the hierarchy and other stakeholders such as marketing, a necessary prerequisite enabling designers to say “No” to unethical requests.

Second, we need philosophers (either comings from our ranks or outsiders) taking a long hard look at the way we do things and trying to make sense of it, to even figure out the questions that need asking for instance, so that we can be better informed of where we as a discipline need to work on. I don’t know of anyone right now doing this very important job.

These, to me, are the first prerequisites to an eventual software reenchantment.

What benefits does iOS 11 get from being 64-bit only?

By now you have proooooooooobably heard that iOS 11 will consider apps that still have not been ported to 64-bit mode as obsolete. In practice, by refusing to run them.

Now this post is not on the how to port to 64-bit (I mean, if that is your concern Apple has been encouraging you to do so for years now…), rather on the why Apple did so. Why obsolete perfectly good 32-bit code and apps? I do not have all the answers, but I have a few. Let us first see why 64-bit is the better choice if we have to choose between the two, and why Apple chose not to maintain both.

Why 64-bit only is better than 32-bit only?

That one is an open and shut case: in this earlier post I already presented how the then-new iPhone 5S 64-bit environment was overall a benefit; and the benefits have only grown since then (as I wrote: “native 64-bit math is a plus for some specialized tasks and the future”), so there really is no question. If Apple had to drop one, it had to be the 32-bit environment.

Why not both?

hardware savings

In theory, Apple could save silicon surface on their post-iOS 11 hardware designs (iPhone 8, 8+, and X) by omitting parts that serve only for ARM/A32 mode execution in their processor design (which they have the power to do, remember they design their own ARM CPUs now). Indeed, while the cleanup from ARM/A32 to ARM64 was not nearly as dramatic as x86 to x86-64, some instructions and instruction semantics were dropped, though how far this could could save in terms of execution units is way beyond my expertise; more important probably are the savings for the instruction decode circuits: not only is there no need to support Thumb, but the instructions formats were completely overhauled between ARM/A32 and ARM64, with the former being quite convoluted (plenty of non-uniform formats, one-off cases, and split fields).

In practice, I wonder if this is worth the trouble. I think ARM processors are meant to start up in 32-bit mode before being raised to 64-bit, anyway, and there may be additional compatibility constraints (e.g. with drivers, hypervisors). Even if they did take advantage of this, this is not the main driver.

software savings

That is the part where the real savings are. Through the equivalent of app thinning, Apple could already eliminate the 32-bit parts from their kernel and built-in applications, but they would still have had to provide the 32-bit slice of the library stack (everything from libSystem to AppKit) so that 32-bit apps could keep running. And that does take up some space on your iPhone or iPad storage (which I have not measured, to be honest)… but more importantly, this slice would take up space in RAM, next to its 64-bit equivalent (always present since built-in apps use it), just as soon and any time a 32-bit app would be running.

This is the message that Apple has been not-so-subtly telling users already when they warned of 32-bit apps that running them would slow down the device: iOS devices have traditionally been quite RAM-constrained, and even if that eased a bit in recent years, any RAM savings are worth taking: they allow more tabs to remain active without having to be reloaded, more apps to remain frozen and only have to be (quickly) thawed instead of having to be relaunched, etc., improving the overall experience. And so to keep having the 32-bit library stack loaded in RAM in most iOS devices just next to the 64-bit library stack was starting to look like a waste of precious resources.

Was it worth it?

Heck if I know. I do not think I will be too much affected through the apps I own, but I am always worried about such obsolescence, especially from a digital preservation perspective. That being said, for the purposes of saving such history it is best to rely on a historical device (such as one that can’t be updated to iOS 11), because they are many other reasons why historical iOS software just stops running anyway. I keep my old iPhone 3GS for that purpose, and it is already loaded with a number of apps that simply don’t run any more on my iPhone 5S running iOS 10.

APFS’s “Bag of Bytes” Filenames (Michael Tsai – Blog)

I have sooooooooo many questions. I mean, first I have the same ones as Michael, but on top of that:

  • “bag of bytes”, but I hope at least that the file name, even if not normalized, is guaranteed to be valid UTF-8, right? Right? Right?
  • In some circumstances, it is possible for the user to type the beginning of a file name to select or at least winnow the file selection; is there going to be guidance on how to perform this?
  • Sorting file names for display. Oh, the fun we shall have with sorting. Again, will guidance/a standard function be provided?
  • Normally this should result in less issues for software that wrote a file name with any valid UTF-8 string, then expects a file with that exact name to be in the directory listing, as it will be the case at least more often (I must admit I don’t fully understand the issue that led to the Apple response in the first place, though I understand even less the Apple response). However, when performing manipulations with NSString/NSURL/Swift String, do those preserve composition enough that developers can rely on them for that?

Now, granted, I know two people this will make happy (or, OK, less unhappy)…

EDIT: One additional data point about this, is that in a similar situation, even Apple doesn’t get it right (coincidentally, fixed in Safari 10.1 and iOS 10.3). Let me tell you, this issue was a bear to isolate.

I admit:

  • I have no idea where this was in Safari, though it is safe to say Apple has responsibility for that code,
  • Safari is already compensating for invalid data, the URL should be properly escaped in the first place, and
  • this is when using HTTP, not the filesystem.

Nevertheless, this shows Apple themselves sometimes get it wrong and normalize strings in a way that causes issues because the underlying namespace has a dumb byte string for key. So if they can get it wrong, then third-party developers will need all the help they can get to get it right.

EDIT: New info, in that there will be a case-insensitive variant for the Mac, which will also behave differently for normalization.

I think “normalization-preserving, but not normalization-sensitive” means that (like HFS+ on the Mac, unlike APFS on iOS) you cannot have multiple files whose names differ only in normalization. And you can look up a file using the “wrong” normalization and still find it. Additionally, beyond what HFS+ offers, if you create a file and then read the directory contents, you’ll see the filename listed using the same normalization that you used.

This is my interpretation as well.

Curtain update

I took advantage of the recent update to JPS to experiment a bit with Curtain. I significantly retooled it towards one goal: separate the generation of the deployment package from the deployment itself.

While the initial version of Curtain benefitted from many influences, one I completely forgot to take into account was Alex Papadimoulis’ teachings, more specifically those about release management and database changes. Especially the commandment that builds be immutable and to make sure that what gets deployed on production is the same thing that got deployed on the earlier environments.

When I recently re-read those two articles for inspiration at work, I thought: “Uh, oh.”

Indeed, with Curtain the deployment process is not only a function of the revision that we ultimately want there, but also of what was previously there, in order to support proper rollover of resources (itself necessary because of offline support). And as originally designed, Curtain would just adapt its deployment to what was previously there, which means that, if I wasn’t careful and did not double check that staging was properly rolled back to what is present in production (which let’s admit, we’ve all done at some point), then the Curtain deployment to staging would not be representative of the eventual deployment to production. Oops.

So Curtain has been updated to, rather than perform the deployment itself, instead generate a package containing the generated files; this package doubles as a Python script which, when invoked, will perform all the deployment steps to the target of choice. The script itself is dumb and takes no decisions, such that it can be invoked multiple times and perform always the same job, but it also checks prior to operating that the data previously present corresponds to the expectations it was generated with. That way, we can use the same script multiple times, once on staging and once on production, and be certain that the two deployments will be the same. And Alex will be happy.

One more thing. In my initial post, I also completely forgot to mention another influence: Deployinator. Many aspects of Curtain come from Deployinator: deployment as a single operation, deploying assets as a layer separate from code, and versioning these assets as part of the URL, etc. The lessons from Deployinator were so obvious to me that it did not even occur to me to mention where they came from. That omission has now been repaired.

Simple File Cache: improve the performance of FileReader in the browser

When was the last time you obtained a 10x (ten times, 1000%) performance gain with a single improvement?

Not recently, I bet. Most optimizations work incrementally, eking out 3% here, 2% there, and only achieve an observable effect by iterating many such optimization steps. Even algorithmic improvements, such as replacing an O(n²) algorithm by an O(n·log(n)) one, typically get you on the order of 3 or 4 times performance improvement, at least on the data sizes in typical use at the time the improvement is made. So let me tell you how I improved performance of JPS, my web app to apply IPS patches, tenfold.

Once upon a time…

Soon after the initial public version of JPS, I started working on support of another format that (among other processing) requires the CRC32 of the whole file to be obtained, which is best done in blocks of, say, 1024 bytes rather than reading from the file byte by byte. Given my prior experiences, I dreaded the performance penalty from having to (re-)visit every single byte of the file, but it turned out to perform surprisingly well. Why couldn’t I get the same performance when processing IPS files?

So as a proof of concept I started developing a layer that would read from the file in blocks of 4096 bytes, then serve read requests from the loaded data whenever possible, entirely in JavaScript. In other words, a cache. Writing a cache is something you always end up learning in any Computer Science curriculum, and you always wonder why, given that it seems so simple and obvious it need not be taught, and simultaneously is something the platform will provide anyway (especially as modern caches tend to be very complex beasts, what with replacement policies, cache invalidation, and so forth). And Mac OS X, on which I develop, aggressively caches filesystem reads at every level already. Writing my own cache for file reads seemed too obvious to be something worth doing.

Photo of the Mont Blanc, lighted with sunset light

From now on, my longer posts will have random photos from my various trips inserted to serve as breathers. This is the Mont Blanc, lighted by sunset light.

As a way to test this anyway, I wrote the dumbest file cache you could possibly imagine: there is only one cache bucket, it can only be loaded from whole block-aligned ranges in the file, with the result that a number of requests, e.g. those that cross block-aligned boundaries, or those that load from the remainder of the file that can’t form a whole block, have to sidestep the cache and be served from the file separately. Furthermore, JavaScript Blobs are supposed to be immutable, so I did not need to worry about invalidating my cache when the underlying storage changed. Even then, this was a not a trivial thing: the asynchronous nature of the browser file reading API meant the cache had to provide an asynchronous API itself and maintain a “todo list” of read operations being processed.

And now I turn on the cache, and measure the performance improvement… and files that used to take Chrome 50 seconds to process now take 5 seconds! (Chrome being my reference browser for development of JPS). And the 10x factor is consistent, applying over various source files, often turning the processing time into “too short to measure”, and over various platforms: the same files which took around 200 seconds on Chrome for Android now take 20 (and the behavior of desktop Chrome on Windows was the same as on Mac OS X). Similar improvements could be observed with desktop Firefox, with processing times going from 20 seconds to 2 seconds.

Wow.

I reported these findings on the Chromium discussion forums (Chrome being the worst offender), because surely that meant something was wrong with Chrome somewhere. However, not much came out of it, so I decided to productize the cache so as to deploy these performance improvements in production.

From proof of concept to production-worthy code

The proof of concept assumed that, for every read operation except from the first it could just append a new read request from the client to its todo list, and once control would bubble up back to the cache code, the request could be served there if it was in cache. That worked in most cases, at least enough to get performance measurements; but in some cases, a new request would be logged from code that was not called from a callback from our cache, so it would never bubble up back to our code and never be served, and the pump would drain.

Photo of a young Ibex

A young ibex.

Easy enough, I thought, I will get rid of the todo list and instead I will always defer processing by calling setTimeout(,O).

That worked.

But it was slow. Even slower than without the cache.

Turns out, the overhead of calling setTimeout(,O) and getting called back by it was killing this solution. What to do, what to do, what to do? Back to the drawing board, I came up with the solution: reinstate the todo list, and use it, but only if we can tell for sure that we are within code that is being called by cache code — which entails keeping track of that information. If we are not within code that is being called by cache code, only then use setTimeout(,O). That managed to both work in all cases and with good performance.

And then I also had to support aborting requests, adding a number of unit tests, fix a few bugs… and then it was done.

Photo of the Grandes Jorasses

The Grandes Jorasses.

What have we learned?

  • Don’t diss CS or the CS curriculum. You never know when what you learn there might turn out to be useful.
  • Sometimes the obvious solution is the right one.
  • The source of slowness isn’t reading files, per se, but rather the shocking overhead of calling a Web API and getting called back by it (whether it be FileReader or setTimeout(,O)), which by my estimates is around 2 ms for each such operation with Chrome on a modern desktop machine. This is crazy. Other browsers (with the exception of Internet Explorer/Edge, which I have not been able to test) fare better, but still have enough overhead that you have to wonder what is going on in there.

Get the code

I set up a specific project for the cache code: you can get the code on BitBucket, and I also published it on NPM as simple-file-cache. It is free to use and modify (under the terms of the BSD license). If you find it useful, I request that you consider donating to the ACLU and the UNHCR, however.


P.S.: While I’ve got your attention, I’m happy to report that JPS will soon support Safari, as this browser is finally about to get support for the download attribute and downloading blobs, normally as part of Safari 10.1, which is meant to arrive with Mac OS X 10.12.4. Being able to be used on a stock install of Mac OS X will be a huge milestone for JPS and the viability of web apps in general as a way to circumvent Developer ID and Gatekeeper.

Looking back on WWDC 2016

Now that the most important Apple release of WWDC has been dealt with, we can cover everything else. I haven’t followed as closely as previous years (hence no keynote reactions on Twitter), but to me here is what stands out.

The Apple App Stores policy announcements

As seen at Daring Fireball for instance, Apple briefed the press on many current and coming improvements to the Apple App Stores (iOS, Mac, tvOS, watchOS). This actually happened ahead of WWDC, but is part of the package. There are a lot of good things, such as for instance the first acceptance that Apple isn’t entitled over the whole lifetime of an app to 30% of any purchase where the buying intent originated from the app with the 85/15 split instead of 70/30 for subscriptions after the first year. However, none of this solves the lack of free trials: if only subscription apps can have free trials, then thanks, but no thanks. I want to both try before I buy and avoid renting my software, and I don’t think subscriptions make sense for every app anyway, so improvements and clarifications (e.g. indication of whether the app is “pay once and play” or ”shareware” or ”coin-op machine”) to apps using non-recurring payment options would be welcome (more on that in a later post). Also, while those apply to the Mac App Store as well, this one will need more specific improvements to regain credibility. I don’t have much of an opinion on the new search ad system.

The new Apple File System (APFS for short)

Apple announced a new filesystem, and to say that it has, over the years, accumulated a lot of pent-up expectations to fulfill would be the understatement of the year. I can’t speak for everyone, but each year N after the loss of ZFS my reaction was “Well, they did not announce anything this year, it’s likely because they only started on year N-1 and can’t announce it yet because they can’t develop such a piece of software in a yearly release cycle, so there is not use complaining about it as it could be already started, and will show up for year N+1.” Repeat every year. So while I can scarcely believe the news that development of APFS only started in 2014, at the same time I’m not really surprised by it.

I haven’t been able to try it out, unfortunately, but from published information these are the highlights. This is as compared to ZFS because ZFS is the reference that the Mac community has studied extensively back when Apple was working on a ZFS port in the open.

What we’ll get from APFS that we hoped to have with ZFS:

  • A modern, copy-on-write filesystem. By itself, this doesn’t do much, but this is the indispensable basis for everything else:
  • Snapshots, or if you prefer, read-only clones of the filesystem as a whole. Probably the most important feature, by itself it alone would justify the investment of a new filesystem to replace HFS+.

    While the obvious use case is backups, particularly with Time Machine, it is not necessarily in the way you think. Currently, when Time Machine backs up a volume, it has to contend with it being in use, and potentially being modified, while it is being backed up; if it was required to freeze a volume while backing it up, you wouldn’t be able to use it during that time and, as a result, you would back up much less often and that would defeat most of the purpose of Time Machine. So Time Machine has no choice but to read a volume while it is being modified, and as a result may not capture a consistent view of the filesystem! Indeed, if two files are modified at the same time, but one was read by Time Machine before the modification and the other after, on the backup the saved filesystem will have one file without the modification and the other with, which has not been the state of the filesystem you intended to back up at any point in time. In fact, this may mean the data is lost if you have to reload from that backup in case neither half can work with the other as a result.

    Instead, with APFS the backup application will be able to create a snapshot, which is a constant time operation (i.e. does not depend on how much data the volume contains) and results in no additional space being taken, at least initially, then can copy from that snapshot, while the filesystem is in use and being modified, and be confident that it is capturing a consistent view of the filesystem, regardless of where the data is being saved (it could be to an HFS+ drive!). Once the copy is over, the snapshot can be harvested to make sure no additional space is used beyond that needed by the live data. Of course, this will also allow, by using multiple snapshots, to more efficiently determine what changed from last time, and with APFS on the backup drive as well the backup application will be able to save space on the backup drive, in particular not taking up space for redundancies the source APFS drive knows about already. But snapshots on the APFS source drive will mean that, after 10 years, Time Machine will finally be safe: this is a correctness improvement, not merely a performance (faster backups and/or taking less space) one.

  • Real protection in the face of crashes and power loss events. HFS+ had some of that with its journal, but it only protected metadata and came with a number of costs. APFS will make sure its writes and other filesystem updates are “crash-safe”.
  • I/O prioritization. A filesystem does not exist merely as a layout of the data on disk, but also as a kernel module that has in-memory state (mostly cache) that processes filesystem requests, and the two are generally tied. I/O prioritization, some level of it at least, will allow some more urgent requests (to load data for an interactive action for instance) to “jump the queue” ahead of background actions (e.g. reads by a backup utility), all the while keeping the filesystem view consistent (e.g. a read after a write to the same file has to see the file as modified, so it can’t just naively jump over the corresponding write).
  • Multithreaded. In the same vein of improvements to the tied filesystem kernel module, this will allow to better serve different processes or threads that read and write from independent parts of the filesystem, especially if multiple cores are involved. HFS+, having been designed at the time of single-processor, single-threaded machines, requires centralized, bottleneck locks and is inefficient for multithreaded use cases.
  • File and directory hierarchy clones. Contrary to snapshots, clones are writable and are copied to another place in the directory hierarchy (while snapshots are filesystem-wide and exist in a namespace above the filesystem root). The direct usefulness is less clear, but it could be massively useful as infrastructure used by specialized apps, version control notably (both for work areas and repositories).
  • Logical volume management. Apple calls this “space sharing”, but it’s really the possibility to make “super folders” by making them their own filesystem in the same partition, and allows this super folder to have different backup behavior for instance.
  • Sparse files. Might as well have that, too.

What APFS will provide beyond ZFS, btrfs, etc. features:

  • Encryption as a first class feature. Full disk and per-file encryption will be integrated in the filesystem and provided by a common encryption codebase, not as layers above or below the filesystem and with two separate implementations. This also means files that are encrypted per-file will be able to be cloned, snapshotted, etc. without distinction from their unencrypted brethren.
  • Scalability down to the watch. ZFS never scaled down very well, in particular when it comes to small RAM amounts.

What we hoped to have with ZFS, but won’t get from APFS:

  • Crazy ZFS-like scalability. For instance, APFS has 64-bit nodes, not 128-bit. This is probably not unreasonable on Apple’s part.
  • RAID integration as part of the filesystem. APFS can work atop a software or hardware RAID in traditional RAID configurations (RAID-0, RAID-1, RAID-10, RAID-5, etc.), but always as a separate layer. APFS does not provide anything like RAID-Z or any other solution to the RAID-5 write hole. That is worth a mention, though I have no idea whether this is a need Apple should fulfill.
  • Deduplication. This is more generally useful to save space than clones or sparse files, but is also probably only really useful for enterprise storage arrays.

What is unclear at this point, either from the current state or because Apple may or may not add it by the time it ships:

  • Whether APFS will checksum data, and thus guarantee end-to-end data integrity. Currently it seems it doesn’t, but it checksums metadata, and has extensible data structures such that the code could trivially be extended to checksum all data while remaining backwards compatible. I don’t know why Apple does not have that turned on, but I beg them to do so, given the ever-increasing amounts of data we store on disks and SSD and their decreasing reliability (e.g. I have heard of TLC flash being used in Apple devices); we need to know when data becomes bad rather than blindly using it, which is the first step to try and improve storage reliability.
  • Whether APFS is completely transaction-based and always consistent on-disk. Copy-on-write filesystems generally are, but being copy-on-write is not sufficient by itself, and the existence of a fsck_apfs suggests that APFS isn’t always consistent on-disk, because otherwise it would not need a FileSystem Consistency checK. Apple claims writes and other filesystem updates will be “crash-safe”, but the guarantees may be lower than a fully transactional FS.
  • Whether APFS containers will be able to be extended after the fact with an additional partition (from another disk, typically), possibly even while the volumes in it are mounted. APFS support for JBOD, and the fact APFS lazily initializes its data structures (saving initialization time when formatting large disks), suggest it, and it would be undeniably useful, but it is still unknown at this time.
  • Whether APFS will be composition-preserving when it comes to file names. It will, certainly, be insensitive to composition differences in file names, like HFS+; however HFS+ goes one step further and normalizes the composition of file names, which ends up making the returned file name byte string different from what was provided at file creation, which itself subtly trips up some software like version control (via Eric Sink), and which is probably the specific behavior that led Linux founder Linus Torvalds to proclaim that HFS+ was “complete and utter crap”; see also this (latter via the Accidental Tech Podcast guys, who had the same Unicode thoughts as I did). Won’t you make Linus happy now by at least preserving composition, Apple? This is your opportunity!
  • Whether APFS uses B+trees. I know, this is an implementation detail, but it’d be neat if Apple could claim to have continuously been using B-/+trees of either kind for their storage for the last 30 years and counting.

For a more in-depth look at what we know so far about APFS, the best source by all accounts is Adam Leventhal’s series of posts.

Apple File Protocol deprecation

Along with APFS, Apple announced it would not be able to be served over AFP, only SMB (Windows file sharing), and AFP was thus deprecated. This raises the question over whether SMB is at parity with AFP: last I checked (but it was some time ago), AFP was still superior when it came to:

  • metadata and
  • searching

But I have no doubt that, whatever feature gap is left between SMB and AFP (if there is even one left), Apple will make sure it is closed before APFS ships, just like Apple made sure Bonjour had feature parity with AppleTalk before stopping support for AppleTalk.

Playgrounds on iOS

I’m of two minds about this one. I’ve always found Swift playgrounds to be a great idea. To give you an idea, back in the day when the only computer in the house was an Apple ][e, I did not yet know how to code, but I knew enough syntax that my father had set up a program that would, in a loop, plot the result of an expression over a two-axis system, and I would only have to change the line containing the expression, with the input variable being conveniently x, and the output, y; e.g. to plot the result of squaring x, I would only have to enter1:

60 y = x*x

run the program, and away I went. It was an interesting lesson when, due to my limited understanding of expressions, specifically that they are not equations, I once wrote:

60 2y = x+4

Which resulted in the same thing as I previously plotted, because this command actually modified line 602 (beyond the end of the loop)… good times.

Anyway, Swift playgrounds, which automatically plot the outcome of expressions run multiple times in a loop for instance, and even more so on iPad where you have the draggable loop templates and other control structure templates, provide the necessary infrastructure program out of the box, and learners will be able to experiment and visualize what they are doing in autonomy.

These playgrounds will be able to be shared, but when I hear some people compare this to the possibilities of Hypercard stacks, I don’t buy it. There is nothing for a user to do with these playgrounds, the graphic aspect is only a visualization (and why does it need to be so elaborate? This is basically Logo, you don’t need to make it look like a Monument Valley that would not even be minimalistic); even if the user can enter simple commands, it always has to start back from the beginning when you change the code (which is not a bad thing mind you, but shows even the command area isn’t an interactive interface). You can’t interact with these creations. Sharing these is like sharing elaborate Rube Goldberg constructions created in The Incredible Machine: it’s fun, and it’s not entirely closed as the recipient can try and improve on it, but except watching it play there is nothing for the recipient to do without understanding the working of the machine first.

Contrast that with Hypercard, in which not only you set up an actual interface, but what you’d code was handlers for actions coming from the interface, and not a non-interactive automaton. This also means that it was much less of a jump to go from there to an actual app, especially one using Cocoa: it’s fundamentally just a bunch of handlers attached to a user interface. It’s a much bigger jump when all you’re familiar with is playgrounds or even command-line programs, because it’s far from obvious how to go from there to something interactive. Seriously, I’m completely done with teaching programming by starting with command-line apps. It needs to die. What I’d like to see Apple try on the iPad is something inspired by the old Currency Converter tutorial (unfortunately gone now), where you’d create a simple but functional app that anyone could interact with.

Stricter Gatekeeper

…speaking of sharing your programming creations. I’m hardly surprised. This shows web apps is definitely the future of tinkerer apps.


  1. In Apple II Basic, you’d enter a line number then a statement, and that would replace the line in the saved program by the one you just entered. Code editors have improved a bit since then.

Review: App Review Guidelines: The Comic Book

The review for this Wednesday is for an unexpected, shall we say, release: it doesn’t appear to have been solicited through Diamond1 beforehand, and so the first comic book coming from Apple Inc. as a publisher, at least first in recent history, came as a complete surprise to everyone. It was released at the same time as many news from Apple, so it took me a bit of time to notice it, then get to it.

Before we begin, if you’ve followed this blog for a bit, you might have noticed I have a bit of a thing for comics, be it in previous posts or the comicroll and the pull list in the sidebar; or maybe you’ve been following some of my other endeavors or follow me on Twitter and have been left with little doubt that I do read and enjoy comics very much. So this is where I’m coming from on comics in general.

I also have a lot of appreciation more specifically for comics as teaching aids: it is to me a very suitable medium for teaching, and there is a lot of unjustified prejudice against this art form as being not for serious purposes, whatever that means. This is completely wrong, as can show the generally cheesy, but not bad teaching comics I read as a child, and it goes for grownups too, as the cartoons from Larry Gonick show (a nice trove of which can be found here, thanks Jeff), or more recently those Dante Shepherd is commissioning with a dedicated grant: 1, 2, 3, 4, 5 and 6 (so far); hat tip to Fleen. So this comic from Apple could, if well done, help with general understanding of what they are trying to accomplish with these guidelines.

I also understand that, as a developer who has followed Apple’s policies relatively well and have some expertise in interpreting them, and who reads a few specialists in Apple kremlinology, I may not actually be in the target audience. I have little doubt that the app review team and DTS are interacting daily with many, many developers who discover the guidelines when their app gets rejected for violating them and/or have a very incomplete picture of the whole of the guidelines and/or are are very stubborn about what they think their “rights” are; this comic is probably intended for them. Lastly, the link to this comic has been provided to me by people I trust, and it is hosted in a CDN domain that Apple uses for a variety of developer-related resources (e.g. Swift blog post images), so I have little reason to doubt its authenticity.

Get on with it!

Ok, ok. This comic is actually sort of an anthology, split in five parts, and the first is:

Safety

With art by Mark Simmons. In a setting and style reminiscent of Jack Kirby’s cosmic works (New Gods in particular), we find the hidden son of Flash and the Silver Surfer as the hero of this story, in which he has to cruise through space, avoiding a number of hazards, after he encounters some sort of Galactus-like planet eater. Will he succeed in time?

I found the story rather hard to follow, no doubt due to the unfamiliar setting, and had to reread it a few times to make sure I hadn’t missed anything; beyond that, the art serves its purpose, unfortunately the tests clearly isn’t here to support it.

Performance

With art by Ile Wolf and Luján Fernández. In a more playful style, two schoolchildren in uniform are battling using Pokémon/Digimon/kaiju (circle as appropriate), and the battle appears to have grown out of control. The situation is dramatic, and it’s not sure there is anything that can stop them.

At least here any ambiguity as to the situation is intentional, but even then it’s hard to take it seriously when the text (speech or narration) takes you out of the climax; not everyone can be a Stan Lee and add text after-the-fact that works well with such a story. And while the conclusion of “Safety” in part explains its title, I can’t help but think its hero would have been more appropriate to star in the “Performance” section.

Business

With art by Shari Chankhamma. A more intimate setting with interesting art where we follow the growth of a boy through times good and bad, but always in the same place: the barbershop he patronizes.

Maybe the most interesting of the stories in this anthology, and it’s too bad they couldn’t come up with text that was to the level: either do away with it, or hire better writers! Who edited this stuff?

Design

With art by Ben Jelter. Foraging in a post-apocalyptic wasteland, with an art style to match, a boy locates and manages to repair a robot who may or may not be related to Wall-e and Eve.

It’s a section for which developers for Apple platforms have understandably high expectations, but I don’t know if they’ll be met with the robot design, or with the art in general, which is nothing special. The less said about the text, the better.

Legal

With art by Malcolm Johnson. A noir/private eye story, all in greyscale, and interestingly starring a woman.

The art style is surprising in a good way for such a story, but it does not do a very good job of carrying the story, and as we’ve seen, no point in counting on the text for that either. At least this one has more relationship with its claimed subject matter than the others do.

Conclusion

What… in… the… ever-loving… frick? This comic may have the dimensions and approximate page count of a comic issue, but is, to be blunt, a crushing disappointment. Its only point, it turns out, is to put pictures which tell their own stories around the exact words of the official document, without any attempt at adaptation, or even just, say, recontextualization of the guidelines as an exchange between two characters. These words don’t benefit in any way from being told there. Meanwhile, the pictures just follow their own scenarios and tell their own stories without any consideration for what is supposedly spoken in the bubbles: there is no correspondence either thematic or in pace between the events depicted and the words you can read. There is no teaching benefit whatsoever to these comics, and no way I see anyone at any knowledge level benefit from reading it, let alone be enlightened as to the profound meaning of the guidelines. It’s as if bubbles were randomly placed, linked so that each would overflow into the next, then the text of the guidelines was just dumped into them. This shows better than anything I have previously seen that sequential art is more than the sum of pictures and text.

Verdict: download it, but don’t read it, and only use it in a few years to remind your interlocutor who works at Apple that this has been a real thing that Apple has released, in order to embarrass him.

App Review Guidelines: The Comic Book
Price: 0¢ (digital only)
Publisher: Apple
Words: Apple
Cover illustration: Dailen Ogden
Illustrations: Mark Simmons, Ile Wolf, Luján Fernández, Shari Chankhamma, Ben Jelter, and Malcolm Johnson


  1. Diamond is the only distributor to comic book stores in North America, and comics appear in its catalog a few months before being available, in case you’re not familiar with that aspect of the comics industry.

Application Cache was fired for his douchebaggery

To all of you who enquired about the whereabouts of Application Cache, I regret that I have to inform you that he is no longer with our company. This was not an easy decision to take, but we believe it was the right one.

While it has been no secret for some time that Application Cache was a douchebag, this was not necessarily apparent at first. Application Cache promised so much, and we believed him because he could prove his claims to a large extent. However, his way of working was so much at odds with the way other web components work (especially long-time pillar of web infrastructure HTTP cache) that his core value proposition was harder to exploit than it should have been (with many unfortunate pitfalls, as Jake Archibald documented); and worse, his more advanced promises, while working in basic scenarios, had some ancillary troubles, which unexpectedly turned out to be intractable no matter how hard we tried, and so these promises never came to light.

Because he was useful despite the issues, we tried to work with him on these, with many counseling sessions with HR; however, Application Cache was adamant that this was his fundamental mode of operation and he could not work any other way, and that others would have to adapt to him. This, of course, was not remotely acceptable, but we could not find any way to make him change either, so little progress was made. There was some, as we did manage to make him more transparent; some claimed that made him no longer a douchebag, but in truth he remained one.

Still, we believed that it could still be worth keeping him just for his core value proposition of using web apps while offline. But as time went on, it became clear that even that was not going to be worth the bother, again as a consequence of his fundamentally different way of working. Things came to a head when we tried to solve race conditions resulting from the possibility that a user load the initial HTML page before the web app is updated, and its dependencies (including the manifest) after the web app is updated: the manifest has to be updated at the same URL (it acts as a fixed entry point of sorts for users who already have the web app in Application cache), so we could not rely on the HTML pointing to a new manifest URL so that the update of the entry point would atomically result in the update of the web app. Even with the provision that the manifest be redownloaded after the entry point, and checked against the manifest downloaded before in the case of an app already in Application Cache (so as to try to have the manifest always loaded after the entry point, at least conceptually), we were stuck.

Some solutions were found, though limited to ideal situations; there was no solution available for the case of a serving infrastructure, such as content distribution networks, with only “eventually consistent” or other weak guarantees, and there was no solution either if even minimal use of FALLBACK: was required. Moreover, even in ideal situations those solutions bring a lot of burden on the web developer, too much burden considering that offline web apps ought to work correctly in the face of these race conditions by default, or at least with minimal care. In the end, Application Cache was let go a few months ago.

If you were relying on the services provided by Application Cache, don’t worry. While there will be no future evolution (in particular, don’t expect bugs to get fixed), a new guy was hired to perform the tasks of Application Cache exactly as the latter did them. This new guy, Service Worker, will also provide a new service allowing web apps to work offline, this time in harmony with the other web components: for instance, out of the box he makes it possible to throttle checks for updated versions simply by setting a cache control header on the service worker (the period being a day at most); something which was exceedingly hard, if not impossible, with Application Cache due to his bad interactions with HTTP cache. He was already available in Chrome, and with the recently released Firefox 44, two independent, non-experimental implementations have now shipped, so you should take the time to make his acquaintance.

New software release: JPS

JPS (stands for Javascript Patching System) is a web app that applies binary patches, currently IPS patches. Usage is simple enough: you provide the reference file and the patch file, and once patching is done you recover the patched file just as you would download a file from a server, except everything happens on your local machine. Moreover, JPS works while offline, thanks to Curtain, which was in fact developed for the needs of JPS.

JPS works on any reasonably recent version of Firefox or Chrome (both of which update automatically anyway), as well as any version of Opera starting with Opera 15. Unfortunately, some of the features used (download of locally-generated files in particular) are not universally supported yet, which means that, regardless of my efforts, Safari (rdar://problem/23550189, OpenRadar) and Internet Explorer are not supported; as a Safari user myself, this bothers me, but I could not find any way around this issue, you will have to wait for a version of Safari that supports the download attribute.

Some background…

My motivation for writing JPS came from two events:

Indeed, when I learned of Zelda Starring Zelda I wanted to play it (A+++ would play again, currently playing the second installment), but realized the IPS patcher I previously used no longer ran (it was built for PowerPC), and while I was able to download and use a different patcher I thought there had to be a better way than each platform using a different program, program also susceptible to becoming unsupported. And this joined my thoughts from the time when Gatekeeper and Developer ID were announced, where I wondered if we couldn’t circumvent this Apple restriction using web apps. So I decided I would develop a web app to apply IPS patches.

While most of the difficulties were encountered when developing the Curtain engine, the browser features used by JPS itself, namely client-side file manipulation and download, led to some challenges as well. One fun aspect was taking a format, IPS, which embeds many assumptions, some undocumented, on C-like manipulation APIs (e.g. writing to a mutable FILE*-like object, and performing automatic zero filling when writing past the end of file), and making it work using the functional Blob APIs, based on slicing and concatenation of arrays and immutable Blob objects. There were a few interesting surprises, for instance early versions of JPS could, on some input files, cause Firefox to crash, taking down JPS and all the other Firefox tabs! Worse, resolving this required a significant rewrite to the patching engine, which led me to develop automated tests to catch any regression before performing this rewrite, to ensure that the rewrite would not regress in any way (it didn’t).

JPS has been extensively tested prior to this release; I tested myself about a hundred patches, with only one patch not working while running on Firefox (bug report), and it has been in open beta for some time without any other problematic patch having been reported.

The JPS source code is available under a BSD license; the source release contains all the needed code to deploy it with Curtain (which has to be downloaded separately), as well as test vectors for the IPS file format and a test harness to automatically test JPS using these files.

A few more words

While I would have liked to support Safari so that JPS could run out of the box on Mac OS X, I deem this proof of concept of a desktop-like web app to be good enough for at least a subset of desktop use cases; enough so for me to put the Gatekeeper and Developer ID concerns behind me. I can now reveal that, because of these concerns, I did not update to Mac OS X Mountain Lion or any later version until today; yes, up until yesterday I was still running Lion on my main machine.

Now that JPS and Curtain have been released, I can’t wait to see what will be done with this easy (well, OK, easier) way to develop small desktop-like tinkerer tools using the web!