APFS’s “Bag of Bytes” Filenames (Michael Tsai – Blog)

I have sooooooooo many questions. I mean, first I have the same ones as Michael, but on top of that:

  • “bag of bytes”, but I hope at least that the file name, even if not normalized, is guaranteed to be valid UTF-8, right? Right? Right?
  • In some circumstances, it is possible for the user to type the beginning of a file name to select or at least winnow the file selection; is there going to be guidance on how to perform this?
  • Sorting file names for display. Oh, the fun we shall have with sorting. Again, will guidance/a standard function be provided?
  • Normally this should result in less issues for software that wrote a file name with any valid UTF-8 string, then expects a file with that exact name to be in the directory listing, as it will be the case at least more often (I must admit I don’t fully understand the issue that led to the Apple response in the first place, though I understand even less the Apple response). However, when performing manipulations with NSString/NSURL/Swift String, do those preserve composition enough that developers can rely on them for that?

Now, granted, I know two people this will make happy (or, OK, less unhappy)…

EDIT: One additional data point about this, is that in a similar situation, even Apple doesn’t get it right (coincidentally, fixed in Safari 10.1 and iOS 10.3). Let me tell you, this issue was a bear to isolate.

I admit:

  • I have no idea where this was in Safari, though it is safe to say Apple has responsibility for that code,
  • Safari is already compensating for invalid data, the URL should be properly escaped in the first place, and
  • this is when using HTTP, not the filesystem.

Nevertheless, this shows Apple themselves sometimes get it wrong and normalize strings in a way that causes issues because the underlying namespace has a dumb byte string for key. So if they can get it wrong, then third-party developers will need all the help they can get to get it right.

EDIT: New info, in that there will be a case-insensitive variant for the Mac, which will also behave differently for normalization.

I think “normalization-preserving, but not normalization-sensitive” means that (like HFS+ on the Mac, unlike APFS on iOS) you cannot have multiple files whose names differ only in normalization. And you can look up a file using the “wrong” normalization and still find it. Additionally, beyond what HFS+ offers, if you create a file and then read the directory contents, you’ll see the filename listed using the same normalization that you used.

This is my interpretation as well.

Curtain update

I took advantage of the recent update to JPS to experiment a bit with Curtain. I significantly retooled it towards one goal: separate the generation of the deployment package from the deployment itself.

While the initial version of Curtain benefitted from many influences, one I completely forgot to take into account was Alex Papadimoulis’ teachings, more specifically those about release management and database changes. Especially the commandment that builds be immutable and to make sure that what gets deployed on production is the same thing that got deployed on the earlier environments.

When I recently re-read those two articles for inspiration at work, I thought: “Uh, oh.”

Indeed, with Curtain the deployment process is not only a function of the revision that we ultimately want there, but also of what was previously there, in order to support proper rollover of resources (itself necessary because of offline support). And as originally designed, Curtain would just adapt its deployment to what was previously there, which means that, if I wasn’t careful and did not double check that staging was properly rolled back to what is present in production (which let’s admit, we’ve all done at some point), then the Curtain deployment to staging would not be representative of the eventual deployment to production. Oops.

So Curtain has been updated to, rather than perform the deployment itself, instead generate a package containing the generated files; this package doubles as a Python script which, when invoked, will perform all the deployment steps to the target of choice. The script itself is dumb and takes no decisions, such that it can be invoked multiple times and perform always the same job, but it also checks prior to operating that the data previously present corresponds to the expectations it was generated with. That way, we can use the same script multiple times, once on staging and once on production, and be certain that the two deployments will be the same. And Alex will be happy.

One more thing. In my initial post, I also completely forgot to mention another influence: Deployinator. Many aspects of Curtain come from Deployinator: deployment as a single operation, deploying assets as a layer separate from code, and versioning these assets as part of the URL, etc. The lessons from Deployinator were so obvious to me that it did not even occur to me to mention where they came from. That omission has now been repaired.

Simple File Cache: improve the performance of FileReader in the browser

When was the last time you obtained a 10x (ten times, 1000%) performance gain with a single improvement?

Not recently, I bet. Most optimizations work incrementally, eking out 3% here, 2% there, and only achieve an observable effect by iterating many such optimization steps. Even algorithmic improvements, such as replacing an O(n²) algorithm by an O(n·log(n)) one, typically get you on the order of 3 or 4 times performance improvement, at least on the data sizes in typical use at the time the improvement is made. So let me tell you how I improved performance of JPS, my web app to apply IPS patches, tenfold.

Once upon a time…

Soon after the initial public version of JPS, I started working on support of another format that (among other processing) requires the CRC32 of the whole file to be obtained, which is best done in blocks of, say, 1024 bytes rather than reading from the file byte by byte. Given my prior experiences, I dreaded the performance penalty from having to (re-)visit every single byte of the file, but it turned out to perform surprisingly well. Why couldn’t I get the same performance when processing IPS files?

So as a proof of concept I started developing a layer that would read from the file in blocks of 4096 bytes, then serve read requests from the loaded data whenever possible, entirely in JavaScript. In other words, a cache. Writing a cache is something you always end up learning in any Computer Science curriculum, and you always wonder why, given that it seems so simple and obvious it need not be taught, and simultaneously is something the platform will provide anyway (especially as modern caches tend to be very complex beasts, what with replacement policies, cache invalidation, and so forth). And Mac OS X, on which I develop, aggressively caches filesystem reads at every level already. Writing my own cache for file reads seemed too obvious to be something worth doing.

Photo of the Mont Blanc, lighted with sunset light

From now on, my longer posts will have random photos from my various trips inserted to serve as breathers. This is the Mont Blanc, lighted by sunset light.

As a way to test this anyway, I wrote the dumbest file cache you could possibly imagine: there is only one cache bucket, it can only be loaded from whole block-aligned ranges in the file, with the result that a number of requests, e.g. those that cross block-aligned boundaries, or those that load from the remainder of the file that can’t form a whole block, have to sidestep the cache and be served from the file separately. Furthermore, JavaScript Blobs are supposed to be immutable, so I did not need to worry about invalidating my cache when the underlying storage changed. Even then, this was a not a trivial thing: the asynchronous nature of the browser file reading API meant the cache had to provide an asynchronous API itself and maintain a “todo list” of read operations being processed.

And now I turn on the cache, and measure the performance improvement… and files that used to take Chrome 50 seconds to process now take 5 seconds! (Chrome being my reference browser for development of JPS). And the 10x factor is consistent, applying over various source files, often turning the processing time into “too short to measure”, and over various platforms: the same files which took around 200 seconds on Chrome for Android now take 20 (and the behavior of desktop Chrome on Windows was the same as on Mac OS X). Similar improvements could be observed with desktop Firefox, with processing times going from 20 seconds to 2 seconds.

Wow.

I reported these findings on the Chromium discussion forums (Chrome being the worst offender), because surely that meant something was wrong with Chrome somewhere. However, not much came out of it, so I decided to productize the cache so as to deploy these performance improvements in production.

From proof of concept to production-worthy code

The proof of concept assumed that, for every read operation except from the first it could just append a new read request from the client to its todo list, and once control would bubble up back to the cache code, the request could be served there if it was in cache. That worked in most cases, at least enough to get performance measurements; but in some cases, a new request would be logged from code that was not called from a callback from our cache, so it would never bubble up back to our code and never be served, and the pump would drain.

Photo of a young Ibex

A young ibex.

Easy enough, I thought, I will get rid of the todo list and instead I will always defer processing by calling setTimeout(,O).

That worked.

But it was slow. Even slower than without the cache.

Turns out, the overhead of calling setTimeout(,O) and getting called back by it was killing this solution. What to do, what to do, what to do? Back to the drawing board, I came up with the solution: reinstate the todo list, and use it, but only if we can tell for sure that we are within code that is being called by cache code — which entails keeping track of that information. If we are not within code that is being called by cache code, only then use setTimeout(,O). That managed to both work in all cases and with good performance.

And then I also had to support aborting requests, adding a number of unit tests, fix a few bugs… and then it was done.

Photo of the Grandes Jorasses

The Grandes Jorasses.

What have we learned?

  • Don’t diss CS or the CS curriculum. You never know when what you learn there might turn out to be useful.
  • Sometimes the obvious solution is the right one.
  • The source of slowness isn’t reading files, per se, but rather the shocking overhead of calling a Web API and getting called back by it (whether it be FileReader or setTimeout(,O)), which by my estimates is around 2 ms for each such operation with Chrome on a modern desktop machine. This is crazy. Other browsers (with the exception of Internet Explorer/Edge, which I have not been able to test) fare better, but still have enough overhead that you have to wonder what is going on in there.

Get the code

I set up a specific project for the cache code: you can get the code on BitBucket, and I also published it on NPM as simple-file-cache. It is free to use and modify (under the terms of the BSD license). If you find it useful, I request that you consider donating to the ACLU and the UNHCR, however.


P.S.: While I’ve got your attention, I’m happy to report that JPS will soon support Safari, as this browser is finally about to get support for the download attribute and downloading blobs, normally as part of Safari 10.1, which is meant to arrive with Mac OS X 10.12.4. Being able to be used on a stock install of Mac OS X will be a huge milestone for JPS and the viability of web apps in general as a way to circumvent Developer ID and Gatekeeper.

Looking back on WWDC 2016

Now that the most important Apple release of WWDC has been dealt with, we can cover everything else. I haven’t followed as closely as previous years (hence no keynote reactions on Twitter), but to me here is what stands out.

The Apple App Stores policy announcements

As seen at Daring Fireball for instance, Apple briefed the press on many current and coming improvements to the Apple App Stores (iOS, Mac, tvOS, watchOS). This actually happened ahead of WWDC, but is part of the package. There are a lot of good things, such as for instance the first acceptance that Apple isn’t entitled over the whole lifetime of an app to 30% of any purchase where the buying intent originated from the app with the 85/15 split instead of 70/30 for subscriptions after the first year. However, none of this solves the lack of free trials: if only subscription apps can have free trials, then thanks, but no thanks. I want to both try before I buy and avoid renting my software, and I don’t think subscriptions make sense for every app anyway, so improvements and clarifications (e.g. indication of whether the app is “pay once and play” or ”shareware” or ”coin-op machine”) to apps using non-recurring payment options would be welcome (more on that in a later post). Also, while those apply to the Mac App Store as well, this one will need more specific improvements to regain credibility. I don’t have much of an opinion on the new search ad system.

The new Apple File System (APFS for short)

Apple announced a new filesystem, and to say that it has, over the years, accumulated a lot of pent-up expectations to fulfill would be the understatement of the year. I can’t speak for everyone, but each year N after the loss of ZFS my reaction was “Well, they did not announce anything this year, it’s likely because they only started on year N-1 and can’t announce it yet because they can’t develop such a piece of software in a yearly release cycle, so there is not use complaining about it as it could be already started, and will show up for year N+1.” Repeat every year. So while I can scarcely believe the news that development of APFS only started in 2014, at the same time I’m not really surprised by it.

I haven’t been able to try it out, unfortunately, but from published information these are the highlights. This is as compared to ZFS because ZFS is the reference that the Mac community has studied extensively back when Apple was working on a ZFS port in the open.

What we’ll get from APFS that we hoped to have with ZFS:

  • A modern, copy-on-write filesystem. By itself, this doesn’t do much, but this is the indispensable basis for everything else:
  • Snapshots, or if you prefer, read-only clones of the filesystem as a whole. Probably the most important feature, by itself it alone would justify the investment of a new filesystem to replace HFS+.

    While the obvious use case is backups, particularly with Time Machine, it is not necessarily in the way you think. Currently, when Time Machine backs up a volume, it has to contend with it being in use, and potentially being modified, while it is being backed up; if it was required to freeze a volume while backing it up, you wouldn’t be able to use it during that time and, as a result, you would back up much less often and that would defeat most of the purpose of Time Machine. So Time Machine has no choice but to read a volume while it is being modified, and as a result may not capture a consistent view of the filesystem! Indeed, if two files are modified at the same time, but one was read by Time Machine before the modification and the other after, on the backup the saved filesystem will have one file without the modification and the other with, which has not been the state of the filesystem you intended to back up at any point in time. In fact, this may mean the data is lost if you have to reload from that backup in case neither half can work with the other as a result.

    Instead, with APFS the backup application will be able to create a snapshot, which is a constant time operation (i.e. does not depend on how much data the volume contains) and results in no additional space being taken, at least initially, then can copy from that snapshot, while the filesystem is in use and being modified, and be confident that it is capturing a consistent view of the filesystem, regardless of where the data is being saved (it could be to an HFS+ drive!). Once the copy is over, the snapshot can be harvested to make sure no additional space is used beyond that needed by the live data. Of course, this will also allow, by using multiple snapshots, to more efficiently determine what changed from last time, and with APFS on the backup drive as well the backup application will be able to save space on the backup drive, in particular not taking up space for redundancies the source APFS drive knows about already. But snapshots on the APFS source drive will mean that, after 10 years, Time Machine will finally be safe: this is a correctness improvement, not merely a performance (faster backups and/or taking less space) one.

  • Real protection in the face of crashes and power loss events. HFS+ had some of that with its journal, but it only protected metadata and came with a number of costs. APFS will make sure its writes and other filesystem updates are “crash-safe”.
  • I/O prioritization. A filesystem does not exist merely as a layout of the data on disk, but also as a kernel module that has in-memory state (mostly cache) that processes filesystem requests, and the two are generally tied. I/O prioritization, some level of it at least, will allow some more urgent requests (to load data for an interactive action for instance) to “jump the queue” ahead of background actions (e.g. reads by a backup utility), all the while keeping the filesystem view consistent (e.g. a read after a write to the same file has to see the file as modified, so it can’t just naively jump over the corresponding write).
  • Multithreaded. In the same vein of improvements to the tied filesystem kernel module, this will allow to better serve different processes or threads that read and write from independent parts of the filesystem, especially if multiple cores are involved. HFS+, having been designed at the time of single-processor, single-threaded machines, requires centralized, bottleneck locks and is inefficient for multithreaded use cases.
  • File and directory hierarchy clones. Contrary to snapshots, clones are writable and are copied to another place in the directory hierarchy (while snapshots are filesystem-wide and exist in a namespace above the filesystem root). The direct usefulness is less clear, but it could be massively useful as infrastructure used by specialized apps, version control notably (both for work areas and repositories).
  • Logical volume management. Apple calls this “space sharing”, but it’s really the possibility to make “super folders” by making them their own filesystem in the same partition, and allows this super folder to have different backup behavior for instance.
  • Sparse files. Might as well have that, too.

What APFS will provide beyond ZFS, btrfs, etc. features:

  • Encryption as a first class feature. Full disk and per-file encryption will be integrated in the filesystem and provided by a common encryption codebase, not as layers above or below the filesystem and with two separate implementations. This also means files that are encrypted per-file will be able to be cloned, snapshotted, etc. without distinction from their unencrypted brethren.
  • Scalability down to the watch. ZFS never scaled down very well, in particular when it comes to small RAM amounts.

What we hoped to have with ZFS, but won’t get from APFS:

  • Crazy ZFS-like scalability. For instance, APFS has 64-bit nodes, not 128-bit. This is probably not unreasonable on Apple’s part.
  • RAID integration as part of the filesystem. APFS can work atop a software or hardware RAID in traditional RAID configurations (RAID-0, RAID-1, RAID-10, RAID-5, etc.), but always as a separate layer. APFS does not provide anything like RAID-Z or any other solution to the RAID-5 write hole. That is worth a mention, though I have no idea whether this is a need Apple should fulfill.
  • Deduplication. This is more generally useful to save space than clones or sparse files, but is also probably only really useful for enterprise storage arrays.

What is unclear at this point, either from the current state or because Apple may or may not add it by the time it ships:

  • Whether APFS will checksum data, and thus guarantee end-to-end data integrity. Currently it seems it doesn’t, but it checksums metadata, and has extensible data structures such that the code could trivially be extended to checksum all data while remaining backwards compatible. I don’t know why Apple does not have that turned on, but I beg them to do so, given the ever-increasing amounts of data we store on disks and SSD and their decreasing reliability (e.g. I have heard of TLC flash being used in Apple devices); we need to know when data becomes bad rather than blindly using it, which is the first step to try and improve storage reliability.
  • Whether APFS is completely transaction-based and always consistent on-disk. Copy-on-write filesystems generally are, but being copy-on-write is not sufficient by itself, and the existence of a fsck_apfs suggests that APFS isn’t always consistent on-disk, because otherwise it would not need a FileSystem Consistency checK. Apple claims writes and other filesystem updates will be “crash-safe”, but the guarantees may be lower than a fully transactional FS.
  • Whether APFS containers will be able to be extended after the fact with an additional partition (from another disk, typically), possibly even while the volumes in it are mounted. APFS support for JBOD, and the fact APFS lazily initializes its data structures (saving initialization time when formatting large disks), suggest it, and it would be undeniably useful, but it is still unknown at this time.
  • Whether APFS will be composition-preserving when it comes to file names. It will, certainly, be insensitive to composition differences in file names, like HFS+; however HFS+ goes one step further and normalizes the composition of file names, which ends up making the returned file name byte string different from what was provided at file creation, which itself subtly trips up some software like version control (via Eric Sink), and which is probably the specific behavior that led Linux founder Linus Torvalds to proclaim that HFS+ was “complete and utter crap”; see also this (latter via the Accidental Tech Podcast guys, who had the same Unicode thoughts as I did). Won’t you make Linus happy now by at least preserving composition, Apple? This is your opportunity!
  • Whether APFS uses B+trees. I know, this is an implementation detail, but it’d be neat if Apple could claim to have continuously been using B-/+trees of either kind for their storage for the last 30 years and counting.

For a more in-depth look at what we know so far about APFS, the best source by all accounts is Adam Leventhal’s series of posts.

Apple File Protocol deprecation

Along with APFS, Apple announced it would not be able to be served over AFP, only SMB (Windows file sharing), and AFP was thus deprecated. This raises the question over whether SMB is at parity with AFP: last I checked (but it was some time ago), AFP was still superior when it came to:

  • metadata and
  • searching

But I have no doubt that, whatever feature gap is left between SMB and AFP (if there is even one left), Apple will make sure it is closed before APFS ships, just like Apple made sure Bonjour had feature parity with AppleTalk before stopping support for AppleTalk.

Playgrounds on iOS

I’m of two minds about this one. I’ve always found Swift playgrounds to be a great idea. To give you an idea, back in the day when the only computer in the house was an Apple ][e, I did not yet know how to code, but I knew enough syntax that my father had set up a program that would, in a loop, plot the result of an expression over a two-axis system, and I would only have to change the line containing the expression, with the input variable being conveniently x, and the output, y; e.g. to plot the result of squaring x, I would only have to enter1:

60 y = x*x

run the program, and away I went. It was an interesting lesson when, due to my limited understanding of expressions, specifically that they are not equations, I once wrote:

60 2y = x+4

Which resulted in the same thing as I previously plotted, because this command actually modified line 602 (beyond the end of the loop)… good times.

Anyway, Swift playgrounds, which automatically plot the outcome of expressions run multiple times in a loop for instance, and even more so on iPad where you have the draggable loop templates and other control structure templates, provide the necessary infrastructure program out of the box, and learners will be able to experiment and visualize what they are doing in autonomy.

These playgrounds will be able to be shared, but when I hear some people compare this to the possibilities of Hypercard stacks, I don’t buy it. There is nothing for a user to do with these playgrounds, the graphic aspect is only a visualization (and why does it need to be so elaborate? This is basically Logo, you don’t need to make it look like a Monument Valley that would not even be minimalistic); even if the user can enter simple commands, it always has to start back from the beginning when you change the code (which is not a bad thing mind you, but shows even the command area isn’t an interactive interface). You can’t interact with these creations. Sharing these is like sharing elaborate Rube Goldberg constructions created in The Incredible Machine: it’s fun, and it’s not entirely closed as the recipient can try and improve on it, but except watching it play there is nothing for the recipient to do without understanding the working of the machine first.

Contrast that with Hypercard, in which not only you set up an actual interface, but what you’d code was handlers for actions coming from the interface, and not a non-interactive automaton. This also means that it was much less of a jump to go from there to an actual app, especially one using Cocoa: it’s fundamentally just a bunch of handlers attached to a user interface. It’s a much bigger jump when all you’re familiar with is playgrounds or even command-line programs, because it’s far from obvious how to go from there to something interactive. Seriously, I’m completely done with teaching programming by starting with command-line apps. It needs to die. What I’d like to see Apple try on the iPad is something inspired by the old Currency Converter tutorial (unfortunately gone now), where you’d create a simple but functional app that anyone could interact with.

Stricter Gatekeeper

…speaking of sharing your programming creations. I’m hardly surprised. This shows web apps is definitely the future of tinkerer apps.


  1. In Apple II Basic, you’d enter a line number then a statement, and that would replace the line in the saved program by the one you just entered. Code editors have improved a bit since then.

Review: App Review Guidelines: The Comic Book

The review for this Wednesday is for an unexpected, shall we say, release: it doesn’t appear to have been solicited through Diamond1 beforehand, and so the first comic book coming from Apple Inc. as a publisher, at least first in recent history, came as a complete surprise to everyone. It was released at the same time as many news from Apple, so it took me a bit of time to notice it, then get to it.

Before we begin, if you’ve followed this blog for a bit, you might have noticed I have a bit of a thing for comics, be it in previous posts or the comicroll and the pull list in the sidebar; or maybe you’ve been following some of my other endeavors or follow me on Twitter and have been left with little doubt that I do read and enjoy comics very much. So this is where I’m coming from on comics in general.

I also have a lot of appreciation more specifically for comics as teaching aids: it is to me a very suitable medium for teaching, and there is a lot of unjustified prejudice against this art form as being not for serious purposes, whatever that means. This is completely wrong, as can show the generally cheesy, but not bad teaching comics I read as a child, and it goes for grownups too, as the cartoons from Larry Gonick show (a nice trove of which can be found here, thanks Jeff), or more recently those Dante Shepherd is commissioning with a dedicated grant: 1, 2, 3, 4, 5 and 6 (so far); hat tip to Fleen. So this comic from Apple could, if well done, help with general understanding of what they are trying to accomplish with these guidelines.

I also understand that, as a developer who has followed Apple’s policies relatively well and have some expertise in interpreting them, and who reads a few specialists in Apple kremlinology, I may not actually be in the target audience. I have little doubt that the app review team and DTS are interacting daily with many, many developers who discover the guidelines when their app gets rejected for violating them and/or have a very incomplete picture of the whole of the guidelines and/or are are very stubborn about what they think their “rights” are; this comic is probably intended for them. Lastly, the link to this comic has been provided to me by people I trust, and it is hosted in a CDN domain that Apple uses for a variety of developer-related resources (e.g. Swift blog post images), so I have little reason to doubt its authenticity.

Get on with it!

Ok, ok. This comic is actually sort of an anthology, split in five parts, and the first is:

Safety

With art by Mark Simmons. In a setting and style reminiscent of Jack Kirby’s cosmic works (New Gods in particular), we find the hidden son of Flash and the Silver Surfer as the hero of this story, in which he has to cruise through space, avoiding a number of hazards, after he encounters some sort of Galactus-like planet eater. Will he succeed in time?

I found the story rather hard to follow, no doubt due to the unfamiliar setting, and had to reread it a few times to make sure I hadn’t missed anything; beyond that, the art serves its purpose, unfortunately the tests clearly isn’t here to support it.

Performance

With art by Ile Wolf and Luján Fernández. In a more playful style, two schoolchildren in uniform are battling using Pokémon/Digimon/kaiju (circle as appropriate), and the battle appears to have grown out of control. The situation is dramatic, and it’s not sure there is anything that can stop them.

At least here any ambiguity as to the situation is intentional, but even then it’s hard to take it seriously when the text (speech or narration) takes you out of the climax; not everyone can be a Stan Lee and add text after-the-fact that works well with such a story. And while the conclusion of “Safety” in part explains its title, I can’t help but think its hero would have been more appropriate to star in the “Performance” section.

Business

With art by Shari Chankhamma. A more intimate setting with interesting art where we follow the growth of a boy through times good and bad, but always in the same place: the barbershop he patronizes.

Maybe the most interesting of the stories in this anthology, and it’s too bad they couldn’t come up with text that was to the level: either do away with it, or hire better writers! Who edited this stuff?

Design

With art by Ben Jelter. Foraging in a post-apocalyptic wasteland, with an art style to match, a boy locates and manages to repair a robot who may or may not be related to Wall-e and Eve.

It’s a section for which developers for Apple platforms have understandably high expectations, but I don’t know if they’ll be met with the robot design, or with the art in general, which is nothing special. The less said about the text, the better.

Legal

With art by Malcolm Johnson. A noir/private eye story, all in greyscale, and interestingly starring a woman.

The art style is surprising in a good way for such a story, but it does not do a very good job of carrying the story, and as we’ve seen, no point in counting on the text for that either. At least this one has more relationship with its claimed subject matter than the others do.

Conclusion

What… in… the… ever-loving… frick? This comic may have the dimensions and approximate page count of a comic issue, but is, to be blunt, a crushing disappointment. Its only point, it turns out, is to put pictures which tell their own stories around the exact words of the official document, without any attempt at adaptation, or even just, say, recontextualization of the guidelines as an exchange between two characters. These words don’t benefit in any way from being told there. Meanwhile, the pictures just follow their own scenarios and tell their own stories without any consideration for what is supposedly spoken in the bubbles: there is no correspondence either thematic or in pace between the events depicted and the words you can read. There is no teaching benefit whatsoever to these comics, and no way I see anyone at any knowledge level benefit from reading it, let alone be enlightened as to the profound meaning of the guidelines. It’s as if bubbles were randomly placed, linked so that each would overflow into the next, then the text of the guidelines was just dumped into them. This shows better than anything I have previously seen that sequential art is more than the sum of pictures and text.

Verdict: download it, but don’t read it, and only use it in a few years to remind your interlocutor who works at Apple that this has been a real thing that Apple has released, in order to embarrass him.

App Review Guidelines: The Comic Book
Price: 0¢ (digital only)
Publisher: Apple
Words: Apple
Cover illustration: Dailen Ogden
Illustrations: Mark Simmons, Ile Wolf, Luján Fernández, Shari Chankhamma, Ben Jelter, and Malcolm Johnson


  1. Diamond is the only distributor to comic book stores in North America, and comics appear in its catalog a few months before being available, in case you’re not familiar with that aspect of the comics industry.

Application Cache was fired for his douchebaggery

To all of you who enquired about the whereabouts of Application Cache, I regret that I have to inform you that he is no longer with our company. This was not an easy decision to take, but we believe it was the right one.

While it has been no secret for some time that Application Cache was a douchebag, this was not necessarily apparent at first. Application Cache promised so much, and we believed him because he could prove his claims to a large extent. However, his way of working was so much at odds with the way other web components work (especially long-time pillar of web infrastructure HTTP cache) that his core value proposition was harder to exploit than it should have been (with many unfortunate pitfalls, as Jake Archibald documented); and worse, his more advanced promises, while working in basic scenarios, had some ancillary troubles, which unexpectedly turned out to be intractable no matter how hard we tried, and so these promises never came to light.

Because he was useful despite the issues, we tried to work with him on these, with many counseling sessions with HR; however, Application Cache was adamant that this was his fundamental mode of operation and he could not work any other way, and that others would have to adapt to him. This, of course, was not remotely acceptable, but we could not find any way to make him change either, so little progress was made. There was some, as we did manage to make him more transparent; some claimed that made him no longer a douchebag, but in truth he remained one.

Still, we believed that it could still be worth keeping him just for his core value proposition of using web apps while offline. But as time went on, it became clear that even that was not going to be worth the bother, again as a consequence of his fundamentally different way of working. Things came to a head when we tried to solve race conditions resulting from the possibility that a user load the initial HTML page before the web app is updated, and its dependencies (including the manifest) after the web app is updated: the manifest has to be updated at the same URL (it acts as a fixed entry point of sorts for users who already have the web app in Application cache), so we could not rely on the HTML pointing to a new manifest URL so that the update of the entry point would atomically result in the update of the web app. Even with the provision that the manifest be redownloaded after the entry point, and checked against the manifest downloaded before in the case of an app already in Application Cache (so as to try to have the manifest always loaded after the entry point, at least conceptually), we were stuck.

Some solutions were found, though limited to ideal situations; there was no solution available for the case of a serving infrastructure, such as content distribution networks, with only “eventually consistent” or other weak guarantees, and there was no solution either if even minimal use of FALLBACK: was required. Moreover, even in ideal situations those solutions bring a lot of burden on the web developer, too much burden considering that offline web apps ought to work correctly in the face of these race conditions by default, or at least with minimal care. In the end, Application Cache was let go a few months ago.

If you were relying on the services provided by Application Cache, don’t worry. While there will be no future evolution (in particular, don’t expect bugs to get fixed), a new guy was hired to perform the tasks of Application Cache exactly as the latter did them. This new guy, Service Worker, will also provide a new service allowing web apps to work offline, this time in harmony with the other web components: for instance, out of the box he makes it possible to throttle checks for updated versions simply by setting a cache control header on the service worker (the period being a day at most); something which was exceedingly hard, if not impossible, with Application Cache due to his bad interactions with HTTP cache. He was already available in Chrome, and with the recently released Firefox 44, two independent, non-experimental implementations have now shipped, so you should take the time to make his acquaintance.

New software release: JPS

JPS (stands for Javascript Patching System) is a web app that applies binary patches, currently IPS patches. Usage is simple enough: you provide the reference file and the patch file, and once patching is done you recover the patched file just as you would download a file from a server, except everything happens on your local machine. Moreover, JPS works while offline, thanks to Curtain, which was in fact developed for the needs of JPS.

JPS works on any reasonably recent version of Firefox or Chrome (both of which update automatically anyway), as well as any version of Opera starting with Opera 15. Unfortunately, some of the features used (download of locally-generated files in particular) are not universally supported yet, which means that, regardless of my efforts, Safari (rdar://problem/23550189, OpenRadar) and Internet Explorer are not supported; as a Safari user myself, this bothers me, but I could not find any way around this issue, you will have to wait for a version of Safari that supports the download attribute.

Some background…

My motivation for writing JPS came from two events:

Indeed, when I learned of Zelda Starring Zelda I wanted to play it (A+++ would play again, currently playing the second installment), but realized the IPS patcher I previously used no longer ran (it was built for PowerPC), and while I was able to download and use a different patcher I thought there had to be a better way than each platform using a different program, program also susceptible to becoming unsupported. And this joined my thoughts from the time when Gatekeeper and Developer ID were announced, where I wondered if we couldn’t circumvent this Apple restriction using web apps. So I decided I would develop a web app to apply IPS patches.

While most of the difficulties were encountered when developing the Curtain engine, the browser features used by JPS itself, namely client-side file manipulation and download, led to some challenges as well. One fun aspect was taking a format, IPS, which embeds many assumptions, some undocumented, on C-like manipulation APIs (e.g. writing to a mutable FILE*-like object, and performing automatic zero filling when writing past the end of file), and making it work using the functional Blob APIs, based on slicing and concatenation of arrays and immutable Blob objects. There were a few interesting surprises, for instance early versions of JPS could, on some input files, cause Firefox to crash, taking down JPS and all the other Firefox tabs! Worse, resolving this required a significant rewrite to the patching engine, which led me to develop automated tests to catch any regression before performing this rewrite, to ensure that the rewrite would not regress in any way (it didn’t).

JPS has been extensively tested prior to this release; I tested myself about a hundred patches, with only one patch not working while running on Firefox (bug report), and it has been in open beta for some time without any other problematic patch having been reported.

The JPS source code is available under a BSD license; the source release contains all the needed code to deploy it with Curtain (which has to be downloaded separately), as well as test vectors for the IPS file format and a test harness to automatically test JPS using these files.

A few more words

While I would have liked to support Safari so that JPS could run out of the box on Mac OS X, I deem this proof of concept of a desktop-like web app to be good enough for at least a subset of desktop use cases; enough so for me to put the Gatekeeper and Developer ID concerns behind me. I can now reveal that, because of these concerns, I did not update to Mac OS X Mountain Lion or any later version until today; yes, up until yesterday I was still running Lion on my main machine.

Now that JPS and Curtain have been released, I can’t wait to see what will be done with this easy (well, OK, easier) way to develop small desktop-like tinkerer tools using the web!

Introducing Curtain

I’m excited to introduce Curtain to you today. Curtain is a packaging and deployment engine for desktop-like web apps; Curtain handles the business of generating the app from source files and deploying it on the server such that it supports offline use.

Curtain can be cloned from BitBucket, and it has a sample app, both under the BSD license. Rather than repeating the Readme found there, I would like here to provide some background.

Some background…

Offline support

I wanted to use Application Cache for a project; as you know, Application Cache is a douchebag, but even that article did not prepare me for how much of a douchebag it is. In particular, you want web apps to be able to be updated, if only because the first version inevitably has bugs. Remember that, even if the list of files in the manifest does not change, the manifest has to change whenever the app changes, otherwise users won’t get the updated version. So how to update the manifest and app?

  • If the app is updated in this manner:

    1. manifest is updated
    2. remainder is updated

    or even if the two are updated at the same time, then you could run into the following scenario:

    1. user does not have the app in cache, and fetches the HTML resource
    2. manifest is updated
    3. remainder is updated
    4. due to a network hiccup on her side, user only now fetches the manifest

    Now the user has the manifest for the updated version, but is really running the previous version of the web app. Even if the list of cached files is still correct, now whenever the user agent checks for an updated manifest it will find it to be bit-for-bit identical, and the user agent will not update the version the user uses, which is out of date, until a second update occurs. This is obviously not acceptable, and if the list of cached files is incorrect for the version it will be even worse.

  • Now imagine the web app is updated in this manner:

    1. remainder is updated
    2. manifest is updated 30 seconds (one network timeout) later

    In this case, the scenario in the previous case cannot occur: if the user fetched the HTML resource prior to the update, the user agent will either succeed before the manifest is updated, or will give up at its network timeout. However, another scenario can now occur:

    1. remainder is updated
    2. user loads the app from the server (either initial install or because he still had a version prior to the one before the update), both app files an manifest
    3. manifest is updated

    In that case, the user has the updated app but the manifest for the previous version. Even if the list of cached files is correct, the versions are inconsistent which is an issue if the new version turns out to have a showstopping issue (which sometimes only becomes apparent after public deployment, due to the enormous variety of user agents in the wild) and we decide to roll back to the previous version: in that case, whenever the user agent checks for an updated manifest, it will find it hasn’t changed and the user will keep using the version of the app that has a showstopping issue. When performing the rollback, we could decide to modify the manifest so that it is different from both versions, but this is dangerous: when rolling back you want to deploy exactly what you deployed before, in order to avoid running into further issues. And I don’t need to tell you how problematic having inconsistent app and manifest would be if the list of resources to cache changed during the update.

So how does Curtain solve this problem?

By updating the manifest twice:

  1. manifest is updated with intermediate contents
  2. remainder is updated
  3. manifest is updated again 30 seconds (one network timeout) later

If the list of resources to cache changes during the update, the manifest contains the union of the files needed by the previous version and the files needed by the updated version; and in all cases, the intermediate manifest contains in a comment two version numbers: the one for the app prior to the update, and the one for the app after the update. That way the manifest is suitable in both cases, and his method of update avoid all the issues associated with the previous methods.

Of course, that would be tedious and error-prone to handle by hand, so Curtain generates both intermediate and updated manifests from a script.

Versioned resources

I enjoy reading Clients from Hell; even though I don’t design web sites for a living I relate strongly to these horror stories. Except for one kind: those where the client complains he should not have to clear the cache/do a hard reload/etc. to see the fully updated site. Sorry, but for those, I side completely and unquestioningly with the client. Even in a development iteration context, it is up to the developer to show he can change the site and have the changes propagate without the user needing to do anything more than a soft reload (which invalidates the initial HTML resource if necessary, but nothing else), because such changes will need to happen in a post-deployment context. And don’t get me started on the number of site redesigns where the previous versions of all assets (icons, previous/next arrows, etc.) are still visible, and the announcement post starts with the caveat that you may have to reload manually in order for the redesign to be fully in effect… and even then, it has to be done again on a second page, because the main page does not have a “next” arrow for instance.

Yes, clearly you want resources and image assets, in particular, to be far-expire in order to save on bandwidth. But this means they must also be immutable: they might disappear, but may never, ever change; and if a different resource is needed, then it must have a different URL. Period.

Obviously, changing the resource name by hand, especially if you need to do so for every development iteration, is tedious and error-prone. When I read in web development tutorials, including some Application Cache ones, the suggestion to use, say, script-v2.js and increment version numbers that way, I can’t help but think “Isn’t that the cutest thing? Thinking you can do so flawlessly without ever forgetting to so do whenever a resource changes? Awwww…” because that is a recipe for failure, even if you only change these resources as part of a deploy.

Such inconsistency issues are even worse for offline web apps. Indeed, if your web app cannot work offline, you can just assume that, if your web app works incorrectly because of an inconsistent set of resources, the user will just reload and she will eventually get consistent resources. But in the case of an offline web app, once the user is back to her new home for which DSL hasn’t been installed yet (I’m getting tired of the airplane example) she has no opportunity for reload.

Even worse, even if the user checked while she was online that the web app was working correctly (which is asking a lot of her already), it may in fact be the previous version that was reloaded from the cache, while an inconsistently updated version is being downloaded, and when she relaunches it while at home she will get the inconsistent version. You can’t afford to be careless with offline web apps.

Curtain resolves this issue by relying on a version control system. On the build machine, all resources must be under a version-controlled work area, and Curtain will query the version control system for the ID of the version where the resource was last updated, and will generate a resource name by appending this version ID. Note that by doing it this way, Curtain will avoid changing the URL of the resource (which would invalidate it in the cache) even if everything else has changed, as long as the resource itself hasn’t changed. Curtain will process your HTML to replace references to the resource by references to the versioned resource name, and upload the result, and upload the resources themselves so that they have the versioned name on the server.

Curtain will also assign a version to the app as a whole, this is in particular put in a comment in the manifest (see above): this version is simply the current version of the version control system work area. Curtain itself must be in such a work area, so that if Curtain is updated but the source files are not, the version number is changed.

As part of these tasks, Curtain will manage the Cache-Control headers by synthesizing the necessary .htaccess file, which is especially important when using Application Cache; since it has to deal with .htaccess anyway, Curtain will also directly manage the MIME type of these resources, to avoid relying on the default Apache behavior (based on file extensions).

No progressive rendering

I have always found progressive rendering to be unsightly. It was necessary in the first days of the web, what with images taking seconds to download, it is largely necessary on mobile to this day, and it is still desirable on desktop for online apps. But for offline, desktop-like web apps? No way.

Curtain opts out of progressive rendering by downloading all dependent resources through XMLHttpRequest and explicitly loading the content, for instance for image resources by generating a URL for the downloaded Blob and assigning it through code to the src attribute of the img tag; this means Curtain-deployed web apps depend on XHR2 and Blob as a XHR responseType. Curtain will hide the interface until all resources have been loaded and assigned, assuming that the user will retry loading the app if no interface appears after a time; it is safe to assume the user is online at that time, because if he is offline, this means all the files listed in the Application Cache manifest are locally available and so will not fail to load.

If JavaScript is disabled or the browser does not have the necessary support for Curtain, we want to be able to show a message to that effect, and we want to do it in the context of the “usual” interface, so that she recognizes the web app. So the entirety of the interface is put in a div belonging to a CSS class called curtain. A small bit of JavaScript code before the interface hides this div: if JavaScript is disabled, the interface simply won’t be hidden. Then code after the interface will check everything necessary for the Curtain runtime to perform its job (using Modernizr in particular); if not everything is available, then the message will be changed and the div will be made visible.

The HTTP URLs of the images are put in the src attributes of the img tags in the initially downloaded HTML. However, this is only a provision for the above two error cases; in normal usage they will have been replaced by the blob URLs prior to the interface becoming visible.

Language

First, Curtain generates static sites, and does not depend on any server programming language or any kind of server processing. Second, while early versions of the build and upload script were written as a shell script, Curtain is written in Python so as to be as portable as possible (it was that or Perl; I chose Python), though I have not been able to test it on Windows or Linux yet.

Third, Curtain embeds a bit of JavaScript code along with your app, and it expects your app to be written in JavaScript. However, Curtain makes no pretense at bringing JavaScript framework features; you should be able to use it with any JavaScript framework, including Vanilla JS.

Stay tuned…

Stay tuned, because tomorrow I will present you the sample app for Curtain, and its justification.

Various comments on the iPad pro, among other Apple announcements

As someone who regularly expresses himself on how the iPad is (or how it currently falls short of being) the future of computing, I guess I should comment on the announcements Apple made this September, in particular of course the iPad pro. I think it’s great news, of course. But… where to begin? More so than the iPhone, it is software applications, and specifically third-party software, that will make or break the iPad, even more so for the iPad pro, considering that most of its hardware improvements will only really be exploited with third-party apps: it does not appear that Apple will provide a built-in drawing app to go with the impressive tablet drawing capabilities of the iPad pro, for instance.

And so, what corresponding iOS platform news did we get from Apple this September? Err, none. From a policy standpoint, iOS is still as developer-unfriendly, by not supporting software trials for instance, even though this is a fundamental commercial practice; in fact, this appear to be backfiring on Apple, as this resulted in potential buyers going for the already established brands1 when it comes to professional iPad software, and in Apple coming to bringing Adobe and Microsoft on stage in the presentation to prove the professional potential of the iPad pro; those two companies are probably the ones Apple wishes to be least dependent upon, and yet here we are.

And what about iOS, iOS 9 specifically? When it was announced at WWDC, my thoughts were, on the timing: “It’s about time!”, and on the feature set: “This is a good start.”; and this was in the context of the 10′ iPad, so with the iPad pro announcement I was expecting more multitasking features to be announced along with it, but nope, that was it, what I saw at WWDC and thought would be catching up, meant for the then-current iPads, was in fact a stealth preview of software features meant to be sufficient for the iPad pro. Don’t get me wrong, on-screen multitasking on the iPad is awesome (I’ve been playing with it this summer with the developer betas), but the iPad pro will need more than that. Much more, such as, I don’t know, drag and drop between apps? Or a system where one app could ask a service of another other app (bringing it on the screen by the first app’s side if it wasn’t present), similar to extensions except it would better use the screen real estate allowed by the iPad and iPad pro?

So yes, most definitely count me in with those who think Apple is solving only one part of the problem of the iPad platform with the iPad pro. I’m afraid the iPad has too little direct competition for Apple to take this problem seriously right now, and later it may be too late.

All that having been said, I am very impressed with what they’ve shown of the iPad pro’s capabilities as a graphic tablet, even if I will probably never use these capabilities myself; I think Wacom should indeed be worried. One thing that will matter a lot, but Apple hasn’t talked about or posted specs on, is the thinness of the so-called air gap between the screen and the drawing surface, which (along with latency, which they did mention in the keynote) makes all the difference in making the user feel like he is actually drawing on a surface, rather than awkwardly interacting with a machine; Piro of Megatokyo posted about this a while ago. This is something Apple has been good at, at least when compared with other handset/tablet makers. At any rate, the iPad, even the iPad pro, has to support the needs of everyone, so Wacom and other graphic tablet makers may still be able to differentiate themselves by providing possibilities that a tablet that still must support finger input and other usage patterns than drawing could not provide.

Lastly, I am interested in the iPad pro as a comic reader. The 10′ iPad family is generally great for reading manga and U.S. comics, as that screen size is large enough to contain a single page of them (in the case of U.S. comics, pages are actually slightly scaled down, but that does not impair reading), however it falls short when needing to show a double page spread, such as the (outstanding) one found in the latest issue of Princeless (if you’re not, you should be reading Princeless. Trust me.) or the not less outstanding ones found in Girl Genius. The iPad pro has the perfect size and aspect ratio for those, being able of showing two U.S. comics pages side by side at the same scale a 10′ iPad is able of showing a single page. A 10′ iPad is also too small to reasonably display a single page of comics of the French-Belgian tradition (aka “bandes dessinées”), while the iPad pro would be up to the task with only a minor downscale; I’m not about to give up buying my French-Belgian comics on paper any time soon, but it would be a wonderful way to read them for someone overseas who is far away from any store that distributes such comics, especially for less well-known ones.2

As for the other announcements… I don’t know what the new Apple TV will be capable of outside the U.S., so most of its appeal is hard to get for me. I have little to comment on with regard to its capabilities as an app platform, except that I find the notion of Universal iOS/tvOS games (even more so saved game handoff) to be completely ludicrous: the two have completely different interaction mechanisms, so the very notion that the “same game” could be running on both is absurd.

As for the iPhone 6S/6S+, what is to me most interesting about them is 3D touch and live photos. Those two are very dependent on the concept catching on with developers (well, Vine is ready for live photos, of course…): there is little point in being able to capture live photos if you can’t share them as such on Facebook/Flickr/Dropbox/etc., and it remains to be seen which developers will add 3D touch actions to their apps, and what for. So will they catch on? We’ll see.


  1. And they, in turn, have the branding power to pull off alternate payment schemes such as subscriptions, allowing them to thrive on the iPad more than developers dependent on race-to-the-bottom iOS App Store revenue do.
  2. What about double page spreads in French-Belgian comics? Those are rare, but they exist, most notably in the œuvre of Cabu (yes, that Cabu). However, I am not sure the 17′ handheld tablet that would be necessary to properly display these would be very desirable.

Maybe Android did have the right idea with activities

One of the features that will land as part of iOS 9 is a new way to display web content within an app by way of a new view controller called SFSafariViewController, which is nicely covered by Federico Viticci in this article on MacStories. I’ve always been of the (apparently minority) opinion that iOS apps ought to open links in the default browser (which currently is necessarily Mobile Safari), but the Safari view controller might change my mind, because it pretty much removes any drawback the in-app UI/WKWebView-based browsers had: with the Safari view controller I will get a familiar and consistent interface (minus the editable address bar, of course), the same cookies as when I usually browse, my visits will be registered in the history, plus added security (and performance, though WKWebView already had that), etc. Come November, maybe I will stop long-tapping any and all links in my Twitter feed (in Twiterrific, of course) in order to bring the share sheet and open these links in Safari.

Part of my reluctance to use the in-app browsers was that I considered myself grown up enough to switch apps, then switch back to Twitter/Tumblr/whatever when I was done with reading the page: I have never considered app switching to be an inconvenience in the post-iOS 4 world, especially with the iOS 7+ multitasking interface. But I understand the motivation behind wanting to make sure the user goes back to whatever he was doing in the app once he is done reading the web content; and no, it is not about the app developers making sure the user does not stray far from the app (okay, it is not just about that): the user himself may fail to remember that he actually was engaged in a non-web-browsing activity and thus is better served, in my opinion, by a Safari experience contained in-app than he is by having to switch to Safari. For instance, if he is reading a book he would appreciate resuming reading the book as soon as he is done reading the web page given as reference, rather than risk forgetting he was reading a book in the first place and realize it the following day as he reopens the book app (after all, there is no reason why only John Siracusa ebooks should have web links).

And most interestingly, in order to deliver on all these features, the Safari view controller will run in its own process (even if, for app switching purposes for instance, it will still belong to the host app); this is a completely bespoke tech, to the best of my knowledge there is no framework on iOS for executing part of an app in a specific process as part of a different host app (notwithstanding app extensions, which provide a specific service to the host app, rather than the ability to run a part of the containing app). And this reminded me of Android activities.

For those of you not familiar with the Android app architecture, Android applications are organized around activities, each activity representing a level of full-screen interaction, e.g. the screen displaying the scrollable list of tweets complete with navigation bars at the top and/or bottom, with the activity changing when drilling down/back up the app interface, e.g. when tapping a tweet the timeline activity slides out and the tweet details activity slides in its place, with activities organized in a hierarchy. Android activities are roughly equivalent to view controllers on iOS, with a fundamental difference: installed apps can “vend” activities, that other apps can use as part of their activity hierarchy. For instance, a video player app can start with an activity listing the files found in its dedicated folder, then once such a file is tapped it starts the activity to play a single video, all the while also allowing any app, such as an email client or the file browser, that wants to play a video the ability to directly invoke the video-playing activity from within the host app.

I’ve always had reservations with this system: I thought it introduced user confusion as to which app the user “really” is in, which matters for app switching and app state management purposes, for instance. And I still do think so to an extent, but it is clear the approach has a lot of merit when it comes to the browser, given how pervasive visiting web links is even as part of otherwise very app-centric interactions. And maybe it could be worth generalizing and creating on iOS an activity-like system; sure, the web browser is pretty much the killer feature for such a system (I am more guarded about the merits when applying it to file viewing/editing, as in my video player example), but other reasonable applications could be considered.

So I’m asking you, would you like, as a user, to have activity-like features on iOS? Would you like, as a developer, to be able to provide some of your app’s view controllers to other apps, and/or to be able to invoke view controllers from other apps as part of your app?