APFS’s “Bag of Bytes” Filenames (Michael Tsai – Blog)

I have sooooooooo many questions. I mean, first I have the same ones as Michael, but on top of that:

  • “bag of bytes”, but I hope at least that the file name, even if not normalized, is guaranteed to be valid UTF-8, right? Right? Right?
  • In some circumstances, it is possible for the user to type the beginning of a file name to select or at least winnow the file selection; is there going to be guidance on how to perform this?
  • Sorting file names for display. Oh, the fun we shall have with sorting. Again, will guidance/a standard function be provided?
  • Normally this should result in less issues for software that wrote a file name with any valid UTF-8 string, then expects a file with that exact name to be in the directory listing, as it will be the case at least more often (I must admit I don’t fully understand the issue that led to the Apple response in the first place, though I understand even less the Apple response). However, when performing manipulations with NSString/NSURL/Swift String, do those preserve composition enough that developers can rely on them for that?

Now, granted, I know two people this will make happy (or, OK, less unhappy)…

EDIT: One additional data point about this, is that in a similar situation, even Apple doesn’t get it right (coincidentally, fixed in Safari 10.1 and iOS 10.3). Let me tell you, this issue was a bear to isolate.

I admit:

  • I have no idea where this was in Safari, though it is safe to say Apple has responsibility for that code,
  • Safari is already compensating for invalid data, the URL should be properly escaped in the first place, and
  • this is when using HTTP, not the filesystem.

Nevertheless, this shows Apple themselves sometimes get it wrong and normalize strings in a way that causes issues because the underlying namespace has a dumb byte string for key. So if they can get it wrong, then third-party developers will need all the help they can get to get it right.

EDIT: New info, in that there will be a case-insensitive variant for the Mac, which will also behave differently for normalization.

I think “normalization-preserving, but not normalization-sensitive” means that (like HFS+ on the Mac, unlike APFS on iOS) you cannot have multiple files whose names differ only in normalization. And you can look up a file using the “wrong” normalization and still find it. Additionally, beyond what HFS+ offers, if you create a file and then read the directory contents, you’ll see the filename listed using the same normalization that you used.

This is my interpretation as well.

Leave a Reply

Name *
Email *