“Character”-by-“character” string processing is hard, people

I bet you did not believe me when I wrote in Swift thoughts about how it is hard to properly process strings when treating them as a sequence of Unicode code points, and that as a result text is better thought of as a media flow and strings better handled through the few primitives I mentioned, which never treat strings using any individual entity (be this entity the byte, the UTF-16 “character”, the Unicode code point, or the grapheme cluster). I am exaggerating, of course, some of you probably did believe me, but given how I still see string processing being discussed between software developers, this is true enough.

So go ahead and read the latest post in the Swift blog, about how they changed the String type in Swift 2, and the fact that it is no longer considered a collection (by no longer conforming to the CollectionType protocol), because a collection where appending an element (a combining acute accent, or “´”) not only does not result in the element being considered part of the collection, but also results in elements previously considered part of it (the letter “e”) to no longer be, is a pretty questionable collection type. Oops.

But that is not the (most) interesting aspect of that blog post.

Look at the table towards the end, which is supposed to correspond to a string “comprised of the decomposed characters [ c, a, f, e ] and [ ´ ]”, and which I am reproducing here, as an actual HTML table as Tim Berners-Lee intended, for your benefit (and because I am pretty certain they are going to correct it after I post this):

Character

c

LATIN SMALL LETTER C
U+0063

a

LATIN SMALL LETTER A
U+0061

f

LATIN SMALL LETTER F
U+0066

é

LATIN SMALL LETTER E WITH ACUTE
U+00E9

Unicode Scalar Value

c

a

f

e

´

UTF-8 Code Unit

99

97

102

101

204

UTF-16 Code Unit

99

97

102

769

The first thing you will notice is the last element of the Character view, the whole row in fact. Why are they described by a Unicode code point each? Indeed, each of these elements is an instance of the Swift Character type, i.e. a grapheme cluster, which can be made up of multiple code points, and this is particularly absurd in the case of the last one, which corresponds to two Unicode code points. True, it would compare equal with a Swift Character containing a single LATIN SMALL LETTER E WITH ACUTE, but that is not what it contains. And yet, this is only the start of the problems…

If we take the third row, its last element is incorrect. Indeed, 204, or 0xCC ($CC for the 68k assembly fans in the audience) is only the first byte of the UTF-8 serialization of U+0301 (COMBINING ACUTE ACCENT) that you see in the previous row (which is correct, amazingly), the second being $81.

And lastly, if the last two column are two separate Unicode scalar values, how could they possibly be represented by a single UTF-16 scalar? Of course, they can’t: 769 is $0301, our friend the combining acute accent. “e” is simply gone.

So out of 4 rows, 3 are wrong *slow clap*. So here is the correct table:

Character

c

a

f

Unicode Scalar Value

c

LATIN SMALL LETTER C
U+0063

a

LATIN SMALL LETTER A
U+0061

f

LATIN SMALL LETTER F
U+0066

e

LATIN SMALL LETTER E
U+0065

´

COMBINING ACUTE ACCENT
U+0301

UTF-8 Code Unit

$63

$61

$66

$65

$CC

$81

UTF-16 Code Unit

$0063

$0061

$0066

$0065

$0301

Note that with the example given, Unicode scalar value match one for one with the UTF-16 scalars in the sequence. For a counterexample to be provided, the string would have to include Unicode code points beyond the Basic Multilingual Plane — a land populated by scripts no longer in general usage (hieroglyphs, Byzantine musical notations, etc.), extra compatibility ideographs, invented languages, and other esoteric entities; that place, by the way, is where emoji were (logically) put in Unicode.

Conclusion

If Apple can’t get its “Characters”, UTF-16 scalars, and bytes of a seemingly simple string such as “café” straight in a blog post designed to show these very views of that string, what hope could you possibly have of getting “character”-wise text processing right?

Treat text as a media flow, by only using string processing primitives without ever directly caring about the individual constituents of these strings.

Thank you, Mr. Siracusa

Today, I learned that John Siracusa had retired from his role of writing the review of each new Mac OS X release for Ars Technica. Well, review is not quite the right word: as I’ve previously written when I had the audacity to review one of his reviews, what are ostensibly articles reviewing Mac OS X are, to my mind, better thought of as book-length essays that aim to vulgarize the progress made in each release of Mac OS X. They will be missed.

It would be hard for me to overstate the influence that John Siracusa’s “reviews” have had on my understanding of Mac OS X and on my writing; you only have to see the various references to John or his reviews I made over the years on this blog (including this bit…). In fact, the very existence of this blog was inspired in part by John: when I wrote him with some additional information in reaction to his Mac OS X Snow Leopard review, he concluded his answer with:

You should actually massage your whole email into a blog post [of] your own.  I’d definitely tweet a link to it! :)

to which my reaction was:

Blog? Which blog? On the other hand, it’d be a good way to start one
Hmm

Merely 4 months later, for this reason and others, this blog started (I finally managed to drop the information alluded to in 2012; still waiting for that tweet ;) ).

And I’ll add that his podcasting output may dwarf his blogging in volume, but, besides the fact I don’t listen to podcasts much, I don’t think they really compare, mostly because podcasts lack the reference aspect of his Mac OS X masterpieces due to the inherent limitations of podcasts (not indexed, hard to link to a specific part, not possible to listen in every context, etc.). But, ultimately, it was his call; as someone, if I remember well, commented on the video of this (the actual video has since gone the way of the dodo): “Dear John, no pressure. Love, the Internet”. Let us not mourn, but rather celebrate, from the Mac OS X developer preview write-ups to the Mac OS X 10.10 Yosemite review, the magnum opus he brought to the world. Thank you, Mr. Siracusa.

Unconventional iOS app extension idea: internal thumbnail generator

The arrival (along with similar features) of extensions in iOS 8, even if it does not solve all problems with the platform’s inclusiveness, represents a sea change in what is possible for third-party developers with iOS, enabling many previously unviable apps such as Transmit iOS. But, even with the ostensibly specific scenarios (document provider extensions, share extensions, etc.) app extensions are allowed to hook themselves to, I feel we have only barely begun to realize the potential of extensions. Today I would like to present a less expected problem extensions could solve: fail-safe thumbnail generation.

The problem is one we encountered back in the day when developing CineXPlayer. I describe the use case in a radar (rdar://problem/9115604), but the tl;dr version is we wanted to generate thumbnails for the videos the user loaded in the app, and were afraid of crashing at launch as a result of doing this processing (likely resulting in the user leaving a one-star “review”), so we wanted to do so in a separate process to preserve the app, but the sandbox on iOS does not allow it.

But now in iOS 8 there may be a way to use extensions to get the same result. Remember that extensions are run in their own process, separate from both the host app process and the containing app process; so the idea would be to embed an action extension for a custom type of content that in practice only our app provides, make the videos loaded in the app provided under that type to extensions, and use the ability of action extensions to send back content to the host to send back the generated thumbnail; if our code crashes while generating the thumbnail, we only lose the extension process, and the app remains fine.

This would not be ideal, of course, as the user would have to perform an explicit action on each and every file (I haven’t checked to see whether there would be sneaky ways to process all files with one extension invocation), but I think it would be worth trying if I were still working on CineXPlayer; and if after deployment Apple eventually wises up to it, well, I would answer them that it’s only up to them to provide better ways to solve this issue.

MPW on Mac OS X

From Steven Troughton-Smith (via both Michael Tsai and John Gruber) comes the news of an MPW compatibility layer project and how to use it to build code targeting Classic Mac OS and even Carbonized code from a Mac OS X host, including Yosemite (10.10). This is quite clever, and awesome news, as doing so was becoming more and more complicated, and in practice required keeping one ore more old Macs around.

Back in the days of Mac OS X 10.2-10.4, I toyed with backporting some of my programming projects, originally developed in Carbon with Project Builder, to MacOS 9, and downloaded MPW (since it was free, and CodeWarrior was not) to do so. The Macintosh Programmer’s Workshop was Apple’s own development environment for developing Mac apps, tracing its lineage from the Lisa Programmer’s Workshop, which was originally the only way to develop Mac apps (yes, in 1984 you could not develop Mac software on the Mac itself). If I recall correctly, Apple originally had MPW for sale, before they made it free when it could no longer compete with CodeWarrior. You can still find elements from MPW in the form of a few tools in today’s Xcode — mostly Rez, DeRez, GetFileInfo and SetFile. As a result, I do have some advice when backporting code from Mac OS X to MacOS 9 (and possibly earlier, as Steven demonstrated).

First, you of course have to forget about Objective-C, forget about any modern Carbon (e.g. HIObject, though the Carbon Event Manager is OK), forget about Quartz (hello QuickDraw), forget about most of Unix, though if I recall correctly the C standard library included with MPW (whose name escapes me at the moment) does have some support beside the standard C library, such as open(), read(), write() and close(). Don’t even think about preemptive threads (or at least, ones you would want to use). In fact, depending on how far back you want to go, you may not have support for what you would not even consider niceties, but were actually nicer than what came before; for instance, before Carbon, a Mac app would call WaitNextEvent() in a loop to sleep until the next event that needed processing, and then the app would have to manually dispatch it to the right target, including switching on the event type, performing hit testing, etc.: no callback-based event handing! But WaitNextEvent() itself did not appear until System 7, if I recall correctly, so if you want to target System 6 and earlier, you have to poll for events while remembering to yield processing time from time to time to drivers, to QuickTime (if you were using it), etc. The same way, if you want to target anything before MacOS 8 you cannot use Navigation Services and instead have to get yourself acquainted with the Standard File Package… FSRefs are not usable before MacOS 9, as another example.

When running in MacOS 9 and earlier, the responsibilities of your code also considerably increase. For instance, you have to be mindful of your memory usage much more than you would have to in Mac OS X, as even when running with virtual memory in MacOS 9 (something many users disabled anyway) your application only has access to a small slice of address space called the memory partition of the application (specified in the 'SIZE' resource and that the user can change): there is only one address space in the system which is partitioned between the running apps; as a result memory fragmentation becomes a much more pressing concern, requiring in practice the use of movable memory blocks and a number of assorted things (move high, locking the block, preallocating master pointers, etc.). Another example is that you must be careful to leave processor time for background apps, even if you are a fullscreen game: otherwise, for instance if iTunes is playing music in the background, it will keep playing (thanks to a trick known as “interrupt time”)… until the end of the track, and become silent from then on. Oh, and did I mention that (at least before Carbon and the Carbon Event Manager) menu handling runs in a closed event handling loop (speaking of interrupt time) that does not yield any processing time to your other tasks? Fun times.

Also, depending again on how far back you want to go, you might have difficulty using the same code in MacOS 9 and Mac OS X, even with Carbon and CarbonLib (the backport of most of the Carbon APIs to MacOS 9 as a library, in order to support the same binary and even the same slice running on both MacOS 9 and Mac OS X). For instance, if you use FSSpec instead of FSRef in order to run on MacOS 8, your app will have issues on Mac OS X with file names longer than were possible on MacOS 9; they are not fatal, but will cause your app to report the file name as something like Thisisaverylongfilena#17678A… not very user-friendly. And the Standard File Package is not supported at all in Carbon, so you will have to split your code at compile time (so that the references to the Standard File Package are not even present when compiling for Carbon) and diverge at runtime so that when running in System 7 the app uses the Standard File Package, and when running in MacOS 8 and later it uses Navigation Services, plus the assorted packaging headaches (e.g. using a solution like FatCarbon to have two slices, one ppc that links to InterfaceLib, the pre-Carbon system library, linking weakly to the Navigation Services symbols, and one ppc that links to CarbonLib and only runs on Mac OS X).

You think I’m done? Of course not, don’t be silly. The runtime environment in MacOS 9 is in general less conductive to development than that of Mac OS X: the lack of memory protection not only means that, when your app crashes, it is safer to just reboot the Mac since it may have corrupted the other applications, but also means you typically do not even know when your code, say, follows a NULL pointer, since that action typically doesn’t fault. Cooperative multitasking also means that a hang from your app hangs the whole Mac (only the pointer is still moving), though that can normally be solved by a good command-alt-escape… after which it’s best to reboot anyway. As for MacsBug, your friendly neighborhood debugger… well, for one, it is disassembly only, no source. But you can handle that, right?

It’s not that bad!

But don’t let these things discourage you from toying with Classic MacOS development! Indeed, doing so is not as bad as you could imagine from the preceding descriptions: none of those things matter when programming trivial, for fun stuff, and even if you program slightly-less-than-trivial stuff, your app will merely require a 128 MB memory partition where it ought to only take 32 MB, which doesn’t matter in this day and age.

And in fact, it is a very interesting exercise because it allows a better understanding of what makes the Macintosh the Macintosh, by seeing how it was originally programmed for. So I encourage you all to try and play with it.

For this, I do have some specific advice about MPW. For one, I remember MrC, the PowerPC compiler, being quite anal-retentive for certain casts, which it just refuses to do implicitly: for instance, the following code will cause an error (not just a warning):

SInt16** sndHand;
sndHand = NewHandle(sampleNb * sizeof(SInt16));

You need to explicitly cast:

SInt16** sndHand;
sndHand = (Sint16**)NewHandle(sampleNb * sizeof(SInt16));

It is less demanding when it comes to simple casts between pointers. Also, even though it makes exactly no difference in PowerPC code, it will check that functions that are supposed to have a pascal attribute (supposed to mark the function as being called using the calling conventions for Pascal, which makes a difference in 68k code), typically callbacks, do have it, and will refuse to compile if this is not the case.

If you go as far back as 68k, if I remember correctly int is 16 bit wide in the Mac 68k environment (this is why SInt32 was long up until 64-bit arrived: in __LP64__ mode SInt32 is int), but became 32 bit wide when ppc arrived, so be careful, it’s better not to use int in general.

QuickDraw is, by some aspects, more approachable that Quartz (e.g. no object to keep track of and deallocate at the end), but on the other hand the Carbon transition added some hoops to jump through that makes it harder to just get started with it; for instance something as basic as getting the black pattern, used to ensure your drawing is a flat color, is described in most docs as using the black global variable, but those docs should have been changed for Carbon: with Carbon GetQDGlobalsBlack(&blackPat); must be used to merely get that value. Another aspect which complicates initial understanding is that pre-Carbon you would just directly cast between a WindowPtr, (C)GrafPtr, offscreen GWorldPtr, etc., but when compiling for Carbon you have to use conversion functions, for instance GetWindowPort() to get the port for a given window… but only for some of those conversions, the others just being done with casts, and it is hard to know at a glance which are which.

When it came to packaging, I think I got an app building for classic MacOS relatively easily with MPW, but when I made it link to CarbonLib I got various issues related to the standard C library, in particular the standard streams (stdin, stdout and stderr), and I think I had to download an updated version of some library or some headers before it would work and I could get a single binary that ran both in MacOS 9 and natively on Mac OS X.

Also, while an empty 'carb' resource with ID 0 does work to mark the application as being carbonized and make it run natively on Mac OS X, you are supposed to instead use a 'plst' resource with ID 0 and put in there what you would put in the Info.plist if the app were in a package. Also, it is not safe to use __i386__ to know whether to use framework includes (#include <Carbon/Carbon.h>) or “flat” includes (#include <Carbon.h>); typically you’d use something like WATEVER_USE_FRAMEWORK_INCLUDES, which you then set in your Makefile depending on the target.

Lastly, don’t make the same mistake I originally did: when an API asks for a Handle, it doesn’t just mean a pointer to pointer to something, it means something that was specifically allocated with NewHandle() (possibly indirectly, e.g. with GetResource() and loaded if necessary), so make sure that is what you give it.

I also have a few practical tips for dealing with Macs running ancient system software (be they physical or emulated). Mac OS X removed support for writing to an HFS (as opposed to HFS+) filesystem starting with Mac OS X 10.6, and HFS is the only thing MacOS 8 and earlier can read. However, you can still for instance write pre-made HFS disk images to floppy discs with Disk Utility (and any emulator worth its salt will allow you to mount disk images inside the emulated system), so your best bet is to use a pre-made image to load some essential tools, then if you can, set up a network connection (either real or emulated) and transfer files that way, making sure to encode them in MacBinary before transfer (which I generally prefer to BinHex); unless you know the transfer method is Mac-friendly the whole way, always decode from MacBinary as the last step, directly from the target. Alternately, you can keep around a Mac running Leopard around to directly write to HFS floppies, as I do.

Okay, exercise time.

If you are cheap, you could get away with only providing a 68k build and a Mac OS X Intel build (except neither of these can run on Leopard running on PowerPC…). So the exercise is to, on the contrary, successfully build the same code (modulo #ifdefs, etc.) for 68k, CFM-PPC linking to InterfaceLib, CFM-PPC linking to CarbonLib, Mach-o Intel, Mach-o 64-bit PPC, and Mach-o 64-bit Intel (a Cocoa UI will be tolerated for those two) for optimal performance everywhere (ARM being excluded here, obviously). Bonus points for Mach-o PPC (all three variants) and CFM-68k. More bonus points for gathering all or at least most of those in a single obese package.

Second exercise: figure out the APIs which were present in System 1.0 and are supported in 64-bit on Mac OS X. It’s a short list, but I know for sure it is not empty.

References

Macintosh C Carbon: besides the old Inside Mac books (most of which can still be found here), this is how I learned Carbon programming back in the day.

Gwynne Raskind presents the Mac toolbox for contemporary audiences in two twin articles, reminding you in particular to never neglect error handling, you can’t get away with it when using the toolbox APIs.

And on a lighter note about Swift…

One more thought on the matter of Swift, which wasn’t suitable for my previous post (and is too long for Twitter)

2014 – Chris Lattner invents Swift. Swift is an admittedly relatively concise, automatically reference counted, but otherwise class based, statically typed, single dispatch, object oriented language with single implementation inheritance and multiple interface inheritance. Apple loudly heralds Swift’s novelty.

(With apologies to James Iry, and to William Ting, who beat me to it except he mischaracterized Swift as being garbage collected.)

Swift Thoughts

Here are my thoughts on Swift, the new application programming language Apple announced at WWDC 2014, based on my reading of The Swift Programming Language (iTunes link, iOS Developer Library version), with a few experiments (you can get my code if you want to reproduce them), all run on the release version of Xcode 6, to clarify behavior that was unclear from the book description: my thoughts are entirely based on the language semantics and the consequences they impose on any implementation, and will hopefully remain valid whichever the implementation, they are not based on any aspect specific to the current implementation (such as how, say, protocols and passing objects supporting multiple protocols is implemented, though that would be interesting too). These thoughts do not come in any particular order: this post is something of an NSSet of my impressions.

First:

on the book itself, I have to mention numerous widows, that is the first line of a paragraph, or even sometimes a section header, appearing at the end of a page with the remainder of the paragraph on the next page (e.g. : “Use the for-in loop with an array to iterate over its items” at the end of a page about more traditional for loops, “variadic parameters”, etc.). If they’re going to publish it on the iBookstore, they ought to watch for that kind of stuff (and yes, even if it is not a static layout as the text is able to reflow, when for instance the text size is changed, there are ways to guard against this happening).

The meta-problem with Swift:

the Apple developer community had all of about three months (from WWDC 2014 to the language GM) to give feedback on Swift. And while I do believe that Swift has been refined internally for much longer than that, I cannot help but notice the number of fundamental changes in Swift from June to August 2014 (documented forever in the document revision history), with for instance Array changing to have full value semantics, or the changes to the String (and Character) type. This is not so much the biggest problem with Swift, than it compounds the other issues found in Swift: if a design issue in Swift only became clear from feedback from the larger Apple developer community, and the feedback came too late or there was no time to fix it in the three (northern hemisphere summer) months, then too bad, it is now part of the language. I think there could have been better ways to handle this.

I might have to temperate that a bit, though: even though Apple is allowing and encouraging you to use Swift in shipping apps, it appears that they are reserving the possibility to break source compatibility (something I admit I did not realize at first, hat tip to, who else, John Siracusa). But I wonder whether Apple will be able to actually exercise that possibility in the future: even in the pessimistic case where Swift only becomes modestly popular at first, there will be significant pushback against such an incompatible change happening — even if Apple provides conversion tools. We’ll see.

The (only) very bad idea:

block comment markers that supposedly nest, so that they can also serve to disable code. For, you see, what is inside the block comment markers is in all likelihood not going to be parsed as code (and this is, in fact, the behavior as I write this post, as of the Xcode 6 release), therefore nested comment markers are simply searched, resulting in the following not working:

/*
println("The basic operators are +-*/%");
*/

The only alternative is to parse text after the block comment start marker as code or at least as tokens… in which case guess what would happen in the following case:

/*
And here I’d like to thank my parents for introducing me to
computers at an early age ":-)
*/

Nested block comments do not work. They cannot be made to work (for those who care, I filed this as rdar://problem/18138958/, visible on Open Radar; it was closed with status “Behaves correctly”). That is why the inside of an #if 0 / #endif pair in C must still be composed of valid preprocessing tokens. “Commenting out” code is a worthy technique, but it should never have been given that name. Instead, in Swift disable code by using #if false / #endif, which is supported but oddly enough only documented in Using Swift and Cocoa with Objective-C.

I don’t like:

the fact that many elements from C have not been challenged. Since programmers coming from C will have many of their habits challenged and will have to unlearn what they have learned anyway, why keep anything from C without justification? For instance, Swift has break; to exit from looping constructs AND to exit from a switch block (even though a switch in Swift is so much more than a switch in C as to be almost a different thing), which forces us to label the looping construct just in order to use a switch as the condition system to exit the loop:

var n=27;

topWhile: while (true)
{
    switch (n)
    {
    case 1:
        break topWhile;
        
    case let foo where foo%2 == 0:
        n = n/2;
        
    default:
        n = n*3 + 1;
    }
}

println(n);

If exiting from a switch had been given a different keyword, uselessly labeling the loop in this case would have been avoided.

I like:

Avoiding the most egregious C flaws. In my opinion, C has a number of flaws that its designers should have avoided even given the stated goals and purposes C was originally meant for. There are many further flaws in C, but many of those make sense as tradeoffs given what the designers of C were aiming for (e.g. programmers were expected to keep track of everything); the following flaws, on the other hand, don’t. Those are: the dependency import model which is simply a textual include (precluding many optimizations to compilation time and harming diagnostics), no (mandatory) keyword to introduce variable declarations (such as let, var, etc. in Swift) which hurts compilation time (given that the compiler has to figure out which tokens are valid types before it can determine whether a statement is a variable declaration or an expression), aliasing rules which are both too restrictive for the compiler (two arrays to the same type may always alias each other, preventing many optimizations; no one uses restrict in practice, and even fewer people could tell you the precise semantics of that keyword) and too restrictive for the developer (he is not supposed to write to a pointer to UInt32 and read from the same pointer as pointing to float). A further flaw becomes glaring if we further consider C as a language for implementing only bit-twiddling and real-time sub-components called from a different higher level language: the lack of any mechanism for tracking scope (initialization, copies, deletion) of heap-bounds variables: those are simply handled in C as byte array blocks which get interpreted as the intended structure type by cast; this is what prevents pointers to Objective-C objects from being stored in C structures in ARC mode, for instance. This is one thing that C++ got right and why Objective-C++ can be a worthwhile way to integrate bit-twiddling and real-time code with Objective-C code. Swift, thankfully, avoids all of these flaws, and many others.

I don’t like:

the method call binding model. Right after watching the keynote, in reaction to the proclamation that Swift uses the same runtime as Objective-C I remarked this had to mean that the messaging semantics had to be the same; I meant it to rule out already the possibility of Swift being even more dynamic that Objective-C. Little did I know that not only Swift method calls are not more dynamic than Objective-C method calls, but in fact don’t use objc_msgSend() at all by default! Look, objc_msgSend() (and friends) is the whole point of the Objective-C runtime. Period. Everything else is bookkeeping in support of objc_msgSend(). Swift can call into objc_msgSend() when calling Objective-C methods and Swift methods marked objc. But using this to proclaim that Swift “uses the same runtime as Objective-C” amounts to telling Python uses the same runtime as Objective-C because of the Python-Cocoa bridge and NSObject-derived Python objects. Apple is trying to convince us of the Objective-C-minus-the-C-part lineage of Swift, but the truth is that Swift has very little to do with that, and much more to do, semantically, with C++. This would never have happened had Avie Tevanian still been alive working at Apple.

My theory as for why Swift works that way is as follows. On the one hand, the people in charge probably think that vtables are dynamic enough, and on the other hand, they may have decided that way first in order to enable Swift to be used in (almost — Swift looks unsuitable for code running at interrupt time) all the places C can be used, including in very demanding, real-time environments such as low-latency audio, drivers, and all the dependencies of these two cases (though for these cases any allocation will have to be avoided, which means not bringing any object or any non-trivial or non-built-in structure in scope); and second in order to allow more optimization opportunities. Indeed, the whole principle of the Smalltalk model that ObjC inherited is that method calls are never bound to the implementation until exactly at the last possible time: right as the method is called, and almost all of the source information is still available at that point for the runtime to decide the binding, in particular the full name of the method in ASCII and parameter metadata (allowing such feats as forwarding, packaging the call in an invocation object, but also method swizzling, isa swizzling, etc.). Meanwhile, with LLVM and Clang Apple has an impressive compilation infrastructure that can realize potentially very useful optimizations, particularly across procedure calls (propagating constants, suppressing useless parameters, hoisting invariants out of loops, etc.). But these interprocedural optimizations cannot occur across Objective-C method calls: the compiler cannot make any assumption about the binding between the call site and the implementation (even when it ends up at run time that the same implementation is always called), which is necessary before the compiler can perform any optimization across the call site.

The problem here may be not so much the cost of objc_msgSend() itself (which can indeed often be reduced for a limited number of hot call sites by careful application of IMP caching) than the diffuse cost of the unexploited optimization opportunities across every single ObjC method call, especially if most or all subroutine calls end up being Objective-C method calls. And the combination of the two has likely prevented Objective-C from being significantly used for the implementation of complex infrastructural code where some dynamism is required (and some resistance to reverse-engineering may be welcome…), such as HTML rendering engines, database engines, game engines, media playback and processing engines, etc., where C++ reigns unchallenged. With Swift, Apple has a language that can reasonably be used for the whole infrastructural part of any application down to the most real-time and performance sensitive tasks you could reasonably want to perform on a general purpose computer or server, not just (as is currently mostly the case with Objective-C) for the MVC organization at the top, with anything below model objects not necessarily being written in the same language as the high-level MVC code.

One way Apple could have had both Smalltalk-style dynamism and optimization across method calls (including the cost itself of binding) would have been to use a virtual machine and use incremental, dynamic optimization techniques, such as those developed for JavaScript in Safari, but Apple decided against it; probably for better integration with existing C code and the Cocoa frameworks, but also maybe because of the reputation of virtual machines for inferior performance. In Smalltalk, precisely, the virtual machine was allowed to inline and in general apply optimizations to (a<b) ifThen: foo else: toto (yes, flow control in Smalltalk was implemented in terms of messages to an object); in Objective-C, the compiler cannot do the equivalent, and such an optimization cannot happen at runtime either given that the program is already frozen as machine code. It is also worth mentioning that the virtual machine approach, while allowing a combination of late binding and whole program optimizations, would not have enabled Swift to both have Smalltalk messaging semantics and be suitable for real-time code: the Smalltalk and Objective-C messaging model is basically lazy binding, and laziness is fundamentally incompatible with real-time.

I like:

the transaction-like aspect of tying variables (typically constant ones) with control flow constructs. Very few variables actually do need to vary, most of them are actually either calculation intermediates, or fixtures which are computed once and then keep the same value for as long as they are valid. And when such a fixture is necessary in a scope, it is for a reason almost always tied to the control flow construct that introduces the scope itself: dereferencing a pointer depends on a prior if statement, for instance. The same way, I like the system (and the switch-case variable tying system that results) that allows tying a dependent data structure to enum values, though making that (at least syntactically) an extension of an enumerated type feel odds to me, I rather consider such a thing a tagged union. In fact, I think they should have gone further, and allowed tying a new variable to the current value of the loop induction variable in case of a break, rather than allow access to the loop induction variable outside the loop by declaring it before the loop.

I don’t like:

the kitchen-sink like aspect, which too reminds me a bit too much of C++. This may be the flip side of the previous point, but nevertheless: do we need an exceedingly versatile, “unified” function declaration syntax? Not to mention we are never clearly told in the book which functions are considered to have the same identifier and will collide if used in the same program; this is not an implementation detail, code will break if two functions which did not collide start doing so with a never version of the Swift compiler. By contrast, Objective-C, even with the recent additions such as number, array and dictionary literals is a simple language, defining only what it needs to define.

I don’t like:

the pretense at being a script-like language when actually compiling down to native code. Since Swift compiles down to native code, this means it inherits the linking model of languages that compile to native code, but in order to claim “approachable scripting language” brownie points, Swift makes top level code the entry point of the process… that is, as long as you write that code in a file called “main.swift” (top level code is otherwise forbidden). Sure, “you don’t need a main function”, but if (unless you are working in a playground) you need to name the file containing the main code “main.swift”, what has been gained is unclear to me.

I have reservations on:

the optional semicolon. I was afraid it would be of the form “semicolons are inserted at the points where leaving it out would end up causing an error”, but it is more subtle than that, avoiding the most obvious pitfalls thanks to a few other rules. Indeed, Swift governs where whitespace can go around operators more strictly than C and other mainstream languages do: in principle (there are exceptions), whitespace is not allowed after prefix and before postfix operators, and infix operators can either have whitespace on both sides, or whitespace on neither side; no mix is allowed. As a result, this code:

infix operator *~* {}
func *~* (left: Int, right:Int) -> Int
{
    return left*right;
}

postfix operator *~* {}
postfix func *~* (val: Int) -> Int
{
    return val+42;
}

var bar = 4, foo = 2;
var toto = 0;

toto = bar*~*
foo++;

foo

will result in this execution:

But add one space before the operator, and what happens?

So the outcome here is unambiguous thanks to these operators and whitespace rules, the worst has been avoided. That being said, I remain very skeptical of the optional semicolon feature, to my mind it’s just not necessary while bringing the risk of subtle pitfalls (of which I admit I have not found any so far). Also, I admit my objection is in part because it encourages (in particular with the simplified closure-as-last-function-parameter syntax) the “Egyptian” braces style, which I simply do not like.

I have big reservations on:

custom operator definition. Swift does not just have operator overloading, where one can declare a function that will be called when one uses an operator such as * with at least one variable of a type of one’s creation, say so that mat1 * mat2 actually performs matrix multiplication; Swift also allows one to define custom operators using unused combination of operator symbols, such as *~*. And I don’t really see the point. Indeed, operator overloading in the first place only really makes sense when one needs to perform calculations on types that are algebraic in nature: matrices, polynomials, Complex or Hamiltonian numbers, etc., where it allows the code to be naturally and concisely expressed as mathematical expressions, rather than having to use a function call for every single product or addition; outside of this situation, the potential for confusion and abuse is just too great for operator overloading to make sense. So custom operators would only really make sense in situations when one operates within an algebraic system but with operations that can not be assimilated to addition and multiplication; while I am certain such situations exist (I can’t think of any off the top of my head), this strikes me as extremely specialized tasks that could be implemented in a specialized language, where they would be better served anyway. So the benefit of custom operators is very limited, while the potential cost in abuse and other drawbacks (such as the compiler reporting an unknown operator rather than a syntax error when it meets a nonsensical combination of operators due to a typo) is much greater, so I have big reservations about the custom operators feature of Swift.

I like:

the relatively strict typing (including for widening integer types) and the accompanying type inference. I think C’s typing is too loose for today’s programming tasks, so I welcome the discipline found in Swift (especially with regard to optional types). It does make it necessary to introduce quite a bit of infrastructure, such as generics and tagged unions (mistakenly labeled as enumerations with associated values), but those make the programmer intentions clearer. And Swift allows looser typing when it matters: with class instances and the AnyObject type, such as when doing UI work, where Swift does keep a strength of Objective-C.

I have reservations on:

string interpolation. It’s quite clever, as far as I can tell being syntactically unambiguous (a closing paren is unambiguously one terminating the expression or not simply by counting parens), however I am wondering if such a major feature is warranted if the usefulness is limited to debugging purposes, as indeed for any other purpose the string will need to be localized, which as far as I can tell precludes the use of this feature.

I am very intrigued about:

the full power of switch. I have a feeling it may be going a bit too far in completeness, but the whole principle of having more complex selection and having the first criterion that applies in case two overlap will allow much more natural expression of complex requirements requiring classification of a situation according to a criterion for each case, but where later criteria must not be applied if one applies already.

I have reservations on:

tuple variables and individual element access (other than through decomposition). If you need a tuple enough that you need to keep it in a variable, then you should define a structure instead; same goes for individual element access. Tuple constants might be useful; other than that, tuple types should only be transitorily used as function return and parameters (in case you want to directly use a returned tuple as a parameter for that function), and should have to be composed and decomposed (including when accessing them inside a function that has a tuple parameter) for any other use.

I have reservations on:

tuple type conversions. This is one place where Swift actually does use duck typing, but with subtle rules that can trip you up, let us see what happens when we put this code:

func tupleuser(toto : (min: Int, max: Int)) -> Int
{
    return toto.max - toto.min;
}

func tupleprovider(a :Int, b: Int) -> (max: Int, min: Int)
{
    return (a - b/2 + b, a - b/2);
}

func filter(item: (Int, Int)) -> (Int, Int)
{
    return item;
}

func filter2(item: (min: Int, max: Int)) -> (min: Int, max: Int)
{
    return item;
}


tupleuser(filter2(tupleprovider(100, 9)));

// I tried to use a generic function instead of "filter2", but
// the compiler complained with "Cannot convert the expression's type
// '(max: Int, min: Int)' to type ’T’", it seems that when the
// parameter type and the expected return type disagree, the Swift
// compiler would rather not infer at all.

in a playground:

The code above in a playground, with in the playground margin 105 and 96 being inverted between tupleprovider and filter2, and the final result being 9

But then let us change the intermediate function:

Same code as above in a playground, except filter2 has been replaced by filter in the last line, and as a result 105 and 96 are no longer inverted between tupleprovider and filer, and the final result is -9

Uh?! That’s right: when a tuple value gets passed between two tuple types (here, from function result to function parameter) where at least one of the tuple types has unnamed fields, then tuple fields keep their position. However, when both tuple types have named fields, then tuple fields are matched by name (the names of course have to match) and can change position! Something to keep in mind, at the very least.

I like:

closures, class extensions. Of course they have to be in.

I have reservations on:

all the possible syntax simplifications for anonymous closures. In particular, the possibility of putting the closure passed as the last parameter to a function outside that function’s parentheses is a bit misleading about whether that code is part or not of the caller of that function, so programmers may make the mistake of putting a return in the closure expecting to exit from the caller function, while this will only exit from the closure.

I have reservations on:

structure and enumeration methods. Structure methods is already taking a superfluous feature from C++, but enumeration methods just take the cake. What reasonable purpose could this serve? Is it so hard to write TypeDoStuff(value) rather than value.doStuff()? Because remember, inheritance is only for classes, so there is no purpose for non-class methods other than the use of the method invocation syntax.

I have big reservations on:

the Character type. I am resolutely of the opinion (informed by having seen way too many permutations of issues that appear when leaving the comfortable world of ASCII) that ordinary programmers should never concern themselves with the elementary constituents of a string. Never. When manipulating sound, do you ever consider it a sequence of phonemes or notes that can be manipulated individually? Of course not: you consider it a continuous flow; even when it needs to be processed as blocks or samples, you apply the same processing (maybe with time-dependent inputs, but the same processing nonetheless) to all of them. So the same way, strings and text should be processed as a media flow. Python has the right idea: there is no character type, merely very short strings when one does character-like processing, though I think Python does not go far enough. The only string primitives ordinary programmers should ever need are:

  • defining literal ASCII strings (typically for dictionary keys and debugging)
  • reading and writing strings from byte arrays with a specified encoding
  • printing the value of variables to a string, possibly under the control of a format and locale
  • attempting to interpret the contents of a string as an integer or floating-point number, possibly under the control of a format and locale
  • concatenating strings
  • hashing strings (with an implementation of hashing that takes into account the fact strings that only vary in character composition are considered equal and so must have equal hashes)
  • searching within a string with appropriate options (regular expression or not, case sensitive or not, anchored or not, etc.) and getting the first match (which may compare equal while not being the exact same Unicode sequence as the searched string), the part before that match, and the part after that match, or nothing if the search turned empty.
  • comparing strings for equality and sorting with appropriate options (similar to that of searching, plus specific options such as numeric sort, that is "1" < "2" < "100")
  • and for very specific purposes, a few text transformations: mostly convert to lowercase, convert to uppercase, and capitalize words.

That’s it. Every other operation ordinary programmers perform can be expressed as a combination of those (and provided as convenience functions): search and replace is simply searching, then either returning the input string if the search turned empty, or concatenating the part before the match, the replacement, and the result of a recursive search and replace on the part after the match; parsing is merely finding the next token (from a list) in the string, or advancing until the regular expression can no longer advance (e.g. stopping once the input is no longer a digit) and then further parsing or interpreting the separated parts; finding out whether a file has file extension “avi” in a case-insensitive way? Do a case-insensitive, anchored, reverse, locale-independent search for ".avi" in the file name string. Etc.

None of those purposes necessitate breaking up a string into its constituents Unicode code points, or into its constituents grapheme clusters, or into its constituents UTF-8 bytes, or into its constituents whatevers. Where better access is needed is for very specific purposes such as text editing, typesetting, and rendering, implemented by specialists in specialized libraries that ordinary programmers use through an API, and these specialists will need access down to the individual Unicode code points, with the Character Swift type being in all likelihood useless for them. So I think Swift should do away with the Character type; yes, this means you would not be able to use the example of “reversing” a string (whatever that means when you have, say, Hangul syllables) to demonstrate how to do string processing in the language, but to be honest this is the only real purpose I can think of for which the Character type is “useful”.

I don’t like:

the assumption across the book that we are necessarily writing a Mac OS X/iOS app in Xcode. For instance, runtime errors (integer overflow, array subscript out of bounds, etc.) are described as causing the app to exit. Does this means Swift cannot be used for command-line tools or XPC services, for instance? I suppose that is not the case, or Swift would be unnecessarily limited, so Swift ought to be described in more general terms (in terms of processes, OS interaction, etc.).

I have reservations on:

the Int and UInt type having different width depending on whether the code is running on a 32-bit or 64-bit environment. Except for item count, array offset, or other types that need to or benefit from scaling with memory size and potential count magnitudes (hash values come to mind), it is better for integer types to be predictable and have fixed width. The result of indiscriminately using Int and UInt will be behavior that is unnecessarily different between the same code running on a 32-bit environment and a 64-bit environment.

I don’t like:

a lot of ambiguities in the language description. For instance, do the range operators ... and ..< return values of an actual type which I could manipulate if I wanted to, or are they an optional part of the for and case statements syntax, only valid there? And why this note about capturing that tells “Swift determines what should be captured by reference and what should be copied by value”? This makes no sense, whether variables are captured by reference or by value is part of the language semantics, it is not an implementation detail. What it should tell is that variables are captured by reference, but when possible the implementation will optimize away the reference and the closure will directly keep the value around (the same way that they do describe that Strings are value types and thus are copied in principle, but the compiler will optimize away the copy whenever possible).

I don’t understand:

how are lazy stored properties useful. Either the initializer for lazy stored properties may depend on instance stored properties, in which case I’d love to know under which conditions (if I had to guess, I’d say only let stored properties could be used as parameters of this initializer, which would in turn justify the usefulness of let stored properties), or it can’t, in which case why pay for the expensive object for multiple instances, as they are just going to be creating always the same one, so the expensive object could just be a global.

I don’t understand:

why so many words are expended to specify the remainder operator behavior, while leaving unanswered the behavior of the integer division operator in the same cases. Look, in any reasonable language, the two expressions a/b and a%b are integers satisfying the following equations:

1: a = (a/b) × b + a%b
2: (a%b) × (a%b) < b × b

with the only remaining ambiguity being the sign of a%b; as a corollary, the values of a, b and a%b necessarily determine the value of a/b in a reasonable language. Fortunately, Swift is a reasonable language, so when delving on the behavior of a%b (answer: it is either 0 or has the same sign as a) the book should specify the tied behavior of a/b along with it. Speaking of which: Swift allows using the remainder operator on floating-point numbers, but how do I get the corresponding Euclidian division of these same floating point numbers? I guess I could do trunc(a/b), but I’m sure there are subtleties I haven’t accounted for.

I don’t like:

the lack of any information on a threading model. Hello? It’s 2014. All available Mac and iOS devices are multi-core, and have been for at least the past year. And except for spawning multiple processes from a single app (which as far as I know is still not possible on iOS, anyway), threads and thread-based infrastructure, such as Grand Central Dispatch, are the only way to exploit the parallelism of our current multi-core hardware. So while not all apps necessarily need to be explicitly threaded, this is an important enough feature that I find it very odd that there is no description or documentation of threading in Swift. And yes, I know you can spawn threads using the Objective-C APIs and then try and run Swift code inside that thread; that’s not the point. The point is: as soon as I share any object between two threads running Swift code, what happens? Which synchronization primitives are available, and what happens if an object is used by two threads without synchronization, is there a possibility of undefined behavior (so far there is none in Swift), or is a fault the worst that could happen? Is it even supported to just use Swift code in two different threads, without sharing any object? This is not documented. I’m not asking for much, even an official admission that there is no currently defined threading model, that they are working on one, and that Swift should only be used on the main thread for now would be enough, and allow us to plan for the future (while allowing to reject contributor suggestions that would end up causing Swift code to be used in an unsafe way). But we don’t get even that, as far as I can tell.

I like:

the support for named parameters. Yes, Swift has named parameters, in the sense that you can omit any externally named parameter that has a default value in whichever way you like, it’s not just the N last parameters that can be omitted as in C++, just as long as these optional parameters have different external names; the only other (minor) restriction is that the parameters that are given must be provided in order. On that subject, it is important to note that two functions or methods can differ merely in the optional parameters part and yet not collide, but doing so will force invocations to specify some optional parameters in order to disambiguate between the two (and therefore make these parameters no longer optional in practice), otherwise a compilation error will occur, as seen in this code:

func joinString(a: String, andString b: String = " ",
                andString c: String = ".") -> String
{
    return a + b + c;
}

func joinString(var a: String, andString b: String = " ",
                numTimes i: Int = 1) -> String
{
    for _ in 0..<i
    {
        a = a + b;
    }
    
    return a;
}


joinString("toto", andString: "s", numTimes:3);

which normally executes as follows:

The code above in a playground, with the final result being totosss

But what if we remove numTimes:?

So make sure that the function name combined with the external names of mandatory parameters is enough to provide the function with a unique signature.

On a related note:

external parameter names are part of the function type, such that if you assign a function with external parameter names (with default values or not) to a variable, the inferred type of the variable includes the external names; as a result, when the function is invoked through the variable, the external parameter names have to be provided, as can be seen in this code:

func modifyint(var base: Int, byScalingBy shift: Int) -> Int
{
    for _ in 0..<shift
    {
        base *= 10;
    }
    
    return base;
}

var combinerfunc = modifyint;

combinerfunc(3, 5)

which will result in an error, as seen here:

You need to add the external parameter name even for this kind of invocation:

Same code as above, except the external parameter name has been added as recommended in the last line, and the result in the playground margin is 300,000

In practice this means functions and closures that are to be called through a variable should not have externally named parameters.

I have reservations on:

seemingly simple statements that cause non-obvious activity. For instance, how does stuff.structtype.field = foo; work? Let us see with this code:

struct Simpler
{
    var a: Int;
    var b: Int;
}

var watcher = 0;

class Complex
{
    var prop : Simpler = Simpler(a: 0, b: 0)
    {
        willSet(newSimpler)
        {
            watcher++;
        }
    }
}

let frobz = Complex();

frobz.prop.b = 4;
frobz.prop.a = 6;

watcher;

println("\(frobz.prop.a), \(frobz.prop.b)");

Which executes as follows:

The code above in a playground, with the result of watcher in the line before last being 2

So yes, a stuff.structtype.field = foo statement, while it looks like a simple assignment, actually causes a read-modify-write of the structure in the class; this is actually a reasonable behavior, otherwise the property observers would not be able to do their job.

I don’t like:

some language features are not documented before the “language reference” part (honestly, who is going to spontaneously read that section from start to finish?), such as dynamicType; this is all the more puzzling as overriding class methods (which is very much described as part of class features in the “language guide”) is useless without dynamicType.

On a related note:

dynamicType cannot be called on self until self is satisfactorily initialized (at least when I tried doing so), as if dynamicType was an ordinary method, even though it is not an ordinary method: after all, dynamicType only gives you access to the type and its type methods, which do not rely on any instance, why would the state of this particular instance matter? This makes dynamicType and overridable class methods that much less useful to control early instance initialization behavior.

I have reservations on:

subscripting on programmer-defined classes and structures. Basically, the questions I have for supporting custom operators are the same I have for support of subscripting: I just don’t see the need in a general-purpose language.

On a related note:

the correct subscript method between the different ones a class can support is chosen according to the (inferred, if necessary) type of the subscript, which sounds like C++’s strictly type (data shape) based overloading, and it is, but it is acceptable in this instance.

I have reservations on:

computed property setters. Modifying a computed property modifies, by definition, at least one stored property, but there is no language feature to document the interdependency, and this absence is going to be felt (just like was felt the lack of any way to mark designated initializers in Objective-C until recently).

I have reservations on:

allowing running a closure for setting the default value of a property. Is it really a good idea?

I like:

the good case examples for the code samples in the book. Each time it is clear why the code construct just introduced is the appropriate way to treat the practical problem.

I don’t like:

the lacks of a narrative, or at least of a progression, in the book. Where is the rationale for some of the less obvious features? Where is the equivalent of Object-Oriented Programming with Objective-C (formerly the first half of “Object-Oriented Programming and the Objective-C Programming Language”)? This matters, we can’t just expect to give developers a bunch of tools and expect them to figure out which tool is for which purpose, or at least not in a consistent way. Providing a rationale for the features is part of a programming language as well.

I like:

the declaration syntax. While compared to C we no longer have the principle that declaration mimics usage, I think it’s worth on the other hand getting rid of this:

char* foo, bar, **baz;

which in C declares foo as a pointer to char, baz as a pointer to pointer to char, but bar as a char, not a pointer to char… In fact, in Swift when you combine the type declaration syntax (colon then type name after the variable/parameter name), function declaration syntax, top level code being the entry point, and nested functions, you get at times a very Pascalian feel… In 2014, Apple languages have gone full circle from 1984 (for the younguns among you, Pascal was the first high level programming language Apple supported for Mac development, and remained the dominant language for Mac application development until the arrival of PowerPC in 1993).

I don’t like:

the lack of any portability information. I guess that it’s a bit early for any kind of cross-platform availability, right now Apple concentrates on making the language run and shine on Apple platforms, I get that. But I’d like some kind of information, even just a rough intent (and the steps they are taking towards it, e.g. working towards standardization? Or making sure Swift support is part of the open-source LLVM releases maybe?) in that area, so that I can know whether I can invest in Swift and eventually leverage this work on another platform, as I can today with, say, C(++). Sorry, but I’m not going to encode my thoughts (at least not for many of my projects) in a format if I do not know whether this format will stay locked to Apple platforms or not. On a related note, some information on which source changes will maintain ABI compatibility and which will not would be appreciated. But this information is not provided. I know that Apple does not guarantee any binary compatibility at this time, but even if it is not implemented yet they have some idea of what will be binary compatible and what will not, and knowing this would inform my API design, for instance.

I like:

the few cases where implicit conversion is provided (that is, where it makes sense). For instance, you might have noticed that, if foo is an optional Int (that is, Int?), you never need to write foo = Some(4);, but simply foo = 4;. This is appreciated when you may or may not do a given action at the end of the function, but if you do a value is necessarily provided, for instance an error code: in that case, you track the need to do this action eventually with an optional of the value’s type, and you have plenty of spots where this optional variable is set, so any simplification is appreciated.

My pessimistic conclusion

Swift seems to go counter to all historical programming language trends: it is statically typed when most of the language work seems to trend towards more loosely typed semantics and even duck typing, it compiles down to machine code and has a design optimized for that purpose when most new languages these days run in virtual machines, it goes for total safety when most new languages have abandoned it. I wonder if Swift won’t end up in the wrong side of history eventually.

My optimistic conclusion

Swift, with its type safety, safe semantics and the possibility to tie variables as part of control flow constructs (if let, etc.), promises to capture programmer intent better than any language that I know of, which ought to ease maintenance and merge operations; this should also help observability, at least in principle (I haven’t investigated Swift’s support for DTrace), and might eventually lead to an old dream of mine: formally defined semantics for the language, which would allow writing proofs (that the compiler could verify) that for instance the code I just wrote could not possibly crash.

Post-scriptum:

let me put a few words of comments on the current state of the toolchain: it still has ways to go in terms of maturity and stability. Most of the time when you make a mistake the error message from the compiler is inscrutable, and I managed to crash the background compilation process of the playground on multiple occasions while researching this post. Nevertheless, as you can see in the illustrations the playground concept has been very useful to experiment with the language, much faster and more enjoyable than with, say, an interactive interpreter interface (as in Python for instance), so it wasn’t a bad experience overall.

I, for one, welcome our new, more inclusive Apple

In case you have not been following closely, at this year’s WWDC Apple introduced a number of technologies that reverse many long-standing policies on what iOS apps were, or to be more accurate, were not allowed to do: technologies such as app extensions, third-party keyboards, Touch ID APIs, manual camera controls, Cloud Kit, or simply the ability to sell apps in bundles on the iOS App Store. I would be remiss if I did not mention a few of my pet peeves that apparently remain unaddressed, such as searching inside third-party apps from the iOS Springboard, real support for trials on the iOS App Store and the Mac App Store (more on that in a later post), any way to distribute iOS apps outside the iOS App Store, or the fact many of the changes in Mac OS X Yosemite are either better integration with iOS, or Lion and Mountain Lion-style “iOS-ification”, both of which would be better solved by transitioning the Mac to iOS, etc.

But in the end, the attitude change from Apple matters more than the specifics of what will come in iOS 8. And it was (as Matt Drance wrote) not just the announcements themselves: for instance with the video shown at the start of the keynote where iPhone and iPad users praise apps and the developers who made them, Apple wants us to know that they care for us developers and want us to succeed, which is a welcome change from the lack of visible consideration developers were treated with so far (with the limitation that this video frames the situation as developers directly providing their wares to users: don’t expect any change to how Apple sees middleware suppliers).

So I welcome this attitude change from Apple, and like Matt Drance, I am glad this seems to be coming from a place of confidence rather than concession (indeed, while the Google Play Store is much more inclusive1, the limited willingness of Android users to pay for apps means Apple probably does not feel much pressure in this area), which means that it’s likely only the first step: what we did not get at this WWDC, we can always hope to get in iOS 9, and at least the situation evolves in the right direction. I do not know where this change of heart comes from, I do not think any obvious event triggered it, I am just thankful that the Powers That Be at Apple decided to be pragmatic and cling less tightly to principles that, while potentially justified five years ago, were these days holding back the platform.

A caveat, though, is that I see one case where a new iOS 8 functionality, rather than giving me hope for the future, will actually hamper future improvements: iCloud Drive. While that feature may appear to address one of my longstanding pet peeves, anyone who thinks we were clamoring for merely a return to the traditional files and folders organization hasn’t really read what I or others have written on the matter; but this is exactly what iCloud Drive proposes (even if only documents are present in there, and even for just the files shared between different iOS apps, we expected better than that). Besides not improving on the current desktop status quo, the issue is that shipping it as such will create compatibility constraints (both from a user interface and API standpoint) which will make it hard for Apple to improve on it in the future, whereas Apple could have taken advantage of its experience and of the hindsight coming from having been without that feature for all this time to propose a better fundamental organization paradigm.

For instance, off the top of my head I can think of two ways to improve the experience of working on the same document from different apps:

  • Instead of (or on top of) “open in…”, have “also open in…”, which would also work by selecting an app among the ones supporting that document type. After that command, the document would appear in a specific section of the document picker of the first app, section with would be marked with the icon of the second app: in other words, this section would contain all documents shared between the first and second app. The same would go in the second app: the shared document would appear in a section marked with the icon of the first app. That way some sort of intuitive organization would be automatically provided. A document shared between more than two apps could appear in two sections at the same time, or could be put in the area where documents are available to all apps.
  • Introduce see-through folders. A paradox of hierarchical filing is that, as you start creating folders to organize your documents so as to more easily find something in the future, you may make documents harder to locate because they become “hidden” behind a folder. With see-through folders any folder you create would start with being just a roundrect drawn around the documents it would contain (say up to 4 contained documents), with the documents still being visible in their full size from the top level view, except there would be this roundrect around them. Then as the folder starts containing more and more documents, these documents would appear smaller and smaller from the top level view, so in practice you would have to “focus” on the folder by tapping on the folder name, so as to list only the documents contained in that folder, in full size, in order to select one document. When you have more than one level of folders, this would allow quickly scanning subfolders that contain only a few documents, since these documents would appear at full size when browsing the parent of these subfolders, so the document could either quickly be found in there, or else we would know it is in the “big” subfolder.

There are of course many other ways this could be improved, such as document tagging, or other metadata-based innovations. There are so many ways hierarchical document storage could be improved that Apple announcing they would merely go with pretty much the status quo for multi-app document collaboration tells me that in all likelihood no one who matters at Apple really cares about document management, which I find sad: even if not all such concocted improvements are actually viable, there is bound to be some that are and that they could have used instead.

(As for Swift, it is a subject with a very different scope that is deserving of its own post.)

But overall, these new developments seen at WWDC 2014 make me optimistic for the future of the Apple platforms and Apple in general. Even if it is not necessarily everything we wanted, change always starts with first steps like these.


  1. “Open” implies a binary situation, where a platform would be either “open” or “closed”; but situations are clearly more nuanced, with a whole continuum of “openness” between different cases such as game consoles, the iOS platform, the Android platform, or Windows. So I refer to platforms as being “more inclusive” or “less inclusive”, which allows for a range of, well, inclusiveness, rather than use “open” and the absolutes it implies.

Porting to the NEON intrinsics from experience

Hey you. Yes, you. Did you, inspired by my introduction to NEON on iPhone, write ARM NEON code, or are you maintaining ARM NEON code in some way? Is this NEON code written as ARM32 assembly? If you answered yes to both questions, then I hope you realize that any app that has your NEON code as a dependency is currently unable of taking advantage of ARM64 on supported hardware (now there may or may not be any real benefit for the app from doing so, but that is beside the point). ARM64, at the very least, is the future, so you will have to do something about that code so that it can run in ARM64 mode, but porting it to ARM64 assembly is not going to be straightforward, as the structure of the NEON register file has changed in ARM64 mode. Rather, I propose here porting your NEON ARM32 assembly algorithms to NEON intrinsics which can compile to both ARM32 and ARM64, and present here the outcome of my experience doing such a port, so that you can learn from it.

An introduction to the ARM NEON intrinsic support

The good thing about ARM NEON intrinsics is that they apply equally well in ARM32 and ARM64 mode, in fact you don’t have to follow any specific rule to support both with the same intrinsics source file: correct NEON intrinsics code that works on ARM32 will also work on ARM64 for free. At the most fundamental level, NEON intrinsics code is simply a C source file that includes <arm_neon.h> and uses a number of specific functions and types. The documentation for the ARM NEON intrinsics can be found here, on the ARM Information Center. This documentation ostensibly covers ARM DS-5, but in fact for iOS clang implements the same support; if you target other platforms in addition to or instead of iOS, you will have to check your toolchain compiler documentation, but if it supports any ARM NEON intrinsics at all it ought to have the same support as ARM DS-5.

Unfortunately, this document pretty much only documents the intrinsic function names and the types: for documentation on the operations these functions perform, it is still necessary to refer to the NEON instructions descriptions in the ARM instruction set document (don’t worry about the “registered ARM customers” part, you only need to create an account and agree to a license in order to download the PDF); furthermore, most material online (including my introduction to NEON on iPhone, if you need to get up to speed with NEON) will discuss NEON in terms of the instruction names rather than in terms of the C intrinsics, so it is a good idea to get used to locating the intrinsic function that corresponds to a given instruction; the most straightforward way is to open arm_neon.h in Xcode (add it as an include, compile once to refresh the index, then open it as one of this file’s includes in the “Related Files” menu), and just do a search for the instruction name: this will turn up the different intrinsic function variants that implement the instruction’s functionality, as the intrinsic function name is based on the instruction name. There is a trick situation, however, as for some instructions there is no matching intrinsic, these cases are documented here, with what you should do to get the equivalent functionality.

The converse also exists, where some intrinsics provide a functionality not provided by a particular instruction, or where the name does not match any instruction, such as:

In particular, the last two are what you will use in replacement of the parts of your ARM32 NEON algorithm where you would put results in, say, d6 and d7, and then the next operation would use q3, which is aliased to these two D registers. Indeed, it is important to realize (in particular if you are coming from NEON assembly coding) that these intrinsics work functionally, rather than procedurally over a register file; notably, the input variables are never modified. So stop worrying about placement and just write your NEON intrinsic code in functional fashion: factor_vec = vrsqrteq_f32(vmlaq_f32(vmulq_f32(x_vec, x_vec), y_vec, y_vec)); (assuming the initial reciprocal square root estimate is enough for your purposes). Things should come naturally once you integrate this way of thinking.

Variables should be reserved for results that you want to use more than once. Those need to be typed correctly, as the whole system is typed, with such fun variable type names as uint8x16_t; this explains the various vcombine_tnn variants, from vcombine_s8 to vcombine_p16, which in fact all come down to the same thing: the sole purpose of the variants is to preserve the correct element typing between the inputs and the output. I personally welcome the discipline: even if you think you know what you are doing, it’s so easy to get it subtly wrong in the middle of your algorithm, and you are left wondering at the end where you wrongly took a left turn (it was at Albuquerque. It is always at Albuquerque).

Less pleasant to use are the types that represent an array of vectors, of the form uint8x16x4_t for instance. Indeed, some intrinsics rely on these types, such as the transpositions ones, but also the deinterleaving loads and stores vld#/vst# (I presented them in my introduction to NEON on iPhone), which are just as indispensable when using intrinsics as they are when programming in assembly, and so when using these intrinsics you have to contend with these variables that represent multiple vectors at once (and that you of course cannot directly use as the input of another intrinsic); fortunately taking the individual vectors of those (for further calculations) is done using normal C array subscripting: coords_vec_arr.val[1], but this makes expressions less straightforward and elegant than they could otherwise have been.

Note that loading and storing vectors to memory without deinterleaving is not performed with an intrinsic, but simply by casting the pointer (typically one to your input/output element arrays) to a pointer to the correct vector type, and dereferencing that; this will result in the correct vector loads and stores being generated.

In practice

I am not going to share the code I ported or the actual benchmark results, but I can share the experience of porting a non-trivial NEON algorithm from ARM32 assembly to NEON intrinsics.

First, if the assembly code is competently commented (in particular with a clear register map), porting it is just a matter of following the algorithm main sequence and is rather straightforward, translating instructions one by one, with the addition of the occasional vcombine when two D vectors become a Q vector; your activity will mostly consist in finding the correct name for the intrinsic function for the given input element type, and finding variable names for these previously unnamed intermediate results (again, for these intermediate results which are only used once, save yourself the trouble of defining a variable and directly use the intrinsic output as the input for the next intrinsic). This was completed quickly.

But this is only the start. The next order of business is running the original algorithm and the new one on test inputs, and compare the results. For integer-only algorithms such as the one I ported, the results must match bit for bit between the original algorithm, the new one compiled as ARM32, and the new one compiled as ARM64; in my case they did. For algorithms that involve floating-point calculations they might not match bit for bit because of the different rounding control in ARM64, so compare within a tolerance that is appropriate for your purposes.

Once this check is done, you might wish to take a look at the assembly code generated from your intrinsics. In my case I discovered the ARM32 compiled version needed more vector storage than there are available registers, and as a result was performing various extra vectors loads and stores from memory at various points in the algorithm. The reason for this is that the automatic register allocation clang performed (at least in this case) just could not compare with my elaborate work in the original ARM32 NEON assembly code to tightly squeeze the necessary work data to never take more than 12 Q vectors at any given time (even avoiding the use of q4-q7, saving the trouble of having to preserve and restore them); also, it appears that, with clang, the intrinsics that use a scalar as one input do not actually generate the scalar-using instruction, but instead require the scalar to be splat on a vector register, harming register usage.

I have not been able to improve the situation by changing the way the intrinsic code was written; it seems it is the compiler which is going to have to improve. However, the ARM64 compiled version had no need for temporary storage beyond the NEON registers: twice as many vector registers are available in this mode, easing the pressure on the compiler register allocator.

But in the end what really matters is the actual performance of the code, so even if you take a look at the compiled code it is only by benchmarking the code (again, comparing between the original algorithm, the new version compiled as ARM32, and the new version compiled as ARM64) that you can reasonably decide which improvements are necessary. Don’t skimp on that part, you could be surprised. In my case, it turned out that the “inefficient”, ARM32 compiled version of the ported algorithm performed just as well as the original NEON ARM32 assembly. The probable reason is that my algorithm (and likely yours too) is in fact memory bandwidth constrained, and taking more time to perform the computations does not really matter when you then have to wait for the memory transfers to or from the level 3 cache or main memory to complete anyway.

As a result, in my case I could just replace the original algorithm by the new one without any performance regression. But that might not always be the case, and so if doing so would result in a performance regression, one course of action would be to keep using the original NEON assembly version in ARM32 mode, and use the new intrinsic-based algorithm only in ARM64 mode; use conditional compilation to select which code is used in each mode (I have a preprocessor macro defined for this purpose in the Xcode build settings, whose value depends on an architecture-dependent build setting). Fortunately, given the number of NEON registers available in ARM64, you should never see a performance regression on ARM64 capable hardware between the original ARM32 NEON assembly algorithm and the new one compiled as ARM64.

It worked

So your mileage may vary, certainly. But in my experience porting a NEON algorithm from ARM32 assembly to C intrinsics gave an adequate result, and was a quick and straightforward process, while writing an ARM64 assembly version would have been much more time consuming and would have required maintaining both versions in the future. And remember, no app that depends on your NEON algorithms can ship as a 64-bit capable app as long as you only have an ARM32 assembly version of these algorithms; if they haven’t been ported already, by now you’d better get started.

By the way, I should mention that today I also updated Introduction to NEON on iPhone and A few things iOS developers ought to know about the ARM architecture to take into account ARM64 and the changes it introduces; please have a look.

Besides fused multiply-add, what is the point of ARMv7s?

This post is part of a series on why an iOS developer would want to add a new architecture target to his iOS app project. The posts in this series so far cover ARMv7, ARMv7s (this one), ARM64 (soon!).

You probably remember the kerfuffle when, at the same time the iPhone 5 was announced (it was not even shipping yet), Apple added ARMv7s as a default architecture in Xcode without warning. But just what is it that ARMv7s brings, and why would you want to take advantage of it?

One thing that ARMv7s definitely brings is support for the VFPv4 and VFPv3 half-precision extensions, which consists of the following: fused floating-point multiply-add, and half-precision floating-point values (only for converting to and from single precision, no other operation supports the half-precision format), as well as the vector versions of these operations. Both of these have potential applications, even if they are not universally useful, and therefore it was indispensable for Apple to define an ARM architecture version so that apps could make use of them in practice if they desired: had Apple not defined ARMv7s, even if the iPhone 5 hardware would have been able to run these instructions, no one could have used them in practice as there would have been no way to safely include them in an iOS app (that is, in a way that does not cause the app to crash when run on earlier devices).

So we have determined that it was necessary for Apple to define ARMv7s, or this new functionality of the iPhone 5 processor would have been added for nothing, got it. But what if you are not taking advantage of these new floating-point instructions? It is important to realize that you are not taking advantage of these new floating-point instructions unless you full well know you do: indeed, the compiler will never generate these instructions, so the only way to benefit from this functionality is if your project includes algorithms that were specifically developed to take advantage of these instructions. And if it is not actually the case, then as far as I can tell using ARMv7s… is simply pointless. That is, there is no tangible benefit.

First, let us remember that adding an ARMv7s slice will almost double the executable binary size compared to shipping a binary with only ARMv7, which may or may not be a significant cost depending on whether other data (art assets, outer resources) already dominates the executable binary size, but remains something to pay attention to. So already the decision to include an ARMv7s slice starts in the red.

Go forth and divide

So let us see what other benefits we can find. The other major improvement of ARMv7s is integer division in hardware. So let us try and see how much it improves things.

int ZPDivisions(void* context)
{
    uint32_t i, accum = 0;
    uint32_t iterations = *((uint32_t*)context);
    
    for (i = 0; i < 4*iterations; i+=1)
    {
        accum += (4*iterations)/(i+1);
    }
    
    return accum;
}

OK, let us measure how much time it takes to execute (iterations equals 1000000, running on an iPhone 5S, averaged over three runs):

ARMv7 ARMv7s
divisions 24.951 ms 25.028 ms

…No difference. That can’t be?! The ARMv7 version includes this call to __udivsi3, which should be slower, let us see in the debugger what happens when this is called:

libcompiler_rt.dylib`__udivsi3:
0x3b767740:  tst    r1, r1
0x3b767744:  beq    0x3b767750                ; __udivsi3 + 16
0x3b767748:  udiv   r0, r0, r1
0x3b76774c:  bx     lr
0x3b767750:  mov    r0, #0
0x3b767754:  bx     lr

D’oh! When run on an ARMv7s device, this runtime function simply uses the hardware instruction to perform the division. Indeed, on other platforms such a function may be provided in a library statically linked to your app, whose code is then frozen. But that is not how Apple rolls. On iOS, even such a seemingly trivial functionality is actually provided by the OS, and the call to __udivsi3 is a dynamic library call which is actually resolved at runtime, and uses the most efficient implementation for the hardware, even if your binary only has ARMv7. In other words, on devices which would run your ARMv7s slice, you already benefit from the hardware integer division even without providing an ARMv7s slice.

Take 2

But wait, surely this dynamic library function call has some overhead compared to directly using the instruction, could we not reveal this by improving the test? We need to go deeper. Let’s find out, by performing four divisions during each loop, which will reduce the looping overhead:

int ZPUnrolledDivisions(void* context)
{
    uint32_t i, accum = 0;
    uint32_t iterations = *((uint32_t*)context);
    
    for (i = 0; i < 4*iterations; i+=4)
    {
        accum += (4*iterations)/(i+1) + (4*iterations)/(i+2) + (4*iterations)/(i+3) + (4*iterations)/(i+4);
    }
    
    return accum;
}

And the results are… drum roll… (iterations equals 1000000, running on an iPhone 5S, averaged over three runs):

ARMv7 ARMv7s
unrolled divisions 24.832 ms 24.930 ms

“Come on now, you’re messing with me here, right?” Nope. There is actually a simple explanation for this: even in hardware, integer division is very expensive. A good rule of thumb for the respective costs of execution for the elementary mathematical operations on integers is this:

operation (on 32-bit integers) + × ÷
cost in cycles 1 1 3 or 4 20+

This approximation remains valid across many different processors. And the amortized cost of a dynamic library function call is pretty low (only slightly more than a regular function call), so it is dwarfed by the execution time of the division instruction itself.

Take 3

I had one last idea of where we could actually look for to observe a penalty when function calls are involved. We need to go even deeper: having these calls to __udivsi3 forces the compiler to put the input variables into the same hardware registers before each division, so the processor is not going to be able run the divisions in parallel, so let us modify the code so that, in ARMv7s, the divisions could actually run in parallel:

int ZPParallelDivisions(void* context)
{
    uint32_t i, accum1 = 0, accum2 = 0, accum3 = 0, accum4 = 0;
    uint32_t iterations = *((uint32_t*)context);
    
    for (i = 0; i < 4*iterations; i+=4)
    {
        accum1 += (4*iterations)/(i+1);
        accum2 += (4*iterations)/(i+2);
        accum3 += (4*iterations)/(i+3);
        accum4 += (4*iterations)/(i+4);
    }
    
    return accum1 + accum2 + accum3 + accum4;
}

(iterations equals 1000000, running on an iPhone 5S, averaged over three runs):

ARMv7 ARMv7s
parallel divisions 25.353 ms 24.977 ms

…I give up (the difference has no statistical significance).

There might be other benefits to avoiding a function call for each integer division, such as the compiler not needing to consider the values stored in caller-saved registers as being lost across the call, but honestly I do not see these effects as having any measurable impact on real-world code.

If you want to reproduce my results, you can get the source for these tests on Bitbucket.

What else?

We have already looked pretty far in trying to find benefits in directly using the new integer division instruction, what if we set that aside for now and try and see what else ARMv7s brings? Technically, nothing else: ARMv7s brings VFPv4 and VFPv3-HP, their vector counterparts, integer division in hardware, and that’s it for unprivileged instructions as far as anyone can tell.

However, when compiling an ARMv7s slice, Clang will apparently take advantage of this to optimize the code specifically for the Swift core (according these patches, via Stack Overflow). These optimizations are of the tuning variety, so do not expect that much from them, but the main limitation with those is that not that many iOS devices run on Swift, in the grand scheme of things. If you check the awesome iOS Support Matrix (ARMv7s column), you will see for instance that no iPod Touch model runs it, and that the iPad mini skipped it entirely (going directly from ARMv7 to ARM64). So is it worth optimizing specifically for the 4th generation iPad, the iPhone 5, and the iPhone 5C? Maybe not.

What compiling for ARMv7s won’t bring you

And now it’s time for our regular segment, “let us dispel some misconceptions about what a new ARM architecture version really brings”. Adding an ARMv7s slice will not:

  • make your code run more efficiently on ARMv7 devices, since those will still be running the ARMv7 compiled code; this means it could potentially improve your code only on devices where your app already runs faster.
  • improve performance of the Apple frameworks and libraries: those are already optimized for the device they are running on, even if your code is compiled only for ARMv7 (we saw this effect earlier with __udivsi3).
  • There are a few cases where ARMv7s devices run code less efficiently than ARMv7 ones; this will happen on these devices even if you only compile for ARMv7, so adding (or replacing by) an ARMv7s slice will not help or hurt this in any way.
  • If you have third-party dependencies with libraries that provide only an ARMv7 slice (you can check with otool -vf <library name>), the code of this dependency won’t become more efficient if you compile for ARMv7s (if they do provide an ARMv7s slice, compiling for ARMv7s will allow you to use it, maybe making it more efficient).

I need you

Seems clear-cut, right? Not so fast. Sure, we explored some places where we thought direct use of hardware integer division could have improved things, but maybe there are actual improvements in places I did not explore, places with a more complex mix between integer division and other operations. Maybe tuning for Swift does improve things for Cyclone too (which is represented by the ARM64 column devices in iOS Support Matrix), and maybe it could be worth it. Maybe I am wrong and Clang can take advantage of fused multiply-add without you needing to do a thing about it. Maybe I completely missed some other instructions that ARMv7s brings.

And most of all, I have not actually run any real benchmark here, for one good reason: I have little idea of the kind of algorithms iOS apps spend significant CPU time on (outside of the frameworks), so I do not know what kind of benchmark to run in the first place (as for Geekbench, I do not think it really represents tasks commonly done in iOS apps, and in addition I am wary of CPU benchmarks I cannot see the source code of). A good benchmark would avoid us missing the forest for the trees, in case that is what is happening here.

So I need you. I need you to run your app with, and without, an ARMv7s slice on a Swift device (as well as a Cyclone device, if you are so inclined), and report the outcome (such as increased performance or decreased processor usage, the latter is important for battery life). Failing that, I need you to tell me the improvements you remember seeing on Swift devices when you added an ARMv7s slice, or what were the conclusions of the evaluation to add an ARMv7s slice, what you can share of it at least. I need you to tell me if I missed something.

And that is why I am exceptionally going to allow comments on this post. In fact, they should appear immediately without having to go through moderation, in order to facilitate the conversation. But first, be wary of Akismet: if your comment is flagged as spam, try and rework it a bit and post again. Second, comments with nothing to do with the matter at hand will be subject to instant vaporization.

So have at it:

What benefits does the iPhone 5S get from being 64-bit?

Every fall, a new iPhone. The schedule of Apple’s mobile hardware releases has become pretty predictable by now, but they more than make up for this timing predictability by the sheer unpredictability of what they introduce each year, and this fall they outdid themselves. Between TouchID and the M7 coprocessor, the iPhone 5S had plenty of surprises, but what most intrigued and excited many people in the development community was its new, 64-bit processor. But many have openly wondered what the point of this feature was, exactly; including some of the same people excited by it, there is no contradiction in that. So I set out to collect answers, and here is what I found.

Before we begin, I strongly recommend you read Friday Q&A 2013-09-27: ARM64 and You, then get the The ARMv8-A Reference Manual (don’t worry about the “registered ARM customers” part, you only need to create an account and agree to a license in order to download the PDF): have it on hand to refer to it whenever I will mention an instruction or architectural feature.

Some Context

iOS devices have always been based on ARM architecture processors. So far, ARM processors have been strictly 32 bit machines: 32-bit general registers, addresses, most calculations, etc.; but in 2011, ARM Holdings announced ARMv8, the first version of the ARM architecture to allow ARM native 64-bit processing and in particular 64-bit addresses in programs. It was clearly, and by their own admission, done quite ahead of the time where it would actually be needed, so that the whole ecosystem would have time to adopt it (board vendors, debug tools, ISVs, open-source projects, etc.), and in fact ARM did not announce at the time any of their own processor implementations of the new architecture (which they also do ahead of time), leaving in fact some of their partners, server processor ones in particular, the honor of releasing the first ARMv8 processor implementations. I’m not sure any device using an ARMv8 processor design from ARM Holdings has even shipped yet.

All that to say that while many people knew about 64-bit ARM for some time, almost no one expected Apple to release an ARMv8-based consumer device so ahead of when everyone thought such a thing would be actually needed; Apple was not merely first to market with an ARMv8 handheld, but lapped every other handset maker and their processor suppliers in that regard. But that naturally raises the question of what exactly Apple gets from having a 64-bit ARM processor in the iPhone 5S, since after all none of its competitors or the suppliers of these saw what benefit would justify rushing to ARMv8 as Apple did. And this is a legitimate question, so let us see what benefits we can identify.

First Candidate: a Larger Address Space

Let us start by the obvious: the ability for a single program to address more than 4 GB of address space. Or rather, let us start by killing the notion that this is only about devices with more than 4 GB of physical RAM. You do not, absolutely not, need a 64-bit processor to build a device with more than 4GB or RAM; for instance, the Cortex A15, which implements ARMv7-A with a few extensions, is able of using up to 1 TB of physical RAM, even though it is a decidedly 32-bit processor. What is true is that, with a 32-bit processor, no single program is able of using more than 4 GB of that RAM at once, so while such an arrangement is very useful for server or desktop multitasking scenarios, its benefits are more limited on a mobile device, where you don’t typically expect background programs to keep using a lot of memory. So handsets and tablets will likely need to go with a 64-bit processor when they will start packing more than 4 GB of RAM, so that the frontmost program can actually make use of that RAM.

However, that does not mean the benefits of large virtual address space are limited to that situation. Indeed, using virtual memory an iOS program can very well use large datasets that do not actually fit in RAM, using mmap(2) to map the files containing this data into virtual memory, and leaving the virtual memory subsystem handle the RAM as a cache. Currently, it is problematic on iOS to map files more than a few hundred megabytes in size, because the program address space is limited to 4GB (and what is left is not necessarily in a big continuous chunk you can map a single file in).

That being said, in my opinion the usefulness of being able to map gigabyte-scale files on 64-bit iOS devices will currently be limited to a few niche applications, if only because, while these files won’t necessarily have to fit in RAM, they will have to fit on the device Flash storage in the first place, and the largest capacity you can get on an iOS device at the time of this writing being 128 GB, with most people settling for less, you’d better not have too many such applications installed at once. That said, for those applications which need it, likely in vertical markets mostly, the sole feature of a larger address space means 64-bit ARM is a godsend for them.

One sometimes heard benefit of Apple already pushing ordinary applications (those which don’t map big files) to adopt 64-bit ARM is that, the day Apple releases a device featuring more than 4 GB of RAM, these applications will benefit without them needing to be updated. However, I don’t buy it. It is in the best interest of iOS applications to not spontaneously occupy too much memory, for instance so that they do not get killed first when they are in the background, so in order to take advantage of an iOS device with more RAM than you can shake a stick at, they would have change behavior and be updated anyway, so…

Verdict: inconclusive for most apps, a godsend for some niche apps.

Second Candidate: the ARMv8 AArch64 A64 ARM64 Instruction Set

The ability of doing 64-bit processing on ARM comes as part of a new instruction set, called…

  • Well it’s not called ARMv8: not only does ARMv8 also bring improvements to the ARM and Thumb instruction sets (more on that later), which for the occasion have been renamed to A32 and T32, respectively; but also ARMv9 whenever it will come out will also feature 64-bit processing, so we can’t refer to 64-bit ARM as ARMv8.
  • AArch64 is the name of the execution mode of the processor where you can perform 64-bit processing, and it’s a mouthful so I won’t be using the term.
  • A64 is the name ARM Holdings gives to the new instructions set, a peer to A32 and T32; but this term does not make it clear we’re talking about ARM so I won’t be using it.
  • Apple in Xcode uses ARM64 to designate the new instruction set, so that’s the name we will be using

The instruction set is quite a change from ARM/A32; for one, the instruction encodings are completely different, and a number of features, most notably pervasive conditional execution, have been dropped. On the other hand, you get access to 31 general purpose registers that are of course 64-bit wide (as opposed to the 14 general purpose registers which you get from ARM/A32, once you remove the PC and SP), some instructions can do more, you get access to much more 64-bit math, and what does remain supported has the same semantics as in ARM/A32 (for instance the classic NZCV flags are still here and behave the exact same way), so it’s not completely unfamiliar territory.

ARM64 could be the subject of an entire blog post, so let us stick to some highlights:

CRC32

Ah, let’s get that one out of the way first. You might have noticed this little guy in the ARMv8-A Reference Manual (page 449). Bad news, folks: support for this instruction is optional, and the iPhone 5S processor does not in fact support it (trust me, I tried). Maybe next time.

More Registers

ARM64 features 31 general purpose registers, as opposed to 14, which helps the compiler avoid running out of registers and having to “spill” variables to memory, which can be a costly operation in the middle of a loop. If you remember, the expansion of 8 to 16 registers for the x86 64-bit transition on the Mac back in the day measurably improved performance; however here the impact will be less, as 14 was already enough for most tasks, and ARM already had a register-based parameter passing convention in 32-bit mode. Your mileage will vary.

64-bit Math

While you could do some 64-bit operations in previous versions of the ARM architecture, this was limited and slow; in ARM64, 64-bit math is natively and efficiently supported, so programs using 64-bit math will definitely benefit from ARM64. But which programs are those? After all, most of the amounts a program ever needs to track do not go over a billion, and therefore fit in a 32-bit integer.

Besides some specialized tasks, two notable example of tasks that do make use of 64-bit math are MP3 processing and most cryptography calculations. So those tasks will benefit from ARM64 on the iPhone 5S.

But on the other hand, all iPhones have always had dedicated MP3 processing hardware (which is relatively straightforward to use with Audio Queues and CoreAudio), which in particular is more power efficient to use than the main processor for this task, and ARMv8 also introduces dedicated AES and SHA accelerator instructions, which are theoretically available from ARM/A32 mode, so ARM64 was not necessary to improve those either.

But on the other other hand, there are other tasks that use 64-bit math and do not have dedicated accelerators. Moreover, standards evolve. Some will be phased out and others appear, and a perfect example is the recently announced SHA-3 standard, based on Keccak. Such new standards generally take time to make their way as dedicated accelerators, and obviously such accelerators cannot be introduced to devices released before the standardization. But software has no such limitations, and it just so happens that Keccak benefits from a 64-bit processor, for instance. 64-bit math matters for emerging and future standards which could benefit from it, even if they are specialized enough to warrant their own dedicated hardware, as software will always be necessary to deploy them after the fact.

NEON Improvements

ARMv8 also brings improvements to NEON, and while some of the new instructions are also theoretically available in 32-bit mode, such as VRINT, surprisingly some improvements are exclusive to ARM64, for instance the ability to operate on double-precision floating point data, and interesting new instructions to accumulate unsigned values to an accumulator which will saturate as a signed amount, and conversely (contrast with x86, where all vector extensions so far, including future ones like AVX-512, are equally available in 32-bit and 64-bit mode even though for 32-bit mode this requires incredible contortions, given how saturated the 32-bit x86 instruction encoding map is). Moreover, in ARM64 the number of 128-bit vector registers increases from 16 to 32, which is much more useful than the similar increase of number of general-purpose registers as SIMD calculations typically involve many vectors. I will talk about this some more in a future update to my NEON post

Pointer size doubling

It has to be mentioned: a tradeoff of ARM64 is that pointers are twice as large, taking more space in memory and the caches. Still, iOS programs are more media-heavy than pointer-heavy, so it shouldn’t be too bad (just make sure to monitor the effect when you will start building for ARM64).

Verdict: a nice win on NEON, native 64-bit math is a plus for some specialized tasks and the future, other factors are inconclusive: in my limited testing I have not observed performance changes from just switching (non-NEON, non-64 bit math) code from ARMv7 compilation to ARM64 compilation and running them on the same hardware (iPhone 5S).

Non-Candidate: Unrelated Processor Implementation Improvements

Speaking of which. Among the many dubious “evaluations” on the web of the iPhone 5S 64-bit feature, I saw some try to isolate the effect of ARM64 by comparing the run of an iOS App Store benchmark app (that had been updated to have an ARM64 slice) on an iPhone 5S to a run of that same app… on an iPhone 5. Facepalm. As if the processor and SoC designers at Apple had not been able to work on anything else than implementing ARMv8 in the past year. As a result, what was reported as being ARM64 improvements were in fact mostly the effect of unrelated improvement such as better caches, faster memory, improved microarchitecture, etc. Honestly, I’ve been running micro benchmarks of my devising on my iPhone 5S, and as far as I can tell the “Cyclone” processor core of the A7 is smoking hot (so to speak), including when running 32-bit ARMv7 code, so completely independently of ARMv8 and ARM64. The Swift core of the A6 was already impressive for a first public release, but here Cyclone knocks my socks off, my hat is off to the Apple semiconductor architecture guys.

Third Candidate: Opportunistic Apple Changes

Mike Ash talked about those, and I have not attempted to measure their effect, so I will defer to him. I will just comment that to an extent, these improvements can also be seen as Apple being stuck with inefficiencies they cannot get rid of for application compatibility reasons on ARM/A32, and they found a solution to not have these inefficiencies in the first place, but only for ARM64 (and therefore, ARM64 is better ;). I mean, we in the Apple development community are quick to point and laugh at Windows being saddled with a huge number of application compatibility hacks and a culture of fear of breaking anything that caused this OS to become prematurely fossilized (and I mean, I’m guilty as charged too), and I think it’s only fair we don’t blindly give Apple a pass on these things.

So, remember the non-fragile Objective-C ABI? The iPhone has always had it (though we did not necessarily realize as the Simulator did not have it at first), so why can’t Apple use it to add an inline retain count to NSObject in 32-bit ARM? I’m willing to bet that for such a fundamental Cocoa object, non-direct effects start playing a role whenever attempting to make even such an ostensibly simple change; I’m sure for instance that some shipping iOS apps allocate an enormous amount of small objects, and would therefore run out of memory if Apple added 4 bytes of inline retain count to each NSObject, and therefore to each such object. Mind you, the non-fragile ABI has likely been useful elsewhere on derivable Apple classes that are under less app compatibility pressure, but it was not enough to solve the inline retain count problem by itself.

Verdict: certainly a win, but should we give credit to ARM64 itself?

Fourth Candidate: the Floating-Point ABI

This one is a fun one. It could also be considered an inefficiency Apple is stuck with on 32-bit ARM, but since ARM touts that a hard float ABI is a feature of ARM64, I’m willing to cut Apple some slack here.

When the iPhone SDK was released, all iOS devices had an ARM11 processor which supported floating-point in hardware; however, Apple allowed and even set by default Thumb to be used for iOS apps, and Thumb on the ARM11 could not access the floating-point hardware, not even to put a floating-point value in a floating-point register. And in order to allow Thumb code to call the APIs, the APIs had to take their floating-point parameters from the general-purpose registers, and return their floating-point result to a general-purpose register, and in fact all function calls, including between ARM functions, had to behave that way, because they could always potentially be called by Thumb code: this is called the soft-float ABI. And when with the iPhone 3GS Thumb gained the ability to use floating-point hardware, it had to remain compatible with existing code, and therefore had to forward floating-point parameters in general-purpose registers. Today on 32-bit ARM parameters are still passed that way for compatibility with the original usages.

This can represent a performance penalty, as typically transferring from the floating-point register file to the general-purpose register file is often expensive (and sometimes the converse too). It is often small in comparison to the time needed for the called function to execute, but not always. ARM64 does not allow such a soft-float ABI, and I wanted to see if I could make the overhead visible, and then if switching to ARM64 would eliminate the overhead.

I created a small function that adds successively 1.0f, 2.0f, 3.0f, etc. up to (1<<20)*1.0f to a single-precision floating-point accumulator that starts at 0.0, and another function which does the same thing except it calls another function to perform the addition, through a function pointer to decrease the risk of it being inlined. Then I compiled the code to ARMv7, ran it on the iPhone 5S, and measured and compared the time taken by each function; then the same process, except the code was also compiled for ARM64. Here are the results:

ARMv7 ARM64
Inlined addition 6.103 ms 5.885 ms
Addition by function call 12.999 ms 6.920 ms

Yup, we have managed to make the effect observable and evaluate the penalty, and when trying on ARM64 the penalty is decimated; there is some overhead left, probably the overhead of the function call itself, which would be negligible in a real situation.

Of course, this was a contrived example designed to isolate the effect, where the least possible work is done for each forced function call, so the effect won’t be so obvious in real code, but it’s worth trying to build for ARM64 and see if improvements can be seen in code that passes around floating-point values.

Verdict: A win, possibly a nice win, in the right situations.

Non-Candidate: Apple-Enforced Limitations

Ah, yes, I have to mention that before I go. Remember when I mentioned that some ARMv8 features were theoretically available in 32-bit mode? That’s because Apple won’t let you use them in practice: you cannot create a program slice for 32-bit ARM that will only run on Cyclone and later (with the other devices using another slice). It is simply impossible (and I have even tried dirty tricks to do so, to no avail). If you want to take advantage of ARMv8 features like the AES and SHA accelerator instructions, you have to port your code to ARM64. Period. So Sayeth Apple.

That means, for instance, that Anand’s comparison on the same hardware of two versions of Geekbench: the one just before and the one just after addition of ARM64 support, while clever and the best he could do, is not really a fair comparison of ARM64v8 and ARM32v8, but in fact a comparison between ARM64v8 and ARMv7. When you remove the impressive AES and SHA1 advantages from the comparison table, you end up with something that may be fair, though it’s still hard to know for sure.

So these Apple-enforced limitations end up making us conflate the difference between ARM32v8 to ARM64v8 with the jump from ARMv7 to ARM64v8. Now mind you, Apple (and ARM, to an extent) gets full credit for both of them, but it is important to realize what we are talking about.

Plus, just a few paragraphs earlier I was chastising Apple for their constraining legacy, and what they did here, by not only shipping a full 64-bit ARMv8 processor, but also immediately a full 64-bit OS, is say: No. Stop trying to split hairs and try and use these new processor features while staying in 32-bit. Go ARM64, or bust. You have no excuse, the 64-bit environment was available as the same time as the new processor. And that way, it’s one less “ARM32v8” slice type in the wild, so one less legacy, to worry about.

Conclusion

Well… No. I’m not going to conclude one way or the other: neither that the 64-bit aspect of the iPhone 5S is a marketing gimmick, or that it is everything Apple implied it would be. I won’t enter this game. Because what I do see here is the result of awesome work from both the processor side, the OS side, and the toolchain side at Apple to seamlessly get us a full 64-bit ARM environment (with 32-bit compatibility) all at once, without us having to double-guess the next increment. The shorter the transition, the better we’ll all be, and Apple couldn’t do shorter than that. For comparison, on the Mac the first 64-bit machine, the Power Mac G5, shipped in 2003; and while Leopard ostensibly added support for 64-bit graphical applications, iTunes actually ended up requiring Lion, released in 2011, to run as a 64-bit process. So overall on the Mac the same transition took 8 years. ARM64 is the future, and the iPhone 5S + iOS 7 is a clear investment in that future. Plus, with the iPhone 5S I was able to tinker on ARMv8 way ahead of when I thought I would be able to do so, so I must thank Apple for that.