Don’t underestimate what just happened: Epic remotely adding an in-app circumvention for in-app purchases in Fortnite on mobile is an engagement of hostilities on par with the bombing of Pearl Harbor. Or the people of Paris taking the Bastille, depending on your point of view. This isn’t just the Hey saga: the relationship between Epic and Apple (and Epic and Google) can never be the same after that.

What’s at stake? Just the whole principle of tight control of mobile platforms by their stewarts, that’s pretty much it. Indeed, you can’t keep the conversation on the “dirty percent” without then reviewing the agency model and the exclusivity (or extinguishing dominance, in the case of Android) of the platform application store, which brings into question app review, which leads us to Apple’s meddling and misguidance and around-pushing and self-dealing and feeding us garbage and capriciousness (too many incidents to list) and inconsistency and preferential treatments (again, too many) and dual-personality and overpowering control and petrification of developers (worth blowing one NYT free read).

Any single one of those is a fundamental injustice, in the sense that each of them represent a damned-if-you-do (Apple could catch you), damned-if-you-don’t (you have to take user criticism/you get punished by the market for being non-competitive/etc.) alternative for the developer. And yet with few exceptions (coverage by Charles Arthur at The Guardian comes to mind), this never reaches the mainstream. But Epic has accomplished that: the news has spread everywhere by now, leading to unprecedented scrutiny of Apple’s actions, and it’s not about to stop. For instance, we’ve always had to assume the best possible intentions from Apple whenever they introduced some restriction, because conspiratorial thinking leads to nowhere, but as Steve Troughton-Smith raises that which we could learn through the lawsuit could be very damaging through revealing what their planning was for introducing them. Yes, it’s a stunt. Of course it’s a stunt. Of course it’s self-serving. But many significant actions were initially dismissed as stunts; what matters is whether it brings us closer to justice.

I think Epic’s message is rather awkward and is preaching too much to the choir, and that them targeting the Play Store as well (which you could always circumvent through Android side-loading) is muddying the message. But not only did it reach the mainstream as mentioned, Epic is making an important point that through sheer airspace saturation has a good chance of reaching the rank-and-file at Apple (and Google): didn’t you become that which you fought? Such a targeting is something that I feel has been missing in the tech conversation. That’s the second thing they’re doing right.

I also hear they are a big faceless company, and I myself wouldn’t trust them any farther than I can throw them, but I can’t answer this argument better than mcc did: given the sheer size of Apple and Google, how could an indie possibly hope to meaningfully take on them? Still, that leaves the question of what will they actually do if they win… but, as it turns out, that doesn’t matter. It’s not a succession war with a zero-sum outcome of just determining who will lord over us: Epic can’t dethrone Apple as the platform stewart. The best they can hope for is having their own store as a peer to the Apple-managed iOS App Store, and if that happens, guess what will happen? Valve will be able to do so as well, and so will instantly bring Steam to iOS. Activision Blizzard will probably want in on the action, as well. etc. And now you have healthy competition of app stores on the platform, such that none of them has veto power over any developer. That is the third and most important reason for why their campaign deserves to be supported.

Look, it can simultaneously be true that Epic is full of shit when they “pass on” the savings for cosmetic add-ons for which they have full pricing power and zero fixed cost, and that they are in the right to take on this Apple policy, because that policy is abusive for the many, many cases where there are fixed costs and pricing power is more limited, so it is abusive in general. And while I think that Epic has little chance of winning their lawsuit given current U.S. law (or elsewhere, should they attempt in France or the E.U.), I also think they should eventually win this fight. So, yeah, #FreeFortnite.

Dear Mr. Cédric O

Ce billet est également disponible en Français

Dear Mr. Cédric O

Small aside for my regular readers: Cédric O is in charge of digital matters in the French government. Of France. No, I am not reducing his name to its initial, this is indeed his name. So while I’ve got your attention, allow me to state that when requested to add name validation that mandates a minimum number of characters, the only ethical solution for a software developer is to refuse to implement that. Back to the matter at hand.

First, thank you for your recent public pronouncements (of which iMore provides good English-language coverage, in my opinion), which allow me to give a theme to this post I intended to write down: that of the open letter.

Indeed, I want to react to the direction you want to steer the StopCovid project towards, which is that of direct standoff with Apple. Now I will readily admit to a bias in favor of Apple, as the coverage of this blog shows: a lot of it centers on Apple technologies, with a focus on iOS. But I hope to convince you anyway that entering a public standoff with Apple would be futile, even if you’d wished for Apple to be pilloried.


It was basically an excellent idea to start and develop the StopCovid project, don’t get me wrong, but I think it’s useful to go over exactly why. Indeed, some observers (some of which I have a lot of respect for, such as Bruce Schneier) have dismissed this kind of computer-based contact tracing infrastructure as being tech solutionism, and these accusations deserve closer scrutiny. I don’t feel the need to go over the principle and usefulness of contact tracing, not only to warn those a newly diagnosed individual could have infected without his knowledge, but also to attempt to figure out the individual who himself infected the newly diagnosed one, if not known already, and thereby potentially walk back the contamination chain.

However, I do feel it is useful to restate the main use case, which is containment of reemerging outbreaks, be it upon the easing of lockdown or imported from endemic areas. As a result, we can assume being a context of a breach that needs repair, and therefore (as an example) means that see little use in the observation phase, such as tests on asymptomatic people, can suddenly be deployed on a large scale as soon as a confirmed case is raised. One of the aims of the lockdown (along with easing the pressure on hospital care) is precisely to get back to a situation where repairing the breaches is feasible anew, as opposed to a situation where there are more breaches than digits available to plug them.

But as we saw when looking for the index case in the Oise (where I’m told it is still being searched for; more than two months after the fact, the expected benefit seems rather thin in my opinion), current techniques appear to be outmatched in the case of Covid-19. And research coming from Oxford university confirms that assessment, as it suggests the inferred epidemiological characteristics of Covid-19 squash any hope of efficient contact tracing by traditional techniques, validating the need for an automated contact recording solution within a more general effort.

That qualification is useful to make, as no application will be a panacea, but rather a link in a chain where all elements must be present. Such as widely available testing (as France appears to be ready to provide as May begins) that also has quick turnover: if the swab result does not come back until three days later, we reach the contacts, and the tests of those don’t come back until three days after that, it is obvious that even if the application allowed for instant contact tracing, the disease would nevertheless outpace us. As a result, the buildup of a waiting list for testing must be avoided at any cost (PCR itself taking only a few hours), and we must ensure the swabs can be quickly taken to PCR spots. And no application can ever replace actual tracing squads, if only in order to trace contacts not covered, those where one of the two is not equipped with a compatible mobile phone, or any mobile phone at all, for instance. That is why it makes sense to display tracing capabilities at the départemental scale, rather than at the regional scale.

How: the physical layer

All that having been said, we can now start going over the different methods for automatically establishing a contact has occurred. Geolocation can be dismissed outright: it is simultaneously not accurate enough for the need: as GPS is not available everywhere or instantly, smartphones often fall back to cell tower or Wifi hotspot geolocation, and too invasive, for obvious reasons. Bluetooth, on the other hand, is designed to allow direct transmission between digital devices, including peripherals, and its low energy variant has even been designed to support very constrained environments such as being integrated in a key fob, thus enabling it to be found by proximity detection. This notion of proximity is essential: for determining potential contamination, we don’t so much want to compute an exact distance as to reduce testing to a list of potential contacts, erring on the side of caution, rather than having to test everyone in a 100 miles radius in case of a confirmed case.

How: Data processing

OK, so we will retrospectively determine a contact by the fact the mobile phones of the parties in question have been able to exchange (or “exchange strongly enough” by measure of signal intensity among other means) data through Bluetooth. But how should this data be kept (and which data), and where should that computation be performed when the time comes?

Any mobile phone already broadcasts Bluetooth messages to any interested party for various reasons (that is what allows them to appear in the list of reachable devices on the selection interface of a personal computer, for instance). So a first idea would be to set up passive listening of broadcast Bluetooth messages and locally store the Bluetooth identifier of the emitter. But that quickly runs into some issues: for privacy reasons, as it happens, that identifier rotates at regular intervals for implementations of recent versions of the standard, such that being forgotten by is own emitter, it becomes useless; furthermore, many mobile phones will throttle if not stop broadcasting when they are not in active use so as to save battery life, which has never ceased to be an important design constraint on mobile phone design.

So it seems necessary to install a change of behavior on both sides of the contact, which shifts the problem space: now both sides have to be modified for even either side to benefit from the system. As a result, it’s kind of like masks: if the source of the contamination did not previously install the application, the contaminated will get no benefit from having installed the application, so a reaching a sufficient density of participants is paramount. That could lead to consider providing smart watches (which are harder to forget at home) to those who, as is their right, do not own a compatible mobile phone.

Now that we can freely design the software on both sides of the interaction, the design space is greatly expanded, too much to explore it here. However, one significant choice is that of the processing which determines whether a contact previously occurred with a newly diagnosed individual, processing which therefore needs access to any necessary information for that purpose: should it occur on the mobile phone (either one: the two now being “hermaphroditic”, what one can determine, symmetrically the other will be able to, as well), or on a central server?

In the second case, since the aim is to warn the bearer of the phone involved in the contact, that would by all appearances entail that the server has the means to contact any user of the system (directly or indirectly: if every user must regularly query the server by asking “I am the user with pseudonym x, is there anything new for me? Send the answer to IP address”, that is equivalent to being reachable).

In the first case, however, it is impossible to prevent the user from figuring out on which occasion he has been in contact with a newly diagnosed person (though still coming short of directly informing of his identity): even if the processing which determines contact were a difficult to reverse cryptographic function, it would suffice for him to run that function on subsets of his contact database, through dichotomy if need be, until the single event sufficient to make the function return a positive is found.

The Apple and Google system

That having been established, let us look at the choices made by Apple and Google, as well as the API provided, over which an application will be able to be built.

To begin with, “Exposure Notification” is a service as far as Bluetooth Low Energy is concerned, that is to say a protocol relying on BLE for over the air data exchange, as HTTP is a protocol relying on TCP for over the Internet (usually) data transmission. It is documented here as I type this; as such, the consortium managing Bluetooth has provided a protocol identifier specifically for its use. The format as it appears on the wire (so to speak) is simple: beyond the mandatory Bluetooth framing, it’s mostly the rotating proximity identifier, but it comes with a metadata field, whose only purpose so far (beyond versioning the protocol, allowing to confirm implementation compatibility) is the improve signal intensity loss computations.

As the name suggests, the rotating proximity identifier rotates at more or less regularly: more or less because if rotation was too regular, that would make it easier to trace people, and render these changes useless: rotation occurs at most every ten minutes, and at most every 20 minutes. All this is properly detailed in the crypto document, which describes how data sent by the protocol mentioned above is generated, how data sent and received is stored, and in particular how to determine a potentially contaminating contact has occurred.

The most important property of the “Exposure Notification” system is that this determination is performed locally. The system assumes the presence of at least one server, but the latter only broadcasts non-nominative data collected (with permission) from diagnosed individuals so as to enable the recipient to make this determination: nothing is uploaded for users that have not been positively diagnosed yet. Even the data that does get uploaded reveals little, to the extent that it amounts to revealing the randomly generated data that was used for sending the rotating proximity identifiers, without any personal data, least of all identifying data, being involved.

The system credibly claims other properties: for instance, it would appear to preclude the collection of information emitted by others, only to later make that be broadcast by passing it off as one’s own information (innocents won’t be the only people positively diagnosed with Covid-19, you have to assume adversaries will too, and as a result minimize their nuisance potential).

That being said, the system does not ensure by itself that only positively diagnosed individuals will be able to upload their alleged contamination information: I have a hard time seeing how it could provide any control in this area, therefore it relies on the health authorities for such a filter.

That is apparent in the third and last document, which describes the API an application (here for an Apple device running iOS) will have to use for interfacing with the service. The API manages in the background everything in relation with the Bluetooth service and data storage, but does not involve itself with network interactions, which is the responsibility of the application proper: parts of the API consists of explicitly taking this data, or providing data to be sent; this is more explicit in the API documentation for a Google device running Android, which otherwise describes the same API, apart from the use of the Java language, as required by Android.

Aside from that, the API in general is unsurprising: it is modern Objective-C, with blocks used for callbacks when data processing is complete for example. It seems to provide good coverage for the future applications usage scenarios: an ENManager class for interaction with the system as a whole such as displaying the current status (allowed or not, mostly) and recover data to be uploaded in case of a positive test result, and a ENExposureDetectionSession to be used when checking whether the latest batch of uploaded data would not trigger an exposure when combined with the internally stored data. The only surprise being Objective-C: we would have expected Swift instead, given how trendy the language is for Apple, but that does not affect the interface functionally, it is even likely that it can directly be used from Swift.

The API reveals one final intriguing functionality, which is the possibility to declare your risk level to the system as soon as any contamination is suspected, before even a formal positive test result, so as to recursively warn contacts; with a lower intensity, of course. That will nevertheless have to go through the health authorities, so it remains to see what they will use this for.

The Apple and Google system: the unsaid

As we see, while the documentation mentions the role played by the server, it remains silent as to how many there will be, how it or they will be managed, and by whom. Multiplying the servers would be unfortunate to the extent it would force the installation of as many apps as there would be applicable servers in order to be properly covered (as much to receive the information broadcast by each server, as to be able to send to each of these servers should it become necessary); that could be acceptable (though we would rather do without) in specific cases, such as that of commuters who cross a border, it is however completely unacceptable for the average user.

Both companies intend to severely restrict access to this API: quoting from their commen document gathering answers to frequently asked questions:

10.How will apps get approval to use this system?

Apps will receive approval based on a specific set of criteria designed to ensure they are only administered in conjunction with public health authorities, meet our privacy requirements, and protect user data.

The criteria are detailed separately in agreements that developers enter into to use the API, and are organized around the principles of functionality and user privacy. There will be restrictions on the data that apps can collect when using the API, including not being able to request access to location services, and restrictions on how data can be used.

Apple in particular has the means to enforce these restrictions: an application in general not only cannot be distributed (beyond a handful of copies) without their permission, but it is certain that a sandbox entitlement (the sandbox being the environment in which an iOS app runs, restricting their access to the operating system to the necessary services with some exceptions, only recognized with a signed waiver from Apple) will be necessary, with very few entities being able to claim such an entitlement (state actors, mostly); sorry for those who would like to play with it: com.apple.developer.exposure-notification will be the most restrictive entitlement ever available for the iPhone… It is a sure bet that Apple will not hesitate to invalidate an entitlement that would have leaked or become abused.

Given the arbitrator position at least Apple holds, I therefore wonder about the lack of any rule or even recommendation on the multiplication of servers. I can conceive that neither Apple nor Google want to expose themselves even more to charges that they are holding themselves as above states, but a confirmation that one server at most would be allowed per country (defined as an entity with diplomatic representations or equivalent) would be desirable, while still allowing each country to have multiple apps if needed, as long as they are all based on the national server. I of course hope that the EU will set up a common server that all member states will adopt, and ideally you would only need one per continent, but it would seem difficult designate for each continent the political entity that would be able to be put in charge of that (as for having a single global server, if that were viable politically as well as technically, Apple and Google would already have handled that).

Additionally, the documentation mentions a second phase where the system will be able to be activated out of the box through the operating system without requiring any app to be installed, with the suggestion to install one appearing once the system has determined a potentially contaminating contact; but if the system can determine that out of the box, that implies the ability to recover data from a server without any previously installed app, so which one is used?

And for that matter, there is no guidance on the role of health authorities to ensure the reliability of data that their servers would broadcast. I recall an incident that was reported to me while working in the telecommunication industry: wireless networks used by police, firefighters, EMT, etc. provide a “distress call” functionality these customers take seriously, you can understand why. In practice, it is a voice call, but flagged so as to trigger some specific processing (in relation to priority, in particular) and raises alarms at many levels. And when initially interconnected with the system covering a neighboring district, it did not go over exactly as planned. Indeed, even though the interconnection protocol did specify how to transmit the distress status of the call, it left a lot of leeway in how to react to such a status; and as it happens, at least one of the systems would consider that distress status to matter so much as to make it sticky for the channel in question, up until explicitly cleared by a supervisor. Which by itself can make sense, but in the case where the channel would belong to the other interconnected system, that one having a different configuration would as a result never send such a confirmation, such that the channel would perpetually be considered in distress after that, even for ordinary calls. Pretty quickly, all channels in this situation ended up flagged as in distress without any way to clear them, and when all calls are distress calls, well none are any more. They had to be supplied an emergency patch.

So it would appear risky to invest resources (engineering, quality assurance, etc.) setting up a system without derisking the possibility that it be for naught if it ends up giving back too many false positives to remain usable. I can’t imagine doing without rules to ensure the reliability of information broadcast by the server or servers.

Finally, still in the geographical matter a risk (raised by Moxie Marlinspike) exists where the database to download could become too heavy (in particular in case the epidemic flares back up) to be practical to download, such that it would become necessary to partition it according to location… thus reintroducing geolocation for this purpose, ruining part of the assurances offered. Similarly to server multiplication, I think this is a matter for which Apple and Google should state their intentions.

The standoff

StopCovid, as with many other projects, was started before the Apple/Google initiative was made public; the project has followed different principles, a process which I respect; the protocol, called ROBERT, is documented. The choice was notably made of an architecture where contamination determination is centralized, with the benefits and drawbacks this entails, I won’t go over them again.

As for the matter of server multiplication, we could question already the necessity of protocol multiplication: will the average user need to install one application for each protocol? But that is not where the issue currently lies: since the ROBERT protocol relies on Bluetooth as expected, its implementation on iPhone meets well-established restrictions by Apple on the ability to use Bluetooth by an app which is not in active usage; these restrictions are documented by Apple, even if in vague terms; I have no doubt the project was able to assess them first hand. They aim at preserving battery life and protecting privacy. They drastically reduce the viability of the solution.

Technologically, it was important to try, if only in order to be credible and not depend on a partner that would have been chosen out of a lack of any alternative. The StopCovid project, notably through its experiments on (approximate) distance measurement using Bluetooth, has already accomplished advances which will be more generally useful, and as a project it could go forward within the framework of a more general protocol, with it adopting another protocol not being synonymous with the end of the project.

Because let’s be clear: I can hear media mention systems that could be “made compatible”, but the only way I know of to make two systems communicate is to have them implement the same protocol. It can consist of two different implementations, but they must be of the same protocol.

For when confronted with these longstanding restrictions, you have chosen the path of the standoff with Apple: you express surprise that Apple would not lift these restrictions on demand, and you insist that the project be delivered in production according to its current principles, invoking a matter of technological sovereignty.

Such a standoff is futile and even harmful, and for at least two reasons.

Futile for now

Beyond these restrictions, Apple has more generally asserted the principle that they control the software that can be distributed on the mobile devices showing their brand. As time went on this control sat less and less well with me, such that I have looked for alternatives, in particular through the web, to be able to distribute my personal projects. This is becoming viable for some applications, such as file processing, but when it comes to Bluetooth there is no alternative to the “native” SDK.

So even if I personally object to some of Apple policies, do you seriously believe that after 12 years of dictating their conditions as they saw fit for iPhone software, witout exception, except those they defined themselves (there are a few more, less well documented), you were going to be able to just walk in and convince them or force their hand just like that? That is delusional.

It is all the more delusional as in other situations where they were largely more exposed (I am referring in particular to a decryption request from the FBI), Apple did not cave in to pressure, and they were proud of it and they still are. That is a matter among others of professional pride, as much in relation to the preservation of battery life as it is of privacy protection. Do you really think big words will be enough to make them change their minds?

If at least you were to invoke some argumentation, such as on the potential advantages for privacy of a centralized solution like ROBERT when compared to their solution, that could make them think twice about it. But instead, only denunciation of their financial health (insolent for sure, but what is the relationship with the matter at hand?) or invocation of technological sovereignty is brought.

You could get the upper hand through legal constraints, but its is certain it will take time, a lot of time. So defending technological sovereignty of France could make sense… in other circumstances. Because, I don’t know if you noticed, but France is in the middle of a pandemic right now. And France will only be able to eventually get rid of it through herd immunity; and I don’t know about you, but I’d rather acquire it through a vaccine, or failing that as late as possible. But by the time you’d have forced Apple’s hand through legal means, I think a vaccine will have been found.

Therefore, your insistence on technological sovereignty tells me that attempting to enforce it in the immediate situation is more important for you than having an efficient contact tracing solution, able to save lives. These priorities are backwards. Technological sovereignty matters, but there will be opportunities to enforce it later, or in other ways.

Maybe it’s unfair for Apple to be dictating their terms in such a way. Maybe it ought to be otherwise. But in the meantime, in the here and now, they hold the keys to the system.

Futile in the long run

Let us now assume Apple has given you the keys. Your troubles are not over. As the reasons for which they have set up these restrictions in the first place have not gone away, especially that of battery preservation.

So what is to say your solution will not excessively drain the battery? In particular, you will have a hard time finding skilled developers on the market when it comes to reasonable usage of Bluetooth when unshackled from these restrictions: even those who are currently doing so on Android may come to a few surprises once on iPhone.

This is particularly important as some iPhones remain in active usage for far longer than Android devices, so often with a degraded battery. I personally still haven’t replaced my 6-year-old iPhone 5S which still works fine, except its autonomy is no longer what it once was, and I don’t think I am alone. The matter isn’t merely that I will have to pay more attention to my battery level once StopCovid will be installed; the matter is how will the general public react to an application that, once installed, will cause the battery to drain more quickly, leading it to become fully drained if not carefully watched? A fraction at least will disable the application. And did not we tell earlier how important sufficient penetration was for the system to function?

Once again, the established situation matters, and the established situation is that Apple keeps maintaining many devices in the wild that others would consider obsolete (but which do feature Bluetooth Low Energy), and any shift in this equilibrium will be noticed. For instance, do you really want to expose yourself to accusations of trying to drive premature device renewal, when the additional battery drain of StopCovid will be realized, thus potentially forcing some to renew hardware that was previously more functional?

You could respond that Apple itself will encounter the same challenges and risk increasing the battery drain when they will implement their own system. That may be the case, but I would rather trust them in that domain than I do the developers of StopCovid, no offense meant.


I refuse the false dichotomy brought by some commenters, who would reduce the choice to the entity to which I would have to confide myself. With the right protocol, the right architecture, we can avoid having to trust any particular party more than we are comfortable with, and reconcile seemingly opposite requirements.

While Germany initially had its own project, it has recently announced joining the Apple and Google system, but that will not prevent it from proposing its own application based on that system. What is to prevent France from following the same path? We will all benefit by avoiding balkanization of solutions, and France here is not in a position of strength.

Cher M. Cédric O

This post is also available in English

Cher M. Cédric O,

Petite parenthèse à l’intention de mes lecteurs habituels: Cédric O est le secrétaire d’état au numérique du gouvernement français. De France. Non, je ne suis pas en train de réduire son nom à une initiale, c’est bien le sien. J’en profite donc pour dire que si on demande la mise en place d’une validation qui exige un nom d’au moins N caractères, la seule solution éthique pour un développeur est de refuser de l’implémenter. Fin de la parenthèse.

D’abord, je vous remercie pour vos récentes interventions, qui me permettent de donner un thème à ce billet que je souhaitais écrire: celui de la lettre ouverte.

Je souhaite donc réagir à la direction que vous voulez faire prendre au projet StopCovid, celle de la confrontation avec Apple. Je confesse volontiers un biais envers Apple, comme le montrent les sujets traités sur ce blog, qui tournent beaucoup autour des technologies Apple, iOS en particulier. Mais j’espère tout de même vous convaincre que aller à la confrontation avec Apple serait futile, quand bien même vous voueriez Apple aux gémonies.


C’était à la base une excellente idée de démarrer et de développer le projet StopCovid, soyons clair, mais je pense qu’il est utile de développer pourquoi. En effet, certains observateurs (dont des que je respecte beaucoup, comme Bruce Schneier) ont dénoncé ce genre de système informatique de suivi de contacts comme étant du solutionnisme technologique, et cette accusation mérite donc d’être examinée. Il est inutile de rappeler ici le principe et l’importance du suivi de contacts, non seulement pour avertir ceux qu’un individu nouvellement diagnostiqué a pu contaminer à son insu, mais aussi tenter de retrouver l’individu ayant contaminé celui nouvellement diagnostiqué s’il est inconnu, et ainsi potentiellement remonter les chaines de contamination.

Par contre, il est utile de repréciser le cas d’utilisation principal, qui est de contenir des réémergences de l’épidémie, que ce soit en sortie du confinement ou importées de zones endémiques. On peut donc supposer qu’on est dans le cas d’une brêche qu’il faut colmater, et donc par exemple que des moyens peu déployés lors de la phase d’observation, comme les tests sur personnes non symptomatiques, peuvent être subitement largement mobilisés dès la première contamination diagnostiquée. L’un des buts du confinement (avec la diminution de la pression sur les services hospitaliers) est d’ailleurs de se ramener à une situation où le colmatage des brêches redevient gérable, au lieu d’une situation où il y a plus de brêches que de doigts disponibles pour les boucher.

Mais comme on l’a vu dans la recherche du patient zéro dans l’Oise (où il parait qu’on cherche encore; plus de deux mois après, l’utilité me semble limitée), les méthodes actuelles semblent défaillantes dans le cas du Covid-19. Et c’est ce que confirment des recherches venant de l’université d’Oxford, qui suggèrent que les caractéristiques épidémiques supposées du Covid-19 rendent futile l’espoir d’un suivi par les méthodes traditionnelles, confirmant le besoin d’un suivi automatisé dans le cadre d’un dispositif plus général.

Cette précision n’est pas inutile, car aucune application ne sera une panacée, mais plutôt le maillon d’une chaine dont tous les éléments doivent être présents. A commencer par des test massifs (ce que la France semble pouvoir fournir en ce début mai) mais aussi à retour rapide: si le résultat du prélèvement ne revient que trois jours après, qu’on prévient les contacts, et que les tests de ceux-ci ne parviennent que trois jours après cela, il est évident que même si l’application permet de déterminer instantanément les contacts, on se ferait distancer par la maladie. Il faut donc éviter à tout prix la constitution de listes d’attente pour les tests (la PCR elle-même ne prenant que quelques heures), et faire en sorte que les prélèvements puissent être rapidement apportés aux lieux de PCR. Et une application dispense moins que jamais la constitution d’équipes d’enquêteurs, ne serait-ce que pour tracer les contacts non couverts, lorsque l’un des deux ne dispose pas d’un téléphone compatible, ou pas de téléphone du tout, par exemple. C’est d’ailleurs pour tout cela qu’il n’est pas absurde de représenter la capacité de suivi département par département, plutôt que région par région.

Comment: la couche physique

Cela étant dit, on peut maintenant envisager les différentes manières d’enregistrer automatiquement les contacts. Ecartons d’emblée la géolocalisation: c’est à la fois pas assez précis pour les besoins: le GPS n’étant pas disponible partout ou instantanément, les smartphones se basent souvent sur la localisation par émetteur cellulaire ou Wifi, et trop invasif, pour des raisons évidentes. Le Bluetooth, lui, est conçu pour permettre des communications directes entre des appareils numériques, y compris des périphériques, et sa variante basse energie a même été conçue pour supporter des applications très contraintes comme être intégrée dans une étiquette de porte-clés, permettant ainsi d’en déterminer la proximité. Cette notion de proximité est essentielle: pour la recherche de contagions potentielles, on cherche moins à déterminer la distance exacte qu’à réduire le dépistage à une liste de contacts potentiels, quitte à ce qu’il y en ait un peu trop, plutôt que de devoir dépister tout le monde à 100km à la ronde en case de contamination confirmée.

Comment: le traitement des données

OK, donc un contact se déterminera rétrospectivement par le fait que les téléphones portables des intéressés ont été en mesure d’échanger (ou « d’échanger suffisamment fort » par mesure d’intensité du signal entre autres) des données par Bluetooth. Mais comment conserver ces données (et lesquelles), et où faire ce calcul le moment venu?

N’importe quel téléphone portable diffuse déjà des messages Bluetooth à qui veut les entendre pour des raisons diverses (c’est ce qui leur permet d’apparaitre dans la liste des périphériques joignables sur l’écran de sélection d’un ordinateur de bureau, par exemple). Donc une première idée est de mettre en place l’écoute des messages Bluetooth diffusés et stocker localement l’identificateur Bluetooth de l’émetteur. Mais cela montre assez rapidement ses limites: précisément pour des raisons de protection de la vie privée cet identificateur change régulièrement dans les mises en place des versions récentes de la norme, de sorte qu’étant oublié par son propre émetteur, il devient inutile; de plus, beaucoup de téléphones vont limiter si ce n’est arrêter les émissions lorsqu’ils ne sont pas activement utilisés afin d’économiser la batterie, qui n’a jamais cessé d’être une contrainte importante sur la conception des téléphones mobiles.

Donc, il semble indispensable de mettre en place une modification de comportement des deux côtés du contact, ce qui change les données du problème: il faut maintenant que les deux côtés soient modifiés pour que ne serait-ce qu’un côté bénéficie du système. En conséquence, c’est comme pour les masques: si son contaminateur n’avait pas l’application installée, son contaminé ne tirera aucun bénéfice d’avoir installé l’application, il importe donc grandement qu’une densité suffisante de participants soit atteinte. Cela conduirait à envisager la fourniture de montres connectées (qu’il est plus difficile d’oublier chez soi) à ceux qui, comme c’est leur droit, ne disposent pas d’un téléphone portable compatible.

Maintenant que nous avons la liberté de concevoir le logiciel des deux côtés de l’interaction, les possibilité de conception augmentent grandement, trop pour revenir dessus. Cependant, un choix significatif est celui du traitement permettant de déterminer si un contact avec un individu nouvellement diagnostiqué a eu lieu, traitement qui doit donc disposer des données nécessaires à cette détermination: doit-il se faire sur le téléphone (n’importe lequel: les deux étant désormais « hermaphrodites », ce que l’un peut déterminer, symmétriquement l’autre le pourra aussi), ou sur un serveur central?

Dans le deuxième cas, le but étant de prévenir le porteur du téléphone en contact, cela emporte a priori que le serveur dispose du moyen de contacter n’importe quel utilisateur du système (directement ou indirectement: si chaque utilisateur doit, à intervalle régulier, interroger le serveur avec « je suis l’utilisateur avec le pseudonyme x, y a-t’il du nouveau pour moi? Envoyez la réponse à l’addresse IP », c’est équivalent à pouvoir être retrouvé).

Dans le premier cas, par contre, il est impossible d’empêcher l’utilisateur de déterminer à quelle occasion il a été en contact avec une personne nouvellement diagnostiquée (à défaut d’être informé directement de l’identité de celle-ci): quand bien même le traitement de détermination serait une fonction cryptographique difficile à inverser, il lui suffirait de la faire tourner sur des sous-ensembles de sa base de contacts, par dichotomie si besoin, jusqu’à trouver l’évènement unique suffisant pour que le traitement sorte qu’il y a contamination potentielle.

Le système d’Apple et Google

Cela étant établi, examinons maintenant les choix effectués par Apple et Google, ainsi que l’API fournie, par dessus laquelle une application pourra être réalisée.

Pour commencer, « Exposure Notification » est un service du point de vue de Bluetooth Low Energy, c’est-à-dire un protocole utilisant BLE pour la transmission radio des données, comme HTTP est un protocole utilisant TCP pour effectuer la transmission des données sur (généralement) Internet. Il est documenté ici à l’heure où j’écris ces lignes; en tant quel tel, il a été enregistré auprès du consortium gérant Bluetooth, avec son propre identifiant. Le format tel que transmis est simple: outre la trame Bluetooth obligatoire, c’est surtout l’identifiant de proximité en vigueur, mais celui-ci est accompagné par un champ de métadonnées, dont la seule utilité actuellement (outre versionner le protocole, ce qui permet de confirmer que les implémentations sont compatibles) est d’affiner le calcul de perte d’intensité du signal.

Comme son nom l’indique, l’identifiant de proximité en vigueur change à intervalles plus ou moins réguliers: plus ou moins car si le changement était trop régulier, cela faciliterait le pistage et rendrait ces changements inutiles: le changement s’effectue au plus toutes les 10 minutes, et au moins toutes les 20 minutes. Tout cela est décrit plus précisément dans la documentation cryptographique, qui décrit la génération des données envoyées par le protocole ci-dessus, comment sont stockées les données reçues et envoyées, et surtout comment déterminer qu’il y a eu contamination potentielle.

La propriété la plus importante du système « Exposure Notification » est que cette détermination est effectuée localement. Le système suppose la présence d’au moins un serveur, mais celui-ci ne fait que diffuser des données non-nominatives collectées (volontairement) auprès des individus diagnostiqués afin de permettre cette détermination par le récepteur: rien n’est remonté pour les utilisateurs tant qu’ils ne sont pas diagnostiqués positivement. Même les données envoyées révèlent peu, dans la mesure où elles reviennent à révéler les données générées aléatoirement qui ont servi pour l’envoi des identifiants de proximité en vigueur, sans qu’aucune autre donnée personnelle, notamment d’identification, ne soit impliquée.

Le système revendique de manière crédible d’autres propriétés: il semble par exemple impossible de collecter des information émises par d’autres, pour ensuite faire diffuser que ces autres ont été contaminants en faisant passer ces informations pour siennes (il n’y a pas que les innocents qui seront diagnostiqués positifs au Covid-19, il faut supposer que des adversaires le seront aussi, et en conséquence minimiser leur pouvoir de nuisance).

Cela étant, le système ne garantit pas en soi que seuls les individus diagnostiqués positivement pourront envoyer leurs informations: je vois difficilement comment il pourrait contrôler quoi que ce soit dans ce domaine, il repose donc sur les autorités sanitaires pour effectuer ce filtrage.

Cela est manifeste dans le troisième et dernier document, qui décrit l’API qu’une application (ici pour appareil Apple sous iOS) devra utiliser pour servir d’interface au service. L’API gère en coulisses tout ce qui est protocole Bluetooth et stockage des données, mais ne s’occupe d’aucune interaction réseau, l’application devant effectuer ces tâches elle-même: une partie des fonctions de l’API consiste à se faire fournir explicitement ces données, ou fournir les données devant être envoyées; c’est plus explicite dans le documentation d’API pour appareil Google sous OS Android, qui décrit par ailleurs la même API, langage Java mis à part, Android oblige.

Mis à part cela, l’API en général ne présente pas de surprise: c’est de l’Objective-C moderne, avec des blocs utilisés pour que l’application puisse se faire rappeler une fois le traitement des données complété par exemple. Elle semble bien couvrir les modes d’utilisation des futures applications: une classe ENManager pour les interactions avec le système en général comme afficher le statut (autorisé ou pas, principalement) et récupérer les données à envoyer au serveur en cas de diagnostic, et une classe ENExposureDetectionSession à utiliser au moment de vérifier si la dernière fournée d’informations de diagnostic ne déclencherait pas une exposition lorsqu’on les combine avec les données enregistrées en interne. La seule surprise étant l’Objective-C: on se serait plutôt attendu à du Swift vu comme le langage est à la mode chez Apple, mais cela ne change pas la fonctionnalité de l’interface, il est même probable qu’elle puisse être utilisée directement depuis Swift.

L’API révèle une dernière fonctionnalité intéressante, qui est la possibilité de déclarer son risque au système en cas de suspicion de contamination, avant même le diagnostic formel, afin d’avertir en cascade les contacts; avec une intensité moindre, bien entendu. Cela devra tout de même passer par les autorités sanitaires, reste à voir ce qu’elles vont en faire.

Le système d’Apple et Google: les non-dits

Comme on le voit, si la documentation du système mentionne le rôle du serveur, elle reste muette sur combien existent, comment il sera géré ou ils seront gérés, et par qui. Multiplier les serveur serait dommageable dans la mesure où cela obligerait à installer autant d’applications qu’il y aurait de serveurs en vigueur pour bénéficier d’une couverture correcte (tant pour pouvoir récupérer les informations diffusées par chacun de ces serveurs, que pour pouvoir envoyer à chacun de ces serveurs le cas échéant); cela pourrait être acceptable (mais on pourrait s’en dispenser) dans des cas particuliers, comme les travailleurs transfrontaliers, c’est par contre complètement inacceptable pour l’utilisateur moyen.

Les deux compagnies ont indiqué l’intention de sévèrement restreindre l’accès à ces API: je cite leur document commun regroupant les réponses au questions fréquemment posées:

10. Comment les applications obtiendront-elles l’autorisation d’accéder à ce système?

Les applications recevront cette autorisation selon un ensemble déterminé de critères choisis pour s’assurer qu’elles ne seront administrées qu’en relation avec les autorités de santé publiques, atteignent nos exigences de protection de la vie privée, et protègent les données utilisateur.

Ces critères sont spécifiés séparément dans des accords que les développeurs ont à accepter afin d’utiliser les API, et sont organisés autour des principes de fonctionnalité et de protection de la vie privée. Il existera des restrictions sur les données que les applications seront en mesure de collecter lorsqu’elles utiliseront l’API, y compris l’impossibilité d’accéder aux services de localisation, ainsi que des limitations sur l’usage qui pourra être fait des données.

Apple en particulier dispose de mesures de contraintes pour que cela soit appliqué: non seulement une application en général ne peut être distribuée (au-delà d’une poignée d’exemplaires) sans leur aval, mais il est certain qu’une permission pour accéder au service depuis le sandbox (« bas à sable » qui héberge les applications dans iOS, restreignant leur accès au système d’exploitation aux services nécessaires hors passe-droit, reconnu uniquement si signé par Apple) sera nécessaire, avec fort peu d’entités pouvant éventuellement y prétendre (étatiques, principalement); désolé pour ceux qui voudraient jouer avec: com.apple.developer.exposure-notification sera la permission distribuée de manière la plus restrictive de toutes celles disponibles pour l’iPhone… Il y a fort à parier qu’Apple n’hésitera aucunement à invalider une permission qui aurait fuité ou aurait été abusée.

Vu la position d’arbitre d’au moins Apple, l’absence de régle ou même de recommandation sur la multiplication des serveurs m’interroge, donc. Je peux concevoir que ni Apple ni Google ne souhaitent s’exposer plus encore à des accusations de se placer au-dessus des états, mais il serait souhaitable d’avoir confirmation que pas plus d’un serveur ne serait autorisé par pays (défini comme une entité disposant de représentations diplomatiques ou assimilées), sans que cela obère la capacité de chaque pays a avoir plusieurs applications selon les besoins, tant que celles-ci se basent sur le serveur national. J’espère bien sûr que l’UE mettra en place un serveur commun que tous les pays adopteront, et idéalement il en faudrait un seul par continent, mais il semble difficile de désigner pour chaque continent l’entité politique qui pourrait prendre cela en charge (quant à avoir un serveur mondial, si cela était viable tant politiquement que techniquement, Apple et Google s’en seraient déjà chargés).

De plus, la documentation mentionne une deuxième phase où le système pourra être activé « de base » par le système d’exploitation sans aucune application installée, avec suggestion d’installer une des application si le système détermine un contact potentiellement contaminant; or si le système « de base » peut déterminer cela, ça implique la récupération de données depuis un serveur sans qu’une appli n’ait été installée, alors lequel est utilisé?

Et sinon, aucune précision sur le rôle des autorités de santé pour s’assurer de la fiabilité des informations diffusées par leur serveur. Je pense à un incident qui m’a été rapporté lorsque je travaillais dans les télécoms: les réseaux de communications sans fil pour policiers et pompiers ont une fonctionnalité « appel de détresse » que ces clients prennent très au sérieux, on peut comprendre pourquoi. En pratique, c’est un appel vocal, mais avec un drapeau qui déclenche certains traitements spécifiques (de priorité notamment) et soulève des alarmes un peu partout. Et lors d’une première interconnexion avec le système du district venant d’un concurrent, tout ne s’est pas passé comme prévu. En effet, même si la norme d’interconnexion précisait comment faire passer l’information qu’un appel était de détresse, elle laissait beaucoup de liberté sur comment prendre en compte cette information; et il s’avère qu’au moins un des deux systèmes considérait le statut détresse comme sufisamment important pour qu’il doive être mémorisé pour le canal en question, jusqu’à ce que ce statut soit explicitement levé par un superviseur. Ce qui n’est pas absurde en soi, mais si le canal appartient à l’autre système en interconnexion, celui-ci n’ayant pas la même configuration n’envoyait jamais une telle confirmation, de sorte que le canal était dorénavant considéré comme étant perpétuellement en détresse, même pour les appels ordinaires. Assez rapidement, tous les canaux dans ce cas ont fini par passer en détresse sans pouvoir les faire redescendre, et lorsque tous les appels sont de détresse, eh bien plus aucun ne l’est. Il a fallu envoyer une correction en urgence.

Il semble donc risqué d’investir des ressources (d’ingénieurie, validation, etc.) dans la mise en place d’un système sans écarter le risque que cela ne soit pour du beurre s’il finit par donner trop de faux positifs pour rester utilisable. Des règles pour assurer la fiabilité des informations diffusées par le ou les serveurs me semblent indispensables.

Enfin, pour rester dans la question géographique, un risque (soulevé par Moxie Marlinspike) est que la base de données à télécharger pourrait devenir trop grosse (en particulier en cas de reprise de l’épidémie) pour être pratique à télécharger, de sorte qu’il faille la partitionner selon la localisation… et réintroduire de la géolocalisation pour ce faire, ruinant une partie des garanties offertes. Comme pour la multiplication des serveurs, je pense que c’est quelque chose au sujet duquel Apple et Google doivent déclarer leurs intentions.

La confrontation

StopCovid, comme de nombreux autres projets, a été démarré avant la publication de l’initiative d’Apple et Google; le projet est parti sur des principes différents, ce que je respecte en soi; le protocole, appelé ROBERT, est documenté. Le choix a notamment été fait d’une architecture où la détermination de contamination est centralisée, avec les avantages et inconvénients que cela comporte, je ne reviendrai pas dessus.

Déjà, comme pour la question de la multiplication des serveurs, on peut se poser la question de la multiplication des protocoles: faudra-t’il que tout un chacun s’équipe d’une application pour chaque protocole? Mais ce n’est pas là où le bat blesse dans l’immédiat: le protocole ROBERT utilisant, comme attendu, Bluetooth, son implémentation sur iPhone rencontre des restrictions établies de longue date par Apple sur les possibilités d’utilisation du Bluetooth par une application non-active; ces restrictions sont documentées par Apple, même si de manière peu précise; je ne doute pas que la projet ait pu les mesurer par lui-même. Elles ont pour objet la préservation de la batterie et la protection de la vie privée. Elles réduisent drastiquement la viabilité de la solution.

Technologiquement, il était important d’essayer, ne serait-ce que pour être crédible et ne pas dépendre d’un partenaire qui aurait été choisi faute d’alternative. Le projet StopCovid, notamment par les expérimentations de mesure (approximative) de distance par Bluetooth, a déjà accompli des avancées qui serviront plus généralement, et en tant que projet il pourrait continuer dans le cadre d’un protocole plus général, l’adoption d’un autre protocole ne signifiant pas la fin du projet.

Parce que soyons clairs: les médias parlent de systèmes qui pourraient être « rendus compatibles », mais la seule manière que je connaisse de faire communiquer deux systèmes, c’est qu’ils implémentent le même protocole. Il peut s’agir de deux implémentations différentes, mais il doit s’agir du même protocole.

Car le choix que vous avez fait, devant l’existence de ces restrictions qui ne sont pas nouvelles, est celui de la confrontation avec Apple: vous vous étonnez qu’Apple ne lève pas ces restrictions à la demande, et vous insistez pour que le projet puisse arriver en production selon ses principes actuels, invoquant une question de souveraineté technologique.

Une telle confrontation est futile et même nuisible, et ce pour au moins deux raisons.

Futile dans l’immédiat

Au-delà de ces restrictions, Apple a plus généralement établi le pricipe qu’ils contrôlent les logiciels pouvant être distribués sur les appareils mobiles de leur marque. Au fil des années ce contrôle m’a de moins en moins plu, de sorte que je cherche des alternatives, notamment par le web, pour distribuer mon travail personnel. Cela devient viable pour certaines applications, comme le traitement de fichiers, mais dans le cas du Bluetooth il n’y a pas d’alternative au SDK « natif ».

Donc même si j’objecte moi-même à certaines des politiques d’Apple, croyez-vous sérieusement qu’après qu’ils aient passé 12 ans à avoir dicté leurs conditions comme ils l’entendaient pour les logiciels iPhone, sans exception, à part celles dont ils ont eux-même défini les termes (il en existe quelques autres, moins documentées), vous alliez pouvoir les convaincre ou leur forcer la main si rapidement? C’est illusoire.

C’est d’autant plus illusoire que dans d’autres situations où ils étaient autrement plus exposés (je pense notamment à une requète de déchiffrage venant du FBI), Apple a résisté aux pressions visant à les faire céder, et ils en étaient fiers et le sont toujours. C’est entre autres une question de fierté profesionnelle, tant pour ce qui est de la préservation de la batterie que la protction de la vie privée. Croyez-vous que des mouvements de menton vont suffire à leur faire changer d’avis?

Si encore vous vous prévaliez d’arguments, comme les avantages potentiels d’une solution centralisée comme ROBERT sur la protection de la vie privée face à leur solution, c’est quelque chose à laquelle ils pourraient être sensibles. Mais non, il n’est question que de dénoncer leur santé financière (insolente certes, mais quel est le rapport?) ou de souveraineté technologique.

Vous pourriez obtenir gain de cause par l’une ou l’autre contrainte légale, mais il est certain que cela prendra du temps, beaucoup de temps. Défendre la souveraineté technologique de la France pourrait donc avoir du sens… dans d’autres circonstances. Car je ne sais pas si vous avez remarqué, mais il y a une pandémie qui touche actuellement la France. Et la France ne pourra s’en débarasser à terme qu’en ayant l’immunité collective; et je ne sais pas vous, mais je préfèrerais l’acquérir moi-même par un vaccin éventuel, ou le plus tard possible le cas échéant. Or le temps que vous contraigniez Apple par la voie légale, je pense que le vaccin aura été trouvé.

Votre insistance sur la souveraineté technologique m’indique donc que faire valoir celle-ci dès maintenant est pour vous un objectif plus important que d’avoir une solution efficace de suivi des contacts, susceptible de sauver des vies. Ces priorités sont dans le désordre. La souveraineté technologique est importante, mais il sera toujours temps de la faire valoir plus tard, ou autrement.

Il est peut-être injuste que Apple dicte ainsi ses termes. Il devrait peut-être en être autrement. Mais en attendant, là, ici, et maintenant, ils ont les clés du système.

Futile à terme

Supposons maintenant qu’Apple vous donne les clés. Vos ennuis ne sont pas finis. Car les raisons pour lesquelles ils avaient posés ces restrictions à la base n’ont pas disparu, notamment la préservation de la batterie.

Donc, qu’est-ce fera que votre solution ne va pas drainer trop rapidement la batterie? En particulier, il vous sera difficle de trouver des compétences sur le marché des développeurs pour ce qui est d’utiliser raisonnablement le Bluetooth une fois ces restrictions levées: même ceux qui le font déjà sur Android pourront avoir des surprises une fois sur iPhone.

C’est particulièrement important parce que des iPhones restent en activité beaucoup plus lontemps que les appareils Android, avec souvent donc une batterie dégradée. Je n’ai moi-même toujours pas remplacé mon iPhone 5S d’il y a plus de 6 ans qui fonctionne très bien, sauf que son autonomie n’est plus ce qu’elle était, et je pense ne pas être le seul. La question n’est pas juste que je devrai faire plus attention à ma batterie une fois StopCovid installé; la question est comment le public va réagir à une application qui, après installation, va drainer plus rapidement sa batterie, pouvant conduire à son épuisement s’il n’y prend pas garde? Une partie au moins va désactiver l’application. Et on n’avait pas dit qu’une pénétration suffisante était importante pour que le système fonctionne?

Encore une fois, la situation existante compte, et la situation existante est que Apple continue à entretenir de nombreux appareils dans la nature que d’autres considèreraient obsolètes (mais qui sont équipés du Bluetooth Low Energy), et tout changement dans cet équilibre se remarquera. Voulez-vous par exemple vous exposer à des accusations de vouloir encourager le renouvellement prématuré, lorsqu’on s’apercevra du drain de la batterie induit par StopCovid, susceptible de contraindre à un renouvellement d’un matériel auparavant plus fonctionnel?

Vous pourrez me dire qu’Apple eux-mêmes risque de susciter aussi un drain augmenté de la batterie avec l’implémentation de leur système. C’est possible, mais je leur fais plus confiance dans ce domaine qu’aux développeurs de StopCovid, sans vouloir les offenser.


Je refuse la fausse alternative proposée par certains commentateurs, qui voudraient réduire la choix à l’entité à laquelle on accepte de se confier. Avec le bon protocole, la bonne architecture, on peut éviter de devoir faire confiance à quelqu’un plus qu’on le voudrait et concilier des exigence apparemment opposées.

L’Allemagne, après avoir commencé de son côté, a annoncé rejoindre le système de Apple et Google, mais cela ne l’empêchera pas de proposer son application sur la base de ce système. Qu’est-ce qui empêche la France de faire de même? Il est de l’intérêt général d’éviter la balkanisation des solutions, et la France n’est pas ici en position de force.

Pope Cancels April The First

In an unexpected decree entitled Inter Gravissimas, Pope Francis enacted a calendar reform containing but a single change: the skipping of April 1st, 2020. March 31st 2020 is to be followed directly by April 2nd, 2020; however, weekdays are to follow their usual succession, as a result April 2nd 2020, following Tuesday March 31st, is to be a Wednesday, contrary to expectations (Editor’s note: this blog publishing software could not be corrected in time for publication, so it is erroneously going to show April 1st at first. We apologize for the confusion).

“It has come to our attention that the celebration of Easter is drifting out of sync with the cycle of seasons”, wrote Pope Francis as a way of exposing the motives for the change. “To avoid the error becoming any greater, we are correcting the calendar in this way while there is still time to do so before Easter 2020.” This decision has baffled all observers: while the current calendar does have some systemic error, it was not expected to have accumulated to a full day, and the emergency is all the more questionable given that Easter can fall on any of about 36 days in the calendar, making a shift of a few hours compared to the cycle of seasons hardly visible.

Keen observers are remarking this may be an attempt by Vatican City to get rid of a celebration they don’t otherwise control. While the papal authorities obviously cannot repeat this every year or otherwise eliminate the day from the calendar going forward, it is rumored they hope the cancellation of this year’s celebrations will disrupt observance of the event and dissuade attempts to reintroduce it.

“This calendaring decision is entirely within the Pope’s prerogatives”, a Vatican City spokesperson was quoted as saying. That may be true inside the Vatican and the Catholic church in general, where the Pope concentrates both legislative and executive powers in his hands. “And who do you think is in charge of the World’s calendars? Julius Caesar? Get real.”, the spokesperson continued.

Meanwhile, all the major software vendors appeared to be scrambling to develop an emergency update to take this change into account, as evidenced by the banners appearing on their web sites apologizing for the erroneous dates being currently shown in their interfaces. However, none of Microsoft, Google, Apple, YouTube, Amazon, Blizzard, Stack Overflow, or Facebook were available for comment, and they have seemingly scrapped all their other plans for the day.

What are the very long-term solutions to Meltdown and Spectre going to look like?

While the long-term solutions to Meltdown and Spectre we saw earlier can and probably will be made to work on mainstream hardware, their inefficiencies call into question the fundamental operation of our processors: rather than try and contain speculation, why not avoid the need for speculation in the first place, for instance? Or, on the contrary, embrace it? But revisiting the design space of processors requires taking a long look at why they work that way in the first place, and whether these assumptions are still suitable.

Inspired by the fact many solutions to Spectre rely on a model of the speculative execution behavior of the processor, e.g. “Modern CPUs do not speculate on bit masking” (which is fine for short-term mitigations: we can’t really throw away every single computer and computing device to start over from scratch and first principles at this point), some suggest we should officially take the speculative behavior of the processor into account when developing/compiling software, typically by having the processor expose it as part of its model. This is completely untenable long-term: as processors evolve and keep relying on speculation to improve performance, they will end up being constrained by the model of speculative execution they initially exposed and that existing programs started relying against, i.e. future processors will have to refrain from speculating even an iota more than they currently do (but still speculate, such that we will keep having to maintain the mitigations). Unless they do, in which case we will have to reanalyze all code and update/add mitigations every time a new processor that improves speculation comes out (not to mention the inherent window of vulnerability). Possibly both. This is untenable, as the story of delay slots taught us.

In case you’re new to the concept of delay slots, they were a technique introduced by Berkeley RISC to allow pipelining instructions while avoiding bubbles every time a branch would be encountered: the instruction(s) following the branch would be executed no matter what (slide 156 out of 378), and any jump would only occur after that. And some RISC architectures such as MIPS and Sparc used them, and it worked well. That is, until it was time to create the successor design, with a longer pipeline. You can’t put an additional delay slot, since that would break compatibility with all existing code, so you instead compensate in other ways (branch prediction, etc.), such that delay slots are not longer useful, but you still have to support the existing delay slots: doing otherwise would also break compatibility. In the end, every future version of your processor ends up having to simulate the original pipeline because you encoded that as part of the ISA. Oh, and when I said delay slots initially worked well, that was a lie, because if your processor takes an interrupt just after a branch and before a delay slot, how can the work be resumed? Returning from the interrupt cannot be simply jumping back to the address of the instruction after the branch, there is also the delayed branch state to be taken care of; solutions to these issues were found, obviously, but singularly complexify interrupt handling.

A separate insight of RISC in that area is that, if we could selectively inhibit which instructions would write to flags, then we could prepare flags well ahead of when they would be needed, allowing the processor to know ahead of time whether the branch would be taken or not, enough so that the next instruction could be fetched without any stall, removing the need for either delay slots or to try and predict the branch. That is often useful for “runtime configurable” code, and sometimes for more dynamic situations, however in many cases the compiler does not have many or any instruction to put between the test and the branch, so while it can be a useful tool (compared to delay slots, it does not have the exception-related issues, and the compiler can provision as many instructions in there as the program allows, rather than being limited by the architecturally-defined delay slot size, and conversely if it can’t, it does not have to waste memory filling the slots with nops), it also has many of the same issues as delay slots: as the pipelines get deeper there will be more and more cases where processors will have to resort to branch prediction and speculative executions to keep themselves fed when running existing code; using knowledge that is only available at runtime, as the processor has access to context the compiler does not have. Furthermore, the internal RISC engine of current x86 processors actually goes as far as fusing conditional branches together with the preceding test or comparison instruction, suggesting such a fused instruction is more amenable to dynamic scheduling. RISC-V has an interesting approach: it does away with flags entirely (not even with multiple flag registers as in PPC for instance (cr0 to cr7)), using instead such fused instructions… but it is still possible to put the result of a test well ahead of the branch that requires it, simply by setting a regular register to 0 or 1 depending on the test outcome, then having the fused branch’s test be a comparison of this register with 0, and presumably implementations will be able to take advantage of this.

Generally, there is an unavoidable tension due to the straightjacket of sequential instruction execution, straightjacket which is itself unavoidable due to the need of being able to suspend processing, then resume where it left off. How could we better express our computations in such a way that hardware can execute a lot of it at once, in parallel, while being formally laid out as a sequence of instructions? For instance, while it could be desirable to have vectors of arbitrary lengths rather than fixed-sized ones (as in MMX, Altivec, SSE, NEON, etc.), doing so raises important interruptibility challenges: the Seymour Cray designs either at CDC or Cray did not support interrupts or demand-paged virtual memory! If we give up on those, we give up on the basis of preemptive multitasking and memory protection, so we’d end up with a system that would be worse than MacOS 9, and while MacOS 9 went to heroic lengths in riding a cooperative multitasking, unprotected memory model (and even that OS supported interrupts), no one who has known MacOS 9 ever wants to go back to it: it is dead for a reason (from the story of Audion). Alternatively, we could imagine fundamentally different ways of providing the high-level features, but then you still have to solve the issue of “what if the software developer made a mistake and specified an excruciatingly long vector, such that it will take 10 seconds to complete processing?” So either way, there would need to be some way to pause vector processing to do something else, then resume where it left off, which imposes severe constraints on the instruction set: RISC-V is going to attempt that, but I do not know of anyone else.

One assumption we could challenge is whether we need to be writing system software in C and judge processors according to how well they execute that software (via). To my mind, Spectre and Meltdown are as much the result of having to support Unix or systems with Unix-ish semantic, or possibly even just fast interrupts, as they are the result of having to support C: flat memory, context switching/preemptive multitasking hacked on absolute minimum hardware support (OSes have to manually read, then replace, every bit of architectural state individually and manually!), which itself implies limited architectural state, which results in lots of implicit micro-architectural state to compensate, traps also hacked on absolute minimal hardware support, in particular no quick way to switch to a supervisor context to provide services short of a crude memory-range-based protection (and now we see how that turned out)¹, no collection types in syscall interface (ever looked at select() ?) thus forcing all interactions to be through the kernel reading a block of memory from the calling process address space even for non-scalar syscalls, mmap(), etc. However, especially in the domain of traps, it will be necessary to carefully follow the end to end principle: the history of computing is littered with hardware features that ended up being unused for fail of providing a good match with the high-level, end to end need. This is visible for instance in the x86 instruction set: neither the dedicated device I/O instructions (IN/OUT) nor the protection rings 1 and 2 (user-level code uses ring 3, and kernels use ring 0) are ever used in general purpose computing (except maybe on Windows for compatibility reasons).

But Spectre and Meltdown are also indeed the result of having to support C, in particular synchronization memory primitives are not very good matches for common higher level language operations, such as reference assignment (whether it is in a GC environment or an ARC environment). Unfortunately, a lot of research in that area ends up falling back to C…

Revisiting the assumption of C and Unix is no coincidence. While our computers are in many ways descendants of microcomputers, the general design common to current microprocessor ISAs is inherited from minicomputers, and most of it specifically from the PDP-11, where both C and Unix were developed; this includes memory mapped and interrupt-based device I/O rather than channel I/O, demand-paged virtual memory, the latter two of which imply the need to fault at any time, both byte and word addressing, etc. This in turn shapes higher level features and their implementation: preemptive multitasking, memory protection, IPC, etc. Alternatives such as Lisp machines, Intel 432, Itanium, or Sun Rock did not really pan out; but did these failures disprove their fundamental ideas? Hard to tell. And some choices from the minicomputer era, most of which were related to offloading responsibilities from hardware to software, ended up being useful and/or sticking for reasons sometimes different from the original ones, original ones which could most often be summarized as: we need to make do with the least possible amount of transistors/memory cells (parallels with evolutionary biology: an adaptation that was developed at one time may find itself repurposed for a completely different need). For instance, the PDP-11 supported immutable code (while some other archs would assume instructions could be explicitly or sometimes even implicitly modified, e.g. as late as 1982 the Burroughs B4900 stored the tracking for branch prediction directly in the branch instruction opcode itself, found out while researching the history of branch prediction…) which was immediately useful for having programs directly stored in ROM instead of having to be loaded from mass storage plus occupy precious RAM, was also useful because it enabled code sharing (the PDP-11 supported PIC in addition), but today is also indispensable for security purposes: it’s most definitely safer to have mutable data be marked as no-exec, and therefore have executable code be immutable. The same way, minicomputers eschewed channel I/O to avoid an additional processing unit dedicated to device I/O, thus saving costs and enabling them to fit in a refrigerator-sized enclosure rather than require a dedicated room with raised flooring, but nowadays having the CPU being able to interrupt its processing (and later be able to resume it) is mandatory for preemptive multitasking and real-time purposes such as low-latency audio (think Skype). As a result, it is not possible to decide we can do without feature X of processors just because its original purpose is no longer current: feature X may have been repurposed in the meantime. In particular, virtual memory faults are used everywhere to provide just-in-time services, saving resources even today in the process (physical pages not allocated until actually used, CoW memory, etc.). Careful analysis, and even trial and error (can we build a performant system on this idea?), are necessary. As a significant example, we must not neglect how moving responsibilities from hardware to software enabled rapid developer iteration of computer monitor functionality UX (job control, I/O, supervision, auditing, debugging, etc.). Moving back any of this responsibility to hardware would almost certainly cause the corresponding UX to regress, regardless of the efforts towards this end by the hardware: the lack of rapid iteration would mean any shortcoming would remain unfixable.

Now that I think about it, one avenue of exploration would be to try and build on a high-level memory model that reflects the performance characteristics of the modern memory hierarchy. Indeed, it is important to realize uniform, randomly addressable memory hasn’t been the “reality” for some time: for instance, row selection in JEDEC protocols (DDR, etc.) means it’s faster to read contiguous blocks than random ones, and besides that we have NUMA, and caches of course (need a refresher? Ulrich Drepper’s got you covered). That being said, the Von Neumann abstraction will remain true at some level. So the game here would be not so much to abandon it than to build a more structured abstraction above it that better matches the underlying performance characteristics.

As you can see, this is still very much an open research space. In the meantime, we will have to make do with what we have.

¹e.g. we could imagine a system similar to fasttraps (themselves typically used for floating-point assistance) where a new syscall instruction would take as arguments the address and number of bytes of memory to copy (even if more may end up being necessary, that would still allow processing to start), and automatically switch to the kernel address space, so that the processor would best know how to manage the caches (in relation to address translation caching for instance) instead of second-guessing what the program is attempting to do.

How to design an architectural processor feature, anyway?

Before I present what could be the very long term solutions to Meltdown and Spectre, I thought it would be interesting to look at a case study in how to (and how not to) implement processor features.

So, imagine you’re in charge of designing a potential replacement for the 6809, and you read this article, with the takeaway that, amazing as it is, that hack would quickly become insufficient given the increase in screen size and resolution (VGA is just around the corner, after all) that is going to outpace processor clock speed.

Of course, one of the first solutions for this issue would be to have better dedicated graphics capabilities, but your new processor may be used in computers where there is zero such hardware, or even if there is, there are always going to be use cases that fall through the cracks and are not well supported by such hardware, use cases where programmers are going to rely on your processor instead. In fact, you don’t want to get too tied up to that particular scenario, and instead think of it as being merely the exemplar of a general need: that of efficient, user-level memory copy of arbitrary length between arbitrary addresses that integrates well with other features, interrupts in particular. That being set up, let us look at prospective solutions.

Repeat next instruction(s) a fixed number of times

That one seems obvious: a new system that indicates the next instruction, or possibly the next few instructions, is to be executed a given number of times, not just once; the point being to pay the instruction overhead (decoding, in particular) only once, then having it perform its work N times at full efficiency. This isn’t a new idea, in fact, programmers for a very long time have been requesting the possibility for an instruction to “stick” so it could operate more than once. How long? Well, how about for as long as there have been programmers?

However, that is not going to work that simply, not with interrupts in play. Let us look at a fictional instruction sequence:

IP -> 0000 REP 4
      0001 MOV X++, Y++
      0002 RTS

SP -> 0100 XXXX (don’t care)
      0102 XXXX

But in the middle of the copy, an interrupt is received after the MOV instruction has executed two times, with two more executions remaining. Now does our state (as we enter the interrupt handler) look like this:

      0000 REP 4
      0001 MOV X++, Y++
      0002 RTS

      0100 0001 (saved IP)
SP -> 0102 XXXX

In which case, when we return from the interrupt the MOV will only be executed once more, making it executed only 3 times in total, rather than the expected 4, wreaking havoc in the program; so can we provide this instead:

      0000 REP 4
      0001 MOV X++, Y++
      0002 RTS

      0100 0000 (saved IP)
SP -> 0102 XXXX

Well, no, since then upon return from the interrupt execution will resume at the REP instruction… in which case the MOV instruction will be executed 4 times, even though it has already executed twice, meaning it will execute 2 extra times and 6 times in total.

It’s not possible to modify the REP instruction since your processor has to support executing code directly from ROM given the price of RAM (and making code be read-only is valuable for other reasons, such as being more secure or allowing it to be shared between different processes). How about resetting X and Y to their starting values and resume all iterations on exit? Except operation of the whole loop is not idempotent if the two buffers overlap, and there is no reason not to allow that (e.g. memmove allows it), so restarting the whole sequence is not going to be transparent. What about delaying interrupts until all iterations are completed? With four iterations that might be acceptable, but given your processor clock speed, as little as 16 iterations could raise important issues in the latency of interrupt handling, such that real-time deadlines would be missed and sound be choppy.

Whichever way we look at it, this is not going to work. What will?

Potential inspiration: the effect on the architectural state of the Thumb (T32) If-Then (IT) instruction

Conditional execution (or perhaps better said, predicated execution) is pervasive in ARM, and it is possible in Thumb too, but that latter case requires the If-Then instruction:

(any instruction that sets the EQ condition code)
IP -> 00000100 ITTE EQ
      00000102 ADD R0, R0, R1
      00000104 ST R5, [R3]
      00000106 XOR R0, R0, R0

SP -> 00000200 XXXXXXXX (don’t care)
      00000204 XXXXXXXX

And as if by magic, the ADD and ST instructions only execute if the EQ condition code is set, and XOR, corresponding to the E (for else) in the IT instruction, only executes if the EQ condition code is *not* set, as if you had written this:

(any instruction that sets the EQ condition code)
IP -> 00000100 ADD.EQ R0, R0, R1
      00000102 ST.EQ R5, [R3]
      00000104 XOR.NE R0, R0, R0

That might appear to raise interruptibility challenges as well: what happens if an interrupt has to be handled just after the ADD instruction, or when the ST instruction raises a page fault because the address at R3 must be paged back in? Because when execution resumes at ST, what is to stop XOR from being unconditionally executed?

The answer is ITSTATE, a 4-bit register that is part of the architectural state. What the IT instruction actually does is:

  • take its immediate bits (here, 110), and combine them using a negative-exclusive-or with the repeated condition code bit (we’re going to assume it is 111)
  • set ITSTATE to the result (here, 110), padding missing bits with ones (final result here being 1101)

And that’s it. What then happens is that nearly every T32 instruction (BKPT being a notable exception) starts operation by shifting out the first bit from ITSTATE (shifting in a 1 from the other side), and avoids performing any work if the shifted out bit was 0

This means you never need explicitly invoke ITSTATE, but it is very much there, and in particular it is saved upon interrupt entry, which ARM calls an exception, and restored upon exception return, such that predicated processing can resume as if control had never been hijacked: upon exception return to the ST instruction, ST will execute, then XOR will not since it will shift out a 0 from ITSTATE, the latter having been restored on exception return.

The lesson is: any extra behavior we want to endow the processor with needs to be expressible as state, so that taking an interrupt, saving the state, and later restoring the state and resuming from a given instruction results in the desired behavior being maintained despite the interrupt.

Repeat next instruction(s) a number of times given by state

Instead of having REP N, let us have a counter register C, and a REP instruction which repeats the next instruction the number of times indicated in the register (we need two instructions for this, as we’re going to see):

IP -> 0000 MOV 4, C
      0001 REP C
      0002 MOV X++, Y++
      0003 RTS

SP -> 0100 XXXX (don’t care)
      0102 XXXX

Now if an interrupt occurs after two iterations, the state is simply going to be:

      0000 MOV 4, C
      0001 REP C
      0002 MOV X++, Y++
      0003 RTS

      0100 0001 (saved IP)
SP -> 0102 XXXX

With C equal to 2. Notice the saved IP points to after the MOV to the counter register, but before the REP C, that way, when execution resumes on the REP instruction the memory-to-memory MOV instruction is executed twice and the end state will be the same as if all four iterations had occurred in sequence without any interrupt.

Applied in: the 8086 and all subsequent x86 processors, where REP is called the REP prefix and is hardwired to the CX register (later ECX), and you can use it for memory copy by prepending the MOVS instruction with it (instruction which is itself hardwired to SI (later ESI) for its source, and DI (later EDI) for its destination).

Load/store multiple

The REP C instruction/prefix system has a number of drawbacks, in particular in order to play well with interrupts as we just saw it requires recognition when handling the interrupt that we are in a special mode, followed by creating the conditions necessary for properly resuming execution. It also requires the elementary memory copy to be feasible as a single instruction, which is incompatible with RISC-style load-store architectures where an instruction can only load or store memory, not both.

We can observe that the REP C prefix, since it is only going to apply to a single instruction, will not serve many use cases anyway, so why not instead dedicate a handful of instructions to the memory copy use case and scale the PUL/PSH system with more registers?

That is the principle of the load and store multiple instructions. They take a list of registers on one side, and a register containing an address on the other, with additional adjustement modes (predecrement, postincrement, leave unchanged) so as to be less constrained as with the PUL/PSH case. Such a system requires increasing the number of registers in the architectural state so as to amortize the instruction decoding cost, increase which is going to add to context switching costs, but we were going to do that anyway with RISC.

So now our fictional instruction sequence can look like this:

IP -> 0000 LOADM X++, A-B-C-D
      0001 STOM A-B-C-D, Y++
      0002 RTS

SP -> 0100 XXXX (don’t care)
      0102 XXXX

We still have to promptly handle interrupts, but for the load/store multiple system the solution is simple, if radical: if an interrupt occurs while such an instruction is partially executed, execution is abandoned in the middle, and it will resume at the start of the instruction when interrupt processing is done. This is OK, since these loads and stores are idempotent: restarting them will not be impacted by any previous partial execution they left over (of course, a change to the register used for the address, if any such change is required, is done as the last step, so that no interrupt can cause the instruction to be restarted once this is done).

Well, this is only mostly OK. For instance, some of the loaded registers may have relationships with one another, such as the stack pointer (in the ABI if not architecturally), and naively loading such a register with a load multiple instruction may violate the constraint if the load multiple instruction is restarted. Similar potentially deadly subtleties may exist, such as in relation with virtual memory page faults where the operating system may have to emulate operation of the instruction… or may omit to do so, in which case load/store multiple instructions are not safe to use even if the processor supports them! I think it was the case for PowerPC ldm/stm in Mac OS X.

Sidebar: how do you, as a software developer, know whether it is safe to use the load and store multiple instructions of a processor if it has them? An important principle of assembly programming is that you can’t go wrong by imitating the system C compiler, so compile this (or a variant) at -Os, or whichever option optimizes for code size, to asm:

struct package
    size_t a, b, c;

void packcopy(struct package* src, struct package* dst)
    *dst = *src;

if this compiles to a load multiple followed by a store multiple, then those instructions are safe to use for your system.

Applied in: the 68000 (MOVEM), PowerPC, ARM (where their use is pervasive, at least pre-ARM64), etc.

decrement and branch if not zero

One observation you could make about the REP C system would be that it is best seen as implicitly branching back to the start of the instruction each time it is done executing, so why not put that as a plain old branch located after the instruction rather than as a prefix? Of course, that branch would handle counter management so that it could still function as a repetition system contained in a single instruction, but now repetition can be handled with more common test+branch mechanisms, simplifying processor implementation especially as it relates to interrupt management, and generalizes to loops involving more than one instruction, meaning there is no need to have the elementary copy be a single instruction:

IP -> 0000 MOV 4, C
      0001 LOAD X++, A
      0002 STO A, Y++
      0003 DBNZ C, loopstart;
      0004 RTS

From that framework, you can design the processor to try and recognize cases where the body of the loop is only 1 or 2 instructions long, and handle those specially by no longer fetching or decoding instructions while the loop is ongoing: it instead repeats operation of the looped instructions. In that case it still needs to handle exiting that mode in case of an interrupt, but at least it can decide by itself whether it can afford to enter that mode: for instance, depending on the type of looped instruction it could decide it would not be able to cleanly exit in case of interrupt and decide to execute the loop the traditional way instead.

The drawback is that it is a bit all-or-nothing: the loop is either executed fuly optimized or not at all, with the analysis becoming less and less trivial as we want to increase the number of looped instructions supported: regardless of the size of the loop, if there is a single instruction in the loop body, or a single instruction transition, where the engine would fail to set them up to loop in a way where it can handle interrupts, then the whole loop is executed slowly. That being said, it does handle our target use case as specified.

Applied in: the 68010 and later 68k variants such as CPU32-based microcontrollers, where the DBRA/DBcc instruction could trigger a fast loop feature where instructions fetches are halted and operation of the looped instruction is repeated according to the counter.

instruction caches, pipelining, and branch prediction

You could look at the complexity of implementing interrupt processing in any of these features and consider that you could almost as easily implement a proper pipeline, including handling interrupts while instructions are in flight, and end up supporting the use case, but also much more general speedups, just as efficiently. After all, the speed of memory copy is going to be constrained by the interface to the memory bus, your only contribution is to reduce as much as possible instruction fetching and decoding overhead, which is going to be accomplished if that happens in parallel with the memory copy of the previous instruction. Accomplishing that also requires a dedicated instruction cache so instruction can be fetched in parallel with data, but integrating a small amount of memory cells on your processor die is getting cheaper by the day. And keeping the pipeline fed when branches are involved, as here with loops, will require you to add non-trivial branch prediction, but you can at least get loops right with a simple “backwards branches are predicted to be taken” approach. And it turns out that simple branch predictors work well in real-life code for branches beyond loops, compensating the effects of pipelining elsewhere (and if you make the predictor just a little bit more complex you can predict even better, and then a little more complexity will improve performance still, etc.; there is always a performance gain to be had).

Applied in: all microprocessors used in general-purpose computing by now, starting in the beginning of the 90’s. For instance, x86 processors have an instruction implementing the decrement and branch if not zero functionality, but its use is now discouraged (see 3.5.1 Instruction Selection in the Intel optimization reference manual), and modern Intel processors recognise loops even when they use generic instructions and optimize for them, except for the loop exit which keeps being mispredicted.

With all that in mind, next time we’re going to be able to look at how to redesign our processors to avoid the situation that led us to rampant, insecure speculation in the first place.

What will the long-term solutions be to Meltdown and Spectre?

It’s hard to believe it has now been more than one year since the disclosure of Meltdown and Spectre. There was so much frenzy in the first days and weeks that it has perhaps obscured the fact any solutions we currently have are temporary, barely secure, spackle-everywhere stopgap mitigations, and now that the dust has settled on that, I thought I’d look at what researchers and other contributors have come up with in the last year to provide secure processors – without of course requiring all of us to rewrite all our software from scratch.


Do I need to remind you of Meltdown and Spectre? No, of course not; even if you’re reading this 20 years from now you will have no trouble finding good intro material to these. So as we discuss solutions, my only contribution would be this: it is important to realize designers were not lazy. For instance, they did not “forget” the caches as part of undoing speculative work in the processor, as you can’t “undo” the effect of speculation on the caches: for one, how would you reload the data that was evicted (necessary in order to be a real undo)? You can’t really have checkpoints in the cache that you roll back to, either: SafeSpec explores that, and besides still leaking state, more importantly it precludes any kind of multi-core or multi-socket configuration (SafeSpec is incompatible with cache coherency protocols), a non-starter in this day and age (undoing cache state is also problematic in multi-core setups, as the cache state resulting from speculative execution would be transitorily visible to other cores).

It is also important to realize preventing aliasing in branch prediction tracking slots would not fundamentally solve anything: even if this was done, attackers could still poison BHS and possibly BTB by coercing the kernel into taking (resp. not taking) the attacked branch, through the execution of ordinary syscalls, and then use speculative execution driven by that to leak data through the caches.

Besides information specific to Meltdown and Spectre, my recommended reading before we begin is Ulrich Drepper on the modern computer memory architecture, still current, and Dan Luu on branch prediction: this will tell you the myriad places where processors generate and store implicit information needed for modern performance.

The goal

As opposed to the current mitigations, we need systemic protection against whole classes of attacks, not just the current ones: it’s not just that hardware cannot be patched, but it also has dramatically longer design cycles which means protecting only against known risks at the start of a projet would make the protections obsolete by the time the hardware ships. And even if patching was a possibility, it’s not like the patch treadmill is desirable, anyway (in particular, adding fences, etc. around manually identified vulnerable sequences feels completely insane to me and amounts to a dangerous game of whack-a-vulnerability: vulnerable sequences will end up being added to kernel code faster than security-conscious people will figure them out). Take for instance, the Intel doc which described the Spectre (and Meltdown) vulnerability as being a variant of the “confused deputy”; this is its correct classification, but I feel this deputy is confused because he has been given responsibility of the effect of speculative executions of his code paths, a staggering responsibility he has never requested in the first place! No, we need to attack these kinds of vulnerabilities at the root, such that they cannot spawn new heads, and those two techniques do so.


First is DAWG. The fundamental idea is very intriguing: it is designed to close off any kind of cache side channel state¹, not merely tag state (that is, whether a value is present in the cache or not), and designed to close off data leaks regardless of which phenomenon would feed any such side channel: it is not limited to speculative execution. How do they ensure that? DAWG does so by having the OS dynamically partition all cache levels, and then assign the partitions, in a fashion similar to PCID.

This means that even with a single processor core, there are multiple caches at each level, one per trust domain, each separate from its siblings, and having a proportional fraction of the size and associativity of the corresponding physical cache of that level (cache line size and cache way size are unaffected). This piggybacks on recent evolutions (Intel CAT) to manage the cache as a resource to provision, but CAT is intended for QoS and offers limited security guarantees.

As long as data stays within its trust domain, that is all there is to it. When cross-partition data transfer is necessary, however, the kernel performs it by first setting up a mixed context where reads are to be considered as belonging to one domain, but writes to another, then performs the reads and writes themselves: it affords best possible cache usage during and after transfer.

Such an organization raises a number of sub-problems, but they seem to have done a good job of addressing those. For instance, since each cache level is effectively partitioned, the same cache line may be in multiple places in the same physical cache level, in different domains, which is not traditional and requires some care in the implementation. The kernel has access to separate controls for where eviction can happen, and where hits can happens, this is necessary for a transition period whenever resizing the partitions. DAWG integrates itself with cache coherency protocols, by having each cache partition behave mostly, but not exactly, like logically separate cache for cache coherency purposes: one particularly important limitation we will come back to is that DAWG cannot handle a trust domain attempting to load a line for writing when a different domain already owns that line for writing.

In terms of functional insertion, they have a clever design where they interpose in a limited number of cache operations so as not to insert themselves in the most latency-critical parts (tag detection, hit multiplexing, etc.). It requires some integration with the cache replacement algorithm, and they show how to do so with tree-PLRU (Pseudo Least Recently Used) and NRU (Not Recently Used).

In terms of features, DAWG allows sharing memory read-only, CoW (Copy on Write), and one-way shared memory where only one trust domain can have write access. DAWG only features a modest efficiency hit compared to the insecure baseline, though it depends on policy (CAT has similar policy-dependent behavior).

On the other hand, there are a few, though manageable, limitations.

  • DAWG disallows sharing physical memory between different trust domains where more than one domain has write access, due to impossibility to manage cache coherence when more than one domain wants to write to two cache lines corresponding to the same physical address. I feel this is manageable: such a technique is probably extremely hard to secure given the possibility of a side channel through cache coherency management state, as MeltdownPrime and SpectrePrime have demonstrated, so we would need an overview of the main uses of where such memory sharing happens; off the top of my head, the typical use is for the framebuffer used for IPC with WindowServer/X11, in which case the need in the first place is only for one-way transfer, the solution here would be to change permissions to restrict write rights to one side only.
  • DAWG provides no solution for transfer in/out of shared physical memory between different trust domains where neither is the kernel. But as we just saw, the allocation of such a thing need only be done by specific processes (perhaps those marked with a specific sandbox permission?), and transfer could be performed by the kernel on behalf of the allocating domain through a new syscall.
  • Hot data outside the kernel such as oft-called functions in shared libraries (think objc_msgSend()), while residing in a single place in physical memory, would end up being copied in every cache partition, thus reducing effective capacity of all physical caches (hot data from the kernel would only need to be present in the kernel partition, regardless of which process makes the syscall).
  • Efficient operation relies on the kernel managing the partitioning according to the needs of each trust domain, which is not trivial: partition ID management could be done in a fashion similar to PCID, however that still leaves the determination of partition size, keeping in mind that the cache at every level needs to be partitioned, including those shared between different cores which therefore have more clients and thus require more partitioning, additionally with limited granularity, granularity which depends on the level: a 16-way set associative cache may be partitioned in increments of 1/16th of its capacity, but a 4-way set associative cache only by fourths of its capacity. Easy.
  • DAWG guards between explicit trust domains, so it cannot protect against an attacker in the same process. This could be mitigated by everyone adopting the Chrome method: sorry Robert, but maybe “mixing code with different trust labels in the same address space” needs to become a black art.


The basic idea of InvisiSpec corresponds to the avenue I evoked back then, which is that speculative loads only bring data to the processor without affecting cache state (either bringing that data to a cache level where it wasn’t, or modifying cache replacement policy, or other metadata), with the cache being updated only when the load is validated.

Well, that’s it, god job everyone? Of course not, the devil is in the details, including some I never began to suspect: validation cannot happen just any random way. InvisiSpec details how this is done in practice, the main technique being special loads performed solely for validation purposes: once loaded, the processors only uses this data, if ever, to compare it against the speculatively loaded data kept in a speculation buffer, and if the values match, processing can proceed; and while you would think that could raise ABA issues, it is not the case, as we’re going to see.

Overall, InvisiSpec proposes a very interesting model of a modern processor: first, a speculation part that performs computations while “playing pretend”: it doesn’t matter at that point whether data is correct (of course, it needs to be correct most of the time to serve any purpose), then the reorder buffer part, which can be seen as the “real” processing that executes according to the sequential model of the processor, except it uses results already computed by the speculative engine, when they exist. In fact, if these results don’t exist (e.g. the data was invalidated), the reorder buffer has to have the speculative engine retry, and the reorder buffer waits for it to be done: it does not execute the calculations (ALU, etc.) inline. A third part makes sure to quickly (i.e. with low latency) feed the speculative engine with data that is right most of the time, and do so invisibly: loads performed by the speculative engine can fetch from the caches but do not populate any cache, and are instead stored in the speculation buffer in order to remember that any results were obtained from these inputs.

This model piggybacks on existing infrastructure of current out of order processors: the reorder buffer is already the part in charge of making sure instructions “retire”, i.e. commit their effect, in order; in particular, on x86 processors the reorder buffer is responsible for invalidating loads executed out of order, including instructions after those, when it detects cache coherence traffic that invalidates the corresponding cache line. Ever wondered how x86 processors could maintain a strongly ordered memory model while executing instructions out of order? Now you know.

InvisiSpec has to do much more, however, as it cannot rely on cache coherence traffic: since the initial load is invisible, by design, other caches are allowed to think they have exclusive access (Modified/Exclusive/Shared/Invalid, or MESI, model) and won’t externally signal any change. Therefore, if the memory ordering model stipulates that loads must appear to occur in order, then it is necessary for the reorder buffer to perform a full validation, i.e. not only must it perform a fresh, new, non-speculative load as if the load was executed for the first time (thus allowing the caches to be populated), but then it has to wait for it to complete and compare the loaded data with the speculatively loaded one; if they are equal, then the results precomputed by the speculative engine for the downstream computations are correct as well, and the reorder buffer can proceed with these instructions: it does not even matter if A compared equal to A but the memory cell held the value B in between, as the only thing that matters is whether the downstream computation is valid for value A, which is true if and only if the speculative engine was fed an equal value A when it executed.

This leads into a much more tractable model for security: as far as leaking state is concerned, security researchers only need to look at operation of the reorder buffer; on the other hand, performance engineers will mostly look at the upstream parts, to make sure speculation will be invalidated as rarely as possible, but still look at the reorder buffer to make sure validation latencies will be covered, as far as possible.

Notably, InvisiSpec protects against attackers living in the same address space or trust boundary, and since it is cache-agnostic, it does not restrict memory sharing in any way.

The following limitations can be noted in InvisiSpec:

  • InvisiSpec only protects against speculation-related attacks, not other kinds of attacks that also use the cache as a side channel. Additional techniques will be needed for those.
  • InvisiSpec adds a significant efficiency hit compared to insecure baseline, both in execution time (22% to 80% increase on SPEC benchmarks, lower is better) and cache traffic (34% to 60% increase on SPEC benchmarks, lower is better), the latter of which is one of the main drivers of power usage. That will need to be improved before people will switch to a secure processor, otherwise they will keep using “good enough” mitigations; more work is needed in that area. My analysis would be that most of that efficiency hit is related to the requirement to protect against an attacker in the same address space: any two pair of loads could be an attacker/victim pair! The result is that pipelining is mostly defeated to the extent it is used to protect against load latencies. I am skeptical with regard to their suggestion for the processor to disable interrupts after a load has committed, and until the next load gets to commit, so as to allow the latter to start validation early (disabling interrupts serves to remove the last potential source of events that could prevent the latter load from committing): this would add an important constraint to interrupt management, which furthermore is unlikely to compose well with similar constraints.

The future

This isn’t the last we will hear of work needed to secure processors post-Meltdown and Spectre; I am sure novel techniques will be proposed. At any rate, we in the computing industry as a whole need to start demanding of Intel and others what systemic protections they are putting in their processors, be they DAWG or InvisiSpec or something else, which will ensure whole classes of attacks become impossible.

  1. At least on the digital side: DAWG does not guard against power usage or electromagnetic radiation leaks, or rowhammer-like attacks.

Software Reenchantment

I’ve had Nikitonsky’s Software Disenchantment post in my mind ever since it was posted… which was four months ago. It’s fair to say I’m obsessed and need to get the following thoughts out of my system.

First because it resonated with me, of course. I recognize many of the situations he describes, and I share many of his concerns. There is no doubt that many evolutions in software development seem incongruous and at odds with recommendations for writing reliable software. The increasing complexity of the software stack, in particular, is undoubtedly a recipe for bugs to thrive, able to hide in that complexity.

Yet some of that complexification, even controversial, is nevertheless a progress. The example that comes to mind is Chrome and more specifically its architecture of running each tab (HTML rendering, JavaScript, etc.) on its own process for reliability and security, and the related decision to develop a high-performance JavaScript engine, V8, that dynamically compiles JavaScript to native code and runs that (if you need a refresher, Scott McCloud’s comic is still relevant). Yes, this makes Chrome a resource hog, and initially I was skeptical about the need: the JavaScript engine controls the generated code, so if it did its work correctly, it would properly enforce same-origin and other restrictions, without the need of the per-tab process architecture and its overhead of creation, memory occupation, etc. of many shell processes.

But later on I started seeing things differently. It is clear that browser developers have been for the last few years engaged in a competition for performance, features, etc., even if they don’t all favor the same benchmarks. In that fast-paced environment, it would be a hard dilemma between going for features and performance at the risk of bugs, especially security vulnerabilities, slipping through the cracks, and instead moving at a more careful pace, at the risk of being left behind by more innovative browsers and being marginalized; and even if your competitor’s vulnerabilities end up catching up with him in the long term, that still leaves enough time for your browser to be so marginalized that it cannot recover. We’re not far from a variant of the prisoner’s dilemma. Chrome resolved that dilemma by going for performance and features, and at the same time investing up front in an architecture that provides a safety net so that a single vulnerability doesn’t mean the attacker can escape the jail yet, and bugs of other kinds are mitigated. This frees the developers working on most of the browser code, in particular on the JavaScript engine, from excessively needing to worry about security and bugs, with the few people having the most expertise on that instead working on the sandbox architecture of the browser.

So that’s good for the browser ecosystem, but the benefits extend beyond that: indeed the oneupmanship from this competition will also democratize software development. Look, C/C++ is my whole carrier, I love system languages, there are many things you can do only using them even in the applicative space (e.g. real-time code such as for A/V applications), and I intend to work in system languages as long as I possibly can. But I realize these languages, C/C++ in particular, have an unforgiving macho “it’s your fault you failed, you should have been more careful” attitude that makes them unsuitable for most people. Chrome and the other high-performance browsers that the others have become since then vastly extend the opportunities of JavaScript development, with it starting now to be credible for many kinds of desktop-like applications. JavaScript has many faults, but it is also vastly more forgiving than C/C++, if only by virtue of it providing memory safety and garbage collection. And most web users can discover JavaScript by themselves with “view source”. Even if C/C++ isn’t the only game in town for application development (Java and C# are somewhat more approchable, for instance), this nevertheless removes quite a hurdle to starting application development, and this can only be a good thing.

And of course, the per-tab process architecture of Chrome really means it ends up piggybacking on the well-understood process separation mechanism of the OS, itself relying of the privilege separation feature of the processor, and after meltdown and spectre it would seem this bedrock is more fragile than we thought… but process separation still looks like a smart choice even in this context, as a long-term solution will be provided in newer processors for code running in different address spaces (at the cost of more address space separation, itself mitigated by features such as PCID), while running untrusted code in the same address space will have no such solution and is going to become more and more of a black art.

So I hope that, if you considered Chrome to be bloated, you realize now it’s not so clear-cut. So more complexity can be a good thing. On the other hand, I have an inkling that the piling on of dependencies in the npm world in general, and in web development specifically, is soon going to be unsustainable, but I’d love to be shown wrong. We need to take a long, hard look at that area in general.

So yes, it’s becoming harder to even tell if we software engineers can be proud of the current state of our discipline or not. So what can be done to make the situation clearer, and if necessary, improve it?

First, we need to treat software engineering (and processor engineering, as we’re at it) just like any other engineering discipline, by having software developers need to be licensed in order to work on actual software. Just like a public works engineer needs to be licensed before he can design bridges, a software engineer will need to be licensed before he can design software that handles personal data, with the requirement repeating down to the dependencies: for this purpose, only library software that has itself been developed by licensed software engineers could be used. We would need to grandfather in existing software, of course, but this is necessary as software mistakes are (generally) not directly lethal, but can be just as disruptive to society as a whole when personal data massively leaks. Making software development require a license would in particular provide some protection against pressure from the hierarchy and other stakeholders such as marketing, a necessary prerequisite enabling designers to say “No” to unethical requests.

Second, we need philosophers (either comings from our ranks or outsiders) taking a long hard look at the way we do things and trying to make sense of it, to even figure out the questions that need asking for instance, so that we can be better informed of where we as a discipline need to work on. I don’t know of anyone right now doing this very important job.

These, to me, are the first prerequisites to an eventual software reenchantment.

Factory Hiro on iPad is the real deal

I tested for you Factory Hiro, the latest attempt at remaking Factory: The Industrial Devolution on iPad (and iPhone, though I tested it on iPad). After the previous attempt towards that goal (if it could even be called an attempt), caution was certainly warranted, after all. So is it worth it?

The short answer: yes. Factory Hiro is the real deal, buy and download it without fear, it is a worthy remake of the original, finally available with a touch interface where it can shine.

The longer answer: compared to the original, graphics have obviously been remade, but everything in the gameplay will otherwise be familiar. Redirection boxes that you tap to toggle between vertical and horizontal direction, the trash destination and the recycle stream (and of course the default final destination: the delivery truck), assembling steps that you sometimes have to turn on or off, etc. It’s all there. In particular, the regulation of assembly line speed so you get your quota done in time for the end of the day, but not so fast that you end up being overwhelmed by the oncoming components to manage, is still the fundamental challenge of the game.

A nicety was added, though: when some trash gets generated (“Oh No!”), the speed will automatically switch to slowest for you, so it is much easier to manage these crises when they come. Other differences exist, but are less significant.

In non-gameplay aspects, the story was changed as well; these days of course it is told as cutscenes (created by KC Green) depicting the titular Hiro proving his hierarchy that yes, he can get the job done. And you, will you succeed?

Factory Hiro is available on the iOS App Store, as well as on the Google Play store for the Android version, and on PC and Mac through Steam (I got it for 3.49€ on the French iOS App Store; pricing will depend on your region); it was reviewed on an iPad Air 2 running iOS 11.4.1.

P.S.: In similar nostalgia-inspired discoveries, I should mention that I can’t believe I went such a long time without being made aware of Contraption Maker, for the creators of The Incredible Machine; this matters beyond nostalgia, as The Incredible Machine is one of the best pedagogical tools disguised as a game that I have ever come across, and Contraption Make is a worthy successor, improving it on many points such as the addition of rotation physics (want to do a cat flap? You can now.) or more digital logic elements for laser computing than you can shake a stick at.

And the last such discovery is for Two Point Hospital, in which you will find everything you liked from the original Theme Hospital; I haven’t played it yet, but if you’ve played the original and the trailer doesn’t convince you, I don’t know what will.

Copland 2020

Five years ago, I predicted that Apple would port iOS to the desktop, with a compatibility mode to run existing Mac OS X programs; we are now at the deadline I later put for this prediction, so did Apple announce such a thing at WWDC 2018?

Big No on WWDC 2018 Keynote stage, with Craig Federighi on the left

Now I think it is worth going into what did come to pass. Apple is definitely going to port UIKit, an important part of iOS, to the desktop for wide usage; and these last few years have seen the possibilities improve for distribution of iOS apps outside the iOS App Store or Apple review, though they remain limited.

But beyond that? I got it all wrong. The norms established by the current Mac application base will remain, with apps ported from the mobile world being only side additions, there will still be no encouragement for sandboxing except for the clout of the Mac App Store, pointing with pixel accuracy will remain the expectation, most of iOS will remain unported, etc. You are all free to point and laugh at me.

And I can’t help but think: what a missed opportunity.

For instance, in an attempt to revitalize the Mac App Store Apple announced expanded sandboxing entitlements, with developers on board pledging to put their apps on the store. Besides the fact some aspects of the store make me wary of it in general, I can’t help but note this sandboxing thing has been dragging on for years, such that it ought to have been handled like a transition in the first place; it might have been handled as such from a technical standpoint, but definitely not from a platform management standpoint (I mean, can you tell whether any given app you run is sandboxed? I can’t) even though that could be a differentiator for Apple in this era of generalized privacy violations. Oh, and that is assuming the announced apps will eventually manage to be sandboxed this time, and this is far from certain: I still remember back in 2012 the saga of apps that were worked on to be sandboxed, only for the developers to eventually have to give up…

I mean, Apple could have done it: the user base was in the right mindset (I did not see a lot of negative reactions when news of the unified desktop user experience got broken by the press a few months ago, which was in fact Marzipan, the initiative to run UIKit on the desktop), developers would obviously have been less enthusiastic but could be convinced with “Benefit from being one of the first native apps!” incentive, influential users could be convinced by selling it as a privacy improvement (remember: in Mac OS X unsandboxed apps can still read all the data of sandboxed apps), etc. But this year they explicitly prompted the question in order to answer it in the negative, meaning no such thing is going to happen in the next few years, and in fact on the contrary investing in alternative solutions like Marzipan. Sorry Apple, but I can’t help but think of Copland 2020.