Patents

Patents and their application to software have been in the news lately: Lodsys and other entities that seem to have been created from whole cloth for that sole purpose are suing various software companies for patent infringement, Android is attacked (directly or indirectly) by historical operating system actors Apple, Microsoft and Oracle (as owner of Sun) for infringing their patents, web video is standardized but the codec is left unspecified as the W3C will only standardize on freely-licensable technologies while any remotely modern video compression technique is patented (even the ostensibly patent-free WebM codec is heading towards having a patent pool formed around it).

Many in the software industry consider it obvious that not only a reform is needed, but that software patents should be banned entirely given their unacceptable effects; however I haven’t seen much of a justification of why they should be banned, as often the article/blog post/editorial defending this position considers it obvious. Well, it is certainly obvious for the author as a practitioner of software, and obvious to me as the same, but it’s not to others, and I wouldn’t want engineers of other trades to see software developers as prima donnas who think they should be exempted from the obligations related to patents for no reason other than the fact it inconveniences them. So here I am going to expose why I consider that software patents actually discourage innovation, and in fact discourage any activity, in the software industry.

Why the current situation is untenable

Let’s start by the basics. A patent is an invention that an inventor, in exchange for registering it in a public office (which includes a fee), is given exclusive rights to. Of course, he can share that right by licensing the patent to others, or he can sell the patent altogether. Anyone else using the invention (and that includes an end user) is said to be infringing the patent and is in the wrong, even if he came up with it independently. That seems quite outlandish, but it’s a tradeoff that we as a society have made: we penalize parallel inventors who are of good faith in order to better protect the original inventor (e.g. to avoid copyists getting away with their copying by pretending they were unaware of the original invention). Of course, if the parallel inventor is not found to have been aware of the original patent, he is less penalized than if he were, but he is penalized nonetheless. The aim is to give practitioners in a given domain an incentive to keep abreast of the state of the art in various ways, including by reading the patents published by the patent office in their domain. In fields where the conditions are right, I hear it works pretty well.

And it is here we see the first issue with software patents: the notorious incompetence of the USPTO (United States Patent and Trademark Office)1, which has been very lax and inconsistent when it comes to software patents, and has granted a number of dubious ones; and I hear it’s not much better in other countries where software patents are granted (European countries thankfully do not grant patents on software, for the most part). One of the criteria when deciding whether an invention can be patented is whether it is obvious to a practitioner aware of the state of the art, and for reasonably competent software developers the patents at the center of some lawsuits are downright obvious inventions. The result is that staying current with the software patents that are granted is such a waste of time that it would sink most software companies faster than any patent suit.

Now, it is entirely possible that the USPTO is overworked with a flood of patent claims which they’re doing their best to evaluate given their means, and the bogus patents that end up being granted are rare exceptions. I personally believe the ones we’ve seen so far are but the tip of the iceberg (most are probably resting along with more valid patents in the patent portfolios of big companies), but even if we accept they are an exception, it doesn’t matter because of a compounding issue with software patents: litigation is very expensive. To be more specific, the U.S. patent litigation system seems calibrated for traditional brick and mortar companies that produce physical goods at the industrial scale; calibrated in the sense of how much scrutiny is given to the patents and the potential infringement, the number of technicalities that have to be dealt with before the court gets to the core of the matter, how long the various stages of litigation last, etc. Remember that in the meantime, the lawyers and patent attorneys gotta get paid. What are expensive but sustainable litigation expenses for these companies simply put most software companies, which operate at a smaller scale, out of business.

Worse yet, even getting to the point where the patent and the infringement are looked at seriously is too expensive for most companies. As a result, attackers only need to have the beginning of a case to start threatening software developers with a patent infringement lawsuit if they don’t take a license; it doesn’t matter if the attacker’s case is weak and likely to lose in court eventually, as these attackers know that the companies they’re threatening do not have the means to fight to get to that point. And there is no provision for the loser to have to pay for the legal fees of the winner. So the choice for these companies is either to back off and pay up, or spend at least an arm and a leg that they will never recover defending themselves. This is extortion, plain and simple.

So even if bogus patents are the exception, it is more than enough for a few of them to end up in the wild and used as bludgeons to waylay software companies pretty much at will, so the impact is disproportionate with the number of bogus patents. Especially when you consider the assailants cannot be targeted back since they do not produce products.

But at the very least, these issues appear to be fixable. The patent litigation system could be scaled back (possibly only for software patents), and, who knows, the USPTO could change and do a correct job of evaluating software patents, especially if there are disincentives in place (like a higher patent submission fee) to curb the number of submissions and allow the USPTO to do a better job. And one could even postulate a world where software developers “get with the program” and follow patent activity and avoid patented techniques (or license them, as appropriate) such that software development is no longer a minefield. But I am convinced this will not work, especially the latter, and that software (with possible exceptions) should not be patentable, for reasons I am going to expose.

Why software patents themselves are not sustainable

The first reason is that contrary to, say, mechanical engineers, or biologists, or even chip designers, the software development community is dispersed, heterogeneous, and loosely connected, if at all. An employee in IT writing network management scripts is a software practitioner; an iOS application developer is a software practitioner; a web front-end developer writing HTML and JavaScript is a software practitioner; a Java programmer writing line of business applications internal to the company is a software practitioner; an embedded programmer writing the control program for a washing machine is a software practitioner; a video game designer scripting a dialog tree is a software practitioner; a Linux kernel programmer is a software practitioner; an embedded programmer writing critical avionics software is a software practitioner; an expert writing weather simulation algorithms is a software practitioner; a security researcher writing cryptographic algorithms is a software practitioner. More significantly, every company past a certain size, regardless of its field, will employ software practitioners, if only in IT, and probably to write internal software related to its field. Software development is not limited to companies in one or a few fields, software practitioners are employed by companies from all industry and non-industry sectors. So I don’t see software developers ever getting into a coherent enough “community” for patents to work as intended.

The second reason, which compounds the first, is that software patents can not be reliably indexed, contrary to, say, chemical patents used in the pharmaceutical industry for instance2. If an engineer working in pharmacology wants to know whether the molecule he intends to work on is patented already, there are databases that, based on the formal description of the molecule, allow to find any and all patents covering that molecule, or allow the knowledge with a reasonably high degree of confidence that the molecule is not patented yet if the search turns up no result. No such thing exists (and likely no such thing can exist) for software patents, where there is at best keyword search; this is less accurate, but in particular cannot give confidence that an algorithm we want to clear is not patented, as a keyword search may miss patents that would apply. It appears that the only way to ensure a piece of software does not infringe patents is to read all software patents (every single one!) as they are issued to see if one of them wouldn’t cover the piece of software we want to clear; given that every company that produces software would need to do so, and remember the compounding factor that this includes every company past a certain size, this raises some scalability challenges, to put it lightly.

This is itself compounded by the fact you do not need a lot of resources available, or to spend a lot of resources or time, to develop and validate a software invention. To figure out whether a drug is worth patenting (to say nothing of producing it in the first place), you need a lab, in which you run experiments taking time and money to pay for the biological materials, the qualified technicians tending to the experiments, etc. Which may not work, in which case you have to start over; one success has to bear the cost of likely a magnitude more failures. To figure out whether a mechanical invention is worth patenting, you need to build it, spend a lot of materials (ones constitutive of the machine because it broke catastrophically, or ones the machine is supposed to process like wood or plastic granules) iterating on the invention until it runs, and even then it may not pan out in the end. But validating a software invention only requires running it on a computer that can be had for $500, eating a handful of kilojoules (kilojoules! Not kWhs, kilojoules, or said another way, kilowatt-seconds) of electrical power, and no worker time at all except waiting for the outcome, since everything in running software is automated. With current hardware and compilers, the outcome of whether a software invention works or not can be had in mere seconds, so there is little cost to failure of an invention. As a result, developing a software invention comparable in complexity to an invention described in a non-software patent has a much, much lower barrier of entry and requires multiple orders of magnitude less resources; everyone can be a software inventor. Now there is still of course the patent filing fee, but still in software you’ve got inventions that are easier to come up with, as a result many more of them will be filed, while they impact many more companies… Hmm…

Of course, don’t get me wrong, I do not mean here that software development is easy or cheap, because software development is about creating products, not inventions per se; developing a product involves a lot more (like user interface design, getting code by different people to run together, figuring out how the product should behave and what users want in the product, and many other things, to say nothing of non-programming work like art assets, etc.) than simply the inventions contained inside, and developing that takes a lot of time and resources.

Now let us add the possibility of a company getting a software patent so universal and unavoidable that the company is thus granted a monopoly on a whole class of software. This has historically happened in other domains, perhaps most famously with Xerox who for long held a monopoly on copying machines, by having the patent on the only viable technique for doing so at the time. But granting Xerox a monopoly on the only viable copying technique did not impact other markets, as this invention was unavoidable for making copying machines and… well, maybe integrated fax/printer/copying machine gizmos which are not much more than the sum of their parts, but that was it. On the other hand, a software invention is always a building block for more complex software, so an algorithmic patent could have an unpredictable reach. Let us take the hash table, for instance. It is a container that allows to quickly (in a sense formally defined) determine whether it already contains an object with a given name, and where, while still allowing to quickly add a new object; something computer memories by themselves are not able to do. Its performance advantages do not merely make programs that use it faster, they allow many programs, which otherwise would be unfeasibly slow, to exist. The hash table enables a staggering amount of software; for instance using a hash table you can figure out in a reasonable time from survey results the list of different answers given in a free-form field of that survey, and for each such answer the average age of respondents who picked that answer (as an example). Most often the various hash tables uses are even further removed from user functionality, but are no less useful, each one providing its services to another software component which itself provides services to another, etc. in order to provide the desired user functionality. Thanks to the universal and infinitely composable nature of software there is no telling where else, in the immensity of software, a software invention could be useful.

Back when it was invented, the hash table was hardly obvious, had it been patented everyone would have had to find alternative ways to accomplish more or less the same purpose (given the universal usefulness it has), such as trees, but those would themselves have become patented until they was no solution left, as there are only so many ways to accomplish that goal (given that in software development you cannot endlessly vary materials, chemical formulas, or environmental conditions); at that point software development would have become frozen in an oligopoly of patent-having companies, which would have taken advantage of being the only ones able to develop software to file more patents to indefinitely maintain that advantage.

Even today, software development is still very young compared to other engineering fields, even to what they were around the start of the nineteenth century when patent systems were introduced. And its fundamentals, such as the hardware it runs on and its capabilities, change all the time, such that there is always a need to reinvent some of its building blocks; therefore patenting techniques currently being developed risks having enormous impact on future software.

But what if algorithmic inventions that are not complex by software standards were not allowed patent protection, and only complex (by software standards) algorithms were, to compensate for the relative cheapness of developing an algorithmic invention of complexity comparable to a non-algorithmic invention, and avoid the issue of simple inventions with too important a reach? The issue is, with rare exceptions complex software does not constitute an invention bigger than the sum of individual inventions. Indeed, complex software is developed to solve user needs, which are not one big technical problem, but rather a collection of technical problems the software needs to solve, such that the complex software is more than the sum of its parts only to the extent these parts work together to solve a more broadly defined, non-technical problem (that is, the user needs). However this complex software is not a more complex invention solving a new technical problem its individual inventions do not already solve, so patenting this complex software would be pointless.

Exceptions (if they are possible)

This does leave open the possibility of some algorithmic techniques for which I would support making an exception and allowing them patent protection while denying it to algorithms in general, contingent on a caveat I will get into afterwards.

First of these are audio and video compression techniques: while they come down to algorithms in the end, they operate on real world data (music, live action footage, voice, etc.) and have shown to be efficient at compressing this real-world data, so they have more than just mathematical properties. But more importantly, these techniques compress data by discarding information that will end up not being noticed as missing by the consumer of the media once uncompressed, and this has to be determined by experimentation, trial and error, large user trials, etc. that take resources comparable to a non-algorithmic invention. As a result, the economics of developing these techniques is not at all similar to software, and application of these techniques is bounded to some, and not all, software applications, so it is worth considering keeping patent protection for these techniques.

Other techniques which are worth, in my opinion, patenting even though they are mostly implemented in software are some encryption/security systems. I am not necessarily talking here of encryption building blocks like AES or SHA, but rather of setups such as PGP. Indeed these setups have provable properties as a whole, so they are more than just the sum of their parts; furthermore, as with all security software the validation that such techniques work can not be done by merely running the code3, but only by proving (a non-trivial job) that they are secure, again bringing the economics more in line with those of non-algorithm patents, therefore having these techniques in the patent system should be beneficial.

So it could be worthwhile to try and carve an exception and allow patents for these techniques and others sharing the same patent-system-friendly characteristics, but if attempted extreme care will have to be taken when specifying such an exception. Indeed, even in the U.S.A. algorithm patents are formally banned, but accumulated litigations ended up with court decisions that progressively eroded this ban, first allowing algorithms on condition they were intimately connected to some physical process, then easing more and more that qualification until it became meaningless; software patents must still pretend being about something other than software or algorithms, typically being titled some variation of “method and apparatus”, but in practice the ban on algorithm patents is well and truly gone, having been loopholed to death. So it is a safe bet any granted exception, on an otherwise general ban on software patents should it happen in the future, will be subject to numerous attempts to exploit it for loopholes to allow software in general to be patented again, especially given the important pressure from big software companies to keep software patents valid.

So if there is any doubt as to the validity and solidity of a proposed exception to a general ban on software patents, then it is better to avoid general software patents coming back through a back door, and therefore better to forego the exception. Sometimes we can’t have nice things.

Other Proposals

Nilay Patel argues that software patents should be allowed, officially even. He mentions mechanical systems and a tap patent in particular, arguing that since the system can be entirely modeled using physical equations, fluid mechanics in particular, the entire invention comes down to math in the end like for software, so why should software patents be treated differently and banned? But the key difference here, to take again the example of the tap patent he mention, is that the part of math which is an external constraint, the fluid mechanics, are an immutable constant of nature. On the other hand with algorithm patents all algorithms involved are the work of man; even if there are external constraining algorithms in a given patent, due to legacy constraints for instance, these were the work of man too. In fact, granting a patent because an invention is remarkable due to the legacy constraints it has to work with and how it solves them would indirectly encourage the development and diffusion of such constraining legacy! We certainly don’t want the patent system encouraging that.

The EFF proposes, among other things, allowing independent invention as a valid defense again software patent infringement liabilities. If this is allowed, we might as well save costs and abolish software patents in the first place: a patent system relies on independent infringement being an infringement nonetheless in order to avoid abuses rendering the whole system meaningless, and I do not see software being any different in that regard.

I cannot remember where, but I heard the idea, especially with regard to media compression patents, of allowing software implementations to use patented algorithm inventions without infringing, so that software publishers would not have to get a license, while hardware implementations would require getting a license. But an issue is that “hardware” implementations are sometimes in fact DSPs which run code actually implementing the codec algorithms, so with this scheme the invention could be argued to be implemented in software; therefore OEMs would just have to switch to such a scheme if they weren’t already, qualify the implementation as software, and not have to pay for any license, so it would be equivalent to abolishing algorithm patents entirely.


  1. I do not comment on the internal affairs of foreign countries in this blog, but I have to make an exception in the case of the software patent situation in the U.S.A., which is so appalling that it ought to be considered a trade impediment.

  2. I learned that indexability was a very useful property that, in contrast to software patents, some patent domains did have, and the specific example of the pharmaceutical industry as such a domain, from an article on the web which I unfortunately cannot find at the moment; a search on the web did not allow me to find it but turned up other references for this fact.

  3. It’s like a lock: you do not determine that a lock you built is fit for fulfilling its purpose by checking that it closes and that using the key opens it; you determine it by making sure there is no other way to open it.

PSA: Do not release ARMv7s code until you have tested it

If you are using a third-party SDK in your iOS app, you may encounter a problem when linking with the current Xcode release: in that case the linker errors with the following line in the build log:

ld: file is universal (2 slices) but does not contain a(n) armv7s slice: libexample.a for architecture armv7s

(with some libraries, the linker will instead output the following when it errors, but it’s the same general problem:)

ld: warning: ignoring file libexample.a, file was built for archive which is not the architecture being linked (armv7s): libexample.a
Undefined symbols for architecture armv7s:

One solution for this issue is to get an updated version of the SDK that has a library with an ARMv7s slice (provided such an update exists, of course, otherwise you have no choice but to apply the second solution). However you should do so only if you have an iPhone 5 to test on; otherwise, I strongly recommend you apply the second solution: go to your project settings, (or target settings, if they are overridden at the target level), and edit the Architectures setting from “armv7 armv7s” to just “armv7”.

Why? Well, only the iPhone 5 can run the variant of your app code (called a slice) compiled for ARMv7s, so if you build and eventually release an update to your app that includes ARMv7s support, you would be releasing code you have not tested yourself, which is a big no-no. Don’t do it. In fact even if you can test on an iPhone 5, there is likely no need to rush and add ARMv7s support in your app as the benefits are incremental at best, as far as I can tell (but do not take my word for it, measure!); I really can’t understand why Apple added ARMv7s support in such a way that existing projects start using it right away by default.

Look forward to a post detailing the benefits (if any) to adding an ARMv7s slice to your app, as well as updates to my existing posts, in the coming weeks.

This post was initially published with somewhat different contents as at the time the iPhone 5 was not actually available. In the interest of historical preservation, the original content has been moved here.

New section: comicroll

The absence of a blogroll on this site is very much intentional. For me the main issue with a blogroll is that it provides no context, it does not tell why the author of the blog reads the linked blogs, and more importantly does not tell why the author of the blog thinks his audience would benefit from reading the linked blogs. For instance, it might surprise a few of you to learn that I enjoy reading the Old New Thing, Raymond Chen’s blog on the the history of Windows and Win32 arcana. A great French humorist once said: “Are you going to remain simplistic (literally, primary) anticommunists your whole life, while it is enough to read Marx to become straight away thoughtful (literally, secondary) anticommunists?” I have not read The Capital, but I read the Old New Thing for a similar purpose. This is very important context that you would not have if I simply were to put a link to the Old New Thing in a blogroll.

When I link to other blogs in one of my posts, on the other hand, it does provide you with the context to help you decide if you want to read that post, and potentially if you want to read more of that blog. However, there exists content that most of the time is not practical to drop a link to in a blog post, such as webcomics. This is why I am pleased to introduce a comicroll to this site. I religiously follow all works listed in this comicroll, never missing an update, and I recommend them all heartily. For the benefit of those of you following at home on an RSS reader, they are, at the time of this writing:

Note that a few of them are in fact over or in hiatus, but they are too good to pass up. In particular, the Daily Victim (which by the way is not exactly a comic, but rather illustrated humor prose) was a GameSpy feature that used to be available here, and this is in fact an archive saved by a fellow fan and made available on the web (by the way, if he happens to be reading this, I would like to contact him about some of the images in this archive Never mind, the issue has been fixed, it’s getting better all the time).

So please enjoy this comicroll; know that it is not going to be set in stone, I will be changing it now and again, so be sure to check it out from time to time.

Developer command-line tools setup

After my previous post, Gregory Pakosz wondered why I was using xcrun, as he did not need to (turns out he set to install command-line developer tools in the Xcode prefs, so otool and friends were in /usr/bin). That got me thinking a bit about the setup for accessing command-line tools we can assume another developer has.

Starting from the olden days of Mac OS X and up until recently, the Developer Tools were a system-spanning install, most definitely not self-contained. In particular, either as standard or as an option offered at install, you could install command-line tools like the compiler and more in /usr/bin, directly accessible to your shell; and even if some disabled the option, they would add /Developer/usr/bin to their $PATH. As a result, you could just assume otool, libtool, gcc, etc. would be directly accessible in the environment of a fellow developer, and just give command lines directly using them when conversing with them by email, Twitter, blog posts, etc.

Then the iPhone SDK happened, and our numbers grew immensely (waving at you guys!), but habits were generally kept. And then Xcode 4 happened and emphasized being a one stop shop for your entire development process. And with Xcode 4.3 Xcode truly became self-contained, with no longer any installation per se; these days it is possible, and I suspect common, to develop and submit iPhone apps without needing to be aware of the Terminal at all (not that this is a bad thing, mind you).

In my case I initially missed installing the command-line tools as instead of meeting the customization step of the install process and enabling everything except WebObjects, now I simply had Xcode as an application in /Applications and never thought to seek the installs in the Xcode preferences.

I have now enabled that preference and installed the tools in /usr/bin, but still I missed it initially, so who knows how many others do. Also, it occurs to me more and more build systems and other scripts in the Mac/iOS development world are now aware of the Xcode hierarchy, and rely on xcode-select -print-path and/or an environment variable to locate the developer directory and use the tools in there directly; as a result, the Command Line Tools install is now mostly for the benefit of open source/Unix stuff which simply assumes cc/gcc is in the path, and maybe we should start thinking about it as being for that specific purpose.

So my question is, should we simply assume command-line developer tools are in the path and write our blogs posts, emails, and Twitter messages as usual (maybe with a reminder to “make sure you have the Command Line Tools installed in the Xcode prefs.”)? Or should we start considering these tools do not necessarily belong in /usr/bin and instead instruct our readers to use xcode-select -switch then xcrun, or even instruct them to directly add the Developer/usr/bin folder inside Xcode to the $PATH (which is likely to break from time to time as the paths change)? What do you think? As usual, write me at wanderingcoder@sfr.fr.

Okay, feedback is not unanimous in either direction. For now I will keep putting xcrun, with the assumption that it will help those who have not installed the Command Line Tools, while those who have will know enough to remove it from the command line before use. — August 8, 2012

Dealing with multiply defined symbols

ld: duplicate symbol _DoStuff in OneFile.o and CompletelyDifferentFile.o for architecture armv7

☠!#✺⁂☆⚡❕☭✹✊✨#❗

Like me, this is probably your spontaneous reaction upon getting this delightful error message when trying to build your app. This is a signal that at the very least some more work will be needed on the code you just integrated before your app will work with it. But it could be even worse than you think, as the Mac OS X linker (as of Xcode 4.3.2) will only report duplicate symbols one. at. a. time. So there could be in fact 50 colliding symbols, and the linker will only tell you about the next one only once you’ve fixed the previous one, making you require 50 prefix-compile-link cycles! Today I will show you how to efficiently assess the damage first thing, then show you different methods to fix the issue, depending on the situation.

The Problem

The issue here, at its most fundamental level, is that there exists two Objective-C classes with the same name, C functions with the same name, or C++ functions with the same name and signature (or possibly two global variables with the same name—it can happen) such that the linker cannot resolve references to this symbol, as it does not know which to pick. One could think the linker could pick either definition, but this would be an incredibly dangerous thing to do as some references likely expect the other definition of the code, and so would end calling completely (or subtly) different code than what it expected and you would end up with a mysterious, impossible-to-debug issue at runtime (and that’s if you’re lucky).

This kind of problem typically occurs at a time in your project which may not be the best: at the time of integration of a separate body of code (for instance, a third-party library); it may occur for various reasons, the most common one is that the code you just integrated contains utility functions it uses, but your code already contains utility functions of the same name because these utility functions were copy pasted one way or the other, and then likely modified. It is also possible to encounter external code which contains unprefixed functions or classes (always put your two or three letter organization prefix in front of any class or function which has visibility outside the current source file, people!), and one of those collides with one of yours, or one in another library with unprefixed symbols. Sometimes a whole module may be included in two different libraries you are using, and you mistakenly built both libraries with this module included, forgetting you were going to use them together. And in some cases the code you just added may include internally its own version of an open-source library like SQLite, which will collide with the SQLite system framework if you are using it.

Depending on the reason and the circumstances, you may have different constraints, so it is important to recognize which situation you are in in order to apply the most appropriate solution; you don’t want to prefix 100 functions on a deadline when there is a better way.

Assessing the damage

The first order of business is to figure out how many colliding symbols there are. But how to do it if the linker is going to only report one such symbol before giving up? I don’t have it down to a single script that would do all the steps, but here is the process I followed when I found myself in this situation:

Disclaimer: these instructions and Terminal commands come with no warranty, I cannot be held responsible if you hose your computer following them; caveat emptor.

  1. I arranged to be able to use commands that come as part of the Xcode tools package; I did so with xcode-select -switch /Applications/Xcode.app/Contents/Developer, and preceding all developer tool commands by xcrun; I could also have added the tools folder to my path.

  2. In one case I came across, the colliding symbols were already in the same static library before the final link (it was an intermediate static library generated by a subtarget) and I could start with step 3.

  3. I generated a static library with the object files that would be linked, by taking from the linker invocation that failed (in the Xcode build log) the -filelist option with its parameter (a file ending in .LinkFileList), as well as any static library parameter (the files ending in .a), and putting them right after “xcrun libtool -arch_only armv7 -o stuff.a ” (with a space after stuff.a) in a Terminal window, to generate a library named stuff.a

  4. Then I ran this wonderful command (as one line):

    xcrun otool -vS stuff.a | LANG=C sed -n "/^object *symbol name\$/,\$p" | LANG=C sed "1d" | LANG=C sed "s/^.* //g" | LANG=C sort | LANG=C uniq -d > duplist

    (it may be useful to understand what this does: otool -vS reads the table of contents of the library, which lists the symbols and the object file where the symbol can be found; the first two sed commands extract the relevant part from the output; the third removes the file name part of each line; sort, then uniq -d extract the lines which appear more than once)
    At this point, duplist contained the list of duplicate symbols, one per line.

  5. I ran this command next:

    xcrun otool -vS stuff.a | LANG=C grep -f duplist > dupreport

    At this point, dupreport listed the object files containing one of the duplicate symbols, as well as which of the duplicate symbols they contain.

Now I had a count of the duplicate symbols (there were only a handful, if it wasn’t the case I would have used wc -l duplist to count them), all such duplicate symbols, and which object files they occur in.

Notice this won’t work to list duplicate symbols with frameworks; you will want to handle that case specifically, anyway.

Fixing the damage

We are getting there: now we know the situation, we can fix it in one fell swoop. But we must avoid making non-trivial changes to the code in doing so, or we risk introducing bugs, while we were merely supposed to integrate already working code together.

If there are only a handful of duplicate symbols, simply prefix the functions and all the places they are called with a different prefix depending on the side; e.g. instead of PREPackBits, you would have PREReaderPackBits and PREWriterPackBits. Unless you only have the code for one side, it’s preferable to prefix both sides (that way any reference that you forgot to prefix will cause an error when linking, rather than silently resolving to the wrong version); note that even if you control both sides, it is not a good idea to try and merge the function implementations so that both sides call a single function which would satisfy them both: even if the function was copy-pasted from one side to another, it was likely modified, and at this point you are trying to integrate two bodies of code that work well separately, it is not a good time to make semantic changes to the code. If the code duplication bothers you, it will always be possible to refactor later. If you have the code for neither side, then you are in trouble, though you may be able to apply the partial link technique described later.

If the duplicate symbols are all the functions in a submodule used by both sides, then you made a mistake when building the libraries and included that submodule in both libraries; since on both sides the same source files with the same compile options are presumably used, then you should build and include that submodule in one library only, and leave it out of the second one: code in the second library which relies on the submodule API will simply use the one from the first library.

If the symbol collision involves a system framework and appears to be a limited coincidence (e.g. an unprefixed function name which is also used in a system framework), then just prefix the function on your side with your organization prefix. If, however, the collision is not a coincidence, for instance because you embed in your app an open-source project which is also available as a system framework, such as SQLite, then it is likely not practical to prefix all the functions in your internal copy of the open-source project, and neither it is to use the system framework and remove your internal version of the open-source code (which may be a different version, have some customizations, and what not). What I did in such a case, at least as a first step, was to do a partial link on the component of my app which made use of SQLite. A partial link forces references to be resolved, but produces an object file that can then be subject to further linking; this is done with ld -r -ObjC, and the linker invocation has to be given (with the -exported_symbols_list option) a file containing the function names one would like the generated object to export, this would be the API of the component. That way, references to SQLite functions coming from inside the component get resolved correctly, and from then on these functions are no longer visible outside the component, such that the final link can proceed without problem.

This is a bit of an extreme solution, as it forces you to handle that component specially (there is no support for this in Xcode), and you will have to redo this handling every time you need to rebuild the component, but it got me out of a bind without having to make changes to my code. In the long run however, you should either prefix or get rid if that internal version of the open source project.

Finishing your work

Now complete the integration of the body of code you were intending to add (fixing runtime issues, etc.) Once your app is running satisfactorily again, then it is probably time to apply a definitive solution to the original duplicate symbol issue: for instance, you may find it unsustainable to have parts of your code use a modified by mostly duplicate function and would much prefer there to be a single, unified copy. Now is a good time to do it, or maybe later; what matters is that this refactoring not be done before you get the app running again, as then you would have no idea whether bugs were due to the newly integrated code, or due to the changes you made in order to integrate it. The sum it up: get the newly integrated code running while making as few changes as possible first, then only refactor the code to your liking.

No, Apple does not pay developers

Enough is enough. For a few years already Apple has been boasting at WWDC and various other events (earning calls, etc.) about paying however billions dollars to developers; more recently, they have put this page where they proudly proclaim their contribution to the (U.S.) economy with, among others, the “App Economy” (they insist there: “And Apple has paid more than $4 billion in royalties to developers through the App Store”). If Apple has been criticized for this claim, I have not come across such criticism, which is a shame because Apple is not paying iOS or Mac application developers by any reasonable sense of the word “pay”. This is money that the developers have earned from users, and Apple is but an intermediary in this transaction.

All definitions and usages of the word “pay” (at least when one entity is subject, as opposed to say, an investment) that I could find imply either an opposite movement of goods or services, or an existing debt the payment exists to settle; as far a I can tell neither of these is the case: is Apple borrowing money from developers, then paying them back? Is Apple placing bulk orders of apps that it then redistributes? Is Apple commissioning the development of apps? No, no, and no, and quite the opposite in fact in the last case, as not only developers figure out themselves what to produce, but Apple has been known to reject the outcome, after it was done, in ways that have sometimes lacked transparency. In the context Apple uses it, “pay” implies there is a supplier/buyer business relationship, but Apple has none of the characteristics of a buyer: it is not the one that needs to be courted, it is not the one making the buying decision, it is not the one deciding how much to buy, it is not the one providing the money. The definition with which I could most charitably interpret their usage of “pay” would be “hand over or transfer the amount due of (a debt, wages, etc.) to someone” (New Oxford American Dictionary), in which case the amount due would be the remainder after commission that Apple owes developers, and transfers at the end of the month, which is not something particularly worthy to boast about.

Apple cannot even technically be said to be paying developers. The business and legal relationship iOS and Mac App Store developers have with Apple is, to sum it up, that Apple represents the developers in front of the customers, it acts as the developer agent (hence the agency model) and among other things collects customer payments on behalf of the developer and then forwards the payment to the developer (minus the commission Apple takes). The end user is the customer in this transaction, and she pays the seller, that is the developer; Apple does not. Apple merely transfers money that the developer earned from customers, nothing more. So it makes sense to say Apple transfers money to developers, amounting more than $4 billion; it doesn’t make sense to say Apple “pays” developers, even less so to speak of “royalties”.

Even worse yet, what kind of organization would boast about creating an “industry” of 210,000 unstable, underpaid jobs? Indeed, these jobs have no long term business visibility as any one of them could be the subject of a random Apple edict in the next few years; and clearly not everyone is earning a wage from the iOS App Store, as even if you assume all $4 billion have gone to the 210,000 U.S. developers, this works out at about $20,000 on average per such “job” since the introduction of the iPhone in 2007…

To me, this is symptomatic of a level of contempt towards third party developers not seen since the contempt of Nintendo towards third party developers at the heyday of the NES. This may or may not be anything new, but Apple should take care. iOS developers are a proud, independent bunch, and it is hard enough by itself to create apps that users will buy; telling them that it is Apple that pays them is not a good tactic. One day or another, Apple will meet its Playstation, and then everything will hinge on the attitude of developers.

Don’t get me wrong, Apple has made a lot of good things with iOS and enabled a lot of possibilities, and I am thankful for that. It’s just that “paying developers” is just not something Apple does; this is money from end users that developers have earned.

New favicon (and updates)

Photo of an old Mac (screen, main unit, keyboard and mouse) displaying a picture editor program with the just completed site favicon (itself a monochrome pixel stick figure carrying a bundle)

my Mac IIsi, running MacPaint 2.1 F

Short of a Mac 128k running the original MacPaint, this is possibly the most authentic way to draw monochrome pixel art.

In other news, I have made a number of updates to previous posts; most of those were to fix spelling mistakes, but I should mention a few major corrections; for one, I realized that Skype very much is household-name commercial software distributed mostly digitally, so I amended First Impressions of the Mac App Store in consequence; another thing I realized is that I discovered the Iconfactory site in 1999, and not 1996; In support of the Lodsys patent lawsuit defendants has been fixed as such. In Developer ID might not seem restrictive, but it is, I implicitly assumed that if Apple was going to require code to be signed with a certificate it provides, developers were necessarily going to have to pay for that certificate, but I realized Apple was providing certificates for signing Safari extensions for free, so I adjusted my oath to specify that point. Lastly, the ARM architecture documentation PDF keeps moving in Xcode, so I again had to update Introduction to NEON on iPhone with up-to-date instructions for locating it…

Beware of ARMv6-only iOS libraries

“What the f…1 is going on?”

There I was, one day at work, trying to figure out why this iOS project wouldn’t link, giving the error “Undefined symbols for architecture armv7:” with the missing symbols being the entry points of a third-party library that was just added. Which made no sense at all, as the library did define these symbols (I checked using otool -vS), and the library was properly pulled in by a subproject, which was itself properly pulled in by the app (I checked with the log of commands executed by Xcode). I already tried cleaning all projects three times, nuking all the caches I could think of, and removing and reinserting the external library, to no avail. I was at a loss for ideas.

This is an actual issue that I want to warn you about because it could affect any iOS developer, so I’ll cut to the chase: the issue was that the third-party library was only built for ARMv6 (or only built for for the ARM_ALL subtype, the same issue occurs in the end) while I was trying to build an app with an ARMv7 slice.

This can happen with libraries that haven’t been updated recently, or that work fine in ARMv6 for most people (I’ll explain why in a minute) so are not built for both ARMv6 and ARMv7. If this happens to you (you can check with otool -vhf <library> | less; notice that in this context, “ALL” does NOT mean “both ARMv6 and ARMv7”), demand from the supplier of the library that you be provided a library that has both ARMv6 and ARMv7 slices; and if you have no choice but to take matters into your own hands (as I had to do), it’s possible to build a simple tool (everything you need is in mach-o/loader.h and mach/machine.h) that will manipulate the ARMv6 object files so that they become ARMv7 (the code in fact remains unchanged) so that you can craft a suitable library with both ARMv6 and ARMv7 slices.

What’s interesting is that the issue only becomes apparent in a specific configuration: if the external library is referenced not by the app target, but by a library target which is itself used by the app. iOS apps projects often have only one target, as iOS apps are generally simpler than Mac apps, but it’s not outlandish for more complex iOS projects to have multiple targets which depend on each other, in which case you could encounter the issue.

What is happening here is that ARMv6 and ARMv7 are treated separately enough by the iOS toolchain that they are in practice treated as different architectures altogether, a bit like x86 and ppc are when building Universal Binaries (and this is no fluke, there are plenty of ARM-specific patches in the source to enforce this behavior). What this means among other things is that libtool (the tool which builds libraries) will refuse to mix together ARMv6 and ARMv7 object files; instead, it will combine them separately, then put the results side by side in a fat library.

So in our case what happens is that the object files from the third-party library, being ARMv6, are put together with the ARMv6 object files (of which there may be none, if you’re building ARMv7-only) in the ARMv6 slice of the library target, while the ARMv7 object files are put together in the ARMv7 slice, which therefore does not include anything from the third-party library. Then when the app target is built, more precisely when its ARMv7 slice is built, the linker will find the ARMv7 slice of the library from the target, and will not look in the ARMv6 slice (as it is expected to define the same symbols, so the linker would find duplicate definitions if it tried); so the third-party library object files are never found, referenced symbols are undefined, and you wonder what is going on.

This does not happen if the third-party library is referenced directly by the app target, as then the linker will find this library with only an ARMv6 slice, in which case there is an exception to the segregation rules: since there is no ARMv7 version, these ARMv6 object files will be searched for symbols and linked in to satisfy the dependencies instead, even though the linker is building for ARMv7; the linker is the only one allowed to mix together ARMv6 and ARMv7 object files. So the ARMv6-only library is working in this case, which may lead the supplier of the library to believe the library does not need to be updated to have an ARMv7 slice, while in fact it does need to, otherwise it will cause problems for people who have more complex project structures.


  1. do not think I am too prude to be spelling out the f-word here; I am just saving f-bombs on this blog for when they are really worth it

A few things I would have liked to read about in John Siracusa’s Lion review

Yesterday we saw a few things that John Siracusa didn’t mention in his Snow Leopard review but that I think could have been in talked about in there. Today we will do the same with his Lion review.

The same disclaimer applies: John can’t know everything or mention everything, so do not construe anything I say here as being any sort of criticism of his work, much to the contrary in fact.

First, since Lion requires a 64-bit Mac, I was wondering whether built-in executables actually were 64-bit only, as it would cut short any attempt to run this release of Mac OS X on anything older than its baseline requirements (some earlier releases of Mac OS X could be massaged to do so, to an extent). As it turns out, most executables do have a 32-bit slice, with the exception however of the Finder, which therefore prevents Lion from meaningfully running on a 32-bit machine.

Then, the specific system requirements for AirDrop made me wonder what was the deal here, and whether there was any relationship with Wi-Fi Direct. Unfortunately, I do not know much here (besides that there is indeed a relation), so John covering this would have been all the more welcome. Ah well.

The addition of the AVFoundation APIs (originally released on iOS) to Mac OS X raises an important question that I haven’t seen addressed: what about QTKit? Does it turn out not to be the way forward (even though QTKit is pretty recent), or are they meant to integrate, or is QTKit more for a certain class of needs and AVFoundation for others, or something else altogether? I have no idea, and that’s one of the of the first things (admittedly, as an iOS multimedia app developer, I am a bit biased) I wanted to know. Maybe in the next review John will make a return to covering video technologies at length…

Lastly, I’ve been very intrigued by the rise of SSDs (though I don’t use one myself) and the impact on how storage devices are abstracted, and so a word about Lion support for TRIM would have been nice. Apparently Apple is quietly and selectively enabling it for some drives, but this comes from sources which aren’t very authoritative, while on the other hand with John this would have been coming, not from the horse’s mouth, but pretty much the next best thing.

Make your site shine in lynx (for some value of “shine”)

Yesterday I presented my findings about the Wii browser. Today we shall do the same with Lynx.

Lynx is a text-mode browser which renders on terminals. While it will work on a VT-100, it will greatly benefit from a color terminal. I have found Lynx to be useful when browsing the web from low-bandwidth and/or high-latency settings. For instance, from my student room in a few occasions getting at the wider web would fail, but I could connect to school machines just fine, so I logged by ssh to my school account. Unfortunately, while I could tunnel an X window connection a graphical browser used in such a way was pretty much unusable, but Lynx worked very well.

At this point I need to note that I am not doing web design by any stretch, rather I try to make sure my writing can be read without hassle on such a target.

So first, Lynx has a number of limitations coming from the display device. For one, there is only one font, determined by the terminal or terminal emulator, and it’s not possible for Lynx to change it. Furthermore, font rendering is pretty primitive, don’t expect proportional text, varying font sizes or (God forbid) kerning or ligatures. There is no italics either, though Lynx will try and make sure italics and bold are highlighted in some way through the use of color, the same goes for <code>. The same way, in order to emphasize <h#>, Lynx renders them flush to the left, while paragraphs are indented.

Lynx is also Unicode-aware and will understand and convert between the different encodings and render to UTF-8 if the terminal supports it (which is common for terminal emulators nowadays). So don’t get the impression that text-mode and terminal mean straight quotes and simple hyphens. All in all, despite the limitations Lynx will render your text and at least the meaning of your text formatting as best as it can.

Besides your text itself, however, it is important to remember that Lynx is a text-mode browser. Besides no images (so don’t forget the alt text!), this means it will not attempt to interpret your site layout and try and render it with box drawing characters or anything of the sort; instead, your text will be rendered in the order it appears in the HTML source, so if you have a side column (like navigation) whose content is after the main writing in the HTML source, it will appear after your writing in Lynx (which may be what you want). This also means that you should avoid having too much navigation boilerplate at the top: what appears as an unobtrusive 30 pixel high row of buttons/menus in a graphical browser will show as a laundry list of links, with menus unrolled, that the user has to scroll through at the start of every single page.

Also, Lynx will render your content as white text on black, and will not try to apply your color scheme. Lynx seems to ignore any styling information, this means you need true separation of content and presentation; for instance the boxes I use around additions I make later on to a post do not render at all and this information is lost (it’s not too bad in this case), so try and avoid such constructs. Lynx does not support Javascript either.

The user interacts entirely with the keyboard; it’s pretty spartan, but efficient: base commands are space/b to scroll down/up a screenful of text from a page, up/down arrows to navigate links, right arrow to follow a link, left arrow for back (a development tip: you do ctrl-R to reload and refresh the page). Some site navigation options, especially if they are after the content, may not be as discoverable for the user as on a graphical browser, on the other hand users will gladly search, so make sure you have a search box at the top, clearly marked (with text!), they will take it from there.

In the end, don’t neglect Lynx, you might not think of it as useful, until the day you need to check the content a site serves when seen from a machine on which you happen to have a ssh account…