Dealing with multiply defined symbols

ld: duplicate symbol _DoStuff in OneFile.o and CompletelyDifferentFile.o for architecture armv7

☠!#✺⁂☆⚡❕☭✹✊✨#❗

Like me, this is probably your spontaneous reaction upon getting this delightful error message when trying to build your app. This is a signal that at the very least some more work will be needed on the code you just integrated before your app will work with it. But it could be even worse than you think, as the Mac OS X linker (as of Xcode 4.3.2) will only report duplicate symbols one. at. a. time. So there could be in fact 50 colliding symbols, and the linker will only tell you about the next one only once you’ve fixed the previous one, making you require 50 prefix-compile-link cycles! Today I will show you how to efficiently assess the damage first thing, then show you different methods to fix the issue, depending on the situation.

The Problem

The issue here, at its most fundamental level, is that there exists two Objective-C classes with the same name, C functions with the same name, or C++ functions with the same name and signature (or possibly two global variables with the same name—it can happen) such that the linker cannot resolve references to this symbol, as it does not know which to pick. One could think the linker could pick either definition, but this would be an incredibly dangerous thing to do as some references likely expect the other definition of the code, and so would end calling completely (or subtly) different code than what it expected and you would end up with a mysterious, impossible-to-debug issue at runtime (and that’s if you’re lucky).

This kind of problem typically occurs at a time in your project which may not be the best: at the time of integration of a separate body of code (for instance, a third-party library); it may occur for various reasons, the most common one is that the code you just integrated contains utility functions it uses, but your code already contains utility functions of the same name because these utility functions were copy pasted one way or the other, and then likely modified. It is also possible to encounter external code which contains unprefixed functions or classes (always put your two or three letter organization prefix in front of any class or function which has visibility outside the current source file, people!), and one of those collides with one of yours, or one in another library with unprefixed symbols. Sometimes a whole module may be included in two different libraries you are using, and you mistakenly built both libraries with this module included, forgetting you were going to use them together. And in some cases the code you just added may include internally its own version of an open-source library like SQLite, which will collide with the SQLite system framework if you are using it.

Depending on the reason and the circumstances, you may have different constraints, so it is important to recognize which situation you are in in order to apply the most appropriate solution; you don’t want to prefix 100 functions on a deadline when there is a better way.

Assessing the damage

The first order of business is to figure out how many colliding symbols there are. But how to do it if the linker is going to only report one such symbol before giving up? I don’t have it down to a single script that would do all the steps, but here is the process I followed when I found myself in this situation:

Disclaimer: these instructions and Terminal commands come with no warranty, I cannot be held responsible if you hose your computer following them; caveat emptor.

  1. I arranged to be able to use commands that come as part of the Xcode tools package; I did so with xcode-select -switch /Applications/Xcode.app/Contents/Developer, and preceding all developer tool commands by xcrun; I could also have added the tools folder to my path.

  2. In one case I came across, the colliding symbols were already in the same static library before the final link (it was an intermediate static library generated by a subtarget) and I could start with step 3.

  3. I generated a static library with the object files that would be linked, by taking from the linker invocation that failed (in the Xcode build log) the -filelist option with its parameter (a file ending in .LinkFileList), as well as any static library parameter (the files ending in .a), and putting them right after “xcrun libtool -arch_only armv7 -o stuff.a ” (with a space after stuff.a) in a Terminal window, to generate a library named stuff.a

  4. Then I ran this wonderful command (as one line):

    xcrun otool -vS stuff.a | LANG=C sed -n "/^object *symbol name\$/,\$p" | LANG=C sed "1d" | LANG=C sed "s/^.* //g" | LANG=C sort | LANG=C uniq -d > duplist

    (it may be useful to understand what this does: otool -vS reads the table of contents of the library, which lists the symbols and the object file where the symbol can be found; the first two sed commands extract the relevant part from the output; the third removes the file name part of each line; sort, then uniq -d extract the lines which appear more than once)
    At this point, duplist contained the list of duplicate symbols, one per line.

  5. I ran this command next:

    xcrun otool -vS stuff.a | LANG=C grep -f duplist > dupreport

    At this point, dupreport listed the object files containing one of the duplicate symbols, as well as which of the duplicate symbols they contain.

Now I had a count of the duplicate symbols (there were only a handful, if it wasn’t the case I would have used wc -l duplist to count them), all such duplicate symbols, and which object files they occur in.

Notice this won’t work to list duplicate symbols with frameworks; you will want to handle that case specifically, anyway.

Fixing the damage

We are getting there: now we know the situation, we can fix it in one fell swoop. But we must avoid making non-trivial changes to the code in doing so, or we risk introducing bugs, while we were merely supposed to integrate already working code together.

If there are only a handful of duplicate symbols, simply prefix the functions and all the places they are called with a different prefix depending on the side; e.g. instead of PREPackBits, you would have PREReaderPackBits and PREWriterPackBits. Unless you only have the code for one side, it’s preferable to prefix both sides (that way any reference that you forgot to prefix will cause an error when linking, rather than silently resolving to the wrong version); note that even if you control both sides, it is not a good idea to try and merge the function implementations so that both sides call a single function which would satisfy them both: even if the function was copy-pasted from one side to another, it was likely modified, and at this point you are trying to integrate two bodies of code that work well separately, it is not a good time to make semantic changes to the code. If the code duplication bothers you, it will always be possible to refactor later. If you have the code for neither side, then you are in trouble, though you may be able to apply the partial link technique described later.

If the duplicate symbols are all the functions in a submodule used by both sides, then you made a mistake when building the libraries and included that submodule in both libraries; since on both sides the same source files with the same compile options are presumably used, then you should build and include that submodule in one library only, and leave it out of the second one: code in the second library which relies on the submodule API will simply use the one from the first library.

If the symbol collision involves a system framework and appears to be a limited coincidence (e.g. an unprefixed function name which is also used in a system framework), then just prefix the function on your side with your organization prefix. If, however, the collision is not a coincidence, for instance because you embed in your app an open-source project which is also available as a system framework, such as SQLite, then it is likely not practical to prefix all the functions in your internal copy of the open-source project, and neither it is to use the system framework and remove your internal version of the open-source code (which may be a different version, have some customizations, and what not). What I did in such a case, at least as a first step, was to do a partial link on the component of my app which made use of SQLite. A partial link forces references to be resolved, but produces an object file that can then be subject to further linking; this is done with ld -r -ObjC, and the linker invocation has to be given (with the -exported_symbols_list option) a file containing the function names one would like the generated object to export, this would be the API of the component. That way, references to SQLite functions coming from inside the component get resolved correctly, and from then on these functions are no longer visible outside the component, such that the final link can proceed without problem.

This is a bit of an extreme solution, as it forces you to handle that component specially (there is no support for this in Xcode), and you will have to redo this handling every time you need to rebuild the component, but it got me out of a bind without having to make changes to my code. In the long run however, you should either prefix or get rid if that internal version of the open source project.

Finishing your work

Now complete the integration of the body of code you were intending to add (fixing runtime issues, etc.) Once your app is running satisfactorily again, then it is probably time to apply a definitive solution to the original duplicate symbol issue: for instance, you may find it unsustainable to have parts of your code use a modified by mostly duplicate function and would much prefer there to be a single, unified copy. Now is a good time to do it, or maybe later; what matters is that this refactoring not be done before you get the app running again, as then you would have no idea whether bugs were due to the newly integrated code, or due to the changes you made in order to integrate it. The sum it up: get the newly integrated code running while making as few changes as possible first, then only refactor the code to your liking.

Comments are closed.
%d bloggers like this: