Dynamic symbol table duel: ELF vs Mach-O, round 2

If you enjoy this article, subscribe (via RSS or e-mail) and follow me on twitter.
The intention of this post is to continue highlighting some of the similarities and differences between ELF and Mach-O that I encountered while building memprof. The previous post in this series can be found here.
What is a symbol table?
A symbol table is simply a list of names in an object. The names in the list may be names of functions, initialized/uninitialized memory regions, or other things depending on the object format. The symbol table does not need to be mapped into a running process and is only useful for debugging. The symbol table (and other sections) may be removed from an object when you use strip.
Symbol tables in ELF objects
An entry in the symbol table in an ELF object can best be described by the following struct from /usr/include/elf.h:
typedef struct
{
Elf64_Word st_name; /* Symbol name (string tbl index) */
unsigned char st_info; /* Symbol type and binding */
unsigned char st_other; /* Symbol visibility */
Elf64_Section st_shndx; /* Section index */
Elf64_Addr st_value; /* Symbol value */
Elf64_Xword st_size; /* Symbol size */
} Elf64_Sym;
In most cases, this structure is used to find the mapping from a symbol name to the address where it lives. Although, different symbol types (specified by st_info) provide mappings from symbols to other data.
The st_name field is an index into a section called strtab which is just a table of strings.
Symbol tables in Mach-O objects
Let’s take a look at the struct for a symbol table entry in a Mach-O object from /usr/include/mach-o/nlist.h:
struct nlist_64 {
union {
uint32_t n_strx; /* index into the string table */
} n_un;
uint8_t n_type; /* type flag */
uint8_t n_sect; /* section number or NO_SECT */
uint16_t n_desc; /* see */
uint64_t n_value; /* value of this symbol (or stab offset) */
};
It looks very similar. The immediately noticeable difference with ELF:
- lack of
sizefield – The only noticeable difference on your first glance is the lack of a size field. The size field in ELF objects describes the number of bytes occupied by the symbol. This is actually pretty useful, especially for memprof. The lack of this field in Mach-O was a source of frustration for Jake when he was implementing Mach-O support.
What is a dynamic symbol table?
Shared objects in both Mach-O and ELF have a symbol table listing only functions that are exporteed by the object.
This table is used during dynamic linking and is mapped into the process’ address space when the object is loaded, unlike the symbol table which is just used for debugging.
The dynamic symbol table is a subset of the symbol table.
Dynamic symbol table in ELF objects
The dynamic symbol table in ELF objects is stored in a section named dynsym. The indexes stored in the st_name field (from the structure listed above) are indexes into the string table in a section named dynstr. dynstr is a string table specifically for entries in the dynamic symbol table.
If you know the symbol you care about, you can simply calculate a hash of the symbol name to find the symbol table entry for that symbol. Unfortunately, there is not very much documentation about the hash function that is to be used.
Your two options are:
- You’ll need to either read the source for binutils,
- check out a useful post on a mailing list.
The sections storing the hash table data for an object are called .hash and .gnu.hash.
Dynamic symbol table in Mach-O objects
Finding the dynamic symbol table in a Mach-O object is a bit complicated. The pieces to the puzzle are found across different structures and the documentation on how it all works is sparse.
Mach-O objects have a load command called LC_DYSYMTAB which describes information about the dynamic symbol table in Mach-O objects.
I’ve shortened the structure definition, as it is quite large and contains documentation about stuff that is not directly relevant to this post. From /usr/include/mach-o/loader.h:
struct dysymtab_command {
uint32_t cmd; /* LC_DYSYMTAB */
uint32_t cmdsize; /* sizeof(struct dysymtab_command) */
/* .... */
/*
* The sections that contain "symbol pointers" and "routine stubs" have
* indexes and (implied counts based on the size of the section and fixed
* size of the entry) into the "indirect symbol" table for each pointer
* and stub. For every section of these two types the index into the
* indirect symbol table is stored in the section header in the field
* reserved1. An indirect symbol table entry is simply a 32bit index into
* the symbol table to the symbol that the pointer or stub is referring to.
* The indirect symbol table is ordered to match the entries in the section.
*/
uint32_t indirectsymoff; /* file offset to the indirect symbol table */
uint32_t nindirectsyms; /* number of indirect symbol table entries */
/* .... */
};
The LC_DYSYMTAB load command provides the fields indirectsymoff and nindirectsyms which describe the offset into the file where the indirect symbol tables lives and the number of entries in the table, respectively.
The dynamic symbol table in Mach-O is surprisingly simple. Each entry in the table is just a 32bit index into the symbol table. The dynamic symbol table is just a list of indexes and nothing else.
It turns out there are a few more pieces to the puzzle.
Take a look at the definition for a Mach-O section:
struct section_64 { /* for 64-bit architectures */
char sectname[16]; /* name of this section */
char segname[16]; /* segment this section goes in */
uint64_t addr; /* memory address of this section */
uint64_t size; /* size in bytes of this section */
uint32_t offset; /* file offset of this section */
uint32_t align; /* section alignment (power of 2) */
uint32_t reloff; /* file offset of relocation entries */
uint32_t nreloc; /* number of relocation entries */
uint32_t flags; /* flags (section type and attributes)*/
uint32_t reserved1; /* reserved (for offset or index) */
uint32_t reserved2; /* reserved (for count or sizeof) */
uint32_t reserved3; /* reserved */
};
It turns out that the fields reserved1 and reserved2 are useful too.
If a section_64 structure is describing a symbol_stub or __la_symbol_ptr sections (read the previous post to learn about these sections), then the reserved1 field hold the index into the dynamic symbol table for the sections entries in the table.
symbol_stub sections also make use of the reserved2 field; the size of a single stub entry is stored in reserved2 otherwise, the field is set to 0.
Two notable differences between the dynamic symbol tables
- There is an explicit section in
ELFthat containsElf64_Symentries. OnMach-Oit’s just a list of 32bit offsets. ELFprovides a.hashsection and/or.gnu_hashsection to speed up symbol lookup.Mach-Odoes not.
What happens when you run strip?
Let’s use strip with no options (other than the filename).
On ELF:
- All
.debug_*sections are removed. These sections contain extra debugging information that helps debuggers figure out more precisely what went wrong. .symtabsection is removed..strtabsection is removed.
On Mach-O:
- Only undefined symbols and dynamic symbols are left in the symbol table. Everything else is removed.
How to strip so I can debug later (linux only)
If you decide to strip your binary please be considerate to future hackers who may need to debug your app for some reason.
You can be considerate by following the directions in strip(1):
1. Link the executable as normal. Assuming that is is called
“foo” then…2. Run “objcopy –only-keep-debug foo foo.dbg” to
create a file containing the debugging info.3. Run “objcopy –strip-debug foo” to create a
stripped executable.4. Run “objcopy –add-gnu-debuglink=foo.dbg foo”
to add a link to the debugging info into the stripped executable.
And don’t forget to put your debugging information somewhere easily accessible and googleable.
If you do this: you are cool. If you don’t…
Conclusion
- I like the way ELF does dynamic symbol tables, the
gnu_debuglinksection, and the lookup hash table for dynamic symbols. All of these pieces are really useful and I am glad they exist. - The indirect symbol table was a bit of a pain to track down on
Mach-Oas the information is hard to parse on the first pass. To be fair, it is all there if you google around a bit and put the pieces together. - On Linux, if you strip, please add a
gnu_debuglinksection and put the debug information somewhere I can find it.
Thanks for reading and don’t forget to subscribe (via RSS or e-mail) and follow me on twitter.

