<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>time to bleed by Joe Damato &#187; osx</title>
	<atom:link href="http://timetobleed.com/category/osx/feed/" rel="self" type="application/rss+xml" />
	<link>http://timetobleed.com</link>
	<description>technical ramblings from a wanna-be unix dinosaur</description>
	<lastBuildDate>Tue, 05 Jul 2011 13:00:09 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>The Broken Promises of MRI/REE/YARV</title>
		<link>http://timetobleed.com/the-broken-promises-of-mrireeyarv/</link>
		<comments>http://timetobleed.com/the-broken-promises-of-mrireeyarv/#comments</comments>
		<pubDate>Tue, 05 Jul 2011 13:00:09 +0000</pubDate>
		<dc:creator>Joe Damato</dc:creator>
				<category><![CDATA[bugfix]]></category>
		<category><![CDATA[debugging]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[osx]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[systems]]></category>
		<category><![CDATA[x86]]></category>

		<guid isPermaLink="false">http://timetobleed.com/?p=2087</guid>
		<description><![CDATA[If you enjoy this article, subscribe (via RSS or e-mail) and follow me on twitter. tl;dr This post is going to explain a serious design flaw of the object system used in MRI/REE/YARV. This flaw causes seemingly random segfaults and other hard to track corruption. One popular incarnation of this bug is the &#8220;rake aborted! [...]]]></description>
			<content:encoded><![CDATA[<p><center><img src="http://timetobleed.com/images/nukez.jpg" alt="" width="400" height="300" /></center><br />
If you enjoy this article, <a rel="alternate" type="application/rss+xml" href="http://feeds.feedburner.com/TimeToBleed">subscribe (via RSS or e-mail)</a> and <a href="http://twitter.com/joedamato">follow me on twitter.</a></p>
<h2>tl;dr</h2>
<p>This post is going to explain a serious design flaw of the object system used in MRI/REE/YARV. This flaw causes seemingly random segfaults and other hard to track corruption. One popular incarnation of this bug is the &#8220;rake aborted! not in gzip format.&#8221;</p>
<h2>theme song</h2>
<p>This blog post was inspired by one of my favorite Papoose verses. If you don&#8217;t listen to this while reading, you probably won&#8217;t understand what I&#8217;m talking about: <a href ="http://www.infinitelooper.com/?v=JMJRfNFfGJw&#038;p=n#/106;210">get in the zone.</a></p>
<h2>rake aborted! not in gzip format<br />
[BUG] Segmentation fault<br />
</h2>
<p>If you&#8217;ve seen either of these error messages you are hitting a <i>fundamental flaw of the object model in MRI/YARV</i>. An example of a fix for <i>a single instance of this bug</i> can be seen in <a href="https://github.com/ruby/ruby/commit/1887f60a8540f64f5c7bb14d57c0be70506941b8">this patch</a>. Let&#8217;s examine this specific patch so that we can gain some understanding of the general case.</p>
<p>FACT: What you are about to read <b>is absolutely not a compiler bug</b>.</p>
<h2>A small, but important piece of background information</h2>
<p>The amd64 ABI<sup>1</sup> states that some registers are caller saved, while others are callee saved. In particular, the register <code>rax</code> is caller saved. The callee will overwrite the value in this register to store its return value for the caller so if the caller cares about what is stored in this register, it must be copied prior to a function call.</p>
<h2>stare into the abyss part 1</h2>
<p>
Let&#8217;s look at the C code for <code>gzfile_read_raw_ensure</code> <b>WITHOUT</b> the fix from above:</p>
<pre class="prettyprint">
#define zstream_append_input2(z,v)\
    zstream_append_input((z), (Bytef*)RSTRING_PTR(v), RSTRING_LEN(v))

static int
gzfile_read_raw_ensure(struct gzfile *gz, int size)
{
    VALUE str;

    while (NIL_P(gz->z.input) || RSTRING_LEN(gz->z.input) < size) {
	str = gzfile_read_raw(gz);
	if (NIL_P(str)) return Qfalse;
	zstream_append_input2(&#038;gz->z, str);
    }
    return Qtrue;
}
</pre>
</p>
<p>
It looks relatively sane at first glance, but to understand this bug we&#8217;ll need to examine the assembly generated for this thing. I&#8217;m going to rearrange the assembly a bit to make it easier to follow and add few comments a long the way.
</p>
<p>First, the code begins by setting the stage:</p>
<pre class="prettyprint">
  push   %rbp
  movslq %esi,%rbp    # sign extend "size" into rbp
  push   %rbx
  mov    %rdi,%rbx    # rbx = gz
  sub    $0x8,%rsp    # make room on the stack for "str"
</pre>
</p>
<p>
The above is pretty basic. It is your typical amd64 prologue. After things are all setup, it is time to enter into the <code>while</code> loop in the C code above:</p>
<pre class="prettyprint">
  jmp    1180 <gzfile_read_raw_ensure+0x20> # JUMP IN to the loop
</pre>
</p>
<p>
Next comes the <code>NIL_P(gz->z.input)</code> portion of the <code>while</code>-loop condition:</p>
<pre class="prettyprint">
  mov    0x18(%rbx),%rax    # rax = gz->z.input
  cmp    $0x4,%rax          # in Ruby, nil is represented as 4.
  je     1190 [gzfile_read_raw_ensure+0x30]  # if gz->z.input is nil, enter the loop
</pre>
</p>
<p>
Now the <code>RSTRING_LEN(gz->z.input) < size</code> portion:</p>
<pre class="prettyprint">
  cmp    %rbp,0x10(%rax)        # compare size and gz->z.input->len
  jge    11b0 [gzfile_read_raw_ensure+0x50]  # jump out of loop
                                             # if  gz->z.input->len is >= size
</pre>
</p>
<p>
Next comes the call to <code>gzfile_read_raw</code> and the <code>NIL_P(str)</code> check. If this check fails, the code just falls through and exits the loop:</p>
<pre class="prettyprint">
 mov    %rbx,%rdi            # rdi = gz, rdi holds the first argument to a function.
 callq  1090 [gzfile_read_raw]  # call gzfile_read_raw
 cmp    $0x4,%rax   # compare return value (%rax) to nil
 jne    1170 [gzfile_read_raw_ensure+0x10] # if it is NOT nil jump to the good stuff
</pre>
</p>
<p>The return value of <code>gzfile_read_raw_ensure</code> (an address of a ruby object) is stored in <code>rax</code>. </p>
<p>
And finally, the good stuff. The call to <code>zstream_append_input</code>:</p>
<pre class="prettyprint">
  mov    0x10(%rax),%rdx # RSTRING_LEN(v) as 3rd arg
  mov    0x18(%rax),%rsi # RSTRING_PTR(v) as 2nd arg
  mov    %rbx,%rdi       # set gz->z as the 1st arg
  callq  10e0 [zstream_append_input]  # let it rip
</pre>
</p>
<p>Note that the arguments to <code>zstream_append_input</code> are moved into registers by offsetting from <code>rax</code> and that when the call to <code>zstream_append</code> occurs, the <b>ruby object returned from <code>gzfile_read_raw_ensure</code> is still stored in rax</b> and not written to it's slot on the stack because the extra write is unnecessary.</p>
<h2>stare into the abyss part 2</h2>
<p>
Aright, so the patch changes the <code>zstream_append_input2</code> macro to this:</p>
<pre class="prettyprint">
#define zstream_append_input2(z,v)\
    RB_GC_GUARD(v),\
    zstream_append_input((z), (Bytef*)RSTRING_PTR(v), RSTRING_LEN(v))
</pre>
</p>
<p>
And, <code>RB_GC_GUARD</code> is <code>define</code>d as:</p>
<pre class="prettyprint">
#define RB_GC_GUARD_PTR(ptr) \
    __extension__ ({volatile VALUE *rb_gc_guarded_ptr = (ptr); rb_gc_guarded_ptr;})

#define RB_GC_GUARD(v) (*RB_GC_GUARD_PTR(&#038;(v)))
</pre>
</p>
<p>
That code is just a hack to mark the memory location holding <code>v</code> with the <code>volatile</code> type qualifier. This tells the compiler that memory backing <code>v</code> acts in ways that the compiler is too stupid to understand, so the compiler must ensure that reads and writes to this location are not optimized out.</p>
<p>A common usage of this qualifier is for memory mapped registers. Reads from memory mapped registers should not be optimized away since a hardware device may update the value stored at that location. The compiler wouldn't know when these updates could happen so it must make sure to re-read the value from this memory location when it is needed. Similarly, writes to memory mapped registers may modify the state of a hardware device and should not be optimized away.</p>
<p>Most of the code generated with the patch applied is the same as without except for a few slight differences before <code>zstream_append_input</code> is called. Let's take a look:</p>
<pre class="prettyprint">
  mov    %rax,-0x18(%rbp)    # write str to the stack
  mov    -0x18(%rbp),%rax    # read the value in str back to rax
  mov    0x10(%rcx),%rdx      # RSTRING_LEN(v)
  mov    0x18(%rcx),%rsi       # RSTRING_PTR(v)
  mov    %rbx,%rdi                # z
  callq  1f60 [_zstream_append_input]
</pre>
<p>
<p><b>The key difference</b> is that the return value of <code>gz_file_read_raw</code> is <i>written back to it's memory location</i> (which, in this case, happens to be on the stack and is called <code>str</code>).</p>
<h2>the bug</h2>
<p>The bug is triggered because:</p>
<ol>
<li>The address of the ruby object str is stored in a caller saved register, <code>rax</code>.</li>
<li>The callee (<code>zstream_append_input</code>) does not save the value of <code>rax</code> (it is not required to) and <code>rax</code> is overwritten in the function, leaving <b>no references</b> to the ruby object returned by <code>gzfile_read_raw</code>.</li>
<li>The callee (<code>zstream_append_input</code>) eventually calls <code>rb_newobj</code>. <code>rb_newobj</code> <i>may</i> trigger a GC run, if there are no available objects on the freelist.</li>
<li>The GC run finds the object returned by <code>gzfile_read_raw</code> but <i>sees no references to it</i> and frees the memory associated with it.</li>
<li>The freed object is used as it were it were valid, and memory corruption occurs causing the VM to explode.</li>
</ol>
<p>The patch prevents this bug from happening because:</p>
<ol>
<li>The address of the ruby object str is stored in a caller saved register, <code>rax</code>.</li>
<li>The <code>volatile</code> type qualifier causes the compiler to generate code which writes the return value back into it's memory location on the stack.</li>
<li>The callee (<code>zstream_append_input</code>) eventually calls <code>rb_newobj</code>. <code>rb_newobj</code> <i>may</i> trigger a GC run, if there are no available objects on the freelist.</li>
<li>The GC run finds the object returned by <code>gzfile_read_raw</code> and <i>finds a reference to it</i> and therefore does not free it.</li>
<li>Everyone is happy.</li>
</ol>
<h2>The general case</h2>
<p>Given valid C code, <code>gcc</code> will generate machine instructions that correctly do what you want. Of course, there are bugs in <code>gcc</code> just like any other piece of software. The problem in this case is not <code>gcc</code>. The problem is that the object and garbage collection implementations in REE/MRI/YARV are not valid C code, so it is not possible for <code>gcc</code> to generate machine instructions that do the right thing. In other words, Ruby's object and GC implementations are breaking their contract with <code>gcc</code>.</p>
<p>The end result is the need for shit like <code>RB_GC_GUARD</code> in REE/MRI/YARV and <b>also in Ruby gems</b> to selectively paper over valid <code>gcc</code> optimizations. Having an API that might cause the Ruby VM to fucking explode unless you proactively mark things with <code>RB_GC_GUARD</code> is not on the path of least resistance toward building a maintainable, safe, and performant system. Very few people out there know that the <code>volatile</code> type qualifier exists, let alone what it does. Essentially, this means that authors of Ruby gems must understand how GC works in the VM to prevent their gems from causing GC to break the universe.</p>
<p>That is fucking beyond stupid.</p>
<h2>How to detect this bug class</h2>
<p>This could be detected by building a simple static analysis tool. You won't catch 100% of cases, and you will definitely have false positives, but it is better than nothing. Something like this should work:</p>
<ol>
<li>Build a call <a href="http://en.wikipedia.org/wiki/Directed_graph">digraph</a> of the VM and/or the set of gems you care about.</li>
<li>Find all <a href="http://en.wikipedia.org/wiki/Path_(graph_theory)">paths</a> leading to the <code>rb_newobj</code> sink.</li>
<li>Find all paths which call <code>rb_newobj</code>, but do not save <code>rax</code> prior to making another function call which is also on a path to <code>rb_newobj</code>.</li>
<li>The functions found are very likely to be causing corruption. A human will need to examine the found cases to weed out false positives and to fix the code.</li>
</ol>
<p>If you have found yourself wondering <i>who the fuck would write such a test?</i> it is important for you to note that <code>rtld</code> in Linux does not save the SSE registers (which are supposed to be caller saved) prior to entering the fixup function, <b>however</b> to ensure that such an optimization does not cause the fucking universe to come crashing down, a test ships with the code to run <code>objdump</code> after building the binary. The <code>objdump</code> output is then grepped for any instructions which might modify the SSE registers. As long as no one touches the SSE registers, there is no need to save and restore them.</p>
<p>If Ruby's object and GC subsystems want to prevent the universe from exploding, it <b>must</b> supply an equivalent test to ensure that corruption is impossible.</p>
</p>
<h2>Conclusion</h2>
<ul>
<li>MRI/YARV/REE are inherently fatally flawed.</li>
<li>I'm never writing another Ruby-related blog post.</li>
<li>I'm not a Ruby programmer.</li>
</ul>
<h2>No comments</h2>
<p>I'm taking a page from the book of <a href="http://twitter.com/coda">coda</a> and disabling comments. If you got something to say, write a blog post.</p>
<p>
If you enjoyed this article, <a rel="alternate" type="application/rss+xml" href="http://feeds.feedburner.com/TimeToBleed">subscribe (via RSS or e-mail)</a> and <a href="http://twitter.com/joedamato">follow me on twitter.</a></p>
<h2>References</h2>
<ol class="footnotes"><li id="footnote_0_2087" class="footnote"> <a href="http://www.x86-64.org/documentation/abi-0.99.pdf">System V Application Binary Interface: AMD64 Architecture Processor Supplement</a> </li></ol>]]></content:encoded>
			<wfw:commentRss>http://timetobleed.com/the-broken-promises-of-mrireeyarv/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Slides from Defcon 18: Function hooking for OSX and Linux</title>
		<link>http://timetobleed.com/slides-from-defcon-18-function-hooking-for-osx-and-linux/</link>
		<comments>http://timetobleed.com/slides-from-defcon-18-function-hooking-for-osx-and-linux/#comments</comments>
		<pubDate>Sun, 01 Aug 2010 18:24:35 +0000</pubDate>
		<dc:creator>Aman Gupta</dc:creator>
				<category><![CDATA[debugging]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[osx]]></category>
		<category><![CDATA[systems]]></category>
		<category><![CDATA[profiling]]></category>

		<guid isPermaLink="false">http://timetobleed.com/?p=1928</guid>
		<description><![CDATA[Video from Def Con 18 Defcon 18: Function hooking for OSX and Linux from Daniel Hückmann on Vimeo. Slides Function hooking for OSX and Linux]]></description>
			<content:encoded><![CDATA[<h2>Video from Def Con 18</h2>
<p><iframe src="http://player.vimeo.com/video/14951625" width="400" height="200" frameborder="0"></iframe>
<p><a href="http://vimeo.com/14951625">Defcon 18: Function hooking for OSX and Linux</a> from <a href="http://vimeo.com/user4726540">Daniel Hückmann</a> on <a href="http://vimeo.com">Vimeo</a>.</p>
<h2>Slides</h2>
<p>
<a title="View Function hooking for OSX and Linux on Scribd" href="http://www.scribd.com/doc/35191054/Function-hooking-for-OSX-and-Linux" style="margin: 12px auto 6px auto; font-family: Helvetica,Arial,Sans-serif; font-style: normal; font-variant: normal; font-weight: normal; font-size: 14px; line-height: normal; font-size-adjust: none; font-stretch: normal; -x-system-font: none; display: block; text-decoration: underline;">Function hooking for OSX and Linux</a> <object id="doc_42930970869868" name="doc_42930970869868" height="500" width="100%" type="application/x-shockwave-flash" data="http://d1.scribdassets.com/ScribdViewer.swf" style="outline:none;" rel="media:presentation" resource="http://d1.scribdassets.com/ScribdViewer.swf?document_id=35191054&#038;access_key=key-1ffxxqbbglaccfa347qr&#038;page=1&#038;viewMode=slideshow" xmlns:media="http://search.yahoo.com/searchmonkey/media/" xmlns:dc="http://purl.org/dc/terms/" ><param name="movie" value="http://d1.scribdassets.com/ScribdViewer.swf"><param name="wmode" value="opaque"><param name="bgcolor" value="#ffffff"><param name="allowFullScreen" value="true"><param name="allowScriptAccess" value="always"><param name="FlashVars" value="document_id=35191054&#038;access_key=key-1ffxxqbbglaccfa347qr&#038;page=1&#038;viewMode=slideshow"><embed id="doc_42930970869868" name="doc_42930970869868" src="http://d1.scribdassets.com/ScribdViewer.swf?document_id=35191054&#038;access_key=key-1ffxxqbbglaccfa347qr&#038;page=1&#038;viewMode=slideshow" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" height="500" width="100%" wmode="opaque" bgcolor="#ffffff"></embed></object> </p>
]]></content:encoded>
			<wfw:commentRss>http://timetobleed.com/slides-from-defcon-18-function-hooking-for-osx-and-linux/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Dynamic symbol table duel: ELF vs Mach-O, round 2</title>
		<link>http://timetobleed.com/dynamic-symbol-table-duel-elf-vs-mach-o-round-2/</link>
		<comments>http://timetobleed.com/dynamic-symbol-table-duel-elf-vs-mach-o-round-2/#comments</comments>
		<pubDate>Tue, 01 Jun 2010 12:59:46 +0000</pubDate>
		<dc:creator>Joe Damato</dc:creator>
				<category><![CDATA[linux]]></category>
		<category><![CDATA[osx]]></category>
		<category><![CDATA[systems]]></category>
		<category><![CDATA[x86]]></category>
		<category><![CDATA[debug]]></category>
		<category><![CDATA[elf]]></category>
		<category><![CDATA[mach-o]]></category>
		<category><![CDATA[x86_64]]></category>

		<guid isPermaLink="false">http://timetobleed.com/?p=1668</guid>
		<description><![CDATA[If you enjoy this article, subscribe (via RSS or e-mail) and follow me on twitter. The intention of this post is to continue highlighting some of the similarities and differences between ELF and Mach-O that I encountered while building memprof. The previous post in this series can be found here. What is a symbol table? [...]]]></description>
			<content:encoded><![CDATA[<p><center><img src="http://timetobleed.com/images/duel.jpg" alt="" width="300" height="400" /></center><br />
If you enjoy this article, <a rel="alternate" type="application/rss+xml" href="http://feeds.feedburner.com/TimeToBleed">subscribe (via RSS or e-mail)</a> and <a href="http://twitter.com/joedamato">follow me on twitter.</a></p>
<p>The intention of this post is to continue highlighting <b>some</b> of the similarities and differences between <code>ELF</code> and <code>Mach-O</code> that I encountered while building <a href="http://github.com/ice799/memprof">memprof</a>. The previous post in this series can be found <a href="http://timetobleed.com/dynamic-linking-elf-vs-mach-o/">here</a>.</p>
<h2>What is a symbol table?</h2>
<p>A <b>symbol table</b> is simply a list of names  in an object. The names in the list may be names of functions, initialized/uninitialized memory regions, or other things depending on the object format. The <b>symbol table</b> does <b>not</b> need to be mapped into a running process and is only useful for debugging. The <b>symbol table</b> (and other sections) may be removed from an object when you use <code>strip</code>.</p>
<h2>Symbol tables in <code>ELF</code> objects</h2>
<p>An entry in the symbol table in an <b>ELF</b> object can best be described by the following <code>struct</code> from <code>/usr/include/elf.h</code>:</p>
<pre class="prettyprint">
typedef struct
{
  Elf64_Word    st_name;                /* Symbol name (string tbl index) */
  unsigned char st_info;                /* Symbol type and binding */
  unsigned char st_other;               /* Symbol visibility */
  Elf64_Section st_shndx;               /* Section index */
  Elf64_Addr    st_value;               /* Symbol value */
  Elf64_Xword   st_size;                /* Symbol size */
} Elf64_Sym;
</pre>
<p></p>
<p>In most cases, this structure is used to find the mapping from a symbol name to the address where it lives. Although, different symbol types (specified by <code>st_info</code>) provide mappings from symbols to other data.</p>
<p>The <code>st_name</code> field is an index into a section called <code>strtab</code> which is just a table of strings.</p>
<h2>Symbol tables in <code>Mach-O</code> objects</h2>
<p>Let&#8217;s take a look at the <code>struct</code> for a symbol table entry in a <b>Mach-O</b> object from <code>/usr/include/mach-o/nlist.h</code>:</p>
<pre class = "prettyprint">
struct nlist_64 {
    union {
        uint32_t  n_strx; /* index into the string table */
    } n_un;
    uint8_t n_type;        /* type flag */
    uint8_t n_sect;        /* section number or NO_SECT */
    uint16_t n_desc;       /* see <mach-o/stab.h> */
    uint64_t n_value;      /* value of this symbol (or stab offset) */
};
</pre>
<p></p>
<p>It looks very similar. The immediately noticeable difference with <code>ELF</code>:
<ul>
<li><b>lack of <code>size</code> field</b> &#8211; The only noticeable difference on your first glance is the lack of a size field. The size field in <b>ELF</b> objects describes the number of bytes occupied by the symbol. This is actually pretty useful, especially for <a href="http://github.com/ice799/memprof">memprof</a>. The <i>lack</i> of this field in <b>Mach-O</b> was a source of frustration for <a href="http://twitter.com/jakedouglas">Jake</a> when he was implementing Mach-O support.
</ul>
</p>
<h2>What is a <i>dynamic</i> symbol table?</h2>
<p>Shared objects in both <code>Mach-O</code> and <code>ELF</code> have a symbol table listing <i>only</i> functions that are exporteed by the object.</p>
<p>This table is used during dynamic linking and is mapped into the process&#8217; address space when the object is loaded, unlike the symbol table which is just used for debugging.</p>
<p>The <b>dynamic symbol table</b> is a <i>subset</i> of the <b>symbol table</b>. </p>
<h2>Dynamic symbol table in ELF objects</h2>
<p>The dynamic symbol table in ELF objects is stored in a section named <code>dynsym</code>. The indexes stored in the <code>st_name</code> field (from the structure listed above) are indexes into the string table in a section named <code>dynstr</code>. <code>dynstr</code> is a string table specifically for entries in the dynamic symbol table.</p>
<p>If you know the symbol you care about, you can simply calculate a hash of the symbol name to find the symbol table entry for that symbol. Unfortunately, there is not very much documentation about the hash function that is to be used.</p>
<p>Your two options are:
<ul>
<li>You&#8217;ll need to either read the source for <a href="http://www.gnu.org/software/binutils/">binutils</a>,</li>
<li>check out a useful post on a <a href="http://sourceware.org/ml/binutils/2006-10/msg00377.html">mailing list</a>. </li>
</ul>
<p>The sections storing the hash table data for an object are called <code>.hash</code> and <code>.gnu.hash</code>.</p>
<h2>Dynamic symbol table in Mach-O objects</h2>
<p>Finding the dynamic symbol table in a Mach-O object is a bit complicated. The pieces to the puzzle are found across different structures and the documentation on how it all works is sparse.</p>
<p><code>Mach-O</code> objects have a load command called <code>LC_DYSYMTAB</code> which describes information about the dynamic symbol table in <code>Mach-O</code> objects.</p>
<p>I&#8217;ve shortened the structure definition, as it is quite large and contains documentation about stuff that is not directly relevant to this post. From <code>/usr/include/mach-o/loader.h</code>:</p>
<pre class="prettyprint">
struct dysymtab_command {
    uint32_t cmd; /* LC_DYSYMTAB */
    uint32_t cmdsize; /* sizeof(struct dysymtab_command) */

    /* .... */

    /*
     * The sections that contain "symbol pointers" and "routine stubs" have
     * indexes and (implied counts based on the size of the section and fixed
     * size of the entry) into the "indirect symbol" table for each pointer
     * and stub.  For every section of these two types the index into the
     * indirect symbol table is stored in the section header in the field
     * reserved1.  An indirect symbol table entry is simply a 32bit index into
     * the symbol table to the symbol that the pointer or stub is referring to.
     * The indirect symbol table is ordered to match the entries in the section.
     */
    uint32_t indirectsymoff; /* file offset to the indirect symbol table */
    uint32_t nindirectsyms;  /* number of indirect symbol table entries */

    /* .... */
};
</pre>
<p></p>
<p>The <code>LC_DYSYMTAB</code> load command provides the fields <code>indirectsymoff</code> and <code>nindirectsyms</code> which describe the offset into the file where the indirect symbol tables lives and the number of entries in the table, respectively.</p>
<p>The dynamic symbol table in <code>Mach-O</code> is surprisingly simple. Each entry in the table is just a 32bit index into the symbol table. The dynamic symbol table is just a list of indexes and nothing else. </p>
<p>It turns out there are a few more pieces to the puzzle.</p>
<p>Take a look at the definition for a <code>Mach-O</code> section:</p>
<pre class="prettyprint">
struct section_64 { /* for 64-bit architectures */
  char    sectname[16]; /* name of this section */
  char    segname[16];  /* segment this section goes in */
  uint64_t  addr;   /* memory address of this section */
  uint64_t  size;   /* size in bytes of this section */
  uint32_t  offset;   /* file offset of this section */
  uint32_t  align;    /* section alignment (power of 2) */
  uint32_t  reloff;   /* file offset of relocation entries */
  uint32_t  nreloc;   /* number of relocation entries */
  uint32_t  flags;    /* flags (section type and attributes)*/
  uint32_t  reserved1;  /* reserved (for offset or index) */
  uint32_t  reserved2;  /* reserved (for count or sizeof) */
  uint32_t  reserved3;  /* reserved */
};
</pre>
</p>
<p>It turns out that the fields <code>reserved1</code> and <code>reserved2</code> are useful too.</p>
<p>If a section_64 structure is describing a <code>symbol_stub</code> or <code>__la_symbol_ptr</code> sections (read the <a href="http://timetobleed.com/dynamic-linking-elf-vs-mach-o/">previous post</a> to learn about these sections), then the <code>reserved1</code> field hold the <i>index into the dynamic symbol table</i> for the sections entries in the table.</p>
<p><code>symbol_stub</code> sections also make use of the <code>reserved2</code> field; the size of a single stub entry is stored in <code>reserved2</code> otherwise, the field is set to 0.</p>
<h2>Two notable differences between the dynamic symbol tables</h2>
<ul>
<li>There is an explicit section in <code>ELF</code> that contains <code>Elf64_Sym</code> entries. On <code>Mach-O</code> it&#8217;s just a list of 32bit offsets.</li>
<li><code>ELF</code> provides a <code>.hash</code> section and/or <code>.gnu_hash</code> section to speed up symbol lookup. <code>Mach-O</code> does not.</li>
</ul>
<h2>What happens when you run <code>strip</code>?</h2>
<p>Let&#8217;s use <code>strip</code> with no options (other than the filename).</p>
<p>On <code>ELF</code>:</p>
<ul>
<li>All <code>.debug_*</code> sections are removed. These sections contain extra debugging information that helps debuggers figure out more precisely what went wrong.</li>
<li><code>.symtab</code> section is removed.</li>
<li><code>.strtab</code> section is removed.</li>
</ul>
<p>On <code>Mach-O</code>:</p>
<ul>
<li>Only undefined symbols and dynamic symbols are left in the symbol table. Everything else is removed.</li>
</ul>
<h2>How to <code>strip</code> so I can debug later (linux only)</h2>
<p>If you decide to <code>strip</code> your binary please be considerate to future hackers who may need to debug your app for some reason.</p>
<p>You can be considerate by following the directions in <code>strip(1)</code>:</p>
<blockquote><p>
           1. Link the executable as normal.  Assuming that is is called<br />
               &#8220;foo&#8221; then&#8230;</p>
<p>           2. Run &#8220;objcopy &#8211;only-keep-debug foo foo.dbg&#8221; to<br />
               create a file containing the debugging info.</p>
<p>           3. Run &#8220;objcopy &#8211;strip-debug foo&#8221; to create a<br />
               stripped executable.</p>
<p>           4. Run &#8220;objcopy &#8211;add-gnu-debuglink=foo.dbg foo&#8221;<br />
               to add a link to the debugging info into the stripped executable.
</p></blockquote>
<p>And don&#8217;t forget to put your debugging information somewhere easily accessible and googleable.</p>
<p>If you do this: <b>you are cool</b>. If you don&#8217;t&#8230;</p>
<h2>Conclusion</h2>
<ol>
<li>I like the way ELF does dynamic symbol tables, the <code>gnu_debuglink</code> section, and the lookup hash table for dynamic symbols. All of these pieces are really useful and I am glad they exist.</li>
<li>The indirect symbol table was a bit of a pain to track down on <code>Mach-O</code> as the information is hard to parse on the first pass. To be fair, it is all there if you google around a bit and put the pieces together.</li>
<li>On Linux, if you strip, please add a <code>gnu_debuglink</code> section and put the debug information somewhere I can find it.</li>
</ol>
<p>
Thanks for reading and don&#8217;t forget to <a rel="alternate" type="application/rss+xml" href="http://feeds.feedburner.com/TimeToBleed">subscribe (via RSS or e-mail)</a> and <a href="http://twitter.com/joedamato">follow me on twitter.</a></p>
<h2>References</h2>
]]></content:encoded>
			<wfw:commentRss>http://timetobleed.com/dynamic-symbol-table-duel-elf-vs-mach-o-round-2/feed/</wfw:commentRss>
		<slash:comments>27</slash:comments>
		</item>
		<item>
		<title>Dynamic Linking: ELF vs. Mach-O</title>
		<link>http://timetobleed.com/dynamic-linking-elf-vs-mach-o/</link>
		<comments>http://timetobleed.com/dynamic-linking-elf-vs-mach-o/#comments</comments>
		<pubDate>Wed, 12 May 2010 14:00:09 +0000</pubDate>
		<dc:creator>Joe Damato</dc:creator>
				<category><![CDATA[debugging]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[osx]]></category>
		<category><![CDATA[systems]]></category>
		<category><![CDATA[x86]]></category>
		<category><![CDATA[dynamic linking]]></category>
		<category><![CDATA[elf]]></category>
		<category><![CDATA[mach-o]]></category>
		<category><![CDATA[x86_64]]></category>

		<guid isPermaLink="false">http://timetobleed.com/?p=1613</guid>
		<description><![CDATA[If you enjoy this article, subscribe (via RSS or e-mail) and follow me on twitter. The intention of this post is to highlight some of the similarities and differences between ELF and Mach-O dynamic linking that I encountered while building memprof. I hope to write more posts about similarities and differences in other aspects of [...]]]></description>
			<content:encoded><![CDATA[<p><center><img src="http://timetobleed.com/images/linking.jpg" alt="" width="400" height="300" /></center><br />
If you enjoy this article, <a rel="alternate" type="application/rss+xml" href="http://feeds.feedburner.com/TimeToBleed">subscribe (via RSS or e-mail)</a> and <a href="http://twitter.com/joedamato">follow me on twitter.</a></p>
<p>The intention of this post is to highlight <b>some</b> of the similarities and differences between <code>ELF</code> and <code>Mach-O</code> dynamic linking that I encountered while building <a href="http://github.com/ice799/memprof">memprof</a>.</p>
<p> I hope to write <b>more posts about similarities and differences in other aspects of Mach-O and ELF</b> that I stumbled across to shed some light on what goes on down there and provide (in some cases) the only documentation.</p>
<h2>Procedure Linkage Table</h2>
<p>The procedure linkage table (PLT) is used to determine the absolute address of a function at runtime. Both Mach-O and ELF objects have PLTs that are generated at compile time. The initial table simply invokes the dynamic linker which finds the symbol you want. The way this works is very similar at a high level in ELF and Mach-O, but there are some implementation differences that I thought were worth mentioning.</p>
<h2>Mach-O PLT arrangement</h2>
<p>Mach-O objects have several different sections across different <i>segments</i> that are all involved to create a PLT entry for a specific symbol.</p>
<p>Consider the following assembly stub which calls out to the PLT entry for <code>malloc</code>:</p>
<pre class="prettyprint">
# MACH-O calling a PLT entry (ELF is nearly identical)
0x000000010008c504 [str_new+52]:	callq  0x10009ebbc [dyld_stub_malloc]
</pre>
<p>
<p>The <code>dyld_stub</code> prefix is added by GDB to let the user know that the <code>callq</code> instruction is calling a PLT entry and not <code>malloc</code> itself. The address <code>0x10009ebbc</code> is the first instruction of <code>malloc</code>&#8216;s PLT entry in this Mach-O object. In Mach-O terminology, the instruction at <code>0x10009ebbc</code> is called a <b>symbol stub</b>. Symbol stubs in Mach-O objects are found in the <code>__TEXT</code> segment in the <code>__symbol_stub1</code> section.</p>
<p>Let&#8217;s examine some instructions at the symbol stub address above:</p>
<pre class="prettyprint">
# MACH-O "symbol stubs" for malloc and other functions
0x10009ebbc [dyld_stub_malloc]:	  jmpq   *0x3ae46(%rip)        # 0x1000d9a08
0x10009ebc2 [dyld_stub_realloc]:  jmpq   *0x3ae48(%rip)        # 0x1000d9a10
0x10009ebc8 [dyld_stub_seekdir$INODE64]:	jmpq   *0x3ae4c(%rip)  # 0x1000d9a20
. . . .
</pre>
<p></p>
<p>Each Mach-O <b>symbol stub</b> is just a single <code>jmpq</code> instruction. That <code>jmpq</code> instruction either:</p>
<ul>
<li>Invokes the dynamic linker to find the symbol and transfer execution there</li>
<p><b><u>OR</u></b></p>
<li>Transfers execution directly to the function.</li>
</ul>
<p><i>via</i> an entry in a table. </p>
<p>In the example above, GDB is telling us that the address of the table entry for <code>malloc</code> is <code>0x1000d9a08</code>. This table entry is stored in a section called the <code>__la_symbol_ptr</code> within the <code>__DATA</code> segment.</p>
<p>Before malloc has been resolved, the address in that table entry points to a helper function which (eventually) invokes the dynamic linker to find <code>malloc</code> and fill in its address in the table entry.</p>
<p>Let&#8217;s take a look at what a few entries of the helper functions look like:</p>
<pre class="prettyprint">
# MACH-O stub helpers
0x1000a08d4 [stub helpers+6986]:	pushq  $0x3b73
0x1000a08d9 [stub helpers+6991]:	jmpq   0x10009ed8a [stub helpers]
0x1000a08de [stub helpers+6996]:	pushq  $0x3b88
0x1000a08e3 [stub helpers+7001]:	jmpq   0x10009ed8a [stub helpers]
0x1000a08e8 [stub helpers+7006]:	pushq  $0x3b9e
0x1000a08ed [stub helpers+7011]:	jmpq   0x10009ed8a [stub helpers]
. . . .
</pre>
</p>
<p>Each symbol that has a PLT entry has 2 instructions above; a pair of <code>pushq</code> and <code>jmpq</code>. This instruction sequence sets an ID for the desired function and then invokes the dynamic linker. The dynamic linker looks up this ID so it knows which function it should be looking for.</p>
<h2>ELF PLT arrangement</h2>
<p>ELF objects have the same mechanism, but organize each PLT entry into chunks instead of splicing them out across different sections. Let&#8217;s take a look at a PLT entry for malloc in an ELF object:</p>
<pre class="prettyprint">
# ELF complete PLT entry for malloc
0x40f3d0 [malloc@plt]:	jmpq   *0x2c91fa(%rip)        # 0x6d85d0
0x40f3d6 [malloc@plt+6]:	pushq  $0x2f
0x40f3db [malloc@plt+11]:	jmpq   0x40f0d0
. . . .
</pre>
<p></p>
<p>Much like a Mach-O object, an ELF object uses a table entry to direct the flow of execution to either invoke the dynamic linker or transfer directly to the desired function if it has already been resolved.</p>
<p>Two differences to point out here: </p>
<ol>
<li>ELF puts the entire PLT entry together in nicely named section called <code>plt</code> instead of splicing it out across multiple sections.</li>
<li>The table entries indirected through with the initial <code>jmpq</code> instruction are stored in a section named: <code>.got.plt</code>.</li>
</ol>
<h2>Both invoke an assembly trampoline&#8230;</h2>
<p>Both Mach-O and ELF objects are set up to invoke the runtime dynamic linker. Both need an assembly trampoline to bridge the gap between the application and the linker. On 64bit Intel based systems, linkers in both systems must comply to the same Application Binary Interace (ABI).</p>
<p><b>Strangely enough</b>, the two linkers <b>have slightly different assembly trampolines even though they share the same calling convention<sup>1</sup>  <sup>2</sup>.</b></p>
<p>Both trampolines ensure that the program stack is 16-byte aligned to comply with the amd64 ABI&#8217;s calling convention. Both trampolines also take care to save the &#8220;general purpose&#8221; caller-saved registers prior to invoking the dynamic link, but it turns out that the trampoline in Linux <b>does not save or restore the SSE registers.</b> It turns out that this &#8220;shouldn&#8217;t&#8221; matter, so long as glibc takes care not to use any of those registers in the dynamic linker. OSX takes a more conservative approach and saves and restores the SSE registers before and after calling out the dynamic linker.</p>
<p>I&#8217;ve included a snippet from the two trampolines below and some comments so you can see the differences up close.</p>
<h2>Different trampolines for the same ABI</h2>
<p>The OSX trampoline:</p>
<pre class="prettyprint">
dyld_stub_binder:
  pushq   %rbp
  movq    %rsp,%rbp
  subq    $STACK_SIZE,%rsp  # at this point stack is 16-byte aligned because two meta-parameters where pushed
  movq    %rdi,RDI_SAVE(%rsp) # save registers that might be used as parameters
  movq    %rsi,RSI_SAVE(%rsp)
  movq    %rdx,RDX_SAVE(%rsp)
  movq    %rcx,RCX_SAVE(%rsp)
  movq    %r8,R8_SAVE(%rsp)
  movq    %r9,R9_SAVE(%rsp)
  movq    %rax,RAX_SAVE(%rsp)
  movdqa    %xmm0,XMMM0_SAVE(%rsp)
  movdqa    %xmm1,XMMM1_SAVE(%rsp)
  movdqa    %xmm2,XMMM2_SAVE(%rsp)
  movdqa    %xmm3,XMMM3_SAVE(%rsp)
  movdqa    %xmm4,XMMM4_SAVE(%rsp)
  movdqa    %xmm5,XMMM5_SAVE(%rsp)
  movdqa    %xmm6,XMMM6_SAVE(%rsp)
  movdqa    %xmm7,XMMM7_SAVE(%rsp)
  movq    MH_PARAM_BP(%rbp),%rdi  # call fastBindLazySymbol(loadercache, lazyinfo)
  movq    LP_PARAM_BP(%rbp),%rsi
  call    __Z21_dyld_fast_stub_entryPvl
</pre>
</p>
<p>The OSX trampoline saves all the caller saved registers <b>as well as</b> the the <code>%xmm0 - %xmm7</code> registers prior to invoking the dynamic linker with that last call instruction. These registers are all restored after the call instruction, but I left that out for the sake of brevity.</p>
<p>The Linux trampoline:</p>
<pre class="prettyprint">
  subq $56,%rsp
  cfi_adjust_cfa_offset(72) # Incorporate PLT
  movq %rax,(%rsp)  # Preserve registers otherwise clobbered.
  movq %rcx, 8(%rsp)
  movq %rdx, 16(%rsp)
  movq %rsi, 24(%rsp)
  movq %rdi, 32(%rsp)
  movq %r8, 40(%rsp)
  movq %r9, 48(%rsp)
  movq 64(%rsp), %rsi # Copy args pushed by PLT in register.
  movq %rsi, %r11   # Multiply by 24
  addq %r11, %rsi
  addq %r11, %rsi
  shlq $3, %rsi
  movq 56(%rsp), %rdi # %rdi: link_map, %rsi: reloc_offset
  call _dl_fixup    # Call resolver.
</pre>
</p>
<p>The Linux trampoline doesn&#8217;t touch the SSE registers because it assumes that the dynamic linker will not modify them thus avoiding a save and restore.</p>
<h2>Conclusion</h2>
<ul>
<li>Tracing program execution from call site to the dynamic linker is pretty interesting and there is a lot to learn along the way.</li>
<li>glibc not saving and restoring <code>%xmm0-%xmm7</code> kind of scares me, but there is a unit test included that disassembles the built ld.so searching it to make sure that those registers are never touched. It is still a bit frightening.</li>
<li>Stay tuned for more posts explaining other interesting similarities and differences between Mach-O and ELF coming soon.</li>
</ul>
<p>Thanks for reading and don&#8217;t forget to <a rel="alternate" type="application/rss+xml" href="http://feeds.feedburner.com/TimeToBleed">subscribe (via RSS or e-mail)</a> and <a href="http://twitter.com/joedamato">follow me on twitter.</a></p>
<h2>References</h2>
<ol class="footnotes"><li id="footnote_0_1613" class="footnote"><a href="http://developer.apple.com/mac/library/documentation/DeveloperTools/Conceptual/LowLevelABI/140-x86-64_Function_Calling_Conventions/x86_64.html#//apple_ref/doc/uid/TP40005035-SW1">http://developer.apple.com/mac/library/documentation/DeveloperTools/Conceptual/LowLevelABI/140-x86-64_Function_Calling_Conventions/x86_64.html#//apple_ref/doc/uid/TP40005035-SW1</a></li><li id="footnote_1_1613" class="footnote"><a href="http://www.x86-64.org/documentation/abi.pdf">http://www.x86-64.org/documentation/abi.pdf</a></li></ol>]]></content:encoded>
			<wfw:commentRss>http://timetobleed.com/dynamic-linking-elf-vs-mach-o/feed/</wfw:commentRss>
		<slash:comments>28</slash:comments>
		</item>
	</channel>
</rss>

