<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>time to bleed by Joe Damato &#187; debugging</title>
	<atom:link href="http://timetobleed.com/category/debugging/feed/" rel="self" type="application/rss+xml" />
	<link>http://timetobleed.com</link>
	<description>technical ramblings from a wanna-be unix dinosaur</description>
	<lastBuildDate>Tue, 05 Jul 2011 13:00:09 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>The Broken Promises of MRI/REE/YARV</title>
		<link>http://timetobleed.com/the-broken-promises-of-mrireeyarv/</link>
		<comments>http://timetobleed.com/the-broken-promises-of-mrireeyarv/#comments</comments>
		<pubDate>Tue, 05 Jul 2011 13:00:09 +0000</pubDate>
		<dc:creator>Joe Damato</dc:creator>
				<category><![CDATA[bugfix]]></category>
		<category><![CDATA[debugging]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[osx]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[systems]]></category>
		<category><![CDATA[x86]]></category>

		<guid isPermaLink="false">http://timetobleed.com/?p=2087</guid>
		<description><![CDATA[If you enjoy this article, subscribe (via RSS or e-mail) and follow me on twitter. tl;dr This post is going to explain a serious design flaw of the object system used in MRI/REE/YARV. This flaw causes seemingly random segfaults and other hard to track corruption. One popular incarnation of this bug is the &#8220;rake aborted! [...]]]></description>
			<content:encoded><![CDATA[<p><center><img src="http://timetobleed.com/images/nukez.jpg" alt="" width="400" height="300" /></center><br />
If you enjoy this article, <a rel="alternate" type="application/rss+xml" href="http://feeds.feedburner.com/TimeToBleed">subscribe (via RSS or e-mail)</a> and <a href="http://twitter.com/joedamato">follow me on twitter.</a></p>
<h2>tl;dr</h2>
<p>This post is going to explain a serious design flaw of the object system used in MRI/REE/YARV. This flaw causes seemingly random segfaults and other hard to track corruption. One popular incarnation of this bug is the &#8220;rake aborted! not in gzip format.&#8221;</p>
<h2>theme song</h2>
<p>This blog post was inspired by one of my favorite Papoose verses. If you don&#8217;t listen to this while reading, you probably won&#8217;t understand what I&#8217;m talking about: <a href ="http://www.infinitelooper.com/?v=JMJRfNFfGJw&#038;p=n#/106;210">get in the zone.</a></p>
<h2>rake aborted! not in gzip format<br />
[BUG] Segmentation fault<br />
</h2>
<p>If you&#8217;ve seen either of these error messages you are hitting a <i>fundamental flaw of the object model in MRI/YARV</i>. An example of a fix for <i>a single instance of this bug</i> can be seen in <a href="https://github.com/ruby/ruby/commit/1887f60a8540f64f5c7bb14d57c0be70506941b8">this patch</a>. Let&#8217;s examine this specific patch so that we can gain some understanding of the general case.</p>
<p>FACT: What you are about to read <b>is absolutely not a compiler bug</b>.</p>
<h2>A small, but important piece of background information</h2>
<p>The amd64 ABI<sup>1</sup> states that some registers are caller saved, while others are callee saved. In particular, the register <code>rax</code> is caller saved. The callee will overwrite the value in this register to store its return value for the caller so if the caller cares about what is stored in this register, it must be copied prior to a function call.</p>
<h2>stare into the abyss part 1</h2>
<p>
Let&#8217;s look at the C code for <code>gzfile_read_raw_ensure</code> <b>WITHOUT</b> the fix from above:</p>
<pre class="prettyprint">
#define zstream_append_input2(z,v)\
    zstream_append_input((z), (Bytef*)RSTRING_PTR(v), RSTRING_LEN(v))

static int
gzfile_read_raw_ensure(struct gzfile *gz, int size)
{
    VALUE str;

    while (NIL_P(gz->z.input) || RSTRING_LEN(gz->z.input) < size) {
	str = gzfile_read_raw(gz);
	if (NIL_P(str)) return Qfalse;
	zstream_append_input2(&#038;gz->z, str);
    }
    return Qtrue;
}
</pre>
</p>
<p>
It looks relatively sane at first glance, but to understand this bug we&#8217;ll need to examine the assembly generated for this thing. I&#8217;m going to rearrange the assembly a bit to make it easier to follow and add few comments a long the way.
</p>
<p>First, the code begins by setting the stage:</p>
<pre class="prettyprint">
  push   %rbp
  movslq %esi,%rbp    # sign extend "size" into rbp
  push   %rbx
  mov    %rdi,%rbx    # rbx = gz
  sub    $0x8,%rsp    # make room on the stack for "str"
</pre>
</p>
<p>
The above is pretty basic. It is your typical amd64 prologue. After things are all setup, it is time to enter into the <code>while</code> loop in the C code above:</p>
<pre class="prettyprint">
  jmp    1180 <gzfile_read_raw_ensure+0x20> # JUMP IN to the loop
</pre>
</p>
<p>
Next comes the <code>NIL_P(gz->z.input)</code> portion of the <code>while</code>-loop condition:</p>
<pre class="prettyprint">
  mov    0x18(%rbx),%rax    # rax = gz->z.input
  cmp    $0x4,%rax          # in Ruby, nil is represented as 4.
  je     1190 [gzfile_read_raw_ensure+0x30]  # if gz->z.input is nil, enter the loop
</pre>
</p>
<p>
Now the <code>RSTRING_LEN(gz->z.input) < size</code> portion:</p>
<pre class="prettyprint">
  cmp    %rbp,0x10(%rax)        # compare size and gz->z.input->len
  jge    11b0 [gzfile_read_raw_ensure+0x50]  # jump out of loop
                                             # if  gz->z.input->len is >= size
</pre>
</p>
<p>
Next comes the call to <code>gzfile_read_raw</code> and the <code>NIL_P(str)</code> check. If this check fails, the code just falls through and exits the loop:</p>
<pre class="prettyprint">
 mov    %rbx,%rdi            # rdi = gz, rdi holds the first argument to a function.
 callq  1090 [gzfile_read_raw]  # call gzfile_read_raw
 cmp    $0x4,%rax   # compare return value (%rax) to nil
 jne    1170 [gzfile_read_raw_ensure+0x10] # if it is NOT nil jump to the good stuff
</pre>
</p>
<p>The return value of <code>gzfile_read_raw_ensure</code> (an address of a ruby object) is stored in <code>rax</code>. </p>
<p>
And finally, the good stuff. The call to <code>zstream_append_input</code>:</p>
<pre class="prettyprint">
  mov    0x10(%rax),%rdx # RSTRING_LEN(v) as 3rd arg
  mov    0x18(%rax),%rsi # RSTRING_PTR(v) as 2nd arg
  mov    %rbx,%rdi       # set gz->z as the 1st arg
  callq  10e0 [zstream_append_input]  # let it rip
</pre>
</p>
<p>Note that the arguments to <code>zstream_append_input</code> are moved into registers by offsetting from <code>rax</code> and that when the call to <code>zstream_append</code> occurs, the <b>ruby object returned from <code>gzfile_read_raw_ensure</code> is still stored in rax</b> and not written to it's slot on the stack because the extra write is unnecessary.</p>
<h2>stare into the abyss part 2</h2>
<p>
Aright, so the patch changes the <code>zstream_append_input2</code> macro to this:</p>
<pre class="prettyprint">
#define zstream_append_input2(z,v)\
    RB_GC_GUARD(v),\
    zstream_append_input((z), (Bytef*)RSTRING_PTR(v), RSTRING_LEN(v))
</pre>
</p>
<p>
And, <code>RB_GC_GUARD</code> is <code>define</code>d as:</p>
<pre class="prettyprint">
#define RB_GC_GUARD_PTR(ptr) \
    __extension__ ({volatile VALUE *rb_gc_guarded_ptr = (ptr); rb_gc_guarded_ptr;})

#define RB_GC_GUARD(v) (*RB_GC_GUARD_PTR(&#038;(v)))
</pre>
</p>
<p>
That code is just a hack to mark the memory location holding <code>v</code> with the <code>volatile</code> type qualifier. This tells the compiler that memory backing <code>v</code> acts in ways that the compiler is too stupid to understand, so the compiler must ensure that reads and writes to this location are not optimized out.</p>
<p>A common usage of this qualifier is for memory mapped registers. Reads from memory mapped registers should not be optimized away since a hardware device may update the value stored at that location. The compiler wouldn't know when these updates could happen so it must make sure to re-read the value from this memory location when it is needed. Similarly, writes to memory mapped registers may modify the state of a hardware device and should not be optimized away.</p>
<p>Most of the code generated with the patch applied is the same as without except for a few slight differences before <code>zstream_append_input</code> is called. Let's take a look:</p>
<pre class="prettyprint">
  mov    %rax,-0x18(%rbp)    # write str to the stack
  mov    -0x18(%rbp),%rax    # read the value in str back to rax
  mov    0x10(%rcx),%rdx      # RSTRING_LEN(v)
  mov    0x18(%rcx),%rsi       # RSTRING_PTR(v)
  mov    %rbx,%rdi                # z
  callq  1f60 [_zstream_append_input]
</pre>
<p>
<p><b>The key difference</b> is that the return value of <code>gz_file_read_raw</code> is <i>written back to it's memory location</i> (which, in this case, happens to be on the stack and is called <code>str</code>).</p>
<h2>the bug</h2>
<p>The bug is triggered because:</p>
<ol>
<li>The address of the ruby object str is stored in a caller saved register, <code>rax</code>.</li>
<li>The callee (<code>zstream_append_input</code>) does not save the value of <code>rax</code> (it is not required to) and <code>rax</code> is overwritten in the function, leaving <b>no references</b> to the ruby object returned by <code>gzfile_read_raw</code>.</li>
<li>The callee (<code>zstream_append_input</code>) eventually calls <code>rb_newobj</code>. <code>rb_newobj</code> <i>may</i> trigger a GC run, if there are no available objects on the freelist.</li>
<li>The GC run finds the object returned by <code>gzfile_read_raw</code> but <i>sees no references to it</i> and frees the memory associated with it.</li>
<li>The freed object is used as it were it were valid, and memory corruption occurs causing the VM to explode.</li>
</ol>
<p>The patch prevents this bug from happening because:</p>
<ol>
<li>The address of the ruby object str is stored in a caller saved register, <code>rax</code>.</li>
<li>The <code>volatile</code> type qualifier causes the compiler to generate code which writes the return value back into it's memory location on the stack.</li>
<li>The callee (<code>zstream_append_input</code>) eventually calls <code>rb_newobj</code>. <code>rb_newobj</code> <i>may</i> trigger a GC run, if there are no available objects on the freelist.</li>
<li>The GC run finds the object returned by <code>gzfile_read_raw</code> and <i>finds a reference to it</i> and therefore does not free it.</li>
<li>Everyone is happy.</li>
</ol>
<h2>The general case</h2>
<p>Given valid C code, <code>gcc</code> will generate machine instructions that correctly do what you want. Of course, there are bugs in <code>gcc</code> just like any other piece of software. The problem in this case is not <code>gcc</code>. The problem is that the object and garbage collection implementations in REE/MRI/YARV are not valid C code, so it is not possible for <code>gcc</code> to generate machine instructions that do the right thing. In other words, Ruby's object and GC implementations are breaking their contract with <code>gcc</code>.</p>
<p>The end result is the need for shit like <code>RB_GC_GUARD</code> in REE/MRI/YARV and <b>also in Ruby gems</b> to selectively paper over valid <code>gcc</code> optimizations. Having an API that might cause the Ruby VM to fucking explode unless you proactively mark things with <code>RB_GC_GUARD</code> is not on the path of least resistance toward building a maintainable, safe, and performant system. Very few people out there know that the <code>volatile</code> type qualifier exists, let alone what it does. Essentially, this means that authors of Ruby gems must understand how GC works in the VM to prevent their gems from causing GC to break the universe.</p>
<p>That is fucking beyond stupid.</p>
<h2>How to detect this bug class</h2>
<p>This could be detected by building a simple static analysis tool. You won't catch 100% of cases, and you will definitely have false positives, but it is better than nothing. Something like this should work:</p>
<ol>
<li>Build a call <a href="http://en.wikipedia.org/wiki/Directed_graph">digraph</a> of the VM and/or the set of gems you care about.</li>
<li>Find all <a href="http://en.wikipedia.org/wiki/Path_(graph_theory)">paths</a> leading to the <code>rb_newobj</code> sink.</li>
<li>Find all paths which call <code>rb_newobj</code>, but do not save <code>rax</code> prior to making another function call which is also on a path to <code>rb_newobj</code>.</li>
<li>The functions found are very likely to be causing corruption. A human will need to examine the found cases to weed out false positives and to fix the code.</li>
</ol>
<p>If you have found yourself wondering <i>who the fuck would write such a test?</i> it is important for you to note that <code>rtld</code> in Linux does not save the SSE registers (which are supposed to be caller saved) prior to entering the fixup function, <b>however</b> to ensure that such an optimization does not cause the fucking universe to come crashing down, a test ships with the code to run <code>objdump</code> after building the binary. The <code>objdump</code> output is then grepped for any instructions which might modify the SSE registers. As long as no one touches the SSE registers, there is no need to save and restore them.</p>
<p>If Ruby's object and GC subsystems want to prevent the universe from exploding, it <b>must</b> supply an equivalent test to ensure that corruption is impossible.</p>
</p>
<h2>Conclusion</h2>
<ul>
<li>MRI/YARV/REE are inherently fatally flawed.</li>
<li>I'm never writing another Ruby-related blog post.</li>
<li>I'm not a Ruby programmer.</li>
</ul>
<h2>No comments</h2>
<p>I'm taking a page from the book of <a href="http://twitter.com/coda">coda</a> and disabling comments. If you got something to say, write a blog post.</p>
<p>
If you enjoyed this article, <a rel="alternate" type="application/rss+xml" href="http://feeds.feedburner.com/TimeToBleed">subscribe (via RSS or e-mail)</a> and <a href="http://twitter.com/joedamato">follow me on twitter.</a></p>
<h2>References</h2>
<ol class="footnotes"><li id="footnote_0_2087" class="footnote"> <a href="http://www.x86-64.org/documentation/abi-0.99.pdf">System V Application Binary Interface: AMD64 Architecture Processor Supplement</a> </li></ol>]]></content:encoded>
			<wfw:commentRss>http://timetobleed.com/the-broken-promises-of-mrireeyarv/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Slides from Defcon 18: Function hooking for OSX and Linux</title>
		<link>http://timetobleed.com/slides-from-defcon-18-function-hooking-for-osx-and-linux/</link>
		<comments>http://timetobleed.com/slides-from-defcon-18-function-hooking-for-osx-and-linux/#comments</comments>
		<pubDate>Sun, 01 Aug 2010 18:24:35 +0000</pubDate>
		<dc:creator>Aman Gupta</dc:creator>
				<category><![CDATA[debugging]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[osx]]></category>
		<category><![CDATA[systems]]></category>
		<category><![CDATA[profiling]]></category>

		<guid isPermaLink="false">http://timetobleed.com/?p=1928</guid>
		<description><![CDATA[Video from Def Con 18 Defcon 18: Function hooking for OSX and Linux from Daniel Hückmann on Vimeo. Slides Function hooking for OSX and Linux]]></description>
			<content:encoded><![CDATA[<h2>Video from Def Con 18</h2>
<p><iframe src="http://player.vimeo.com/video/14951625" width="400" height="200" frameborder="0"></iframe>
<p><a href="http://vimeo.com/14951625">Defcon 18: Function hooking for OSX and Linux</a> from <a href="http://vimeo.com/user4726540">Daniel Hückmann</a> on <a href="http://vimeo.com">Vimeo</a>.</p>
<h2>Slides</h2>
<p>
<a title="View Function hooking for OSX and Linux on Scribd" href="http://www.scribd.com/doc/35191054/Function-hooking-for-OSX-and-Linux" style="margin: 12px auto 6px auto; font-family: Helvetica,Arial,Sans-serif; font-style: normal; font-variant: normal; font-weight: normal; font-size: 14px; line-height: normal; font-size-adjust: none; font-stretch: normal; -x-system-font: none; display: block; text-decoration: underline;">Function hooking for OSX and Linux</a> <object id="doc_42930970869868" name="doc_42930970869868" height="500" width="100%" type="application/x-shockwave-flash" data="http://d1.scribdassets.com/ScribdViewer.swf" style="outline:none;" rel="media:presentation" resource="http://d1.scribdassets.com/ScribdViewer.swf?document_id=35191054&#038;access_key=key-1ffxxqbbglaccfa347qr&#038;page=1&#038;viewMode=slideshow" xmlns:media="http://search.yahoo.com/searchmonkey/media/" xmlns:dc="http://purl.org/dc/terms/" ><param name="movie" value="http://d1.scribdassets.com/ScribdViewer.swf"><param name="wmode" value="opaque"><param name="bgcolor" value="#ffffff"><param name="allowFullScreen" value="true"><param name="allowScriptAccess" value="always"><param name="FlashVars" value="document_id=35191054&#038;access_key=key-1ffxxqbbglaccfa347qr&#038;page=1&#038;viewMode=slideshow"><embed id="doc_42930970869868" name="doc_42930970869868" src="http://d1.scribdassets.com/ScribdViewer.swf?document_id=35191054&#038;access_key=key-1ffxxqbbglaccfa347qr&#038;page=1&#038;viewMode=slideshow" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" height="500" width="100%" wmode="opaque" bgcolor="#ffffff"></embed></object> </p>
]]></content:encoded>
			<wfw:commentRss>http://timetobleed.com/slides-from-defcon-18-function-hooking-for-osx-and-linux/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>GCC optimization flag makes your 64bit binary fatter and slower</title>
		<link>http://timetobleed.com/gcc-optimization-flag-makes-your-64bit-binary-fatter-and-slower/</link>
		<comments>http://timetobleed.com/gcc-optimization-flag-makes-your-64bit-binary-fatter-and-slower/#comments</comments>
		<pubDate>Tue, 20 Jul 2010 12:59:53 +0000</pubDate>
		<dc:creator>Joe Damato</dc:creator>
				<category><![CDATA[debugging]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[systems]]></category>
		<category><![CDATA[testing]]></category>
		<category><![CDATA[x86]]></category>

		<guid isPermaLink="false">http://timetobleed.com/?p=1909</guid>
		<description><![CDATA[If you enjoy this article, subscribe (via RSS or e-mail) and follow me on twitter. The intention of this post is to highlight a subtle GCC optimization bug that leads to slower and larger code being generated than would have been generated without the optimization flag. UPDATED: Graphs are now 0 based on the y [...]]]></description>
			<content:encoded><![CDATA[<p><center><img src="http://timetobleed.com/images/large_bug.jpg" alt="" width="300" height="400" /></center><br />
If you enjoy this article, <a rel="alternate" type="application/rss+xml" href="http://feeds.feedburner.com/TimeToBleed">subscribe (via RSS or e-mail)</a> and <a href="http://twitter.com/joedamato">follow me on twitter.</a></p>
<p>The intention of this post is to highlight a subtle GCC optimization bug that leads to slower and larger code being generated than would have been generated without the optimization flag.</p>
<h2>UPDATED: Graphs are now 0 based on the y axis. Links in the tidbits section (below conclusion) for my ugly test harness and terminal session of the build of the test case in the bug report, objdump, and corresponding system information.</h2>
<h2>Hold the #gccfail tweets, son.</h2>
<p>Everyone fucks up. The point of this post is <em>not</em> to rag on GCC. If writing a C compiler was easy then every asshole with a keyboard would write one for fun.</p>
<h2>WARNING: THERE IS MATH, SCIENCE, AND GRAPHS BELOW.</h2>
<p>Watch yourself.</p>
<h2>The original bug report for <code>-fomit-frame-pointer</code>.</h2>
<p>I stumbled across a <a href="http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44958">bug report for GCC</a> that was very interesting. It points out a very subtle bug that occurs when the <code>-fomit-frame-pointer</code> flag is passed to GCC. The bug report is for 32bit code, however after some testing I found that this bug <strong>also rears its head in 64bit code</strong>.</p>
<h2>What is <code>-fomit-frame-pointer</code> supposed to do?</h2>
<p>The <code>-fomit-frame-pointer</code> flag is intended to direct GCC to avoid saving and restoring the frame pointer (<code>%ebp</code> or <code>%rbp</code>). This is supposed to make function calls faster, since the function is doing less work each invocation. It should also make function code take fewer bytes since there are fewer instructions being executed.</p>
<p>A caveat of using <code>-fomit-frame-pointer</code> is that it <em>may</em> make <strong>debugging impossible</strong> on certain systems. To combat this on Linux, <code>.debug_frame</code> and <code>.eh_frame</code> sections are added to ELF binaries to assist in the stack unwinding process when the frame pointer is omitted.</p>
<h2>What is the bug?</h2>
<p>The bug is that when <code>-fomit-frame-pointer</code> is used, GCC erroneously uses the frame pointer register as a general purpose register <em>when a different register could be used instead</em>.</p>
<p><strong>wat.</strong></p>
<p>The amd64 and i386 ABIs<sup>1</sup> <sup>2</sup> specify a list of caller and callee saved registers.</p>
<ul>
<li>The frame pointer register is callee saved. That means that if a function is going to use the frame pointer register, it must save and restore the value in the register.</li>
<li>The test case provided in the bug report shows that other <em>caller</em> saved registers were available for use.</li>
<li>Had the function used a caller saved register instead, there would be <em>no need</em> for the additional save and restore instructions in the function.</li>
<li>Removing those instructions would take fewer bytes and execute faster.</li>
</ul>
<h2>What are the consequences?</h2>
<p>Let&#8217;s take a look at two potential pieces of code.</p>
<p>The first piece is the code that would be generated if <code>-fomit-frame-pointer</code> <strong>is not used</strong>:</p>
<pre class="prettyprint">test1:
        pushq %rbp       ; save frame pointer
        movq %rsp,%rbp   ; update frame pointer to the current stack pointer
           ; here is where your function would do work
        leave            ; restore the stack pointer and frame pointer
        ret              ; return</pre>
<p><strong>Size: 6 bytes</strong>.</p>
<p>The above assembly sequence uses the frame pointer.</p>
<p>Let&#8217;s take a look at the code that is generated by GCC when <code>-fomit-frame-pointer</code> is used:</p>
<pre class="prettyprint">        sub $0x8, %rsp    ; make room on the stack
        movq %rbp, (%rsp) ; store rbp on the stack
          ; here is where your function would modify and use %rbp as needed
        movq (%rsp), %rbp ; restore %rbp
        add $0x8, %rsp    ; get rid of the extra stack space
        ret               ; return</pre>
<p><strong>Size: 17 bytes</strong>.</p>
<p>The above assembly sequence is what is generated when GCC decides to use the frame pointer register as a general purpose register. Since it is callee saved, it must be saved before being modified and restored after being modified.</p>
<h2>So <code>-fomit-frame-pointer</code> makes your binary fatter, but does it make it slower?</h2>
<p>Only one way to find out: <strong>do science.</strong></p>
<p>I built a simple (and very ugly) testing harness to test the above pieces of code to determine which piece of code is faster. Before we get into the benchmark results, I want to tell you why my benchmark is <em>bullshit</em>.</p>
<p>Yes, <em>bullshit</em>.</p>
<p>You see, it makes me sad when people post benchmarks and neglect to tell others why their benchmark may be inaccurate. So, lemme start the trend.</p>
<p>This benchmark is useless because:</p>
<ul>
<li>Reading the CPU cycle counter is unreliable (more on this below the conclusion). I also tracked wall clock time, too.</li>
<li>I don&#8217;t have the ideal test environment. I ran this on bare metal hardware, and set the CPU affinity to keep the process pinned to a single CPU&#8230; <strong>BUT</strong></li>
<li><strong>I could have done better</strong> if I had pinned <code>init</code> to CPU0 (thereby forcing all children of init to be pinned to CPU0 &#8211; <strong>remember child processes inherit the affinity mask</strong>). I would have then had an entire CPU for nothing but my benchmark.</li>
<li><strong>I could have done better</strong> if I forced the CPU running my benchmark program to not handle any IRQs.</li>
<li><b>I only tested one version of GCC</b>: (Debian 4.3.2-1.1) 4.3.2</li>
<li><strong>I could have</strong> taken more samples.</li>
</ul>
<p>You can find more testing harness tidbits below the conclusion.</p>
<h2>Benchmark Results</h2>
<p>
<b>test 1</b> &#8212; Code sequence simulating using the  frame pointer.<br />
<b>test 2</b> &#8212; Code sequence simulating using the frame pointer as a general purpose register.
</p>
<h2>64bit results</h2>
<p><b><u>Using <code>-fomit-frame-pointer</code> is SLOWER (contrary to what you&#8217;d expect) than not using it!</u></b></p>
<table border="1" bordercolor="#000000" style="background-color:#ffffff" width="600" cellpadding="1" cellspacing="0">
<tr>
<td></td>
<td>cycles test 1</td>
<td>cycles test 2</td>
<td>microsecs test 1</td>
<td>microsecs test 2</td>
</tr>
<tr>
<td>mean</td>
<td>3514422987.92</td>
<td>4559685515.66</td>
<td>1882707.27</td>
<td>2442663.94</td>
</tr>
<tr>
<td>median</td>
<td>3507007423.5</td>
<td>4562511684.5</td>
<td>1878721.5</td>
<td>2444171.5</td>
</tr>
<tr>
<td>max</td>
<td>3922780211</td>
<td>4672066854</td>
<td>2101457</td>
<td>2502869</td>
</tr>
<tr>
<td>min</td>
<td>3502194976</td>
<td>4327782795</td>
<td>1876113</td>
<td>2318452</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>std dev</td>
<td>31927179.5632</td>
<td>15449507.8196</td>
<td>17103.7755</td>
<td>8275.49788</td>
</tr>
<tr>
<td>variance</td>
<td>1.02E+15</td>
<td>238687291867021</td>
<td>292539135.936</td>
<td>68483865.11835</td>
</tr>
</table>
<p></p>
<p>
<img src="http://timetobleed.com/images/64bit_cycles.png" alt="" />
</p>
<p>
<br />
<img src="http://timetobleed.com/images/64bit_microsecs.png" alt="" />
</p>
<p></p>
<h2>32bit results</h2>
<p><b><u>Using <code>-fomit-frame-pointer</code> is FASTER (as it should be) than not using it! The binary is still fatter, though.</u></b></p>
<table border="1" bordercolor="#000000" style="background-color:#ffffff" width="600" cellpadding="1" cellspacing="0">
<tr>
<td></td>
<td>cycles test 1</td>
<td>cycles test 2</td>
<td>microsecs test 1</td>
<td>microsecs test 2</td>
</tr>
<tr>
<td>mean</td>
<td>3502932799.49</td>
<td>3491263364.89</td>
<td>1876553.08</td>
<td>1870301.35</td>
</tr>
<tr>
<td>median</td>
<td>3501486586.5 </td>
<td>3492013955.5</td>
<td>1875778</td>
<td>1870702.5</td>
</tr>
<tr>
<td>max</td>
<td>3905163528</td>
<td>3731985243</td>
<td>2092032</td>
<td>1999259</td>
</tr>
<tr>
<td>min</td>
<td>3500916510</td>
<td>3408834436</td>
<td>1875472</td>
<td>1826144</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>std dev</td>
<td>10066939.1113</td>
<td>7992367.6913</td>
<td>5393.0412</td>
<td>4281.5466</td>
</tr>
<tr>
<td>variance</td>
<td>101343263071403</td>
<td>63877941312996.4</td>
<td>29084893.2588</td>
<td>18331640.9459</td>
</tr>
</table>
<p></p>
<p>
<img src="http://timetobleed.com/images/32bit_cycles.png" alt="" />
</p>
<p>
<br />
<img src="http://timetobleed.com/images/32bit_microsecs.png" alt="" />
</p>
<h2>Conclusion</h2>
<ul>
<li>GCC is a really complex piece of software; this bug is very subtle and may have existed for a while.</li>
<li>I&#8217;ve said this a few times, but knowing and understanding your system&#8217;s ABI is crucial for catching bugs like these.</li>
<li>Math and science are cool now, much like computers. You should use both.</li>
</ul>
<p>
Thanks for reading and don&#8217;t forget to <a rel="alternate" type="application/rss+xml" href="http://feeds.feedburner.com/TimeToBleed">subscribe (via RSS or e-mail)</a> and <a href="http://twitter.com/joedamato">follow me on twitter.</a></p>
<h2>Testing harness tidbits</h2>
<p>Each <strong>run</strong> of the benchmark executes either <code>test1</code> or <code>test2</code> (from above) 500,000,000 times. I did around 2500 runs for each test function.<br />
</p>
<p>
You can get the testing harness, a build script, and a test script here: <a href="http://gist.github.com/483524">http://gist.github.com/483524</a>
</p>
<p>You can look at the terminal session where I build the test from the original bug report on my system: <a href="http://gist.github.com/483494">http://gist.github.com/483494</a>
</p>
<p>
The code I used to read the CPU cycle counter looks like this:</p>
<pre class="prettyprint">static __inline__ unsigned long long rdtsc(void)
{
  unsigned long hi = 0, lo = 0;
  __asm__ __volatile__ ("lfence\n\trdtsc" : "=a"(lo), "=d"(hi));
  return ( (unsigned long long)lo)|( ((unsigned long long)hi)<<32 );
}</pre>
</p>
<p>
The <code>lfence</code> instruction is a serializing instruction that ensures that all load instructions which were issued before the <code>lfence</code> instruction have been executed before proceeding. I did this to make sure that the cycle counter was being read after all operations in the test functions were executed.<br />
<br />
The values returned by this function are misleading because CPU frequency may be scaled at any time. This is why I also measured wall clock time.<br />
</p>
<h2>References</h2>
<ol class="footnotes"><li id="footnote_0_1909" class="footnote"><a href="http://www.sco.com/developers/devspecs/abi386-4.pdf">http://www.sco.com/developers/devspecs/abi386-4.pdf</a></li><li id="footnote_1_1909" class="footnote"><a href="http://www.x86-64.org/documentation/abi.pdf ">http://www.x86-64.org/documentation/abi.pdf </a></li></ol>]]></content:encoded>
			<wfw:commentRss>http://timetobleed.com/gcc-optimization-flag-makes-your-64bit-binary-fatter-and-slower/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Garbage Collection and the Ruby Heap (from railsconf)</title>
		<link>http://timetobleed.com/garbage-collection-and-the-ruby-heap-from-railsconf/</link>
		<comments>http://timetobleed.com/garbage-collection-and-the-ruby-heap-from-railsconf/#comments</comments>
		<pubDate>Tue, 08 Jun 2010 16:38:20 +0000</pubDate>
		<dc:creator>Joe Damato</dc:creator>
				<category><![CDATA[debugging]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[scaling]]></category>
		<category><![CDATA[systems]]></category>
		<category><![CDATA[x86]]></category>
		<category><![CDATA[debug]]></category>
		<category><![CDATA[garbage collection]]></category>
		<category><![CDATA[GC]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[ltrace]]></category>
		<category><![CDATA[memory]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[profiling]]></category>
		<category><![CDATA[x86_64]]></category>

		<guid isPermaLink="false">http://timetobleed.com/?p=1787</guid>
		<description><![CDATA[Download as PDF (15mb) Garbage Collection and the Ruby Heap]]></description>
			<content:encoded><![CDATA[<p><a style="float:right" href="http://dl.dropbox.com/u/1681973/gc-railsconf.pdf">Download as PDF (15mb)</a><br />
<a title="View Garbage Collection and the Ruby Heap on Scribd" href="http://www.scribd.com/doc/32718051/Garbage-Collection-and-the-Ruby-Heap" style="margin: 12px auto 6px auto; font-family: Helvetica,Arial,Sans-serif; font-style: normal; font-variant: normal; font-weight: normal; font-size: 14px; line-height: normal; font-size-adjust: none; font-stretch: normal; -x-system-font: none; display: block; text-decoration: underline;">Garbage Collection and the Ruby Heap</a> <object id="doc_179903367382288" name="doc_179903367382288" height="600" width="100%" type="application/x-shockwave-flash" data="http://d1.scribdassets.com/ScribdViewer.swf" style="outline:none;" ><param name="movie" value="http://d1.scribdassets.com/ScribdViewer.swf"><param name="wmode" value="opaque"><param name="bgcolor" value="#ffffff"><param name="allowFullScreen" value="true"><param name="allowScriptAccess" value="always"><param name="FlashVars" value="document_id=32718051&#038;access_key=key-1hl4d18vocqmc9ilk9a&#038;page=1&#038;viewMode=slideshow"><embed id="doc_179903367382288" name="doc_179903367382288" src="http://d1.scribdassets.com/ScribdViewer.swf?document_id=32718051&#038;access_key=key-1hl4d18vocqmc9ilk9a&#038;page=1&#038;viewMode=slideshow" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" height="600" width="100%" wmode="opaque" bgcolor="#ffffff"></embed></object></p>
]]></content:encoded>
			<wfw:commentRss>http://timetobleed.com/garbage-collection-and-the-ruby-heap-from-railsconf/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Dynamic Linking: ELF vs. Mach-O</title>
		<link>http://timetobleed.com/dynamic-linking-elf-vs-mach-o/</link>
		<comments>http://timetobleed.com/dynamic-linking-elf-vs-mach-o/#comments</comments>
		<pubDate>Wed, 12 May 2010 14:00:09 +0000</pubDate>
		<dc:creator>Joe Damato</dc:creator>
				<category><![CDATA[debugging]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[osx]]></category>
		<category><![CDATA[systems]]></category>
		<category><![CDATA[x86]]></category>
		<category><![CDATA[dynamic linking]]></category>
		<category><![CDATA[elf]]></category>
		<category><![CDATA[mach-o]]></category>
		<category><![CDATA[x86_64]]></category>

		<guid isPermaLink="false">http://timetobleed.com/?p=1613</guid>
		<description><![CDATA[If you enjoy this article, subscribe (via RSS or e-mail) and follow me on twitter. The intention of this post is to highlight some of the similarities and differences between ELF and Mach-O dynamic linking that I encountered while building memprof. I hope to write more posts about similarities and differences in other aspects of [...]]]></description>
			<content:encoded><![CDATA[<p><center><img src="http://timetobleed.com/images/linking.jpg" alt="" width="400" height="300" /></center><br />
If you enjoy this article, <a rel="alternate" type="application/rss+xml" href="http://feeds.feedburner.com/TimeToBleed">subscribe (via RSS or e-mail)</a> and <a href="http://twitter.com/joedamato">follow me on twitter.</a></p>
<p>The intention of this post is to highlight <b>some</b> of the similarities and differences between <code>ELF</code> and <code>Mach-O</code> dynamic linking that I encountered while building <a href="http://github.com/ice799/memprof">memprof</a>.</p>
<p> I hope to write <b>more posts about similarities and differences in other aspects of Mach-O and ELF</b> that I stumbled across to shed some light on what goes on down there and provide (in some cases) the only documentation.</p>
<h2>Procedure Linkage Table</h2>
<p>The procedure linkage table (PLT) is used to determine the absolute address of a function at runtime. Both Mach-O and ELF objects have PLTs that are generated at compile time. The initial table simply invokes the dynamic linker which finds the symbol you want. The way this works is very similar at a high level in ELF and Mach-O, but there are some implementation differences that I thought were worth mentioning.</p>
<h2>Mach-O PLT arrangement</h2>
<p>Mach-O objects have several different sections across different <i>segments</i> that are all involved to create a PLT entry for a specific symbol.</p>
<p>Consider the following assembly stub which calls out to the PLT entry for <code>malloc</code>:</p>
<pre class="prettyprint">
# MACH-O calling a PLT entry (ELF is nearly identical)
0x000000010008c504 [str_new+52]:	callq  0x10009ebbc [dyld_stub_malloc]
</pre>
<p>
<p>The <code>dyld_stub</code> prefix is added by GDB to let the user know that the <code>callq</code> instruction is calling a PLT entry and not <code>malloc</code> itself. The address <code>0x10009ebbc</code> is the first instruction of <code>malloc</code>&#8216;s PLT entry in this Mach-O object. In Mach-O terminology, the instruction at <code>0x10009ebbc</code> is called a <b>symbol stub</b>. Symbol stubs in Mach-O objects are found in the <code>__TEXT</code> segment in the <code>__symbol_stub1</code> section.</p>
<p>Let&#8217;s examine some instructions at the symbol stub address above:</p>
<pre class="prettyprint">
# MACH-O "symbol stubs" for malloc and other functions
0x10009ebbc [dyld_stub_malloc]:	  jmpq   *0x3ae46(%rip)        # 0x1000d9a08
0x10009ebc2 [dyld_stub_realloc]:  jmpq   *0x3ae48(%rip)        # 0x1000d9a10
0x10009ebc8 [dyld_stub_seekdir$INODE64]:	jmpq   *0x3ae4c(%rip)  # 0x1000d9a20
. . . .
</pre>
<p></p>
<p>Each Mach-O <b>symbol stub</b> is just a single <code>jmpq</code> instruction. That <code>jmpq</code> instruction either:</p>
<ul>
<li>Invokes the dynamic linker to find the symbol and transfer execution there</li>
<p><b><u>OR</u></b></p>
<li>Transfers execution directly to the function.</li>
</ul>
<p><i>via</i> an entry in a table. </p>
<p>In the example above, GDB is telling us that the address of the table entry for <code>malloc</code> is <code>0x1000d9a08</code>. This table entry is stored in a section called the <code>__la_symbol_ptr</code> within the <code>__DATA</code> segment.</p>
<p>Before malloc has been resolved, the address in that table entry points to a helper function which (eventually) invokes the dynamic linker to find <code>malloc</code> and fill in its address in the table entry.</p>
<p>Let&#8217;s take a look at what a few entries of the helper functions look like:</p>
<pre class="prettyprint">
# MACH-O stub helpers
0x1000a08d4 [stub helpers+6986]:	pushq  $0x3b73
0x1000a08d9 [stub helpers+6991]:	jmpq   0x10009ed8a [stub helpers]
0x1000a08de [stub helpers+6996]:	pushq  $0x3b88
0x1000a08e3 [stub helpers+7001]:	jmpq   0x10009ed8a [stub helpers]
0x1000a08e8 [stub helpers+7006]:	pushq  $0x3b9e
0x1000a08ed [stub helpers+7011]:	jmpq   0x10009ed8a [stub helpers]
. . . .
</pre>
</p>
<p>Each symbol that has a PLT entry has 2 instructions above; a pair of <code>pushq</code> and <code>jmpq</code>. This instruction sequence sets an ID for the desired function and then invokes the dynamic linker. The dynamic linker looks up this ID so it knows which function it should be looking for.</p>
<h2>ELF PLT arrangement</h2>
<p>ELF objects have the same mechanism, but organize each PLT entry into chunks instead of splicing them out across different sections. Let&#8217;s take a look at a PLT entry for malloc in an ELF object:</p>
<pre class="prettyprint">
# ELF complete PLT entry for malloc
0x40f3d0 [malloc@plt]:	jmpq   *0x2c91fa(%rip)        # 0x6d85d0
0x40f3d6 [malloc@plt+6]:	pushq  $0x2f
0x40f3db [malloc@plt+11]:	jmpq   0x40f0d0
. . . .
</pre>
<p></p>
<p>Much like a Mach-O object, an ELF object uses a table entry to direct the flow of execution to either invoke the dynamic linker or transfer directly to the desired function if it has already been resolved.</p>
<p>Two differences to point out here: </p>
<ol>
<li>ELF puts the entire PLT entry together in nicely named section called <code>plt</code> instead of splicing it out across multiple sections.</li>
<li>The table entries indirected through with the initial <code>jmpq</code> instruction are stored in a section named: <code>.got.plt</code>.</li>
</ol>
<h2>Both invoke an assembly trampoline&#8230;</h2>
<p>Both Mach-O and ELF objects are set up to invoke the runtime dynamic linker. Both need an assembly trampoline to bridge the gap between the application and the linker. On 64bit Intel based systems, linkers in both systems must comply to the same Application Binary Interace (ABI).</p>
<p><b>Strangely enough</b>, the two linkers <b>have slightly different assembly trampolines even though they share the same calling convention<sup>1</sup>  <sup>2</sup>.</b></p>
<p>Both trampolines ensure that the program stack is 16-byte aligned to comply with the amd64 ABI&#8217;s calling convention. Both trampolines also take care to save the &#8220;general purpose&#8221; caller-saved registers prior to invoking the dynamic link, but it turns out that the trampoline in Linux <b>does not save or restore the SSE registers.</b> It turns out that this &#8220;shouldn&#8217;t&#8221; matter, so long as glibc takes care not to use any of those registers in the dynamic linker. OSX takes a more conservative approach and saves and restores the SSE registers before and after calling out the dynamic linker.</p>
<p>I&#8217;ve included a snippet from the two trampolines below and some comments so you can see the differences up close.</p>
<h2>Different trampolines for the same ABI</h2>
<p>The OSX trampoline:</p>
<pre class="prettyprint">
dyld_stub_binder:
  pushq   %rbp
  movq    %rsp,%rbp
  subq    $STACK_SIZE,%rsp  # at this point stack is 16-byte aligned because two meta-parameters where pushed
  movq    %rdi,RDI_SAVE(%rsp) # save registers that might be used as parameters
  movq    %rsi,RSI_SAVE(%rsp)
  movq    %rdx,RDX_SAVE(%rsp)
  movq    %rcx,RCX_SAVE(%rsp)
  movq    %r8,R8_SAVE(%rsp)
  movq    %r9,R9_SAVE(%rsp)
  movq    %rax,RAX_SAVE(%rsp)
  movdqa    %xmm0,XMMM0_SAVE(%rsp)
  movdqa    %xmm1,XMMM1_SAVE(%rsp)
  movdqa    %xmm2,XMMM2_SAVE(%rsp)
  movdqa    %xmm3,XMMM3_SAVE(%rsp)
  movdqa    %xmm4,XMMM4_SAVE(%rsp)
  movdqa    %xmm5,XMMM5_SAVE(%rsp)
  movdqa    %xmm6,XMMM6_SAVE(%rsp)
  movdqa    %xmm7,XMMM7_SAVE(%rsp)
  movq    MH_PARAM_BP(%rbp),%rdi  # call fastBindLazySymbol(loadercache, lazyinfo)
  movq    LP_PARAM_BP(%rbp),%rsi
  call    __Z21_dyld_fast_stub_entryPvl
</pre>
</p>
<p>The OSX trampoline saves all the caller saved registers <b>as well as</b> the the <code>%xmm0 - %xmm7</code> registers prior to invoking the dynamic linker with that last call instruction. These registers are all restored after the call instruction, but I left that out for the sake of brevity.</p>
<p>The Linux trampoline:</p>
<pre class="prettyprint">
  subq $56,%rsp
  cfi_adjust_cfa_offset(72) # Incorporate PLT
  movq %rax,(%rsp)  # Preserve registers otherwise clobbered.
  movq %rcx, 8(%rsp)
  movq %rdx, 16(%rsp)
  movq %rsi, 24(%rsp)
  movq %rdi, 32(%rsp)
  movq %r8, 40(%rsp)
  movq %r9, 48(%rsp)
  movq 64(%rsp), %rsi # Copy args pushed by PLT in register.
  movq %rsi, %r11   # Multiply by 24
  addq %r11, %rsi
  addq %r11, %rsi
  shlq $3, %rsi
  movq 56(%rsp), %rdi # %rdi: link_map, %rsi: reloc_offset
  call _dl_fixup    # Call resolver.
</pre>
</p>
<p>The Linux trampoline doesn&#8217;t touch the SSE registers because it assumes that the dynamic linker will not modify them thus avoiding a save and restore.</p>
<h2>Conclusion</h2>
<ul>
<li>Tracing program execution from call site to the dynamic linker is pretty interesting and there is a lot to learn along the way.</li>
<li>glibc not saving and restoring <code>%xmm0-%xmm7</code> kind of scares me, but there is a unit test included that disassembles the built ld.so searching it to make sure that those registers are never touched. It is still a bit frightening.</li>
<li>Stay tuned for more posts explaining other interesting similarities and differences between Mach-O and ELF coming soon.</li>
</ul>
<p>Thanks for reading and don&#8217;t forget to <a rel="alternate" type="application/rss+xml" href="http://feeds.feedburner.com/TimeToBleed">subscribe (via RSS or e-mail)</a> and <a href="http://twitter.com/joedamato">follow me on twitter.</a></p>
<h2>References</h2>
<ol class="footnotes"><li id="footnote_0_1613" class="footnote"><a href="http://developer.apple.com/mac/library/documentation/DeveloperTools/Conceptual/LowLevelABI/140-x86-64_Function_Calling_Conventions/x86_64.html#//apple_ref/doc/uid/TP40005035-SW1">http://developer.apple.com/mac/library/documentation/DeveloperTools/Conceptual/LowLevelABI/140-x86-64_Function_Calling_Conventions/x86_64.html#//apple_ref/doc/uid/TP40005035-SW1</a></li><li id="footnote_1_1613" class="footnote"><a href="http://www.x86-64.org/documentation/abi.pdf">http://www.x86-64.org/documentation/abi.pdf</a></li></ol>]]></content:encoded>
			<wfw:commentRss>http://timetobleed.com/dynamic-linking-elf-vs-mach-o/feed/</wfw:commentRss>
		<slash:comments>28</slash:comments>
		</item>
		<item>
		<title>Descent into Darkness: Understanding your system&#8217;s binary interface is the only way out</title>
		<link>http://timetobleed.com/descent-into-darkness-understanding-your-systems-binary-interface-is-the-only-way-out/</link>
		<comments>http://timetobleed.com/descent-into-darkness-understanding-your-systems-binary-interface-is-the-only-way-out/#comments</comments>
		<pubDate>Mon, 15 Mar 2010 19:11:19 +0000</pubDate>
		<dc:creator>Joe Damato</dc:creator>
				<category><![CDATA[bugfix]]></category>
		<category><![CDATA[debugging]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[scaling]]></category>
		<category><![CDATA[systems]]></category>
		<category><![CDATA[x86]]></category>
		<category><![CDATA[debug]]></category>
		<category><![CDATA[garbage collection]]></category>
		<category><![CDATA[GC]]></category>
		<category><![CDATA[memory]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[syscall]]></category>
		<category><![CDATA[x86_64]]></category>

		<guid isPermaLink="false">http://timetobleed.com/?p=1602</guid>
		<description><![CDATA[Download as PDF (3mb) Descent into Darkness: Understanding your system&#8217;s binary interface is the only way out.]]></description>
			<content:encoded><![CDATA[<p><a style="float:right" href="http://dl.dropbox.com/u/1681973/abi.pdf">Download as PDF (3mb)</a><br />
<a title="View Descent into Darkness: Understanding your system's binary interface is the only way out. on Scribd" href="http://www.scribd.com/doc/28264000/Descent-into-Darkness-Understanding-your-system-s-binary-interface-is-the-only-way-out" style="margin: 12px auto 6px auto; font-family: Helvetica,Arial,Sans-serif; font-style: normal; font-variant: normal; font-weight: normal; font-size: 14px; line-height: normal; font-size-adjust: none; font-stretch: normal; -x-system-font: none; display: block; text-decoration: underline;">Descent into Darkness: Understanding your system&#8217;s binary interface is the only way out.</a> <object id="doc_50009547124029" name="doc_50009547124029" height="600" width="100%" type="application/x-shockwave-flash" data="http://d1.scribdassets.com/ScribdViewer.swf" style="outline:none;" ><param name="movie" value="http://d1.scribdassets.com/ScribdViewer.swf"><param name="wmode" value="opaque"><param name="bgcolor" value="#ffffff"><param name="allowFullScreen" value="true"><param name="allowScriptAccess" value="always"><param name="FlashVars" value="document_id=28264000&#038;access_key=key-nywmlzldrcxb47d7tv9&#038;page=1&#038;viewMode=slideshow"><embed id="doc_50009547124029" name="doc_50009547124029" src="http://d1.scribdassets.com/ScribdViewer.swf?document_id=28264000&#038;access_key=key-nywmlzldrcxb47d7tv9&#038;page=1&#038;viewMode=slideshow" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" height="600" width="100%" wmode="opaque" bgcolor="#ffffff"></embed></object>	</p>
]]></content:encoded>
			<wfw:commentRss>http://timetobleed.com/descent-into-darkness-understanding-your-systems-binary-interface-is-the-only-way-out/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Garbage Collection Slides from LA Ruby Conference</title>
		<link>http://timetobleed.com/garbage-collection-slides-from-la-ruby-conference/</link>
		<comments>http://timetobleed.com/garbage-collection-slides-from-la-ruby-conference/#comments</comments>
		<pubDate>Sat, 20 Feb 2010 22:03:14 +0000</pubDate>
		<dc:creator>Aman Gupta</dc:creator>
				<category><![CDATA[bugfix]]></category>
		<category><![CDATA[debugging]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[debug]]></category>
		<category><![CDATA[garbage collection]]></category>
		<category><![CDATA[GC]]></category>
		<category><![CDATA[memory]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[profiling]]></category>

		<guid isPermaLink="false">http://timetobleed.com/?p=1569</guid>
		<description><![CDATA[Garbage Collection and the Ruby Heap]]></description>
			<content:encoded><![CDATA[<p><a title="View Garbage Collection and the Ruby Heap on Scribd" href="http://www.scribd.com/doc/27174770/Garbage-Collection-and-the-Ruby-Heap" style="margin: 12px auto 6px auto; font-family: Helvetica,Arial,Sans-serif; font-style: normal; font-variant: normal; font-weight: normal; font-size: 14px; line-height: normal; font-size-adjust: none; font-stretch: normal; -x-system-font: none; display: block; text-decoration: underline;">Garbage Collection and the Ruby Heap</a> <object id="doc_629766057039419" name="doc_629766057039419" height="600" width="100%" type="application/x-shockwave-flash" data="http://d1.scribdassets.com/ScribdViewer.swf" style="outline:none;" ><param name="movie" value="http://d1.scribdassets.com/ScribdViewer.swf"><param name="wmode" value="opaque"><param name="bgcolor" value="#ffffff"><param name="allowFullScreen" value="true"><param name="allowScriptAccess" value="always"><param name="FlashVars" value="document_id=27174770&#038;access_key=key-2g5x6qhwa28yz3ia1hih&#038;page=1&#038;viewMode=slideshow"><embed id="doc_629766057039419" name="doc_629766057039419" src="http://d1.scribdassets.com/ScribdViewer.swf?document_id=27174770&#038;access_key=key-2g5x6qhwa28yz3ia1hih&#038;page=1&#038;viewMode=slideshow" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" height="600" width="100%" wmode="opaque" bgcolor="#ffffff"></embed></object>	</p>
]]></content:encoded>
			<wfw:commentRss>http://timetobleed.com/garbage-collection-slides-from-la-ruby-conference/feed/</wfw:commentRss>
		<slash:comments>18</slash:comments>
		</item>
		<item>
		<title>String together global offset tables to build a Ruby memory profiler</title>
		<link>http://timetobleed.com/string-together-global-offset-tables-to-build-a-ruby-memory-profiler/</link>
		<comments>http://timetobleed.com/string-together-global-offset-tables-to-build-a-ruby-memory-profiler/#comments</comments>
		<pubDate>Mon, 25 Jan 2010 12:59:56 +0000</pubDate>
		<dc:creator>Joe Damato</dc:creator>
				<category><![CDATA[debugging]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[systems]]></category>
		<category><![CDATA[x86]]></category>
		<category><![CDATA[debug]]></category>
		<category><![CDATA[memory]]></category>
		<category><![CDATA[profiling]]></category>
		<category><![CDATA[x86_64]]></category>

		<guid isPermaLink="false">http://timetobleed.com/?p=1539</guid>
		<description><![CDATA[If you enjoy this article, subscribe (via RSS or e-mail) and follow me on twitter. Disclaimer The tricks, techniques, and ugly hacks in this article are PLATFORM SPECIFIC, DANGEROUS, and NOT PORTABLE. This is the third article in a series of articles describing a set of low level hacks that I used to create memprof [...]]]></description>
			<content:encoded><![CDATA[<p><center><img src="http://timetobleed.com/images/got.jpg" alt="" width="400" height="300" /></center><br />
If you enjoy this article, <a rel="alternate" type="application/rss+xml" href="http://feeds.feedburner.com/TimeToBleed">subscribe (via RSS or e-mail)</a> and <a href="http://twitter.com/joedamato">follow me on twitter.</a></p>
<h2>Disclaimer</h2>
<p><i>The tricks, techniques, and ugly hacks in this article are  <b>PLATFORM SPECIFIC</b>, <b>DANGEROUS</b>, and <b>NOT PORTABLE</b>. </i></p>
<p>This is the third article in a series of articles describing a set of low level hacks that I used to create <a href="http://github.com/ice799/memprof">memprof</a> a Ruby level memory profiler. <b>You should be able to survive without reading the other articles in this series</b>, but you can check them out <a href="http://timetobleed.com/rewrite-your-ruby-vm-at-runtime-to-hot-patch-useful-features/">here</a> and <a href="http://timetobleed.com/hot-patching-inlined-functions-with-x86_64-asm-metaprogramming/">here</a>.</p>
<h2>How is this different from the other hooking articles/techniques?</h2>
<p>The previous articles explained how to insert trampolines in the <code>.text</code> segment of a binary. This article explains a cool technique for hooking functions in the <code>.text</code> segment of <i>shared libraries</i>, allowing your handler to run, and then resuming execution. Hooking shared libraries turns out to be less work than hooking the binary (in the case of Ruby, that is), but making it all happen was a bit tricky. Read on to learn more.</p>
<h2>The &#8220;problem&#8221; with shared libraries</h2>
<p>The problem is that if a trampoline is inserted into the code of the shared library, the trampoline will need to invoke the dynamic linker to resolve the function that is being hooked, call the function, do whatever additional logic is desired, and then resume execution.</p>
<p><b>In other words you need to (somehow) insert a trampoline for a function that will call the function being trampolined without ending up in an infinite loop.</b></p>
<p>The additional complexity occurs because when shared libraries are loaded, the kernel decides at runtime where exactly in memory the library should be loaded. Since the exact location of symbols is not known at link time, a procedure linkage table (<code>.plt</code>) is created so that the program and the dynamic linker can work together to resolve symbol addresses.</p>
<p>I explained how <code>.plt</code>s work in a <a href="http://timetobleed.com/extending-ltrace-to-make-your-rubypythonperlphp-apps-faster/">previous article</a>, but looking at this again is worthwhile. I&#8217;ve simplified the explanation a bit<sup>1</sup>, but at a high level:</p>
<ol>
<li>Program calls a function in a shared object, the link editor makes sure that the program jumps to a stub function in the <code>.plt</code></li>
<li>The program sets some data up for the dynamic linker and then hands control over to it.</li>
<li>The dynamic linker looks at the info set up by the program and fills in the absolute address of the function that was called in the <code>.plt</code> in the global offset table (<code>.got</code>).</li>
<li>Then the dynamic linker calls the function.</li>
<li>Subsequent calls to the same function jump to the same stub in the <code>.plt</code>, but every time after the first call the absolute address is already in the <code>.got</code> (because when the dynamic linker is invoked the first time, it fills in the absolute address in the <code>.got</code>).</p>
</ol>
<p>Disassembling a short Ruby VM function that calls <code>rb_newobj</code> (a memory allocation routine that we&#8217;d like to hook), shows the calls to the <code>.plt</code>:</p>
<p><pre class="prettyprint">
000000000001af10 <ary_alloc>:
   . . . .
   1af14:       e8 e7 c6 ff ff          callq  17600 [rb_newobj@plt]
   . . . .
</pre>
</p>
<p>
Let&#8217;s take a look at the corresponding <code>.plt</code> stub:</p>
<pre class="prettyprint">
0000000000017600 <rb_newobj@plt>:
   17600:       ff 25 6a 9c 2c 00       jmpq   *0x2c9c6a(%rip) # 2e1270 [_GLOBAL_OFFSET_TABLE_+0x288]
   17606:       68 4e 00 00 00          pushq  $0x4e
   1760b:       e9 00 fb ff ff          jmpq   17110 <_init+0x18>
</pre>
</p>
<p><b><u>Important fact:</u></b> The program and each shared library has its own <code>.plt</code> and <code>.got</code> sections (amongst other sections). Keep this in mind as it&#8217;ll be handy very shortly.</p>
<p>That is a lot of stub code to reproduce in the trampoline. Reproducing that stuff in the trampoline shouldn&#8217;t be hard, but invites a large number of bugs over to play. <i>Is there a better way?</i></p>
<h2>What is a global offset table (<code>.got</code>)?</h2>
<p>The global offset table (<code>.got</code>) is a table of absolute addresses that can be filled in at runtime. In the assembly dump above, the <code>.got</code> entry for <code>rb_newobj</code> is referenced in the <code>.plt</code> stub code.</p>
<h2>Intercepting a function call</h2>
<p>It would be <b>awesome</b> if it were possible to overwrite the <code>.got</code> entry for <code>rb_newobj</code> and insert the address of a trampoline. But how would the intercepting function call <code>rb_newobj</code> itself without ending up in an infinite loop?</p>
<p>The <b>important fact</b> above comes in to save the day.</p>
<p>Since each shared object has its own <code>.plt</code> and <code>.got</code> sections, it is possible to overwrite the <code>.got</code> entry for <code>rb_newobj</code> in <i>every shared object except for the object where the trampoline lives</i>. Then, when <code>rb_newobj</code> is called, the <code>.plt</code> entry will redirect execution to the trampoline. The trampoline then calls out to its <code>.plt</code> entry for <code>rb_newobj</code> which is left untouched allowing <code>rb_newobj</code> to be resolved and called out to successfully.</p>
<h2>Not as easy as it sounds, though</h2>
<p>This solution is less work than the other hooking methods, but it has its own particular details as well:</p>
<ol>
<li>You&#8217;ll need to walk the link map at runtime to determine the base address for the shared library you are hooking (it could be anywhere).</li>
<li>Next, you&#8217;ll need to parse the <code>.rela.plt</code> section which contains information on the location of each <code>.plt</code> stub, relative to the base address of the shared object.</li>
<li>Once you have the address of the <code>.plt</code> stub, you&#8217;ll need to determine the absolute address of the <code>.got</code> entry by parsing the first instruction of the <code>.plt</code> stub (a <code>jmp</code>) as seen in the disassembly above.</li>
<li>Finally, you can write to the <code>.got</code> entry the address of your trampoline, as long as the trampoline <b>lives in a different shared library</b>.</li>
</ol>
<p>You&#8217;ve now successfully managed to poison the <code>.got</code> entry of a symbol in one shared library to direct execution to your own function which can then call the intercepted function itself without getting stuck in an infinite loop.</p>
<h2>Conclusion</h2>
<ul>
<li>There are lots of sections in each ELF object. Each section is special and important.</li>
<li>ELF documentation can be difficult to obtain and understand.</li>
<li>Got pretty lucky this time around. I was getting a little worried that it would get complicated. Made it out alive, though.</li>
</ul>
<p>Thanks for reading and don&#8217;t forget to <a rel="alternate" type="application/rss+xml" href="http://feeds.feedburner.com/TimeToBleed">subscribe (via RSS or e-mail)</a> and <a href="http://twitter.com/joedamato">follow me on twitter.</a></p>
<h2>References</h2>
<ol class="footnotes"><li id="footnote_0_1539" class="footnote"><a href="http://www.x86-64.org/documentation/abi.pdf">System V Application Binary Interface AMD64 Architecture Processor Supplement, p 78</a></li></ol>]]></content:encoded>
			<wfw:commentRss>http://timetobleed.com/string-together-global-offset-tables-to-build-a-ruby-memory-profiler/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What is a ruby object? (introducing Memprof.dump)</title>
		<link>http://timetobleed.com/what-is-a-ruby-object-introducing-memprof-dump/</link>
		<comments>http://timetobleed.com/what-is-a-ruby-object-introducing-memprof-dump/#comments</comments>
		<pubDate>Mon, 14 Dec 2009 12:59:52 +0000</pubDate>
		<dc:creator>Aman Gupta</dc:creator>
				<category><![CDATA[debugging]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[debug]]></category>
		<category><![CDATA[garbage collection]]></category>
		<category><![CDATA[GC]]></category>
		<category><![CDATA[memory]]></category>
		<category><![CDATA[profiling]]></category>

		<guid isPermaLink="false">http://timetobleed.com/?p=1426</guid>
		<description><![CDATA[If you enjoy this article, subscribe (via RSS or e-mail) and follow me on twitter. #mdump pre.prettyprint{ font-size: 0.85em; padding: 0.85em; } #mdump pre.output{ margin-top: 0.3em; margin-bottom: 1.4em } div.main #mdump h2{ padding-top: 1em; border-bottom: 1px solid black; } #mdump h3{ margin-top: 0.45em; font-size: 1.4em; text-decoration: underline; } #mdump ul.links{ padding-left: 2em; } #mdump ul.links [...]]]></description>
			<content:encoded><![CDATA[<p><img alt="" src="http://timetobleed.com/images/blueprint.jpg" class="aligncenter" width="360" height="307" /></p>
<div style="text-align:right; font-size: 0.8em; margin-bottom: 1.5em">
If you enjoy this article, <a href="http://feeds.feedburner.com/TimeToBleed">subscribe (via RSS or e-mail)</a> and <a href="http://twitter.com/tmm1">follow me on twitter.</a>
</div>
<style type="text/css">
#mdump pre.prettyprint{
  font-size: 0.85em;
  padding: 0.85em;
}
#mdump pre.output{
  margin-top: 0.3em;
  margin-bottom: 1.4em
}
div.main #mdump h2{
  padding-top: 1em;
  border-bottom: 1px solid black;
}
#mdump h3{
  margin-top: 0.45em;
  font-size: 1.4em;
  text-decoration: underline;
}
#mdump ul.links{
  padding-left: 2em;
}
#mdump ul.links li {
  margin: 0;
  padding: 0;
}
</style>
<div id="mdump">
After <a href="http://twitter.com/joedamato">Joe</a> released memprof a few days ago, <a href="http://twitter.com/tmm1">I</a> started thinking about ways to add more functionality.</p>
<p>The initial Memprof release only offered a simple stats api, inspired by the one in bleak_house:</p>
<pre class="prettyprint">
require 'memprof'
Memprof.start
o = Object.new
Memprof.stats
</pre>
<pre class="prettyprint output">
      1 test.rb:3:Object
</pre>
<p>With the help of <a href="http://twitter.com/lloydhilaiel">lloyd</a>&#8216;s excellent <a href="http://github.com/lloyd/yajl">yajl json library</a>, I&#8217;ve slowly been building a <a href="http://github.com/ice799/memprof/commits/heap_dump">full-featured heap dumper</a>: <code>Memprof.dump</code>.</p>
<pre class="prettyprint">
require 'memprof'
Memprof.start
[]
Memprof.dump
</pre>
<pre class="prettyprint output">
[
  {
    "address": "0xea52f0",
    "source": "test.rb:3",
    "type": "array",
    "length": 0
  }
]
</pre>
<h3 style="padding-top: 1em">Where can I find it?</h3>
<p>This new heap dumper will be in the next release of Memprof. If you want to play with it, checkout the <a href="http://github.com/ice799/memprof/commits/heap_dump">heap_dump branch</a> on github.</p>
<h3>What else is planned?</h3>
<p>Over the next few days, I&#8217;m going to add a <code>Memprof.dump_all</code> method to dump out the entire ruby heap. This full dump will contain complete knowledge of the ruby object graph (what objects point to other objects), and its json format will allow for easy analysis. I&#8217;m envisioning a set of post-processing tools that can find leaks, calculate object memory usage, and generate various visualizations of memory consumption and object hierarchies.</p>
<h3>Why should I care?</h3>
<p>In building and testing <code>Memprof.dump</code>, I&#8217;ve learned a lot about different types of ruby objects. The rest of this post covers interesting details about common ruby objects, with examples of how they&#8217;re created and what they look like inside the MRI VM.</p>
<p><span id="more-1426"></span></p>
<h2>Objects and Floats</h2>
<pre class="prettyprint">
o = Object.new
o.instance_variable_set(:@pi, 3+0.14159)
</pre>
<pre class="prettyprint output">
  {
    "address": "0x1823dd8",
    "source": "test.rb:3",
    "type": "object",
    "class": "0x1854b38",
    "class_name": "Object",
    "ivars": {
      "@pi": "0x1823da0"
    }
  }
</pre>
<p>This ruby object points to its class (<code>Object 0x1854b38</code>) and has some instance variables- here, there&#8217;s only one variable named <code>@pi</code> that points to another object at <code>0x1823da0</code>.</p>
<p>The address <code>0x1823da0</code> belongs to a <code>float object</code>- this float was created on the heap when MRI executed the code <code>3 + 0.14159</code>.</p>
<pre class="prettyprint output">
  {
    "address": "0x1823da0",
    "source": "test.rb:4",
    "type": "float",
    "data": 3.14159
  }
</pre>
<p>The float <code>0.14159</code> used in the addition also lives on the heap, but it is created upfront once when the ruby source is parsed.</p>
<h2>Strings</h2>
<p>Unlike floats, new string objects are created every time ruby encounters a string in its execution path.</p>
<pre class="prettyprint">
1.times{"abc"}
</pre>
<pre class="prettyprint output">
  {
    "type": "string",
    "shared": "0x15136a0",
    "flags": ["elts_shared"]
  }
</pre>
<p>This newly created <code>string object</code> has no character data associated with it; instead, it is marked <code>elts_shared</code> and points to <code>0x15136a0</code>. In this case, <code>0x15136a0</code> is another string object- one that holds the actual data &#8220;abc&#8221; and was created earlier when the ruby source was parsed.</p>
<h2>Arrays and Fixnums</h2>
<pre class="prettyprint">
[1,2,3,"hello"]
</pre>
<pre class="prettyprint output">
  {
    "type": "array",
    "length": 4,
    "data": [
      1,
      2,
      3,
      "0x12aa0c0"
    ]
  }
</pre>
<p>The fixnums <code>1</code>, <code>2</code> and <code>3</code> in the array are immediates, so they live in the array itself and do not occupy slots on the ruby heap<sup>1</sup>. The fourth member is the <code>string object</code> &#8220;hello&#8221; that lives at <code>0x12aa0c0</code>.</p>
<h2>Hashes and Symbols</h2>
<pre class="prettyprint">
{:a=>1,"b"=>:c}
</pre>
<pre class="prettyprint output">
  {
    "type": "hash",
    "length": 2,
    "default": null,
    "data": {
      "0xd13378": ":c",
      ":a": 1
    }
  }
</pre>
<p>The symbols <code>:a</code> and <code>:c</code> are also immediates, so they live directly inside the hash&#8217;s data table. The key for &#8220;b&#8221; is a pointer to that string object at <code>0xd13378</code>.</p>
<h2>Blocks and Data</h2>
<p>Hashes can also be created with a default block.</p>
<pre class="prettyprint">
Hash.new{|h,k| h[k] = k; h }
</pre>
<pre class="prettyprint output">
  {
    "type": "hash",
    "length": 0,
    "default": "0xcca208"
  },
  {
    "address": "0xcca208",
    "type": "data",
    "class": "0xcced80",
    "class_name": "Proc"
  }
</pre>
<p>In this case, the block is converted to a new Proc <code>data object</code> that holds a reference to an internal <code>struct BLOCK</code><sup>2</sup>. The new hash&#8217;s <code>default</code> field points to the address of the Proc.</p>
<p><code>Data objects</code> are commonly created by C extensions to point to external memory that needs to be marked and freed using ruby&#8217;s garbage collector.</p>
<h2>Classes</h2>
<p>A simple class definition creates many objects on the heap.</p>
<pre class="prettyprint output">
class MyClass; end
</pre>
<p>First is the class itself, along with the class&#8217;s string representation (pointed to by an internal ivar <code>__classpath__</code>). Notice the class object holds a reference to its superclass.</p>
<pre class="prettyprint output">
  {
    "address":"0x29f3228",
    "type": "class",
    "name": "MyClass",
    "super": "0x2a23b28",
    "super_name": "Object",
    "ivars": {
      "__classpath__": "0x29f31b8"
    }
  },
  {
    "address": "0x29f31b8",
    "type": "string",
    "length": 7,
    "data": "MyClass",
  }
</pre>
<p>The class definition also creates two more objects- an internal CREF node, and <strong>another</strong> <i>singleton</i> class with no name that is <code>__attached__</code> to MyClass.</p>
<pre class="prettyprint output">
  {
    "type": "node",
    "node_type": "CREF",
  },
  {
    "type": "class",
    "name": null,
    "super": "0x2a23a80",
    "super_name": null,
    "singleton": true,
    "ivars": {
      "__attached__": "0x29f3228"
    }
  }
</pre>
<p>This singleton is <code>MyClass</code>&#8216;s metaclass, where singleton methods and instance variables are added.</p>
<pre class="prettyprint">
MyClass.instance_variable_set(:@a, 123)
</pre>
<pre class="prettyprint output">
  {
    "type": "class",
    "name": null,
    "singleton": true,
    "ivars": {
      "__attached__": "0x29f3228",
      "@a": 123
    }
  }
</pre>
<h3>Constants, Class and Instance Variables</h3>
<p>Classes store both constants and class variables along with the instance variables.</p>
<pre class="prettyprint">
class MyClass
  A=1
  @@b=2
  @c=3
end
</pre>
<pre class="prettyprint output">
  {
    "type": "class",
    "name": "MyClass",
    "ivars": {
      "@@b": 2,
      "A": 1,
      "@c": 3
    }
  }
</pre>
<h3>Methods</h3>
<p>Methods are stored in a separate method table and represented by <code>METHOD node objects</code> which hold the method body.</p>
<pre class="prettyprint">
class MyClass
  def d() end
end
</pre>
<pre class="prettyprint output">
  {
    "type": "class",
    "name": "MyClass",
    "methods": {
      "d": "0xb7ec30"
    }
  },
  {
    "address": "0xb7ec30",
    "type": "node",
    "node_type": "METHOD",
  }
</pre>
<h2>Method Invocation</h2>
<pre class="prettyprint">
def test()
  a=1
  b=:b
  c='c'
  Memprof.dump<sup>3</sup>
end
test()
</pre>
<pre class="prettyprint output">
  {
    "type": "scope",
    "node": "0xa9bdd0",
    "variables": {
      "_": null,
      "~": null,
      "a": 1,
      "b": ":b",
      "c": "0xb60ce8"
    }
  }
</pre>
<p>During method invocation, a new <code>scope object</code> is created on the heap. This scope points to the <code>node object</code> representing the method body, and has a list of all local variables.</p>
<p>The local variables include the perl-style ruby magic variables <code>$_</code> and <code>$~</code>.</p>
<h2>Modules and IClasses</h2>
<p>Modules in ruby are similar to classes and have the same associated strings and CREF nodes created with them.</p>
<pre class="prettyprint">
module MyModule; end
</pre>
<pre class="prettyprint output">
  {
    "address": "0xe82248",
    "type": "module",
    "name": "MyModule",
    "super": false,
    "ivars": {
      "__classpath__": "0x208eda8",
      "__classid__": ":MyModule"
    }
  }
</pre>
<p>When a module is included into a class, an extra <code>iclass object</code> is created:</p>
<pre class="prettyprint">
class MyClass
  include MyModule
end
</pre>
<pre class="prettyprint output">
  {
    "address": "0x208ecc8",
    "source": "-e:1",
    "type": "iclass",
    "super": "0x20bfb40",
    "super_name": "Object",
    "ivars": {
      "__classpath__": "0x208eda8",
      "__classid__": ":MyModule"
    }
  }
</pre>
<p>This new <code>iclass</code> points to <code>MyClass</code>&#8216;s old superclass, and shares its instance variable and method tables with <code>MyModule</code>. Once created, this <code>iclass</code> becomes <code>MyClass</code>&#8216;s new superclass.</p>
<pre class="prettyprint output">
  {
    "type": "class",
    "name": "MyClass",
    "super": "0x208ecc8",
    "super_name": "MyModule",
  }
</pre>
<h2>and more..</h2>
<p>Ruby has various other internal object types, including Regexps, Matches, Bignums, Structs, Files, Varmaps, and almost 130 different types of Nodes. Memprof will eventually be able to dump out all these objects in individual detail.
</p></div>
<ol class="footnotes"><li id="footnote_0_1426" class="footnote">Fixnums can, however, <a href="http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/17321">still have instance variables</a></li><li id="footnote_1_1426" class="footnote">Future versions of memprof will print out <code>struct BLOCK</code>s in more detail, to show all references held by ruby procs</li><li id="footnote_2_1426" class="footnote">Memprof.dump was called in the method body, because the scope is freed explicitly when the method ends (unless it is referenced by a block).</li></ol>]]></content:encoded>
			<wfw:commentRss>http://timetobleed.com/what-is-a-ruby-object-introducing-memprof-dump/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>memprof: A Ruby level memory profiler</title>
		<link>http://timetobleed.com/memprof-a-ruby-level-memory-profiler/</link>
		<comments>http://timetobleed.com/memprof-a-ruby-level-memory-profiler/#comments</comments>
		<pubDate>Fri, 11 Dec 2009 12:59:43 +0000</pubDate>
		<dc:creator>Joe Damato</dc:creator>
				<category><![CDATA[bugfix]]></category>
		<category><![CDATA[debugging]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[systems]]></category>
		<category><![CDATA[x86]]></category>
		<category><![CDATA[debug]]></category>
		<category><![CDATA[garbage collection]]></category>
		<category><![CDATA[GC]]></category>
		<category><![CDATA[memory]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[profiling]]></category>
		<category><![CDATA[system health]]></category>
		<category><![CDATA[x86_64]]></category>

		<guid isPermaLink="false">http://timetobleed.com/?p=1398</guid>
		<description><![CDATA[If you enjoy this article, subscribe (via RSS or e-mail) and follow me on twitter. What is memprof and why do I care? memprof is a Ruby gem which supplies memory profiler functionality similar to bleak_house without patching the Ruby VM. You just install the gem, call a function or two, and off you go. [...]]]></description>
			<content:encoded><![CDATA[<p><center><img src="http://timetobleed.com/images/memory.jpg" alt="" width="300" height="200" /></center><br />
If you enjoy this article, <a rel="alternate" type="application/rss+xml" href="http://feeds.feedburner.com/TimeToBleed">subscribe (via RSS or e-mail)</a> and <a href="http://twitter.com/joedamato">follow me on twitter.</a></p>
<h2>What is memprof and why do I care?</h2>
<p>memprof is a Ruby gem which supplies memory profiler functionality similar to bleak_house <b>without</b> patching the Ruby VM. You just install the gem, call a function or two, and off you go.</p>
<h2>Where do I get it?</h2>
<p>memprof is available on gemcutter, so you can just:</p>
<p><b><code>gem install memprof</code></b></p>
<p>Feel free to browse the source code at: <a href="http://github.com/ice799/memprof">http://github.com/ice799/memprof</a>.</p>
<h2>How do I use it?</h2>
<p>Using memprof is simple. Before we look at some examples, let me explain more precisely what memprof is measuring.</p>
<p>memprof is measuring the number of objects created and not destroyed during a segment of Ruby code. The ideal use case for memprof is to show you where objects that do not get destroyed are being created: </p>
<ul>
<li>Objects are created and not destroyed when you create new classes. This is a good thing.</li>
<li>Sometimes garbage objects sit around until <code>garbage_collect</code> has had a chance to run. These objects will go away.</li>
<li>Yet in other cases you might be holding a reference to a large chain of objects without knowing it. Until you remove this reference, the entire chain of objects will remain in memory taking up space.</li>
</ul>
<p>memprof will show objects created in all cases listed above.</p>
<p>OK, now Let&#8217;s take a look at two examples and their output.</p>
<p>A simple program with an obvious memory &#8220;leak&#8221;:</p>
<pre class="prettyprint">
require 'memprof'

@blah = Hash.new([])

Memprof.start
100.times {
  @blah[1] << "aaaaa"
}

1000.times {
   @blah[2] << "bbbbb"
}
Memprof.stats
Memprof.stop
</pre>
<p>
<p>
This program creates 1100 objects which are not destroyed during the <code>start</code> and <code>stop</code> sections of the file because references are held for each object created.</p>
<p>Let's look at the output from memprof:</p>
<pre>
   1000 test.rb:11:String
    100 test.rb:7:String
</pre>
<p>
<p>In this example memprof shows the 1100 created, broken up by file, line number, and type.</p>
<p>Let's take a look at another example:</p>
<pre class="prettyprint">
require 'memprof'
Memprof.start
require "stringio"
StringIO.new
Memprof.stats
</pre>
<p>
<p>This simple program is measuring the number of objects created when requiring <code>stringio</code>.</p>
<p>Let's take a look at the output:</p>
<pre>
    108 /custom/ree/lib/ruby/1.8/x86_64-linux/stringio.so:0:__node__
     14 test2.rb:3:String
      2 /custom/ree/lib/ruby/1.8/x86_64-linux/stringio.so:0:Class
      1 test2.rb:4:StringIO
      1 test2.rb:4:String
      1 test2.rb:3:Array
      1 /custom/ree/lib/ruby/1.8/x86_64-linux/stringio.so:0:Enumerable
</pre>
<p>
<p>This output shows an internal Ruby interpreter type <code>__node__</code> was created (these represent code), as well as a few <code>String</code>s and other objects. Some of these objects are just garbage objects which haven't had a chance to be recycled yet.</p>
<p>What if nudge the garbage_collector along a little bit just for our example? Let's add the following two lines of code to our previous example:</p>
<pre class="prettyprint">
GC.start
Memprof.stats
</pre>
<p>
<p>We're now nudging the garbage collector and outputting memprof stats information again. This should show fewer objects, as the garbage collector will recycle some of the garbage objects:</p>
<pre>
    108 /custom/ree/lib/ruby/1.8/x86_64-linux/stringio.so:0:__node__
      2 test2.rb:3:String
      2 /custom/ree/lib/ruby/1.8/x86_64-linux/stringio.so:0:Class
      1 /custom/ree/lib/ruby/1.8/x86_64-linux/stringio.so:0:Enumerable
</pre>
<p></p>
<p>As you can see above, a few <code>String</code>s and other objects went away after the garbage collector ran.</p>
<h2>Which Rubies and systems are supported?</h2>
<ul>
<li>Only <b>unstripped</b> binaries are supported. To determine if your Ruby binary is stripped, simply run: <code>file `which ruby`</code>. If it is, consult your package manager's documentation. Most Linux distributions offer a package with an unstripped Ruby binary.</li>
<li>Only <b>x86_64</b> is supported at this time. Hopefully, I'll have time to add support for i386/i686 in the immediate future.</li>
<li>Linux Ruby Enterprise Edition (1.8.6 and 1.8.7) is supported.</li>
<li>Linux MRI Ruby 1.8.6 and 1.8.7 built with --disable-shared are supported. Support for --enable-shared binaries is <b>coming soon.</b></li>
<li>Snow Leopard support is <b>experimental</b> at this time.</li>
<li>Ruby 1.9 support <b>coming soon</b>.</li>
</ul>
<h2>How does it work?</h2>
<p>If you've been reading my blog over the last week or so, you'd have noticed two previous blog posts (<a href="http://timetobleed.com/rewrite-your-ruby-vm-at-runtime-to-hot-patch-useful-features/">here</a> and <a href="http://timetobleed.com/hot-patching-inlined-functions-with-x86_64-asm-metaprogramming/">here</a>) that describe some tricks I came up with for modifying a running binary image in memory.</p>
<p>memprof is a combination of all those tricks and other hacks to allow memory profiling in Ruby without the need for custom patches to the Ruby VM. You simply require the gem and off you go.</p>
<p>memprof works by inserting trampolines on object allocation and deallocation routines. It gathers metadata about the objects and outputs this information when the <code>stats</code> method is called.</p>
<h2>What else is planned?</h2>
<p><a href="http://twitter.com/joedamato">Myself</a>, <a href="http://twitter.com/jakedouglas">Jake Douglas</a>, and <a href="http://www.twitter.com/tmm1">Aman Gupta</a> have lots of interesting ideas for new features. We don't want to ruin the surprise, but stay tuned. More cool stuff coming really soon :)</p>
<p>Thanks for reading and don't forget to <a rel="alternate" type="application/rss+xml" href="http://feeds.feedburner.com/TimeToBleed">subscribe (via RSS or e-mail)</a> and <a href="http://twitter.com/joedamato">follow me on twitter.</a></p>
]]></content:encoded>
			<wfw:commentRss>http://timetobleed.com/memprof-a-ruby-level-memory-profiler/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
	</channel>
</rss>

