<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>time to bleed by Joe Damato &#187; ruby</title>
	<atom:link href="http://timetobleed.com/category/ruby/feed/" rel="self" type="application/rss+xml" />
	<link>http://timetobleed.com</link>
	<description>technical ramblings from a wanna-be unix dinosaur</description>
	<lastBuildDate>Tue, 20 Jul 2010 21:03:03 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Garbage Collection and the Ruby Heap (from railsconf)</title>
		<link>http://timetobleed.com/garbage-collection-and-the-ruby-heap-from-railsconf/</link>
		<comments>http://timetobleed.com/garbage-collection-and-the-ruby-heap-from-railsconf/#comments</comments>
		<pubDate>Tue, 08 Jun 2010 16:38:20 +0000</pubDate>
		<dc:creator>Joe Damato</dc:creator>
				<category><![CDATA[debugging]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[scaling]]></category>
		<category><![CDATA[systems]]></category>
		<category><![CDATA[x86]]></category>
		<category><![CDATA[debug]]></category>
		<category><![CDATA[garbage collection]]></category>
		<category><![CDATA[GC]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[ltrace]]></category>
		<category><![CDATA[memory]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[profiling]]></category>
		<category><![CDATA[x86_64]]></category>

		<guid isPermaLink="false">http://timetobleed.com/?p=1787</guid>
		<description><![CDATA[Download as PDF (15mb) Garbage Collection and the Ruby Heap]]></description>
			<content:encoded><![CDATA[<p><a style="float:right" href="http://dl.dropbox.com/u/1681973/gc-railsconf.pdf">Download as PDF (15mb)</a><br />
<a title="View Garbage Collection and the Ruby Heap on Scribd" href="http://www.scribd.com/doc/32718051/Garbage-Collection-and-the-Ruby-Heap" style="margin: 12px auto 6px auto; font-family: Helvetica,Arial,Sans-serif; font-style: normal; font-variant: normal; font-weight: normal; font-size: 14px; line-height: normal; font-size-adjust: none; font-stretch: normal; -x-system-font: none; display: block; text-decoration: underline;">Garbage Collection and the Ruby Heap</a> <object id="doc_179903367382288" name="doc_179903367382288" height="600" width="100%" type="application/x-shockwave-flash" data="http://d1.scribdassets.com/ScribdViewer.swf" style="outline:none;" ><param name="movie" value="http://d1.scribdassets.com/ScribdViewer.swf"><param name="wmode" value="opaque"><param name="bgcolor" value="#ffffff"><param name="allowFullScreen" value="true"><param name="allowScriptAccess" value="always"><param name="FlashVars" value="document_id=32718051&#038;access_key=key-1hl4d18vocqmc9ilk9a&#038;page=1&#038;viewMode=slideshow"><embed id="doc_179903367382288" name="doc_179903367382288" src="http://d1.scribdassets.com/ScribdViewer.swf?document_id=32718051&#038;access_key=key-1hl4d18vocqmc9ilk9a&#038;page=1&#038;viewMode=slideshow" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" height="600" width="100%" wmode="opaque" bgcolor="#ffffff"></embed></object></p>
]]></content:encoded>
			<wfw:commentRss>http://timetobleed.com/garbage-collection-and-the-ruby-heap-from-railsconf/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Descent into Darkness: Understanding your system&#8217;s binary interface is the only way out</title>
		<link>http://timetobleed.com/descent-into-darkness-understanding-your-systems-binary-interface-is-the-only-way-out/</link>
		<comments>http://timetobleed.com/descent-into-darkness-understanding-your-systems-binary-interface-is-the-only-way-out/#comments</comments>
		<pubDate>Mon, 15 Mar 2010 19:11:19 +0000</pubDate>
		<dc:creator>Joe Damato</dc:creator>
				<category><![CDATA[bugfix]]></category>
		<category><![CDATA[debugging]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[scaling]]></category>
		<category><![CDATA[systems]]></category>
		<category><![CDATA[x86]]></category>
		<category><![CDATA[debug]]></category>
		<category><![CDATA[garbage collection]]></category>
		<category><![CDATA[GC]]></category>
		<category><![CDATA[memory]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[syscall]]></category>
		<category><![CDATA[x86_64]]></category>

		<guid isPermaLink="false">http://timetobleed.com/?p=1602</guid>
		<description><![CDATA[Download as PDF (3mb) Descent into Darkness: Understanding your system&#8217;s binary interface is the only way out.]]></description>
			<content:encoded><![CDATA[<p><a style="float:right" href="http://dl.dropbox.com/u/1681973/abi.pdf">Download as PDF (3mb)</a><br />
<a title="View Descent into Darkness: Understanding your system's binary interface is the only way out. on Scribd" href="http://www.scribd.com/doc/28264000/Descent-into-Darkness-Understanding-your-system-s-binary-interface-is-the-only-way-out" style="margin: 12px auto 6px auto; font-family: Helvetica,Arial,Sans-serif; font-style: normal; font-variant: normal; font-weight: normal; font-size: 14px; line-height: normal; font-size-adjust: none; font-stretch: normal; -x-system-font: none; display: block; text-decoration: underline;">Descent into Darkness: Understanding your system&#8217;s binary interface is the only way out.</a> <object id="doc_50009547124029" name="doc_50009547124029" height="600" width="100%" type="application/x-shockwave-flash" data="http://d1.scribdassets.com/ScribdViewer.swf" style="outline:none;" ><param name="movie" value="http://d1.scribdassets.com/ScribdViewer.swf"><param name="wmode" value="opaque"><param name="bgcolor" value="#ffffff"><param name="allowFullScreen" value="true"><param name="allowScriptAccess" value="always"><param name="FlashVars" value="document_id=28264000&#038;access_key=key-nywmlzldrcxb47d7tv9&#038;page=1&#038;viewMode=slideshow"><embed id="doc_50009547124029" name="doc_50009547124029" src="http://d1.scribdassets.com/ScribdViewer.swf?document_id=28264000&#038;access_key=key-nywmlzldrcxb47d7tv9&#038;page=1&#038;viewMode=slideshow" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" height="600" width="100%" wmode="opaque" bgcolor="#ffffff"></embed></object>	</p>
]]></content:encoded>
			<wfw:commentRss>http://timetobleed.com/descent-into-darkness-understanding-your-systems-binary-interface-is-the-only-way-out/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>EventMachine: scalable non-blocking i/o in ruby</title>
		<link>http://timetobleed.com/eventmachine-scalable-non-blocking-io-in-ruby/</link>
		<comments>http://timetobleed.com/eventmachine-scalable-non-blocking-io-in-ruby/#comments</comments>
		<pubDate>Fri, 12 Mar 2010 20:07:39 +0000</pubDate>
		<dc:creator>Aman Gupta</dc:creator>
				<category><![CDATA[linux]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[scaling]]></category>
		<category><![CDATA[systems]]></category>
		<category><![CDATA[x86]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[x86_64]]></category>

		<guid isPermaLink="false">http://timetobleed.com/?p=1574</guid>
		<description><![CDATA[Download as PDF (40mb) EventMachine: scalable non-blocking i/o in ruby]]></description>
			<content:encoded><![CDATA[<p><a style="float:right" href="http://dl.dropbox.com/u/635/em_export.pdf">Download as PDF (40mb)</a><br />
<a title="View EventMachine: scalable non-blocking i/o in ruby on Scribd" href="http://www.scribd.com/doc/28253878/EventMachine-scalable-non-blocking-i-o-in-ruby" style="margin: 12px auto 6px auto; font-family: Helvetica,Arial,Sans-serif; font-style: normal; font-variant: normal; font-weight: normal; font-size: 14px; line-height: normal; font-size-adjust: none; font-stretch: normal; -x-system-font: none; display: block; text-decoration: underline;">EventMachine: scalable non-blocking i/o in ruby</a> <object id="doc_298923438833050" name="doc_298923438833050" height="600" width="100%" type="application/x-shockwave-flash" data="http://d1.scribdassets.com/ScribdViewer.swf" style="outline:none;" ><param name="movie" value="http://d1.scribdassets.com/ScribdViewer.swf"><param name="wmode" value="opaque"><param name="bgcolor" value="#ffffff"><param name="allowFullScreen" value="true"><param name="allowScriptAccess" value="always"><param name="FlashVars" value="document_id=28253878&#038;access_key=key-1rb2iijpl7bew7i1f04i&#038;page=1&#038;viewMode=slideshow"><embed id="doc_298923438833050" name="doc_298923438833050" src="http://d1.scribdassets.com/ScribdViewer.swf?document_id=28253878&#038;access_key=key-1rb2iijpl7bew7i1f04i&#038;page=1&#038;viewMode=slideshow" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" height="600" width="100%" wmode="opaque" bgcolor="#ffffff"></embed></object></p>
]]></content:encoded>
			<wfw:commentRss>http://timetobleed.com/eventmachine-scalable-non-blocking-io-in-ruby/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Garbage Collection Slides from LA Ruby Conference</title>
		<link>http://timetobleed.com/garbage-collection-slides-from-la-ruby-conference/</link>
		<comments>http://timetobleed.com/garbage-collection-slides-from-la-ruby-conference/#comments</comments>
		<pubDate>Sat, 20 Feb 2010 22:03:14 +0000</pubDate>
		<dc:creator>Aman Gupta</dc:creator>
				<category><![CDATA[bugfix]]></category>
		<category><![CDATA[debugging]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[debug]]></category>
		<category><![CDATA[garbage collection]]></category>
		<category><![CDATA[GC]]></category>
		<category><![CDATA[memory]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[profiling]]></category>

		<guid isPermaLink="false">http://timetobleed.com/?p=1569</guid>
		<description><![CDATA[Garbage Collection and the Ruby Heap]]></description>
			<content:encoded><![CDATA[<p><a title="View Garbage Collection and the Ruby Heap on Scribd" href="http://www.scribd.com/doc/27174770/Garbage-Collection-and-the-Ruby-Heap" style="margin: 12px auto 6px auto; font-family: Helvetica,Arial,Sans-serif; font-style: normal; font-variant: normal; font-weight: normal; font-size: 14px; line-height: normal; font-size-adjust: none; font-stretch: normal; -x-system-font: none; display: block; text-decoration: underline;">Garbage Collection and the Ruby Heap</a> <object id="doc_629766057039419" name="doc_629766057039419" height="600" width="100%" type="application/x-shockwave-flash" data="http://d1.scribdassets.com/ScribdViewer.swf" style="outline:none;" ><param name="movie" value="http://d1.scribdassets.com/ScribdViewer.swf"><param name="wmode" value="opaque"><param name="bgcolor" value="#ffffff"><param name="allowFullScreen" value="true"><param name="allowScriptAccess" value="always"><param name="FlashVars" value="document_id=27174770&#038;access_key=key-2g5x6qhwa28yz3ia1hih&#038;page=1&#038;viewMode=slideshow"><embed id="doc_629766057039419" name="doc_629766057039419" src="http://d1.scribdassets.com/ScribdViewer.swf?document_id=27174770&#038;access_key=key-2g5x6qhwa28yz3ia1hih&#038;page=1&#038;viewMode=slideshow" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" height="600" width="100%" wmode="opaque" bgcolor="#ffffff"></embed></object>	</p>
]]></content:encoded>
			<wfw:commentRss>http://timetobleed.com/garbage-collection-slides-from-la-ruby-conference/feed/</wfw:commentRss>
		<slash:comments>17</slash:comments>
		</item>
		<item>
		<title>String together global offset tables to build a Ruby memory profiler</title>
		<link>http://timetobleed.com/string-together-global-offset-tables-to-build-a-ruby-memory-profiler/</link>
		<comments>http://timetobleed.com/string-together-global-offset-tables-to-build-a-ruby-memory-profiler/#comments</comments>
		<pubDate>Mon, 25 Jan 2010 12:59:56 +0000</pubDate>
		<dc:creator>Joe Damato</dc:creator>
				<category><![CDATA[debugging]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[systems]]></category>
		<category><![CDATA[x86]]></category>
		<category><![CDATA[debug]]></category>
		<category><![CDATA[memory]]></category>
		<category><![CDATA[profiling]]></category>
		<category><![CDATA[x86_64]]></category>

		<guid isPermaLink="false">http://timetobleed.com/?p=1539</guid>
		<description><![CDATA[If you enjoy this article, subscribe (via RSS or e-mail) and follow me on twitter. Disclaimer The tricks, techniques, and ugly hacks in this article are PLATFORM SPECIFIC, DANGEROUS, and NOT PORTABLE. This is the third article in a series of articles describing a set of low level hacks that I used to create memprof [...]]]></description>
			<content:encoded><![CDATA[<p><center><img src="http://timetobleed.com/images/got.jpg" alt="" width="400" height="300" /></center><br />
If you enjoy this article, <a rel="alternate" type="application/rss+xml" href="http://feeds.feedburner.com/TimeToBleed">subscribe (via RSS or e-mail)</a> and <a href="http://twitter.com/joedamato">follow me on twitter.</a></p>
<h2>Disclaimer</h2>
<p><i>The tricks, techniques, and ugly hacks in this article are  <b>PLATFORM SPECIFIC</b>, <b>DANGEROUS</b>, and <b>NOT PORTABLE</b>. </i></p>
<p>This is the third article in a series of articles describing a set of low level hacks that I used to create <a href="http://github.com/ice799/memprof">memprof</a> a Ruby level memory profiler. <b>You should be able to survive without reading the other articles in this series</b>, but you can check them out <a href="http://timetobleed.com/rewrite-your-ruby-vm-at-runtime-to-hot-patch-useful-features/">here</a> and <a href="http://timetobleed.com/hot-patching-inlined-functions-with-x86_64-asm-metaprogramming/">here</a>.</p>
<h2>How is this different from the other hooking articles/techniques?</h2>
<p>The previous articles explained how to insert trampolines in the <code>.text</code> segment of a binary. This article explains a cool technique for hooking functions in the <code>.text</code> segment of <i>shared libraries</i>, allowing your handler to run, and then resuming execution. Hooking shared libraries turns out to be less work than hooking the binary (in the case of Ruby, that is), but making it all happen was a bit tricky. Read on to learn more.</p>
<h2>The &#8220;problem&#8221; with shared libraries</h2>
<p>The problem is that if a trampoline is inserted into the code of the shared library, the trampoline will need to invoke the dynamic linker to resolve the function that is being hooked, call the function, do whatever additional logic is desired, and then resume execution.</p>
<p><b>In other words you need to (somehow) insert a trampoline for a function that will call the function being trampolined without ending up in an infinite loop.</b></p>
<p>The additional complexity occurs because when shared libraries are loaded, the kernel decides at runtime where exactly in memory the library should be loaded. Since the exact location of symbols is not known at link time, a procedure linkage table (<code>.plt</code>) is created so that the program and the dynamic linker can work together to resolve symbol addresses.</p>
<p>I explained how <code>.plt</code>s work in a <a href="http://timetobleed.com/extending-ltrace-to-make-your-rubypythonperlphp-apps-faster/">previous article</a>, but looking at this again is worthwhile. I&#8217;ve simplified the explanation a bit<sup>1</sup>, but at a high level:</p>
<ol>
<li>Program calls a function in a shared object, the link editor makes sure that the program jumps to a stub function in the <code>.plt</code></li>
<li>The program sets some data up for the dynamic linker and then hands control over to it.</li>
<li>The dynamic linker looks at the info set up by the program and fills in the absolute address of the function that was called in the <code>.plt</code> in the global offset table (<code>.got</code>).</li>
<li>Then the dynamic linker calls the function.</li>
<li>Subsequent calls to the same function jump to the same stub in the <code>.plt</code>, but every time after the first call the absolute address is already in the <code>.got</code> (because when the dynamic linker is invoked the first time, it fills in the absolute address in the <code>.got</code>).</p>
</ol>
<p>Disassembling a short Ruby VM function that calls <code>rb_newobj</code> (a memory allocation routine that we&#8217;d like to hook), shows the calls to the <code>.plt</code>:</p>
<p><pre class="prettyprint">
000000000001af10 <ary_alloc>:
   . . . .
   1af14:       e8 e7 c6 ff ff          callq  17600 [rb_newobj@plt]
   . . . .
</pre>
</p>
<p>
Let&#8217;s take a look at the corresponding <code>.plt</code> stub:</p>
<pre class="prettyprint">
0000000000017600 <rb_newobj@plt>:
   17600:       ff 25 6a 9c 2c 00       jmpq   *0x2c9c6a(%rip) # 2e1270 [_GLOBAL_OFFSET_TABLE_+0x288]
   17606:       68 4e 00 00 00          pushq  $0x4e
   1760b:       e9 00 fb ff ff          jmpq   17110 <_init+0x18>
</pre>
</p>
<p><b><u>Important fact:</u></b> The program and each shared library has its own <code>.plt</code> and <code>.got</code> sections (amongst other sections). Keep this in mind as it&#8217;ll be handy very shortly.</p>
<p>That is a lot of stub code to reproduce in the trampoline. Reproducing that stuff in the trampoline shouldn&#8217;t be hard, but invites a large number of bugs over to play. <i>Is there a better way?</i></p>
<h2>What is a global offset table (<code>.got</code>)?</h2>
<p>The global offset table (<code>.got</code>) is a table of absolute addresses that can be filled in at runtime. In the assembly dump above, the <code>.got</code> entry for <code>rb_newobj</code> is referenced in the <code>.plt</code> stub code.</p>
<h2>Intercepting a function call</h2>
<p>It would be <b>awesome</b> if it were possible to overwrite the <code>.got</code> entry for <code>rb_newobj</code> and insert the address of a trampoline. But how would the intercepting function call <code>rb_newobj</code> itself without ending up in an infinite loop?</p>
<p>The <b>important fact</b> above comes in to save the day.</p>
<p>Since each shared object has its own <code>.plt</code> and <code>.got</code> sections, it is possible to overwrite the <code>.got</code> entry for <code>rb_newobj</code> in <i>every shared object except for the object where the trampoline lives</i>. Then, when <code>rb_newobj</code> is called, the <code>.plt</code> entry will redirect execution to the trampoline. The trampoline then calls out to its <code>.plt</code> entry for <code>rb_newobj</code> which is left untouched allowing <code>rb_newobj</code> to be resolved and called out to successfully.</p>
<h2>Not as easy as it sounds, though</h2>
<p>This solution is less work than the other hooking methods, but it has its own particular details as well:</p>
<ol>
<li>You&#8217;ll need to walk the link map at runtime to determine the base address for the shared library you are hooking (it could be anywhere).</li>
<li>Next, you&#8217;ll need to parse the <code>.rela.plt</code> section which contains information on the location of each <code>.plt</code> stub, relative to the base address of the shared object.</li>
<li>Once you have the address of the <code>.plt</code> stub, you&#8217;ll need to determine the absolute address of the <code>.got</code> entry by parsing the first instruction of the <code>.plt</code> stub (a <code>jmp</code>) as seen in the disassembly above.</li>
<li>Finally, you can write to the <code>.got</code> entry the address of your trampoline, as long as the trampoline <b>lives in a different shared library</b>.</li>
</ol>
<p>You&#8217;ve now successfully managed to poison the <code>.got</code> entry of a symbol in one shared library to direct execution to your own function which can then call the intercepted function itself without getting stuck in an infinite loop.</p>
<h2>Conclusion</h2>
<ul>
<li>There are lots of sections in each ELF object. Each section is special and important.</li>
<li>ELF documentation can be difficult to obtain and understand.</li>
<li>Got pretty lucky this time around. I was getting a little worried that it would get complicated. Made it out alive, though.</li>
</ul>
<p>Thanks for reading and don&#8217;t forget to <a rel="alternate" type="application/rss+xml" href="http://feeds.feedburner.com/TimeToBleed">subscribe (via RSS or e-mail)</a> and <a href="http://twitter.com/joedamato">follow me on twitter.</a></p>
<h2>References</h2>
<ol class="footnotes"><li id="footnote_0_1539" class="footnote"><a href="http://www.x86-64.org/documentation/abi.pdf">System V Application Binary Interface AMD64 Architecture Processor Supplement, p 78</a></li></ol>]]></content:encoded>
			<wfw:commentRss>http://timetobleed.com/string-together-global-offset-tables-to-build-a-ruby-memory-profiler/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What is a ruby object? (introducing Memprof.dump)</title>
		<link>http://timetobleed.com/what-is-a-ruby-object-introducing-memprof-dump/</link>
		<comments>http://timetobleed.com/what-is-a-ruby-object-introducing-memprof-dump/#comments</comments>
		<pubDate>Mon, 14 Dec 2009 12:59:52 +0000</pubDate>
		<dc:creator>Aman Gupta</dc:creator>
				<category><![CDATA[debugging]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[debug]]></category>
		<category><![CDATA[garbage collection]]></category>
		<category><![CDATA[GC]]></category>
		<category><![CDATA[memory]]></category>
		<category><![CDATA[profiling]]></category>

		<guid isPermaLink="false">http://timetobleed.com/?p=1426</guid>
		<description><![CDATA[If you enjoy this article, subscribe (via RSS or e-mail) and follow me on twitter. #mdump pre.prettyprint{ font-size: 0.85em; padding: 0.85em; } #mdump pre.output{ margin-top: 0.3em; margin-bottom: 1.4em } div.main #mdump h2{ padding-top: 1em; border-bottom: 1px solid black; } #mdump h3{ margin-top: 0.45em; font-size: 1.4em; text-decoration: underline; } #mdump ul.links{ padding-left: 2em; } #mdump ul.links [...]]]></description>
			<content:encoded><![CDATA[<p><img alt="" src="http://timetobleed.com/images/blueprint.jpg" class="aligncenter" width="360" height="307" /></p>
<div style="text-align:right; font-size: 0.8em; margin-bottom: 1.5em">
If you enjoy this article, <a href="http://feeds.feedburner.com/TimeToBleed">subscribe (via RSS or e-mail)</a> and <a href="http://twitter.com/tmm1">follow me on twitter.</a>
</div>
<style type="text/css">
#mdump pre.prettyprint{
  font-size: 0.85em;
  padding: 0.85em;
}
#mdump pre.output{
  margin-top: 0.3em;
  margin-bottom: 1.4em
}
div.main #mdump h2{
  padding-top: 1em;
  border-bottom: 1px solid black;
}
#mdump h3{
  margin-top: 0.45em;
  font-size: 1.4em;
  text-decoration: underline;
}
#mdump ul.links{
  padding-left: 2em;
}
#mdump ul.links li {
  margin: 0;
  padding: 0;
}
</style>
<div id="mdump">
After <a href="http://twitter.com/joedamato">Joe</a> released memprof a few days ago, <a href="http://twitter.com/tmm1">I</a> started thinking about ways to add more functionality.</p>
<p>The initial Memprof release only offered a simple stats api, inspired by the one in bleak_house:</p>
<pre class="prettyprint">
require 'memprof'
Memprof.start
o = Object.new
Memprof.stats
</pre>
<pre class="prettyprint output">
      1 test.rb:3:Object
</pre>
<p>With the help of <a href="http://twitter.com/lloydhilaiel">lloyd</a>&#8216;s excellent <a href="http://github.com/lloyd/yajl">yajl json library</a>, I&#8217;ve slowly been building a <a href="http://github.com/ice799/memprof/commits/heap_dump">full-featured heap dumper</a>: <code>Memprof.dump</code>.</p>
<pre class="prettyprint">
require 'memprof'
Memprof.start
[]
Memprof.dump
</pre>
<pre class="prettyprint output">
[
  {
    "address": "0xea52f0",
    "source": "test.rb:3",
    "type": "array",
    "length": 0
  }
]
</pre>
<h3 style="padding-top: 1em">Where can I find it?</h3>
<p>This new heap dumper will be in the next release of Memprof. If you want to play with it, checkout the <a href="http://github.com/ice799/memprof/commits/heap_dump">heap_dump branch</a> on github.</p>
<h3>What else is planned?</h3>
<p>Over the next few days, I&#8217;m going to add a <code>Memprof.dump_all</code> method to dump out the entire ruby heap. This full dump will contain complete knowledge of the ruby object graph (what objects point to other objects), and its json format will allow for easy analysis. I&#8217;m envisioning a set of post-processing tools that can find leaks, calculate object memory usage, and generate various visualizations of memory consumption and object hierarchies.</p>
<h3>Why should I care?</h3>
<p>In building and testing <code>Memprof.dump</code>, I&#8217;ve learned a lot about different types of ruby objects. The rest of this post covers interesting details about common ruby objects, with examples of how they&#8217;re created and what they look like inside the MRI VM.</p>
<p><span id="more-1426"></span></p>
<h2>Objects and Floats</h2>
<pre class="prettyprint">
o = Object.new
o.instance_variable_set(:@pi, 3+0.14159)
</pre>
<pre class="prettyprint output">
  {
    "address": "0x1823dd8",
    "source": "test.rb:3",
    "type": "object",
    "class": "0x1854b38",
    "class_name": "Object",
    "ivars": {
      "@pi": "0x1823da0"
    }
  }
</pre>
<p>This ruby object points to its class (<code>Object 0x1854b38</code>) and has some instance variables- here, there&#8217;s only one variable named <code>@pi</code> that points to another object at <code>0x1823da0</code>.</p>
<p>The address <code>0x1823da0</code> belongs to a <code>float object</code>- this float was created on the heap when MRI executed the code <code>3 + 0.14159</code>.</p>
<pre class="prettyprint output">
  {
    "address": "0x1823da0",
    "source": "test.rb:4",
    "type": "float",
    "data": 3.14159
  }
</pre>
<p>The float <code>0.14159</code> used in the addition also lives on the heap, but it is created upfront once when the ruby source is parsed.</p>
<h2>Strings</h2>
<p>Unlike floats, new string objects are created every time ruby encounters a string in its execution path.</p>
<pre class="prettyprint">
1.times{"abc"}
</pre>
<pre class="prettyprint output">
  {
    "type": "string",
    "shared": "0x15136a0",
    "flags": ["elts_shared"]
  }
</pre>
<p>This newly created <code>string object</code> has no character data associated with it; instead, it is marked <code>elts_shared</code> and points to <code>0x15136a0</code>. In this case, <code>0x15136a0</code> is another string object- one that holds the actual data &#8220;abc&#8221; and was created earlier when the ruby source was parsed.</p>
<h2>Arrays and Fixnums</h2>
<pre class="prettyprint">
[1,2,3,"hello"]
</pre>
<pre class="prettyprint output">
  {
    "type": "array",
    "length": 4,
    "data": [
      1,
      2,
      3,
      "0x12aa0c0"
    ]
  }
</pre>
<p>The fixnums <code>1</code>, <code>2</code> and <code>3</code> in the array are immediates, so they live in the array itself and do not occupy slots on the ruby heap<sup>1</sup>. The fourth member is the <code>string object</code> &#8220;hello&#8221; that lives at <code>0x12aa0c0</code>.</p>
<h2>Hashes and Symbols</h2>
<pre class="prettyprint">
{:a=>1,"b"=>:c}
</pre>
<pre class="prettyprint output">
  {
    "type": "hash",
    "length": 2,
    "default": null,
    "data": {
      "0xd13378": ":c",
      ":a": 1
    }
  }
</pre>
<p>The symbols <code>:a</code> and <code>:c</code> are also immediates, so they live directly inside the hash&#8217;s data table. The key for &#8220;b&#8221; is a pointer to that string object at <code>0xd13378</code>.</p>
<h2>Blocks and Data</h2>
<p>Hashes can also be created with a default block.</p>
<pre class="prettyprint">
Hash.new{|h,k| h[k] = k; h }
</pre>
<pre class="prettyprint output">
  {
    "type": "hash",
    "length": 0,
    "default": "0xcca208"
  },
  {
    "address": "0xcca208",
    "type": "data",
    "class": "0xcced80",
    "class_name": "Proc"
  }
</pre>
<p>In this case, the block is converted to a new Proc <code>data object</code> that holds a reference to an internal <code>struct BLOCK</code><sup>2</sup>. The new hash&#8217;s <code>default</code> field points to the address of the Proc.</p>
<p><code>Data objects</code> are commonly created by C extensions to point to external memory that needs to be marked and freed using ruby&#8217;s garbage collector.</p>
<h2>Classes</h2>
<p>A simple class definition creates many objects on the heap.</p>
<pre class="prettyprint output">
class MyClass; end
</pre>
<p>First is the class itself, along with the class&#8217;s string representation (pointed to by an internal ivar <code>__classpath__</code>). Notice the class object holds a reference to its superclass.</p>
<pre class="prettyprint output">
  {
    "address":"0x29f3228",
    "type": "class",
    "name": "MyClass",
    "super": "0x2a23b28",
    "super_name": "Object",
    "ivars": {
      "__classpath__": "0x29f31b8"
    }
  },
  {
    "address": "0x29f31b8",
    "type": "string",
    "length": 7,
    "data": "MyClass",
  }
</pre>
<p>The class definition also creates two more objects- an internal CREF node, and <strong>another</strong> <i>singleton</i> class with no name that is <code>__attached__</code> to MyClass.</p>
<pre class="prettyprint output">
  {
    "type": "node",
    "node_type": "CREF",
  },
  {
    "type": "class",
    "name": null,
    "super": "0x2a23a80",
    "super_name": null,
    "singleton": true,
    "ivars": {
      "__attached__": "0x29f3228"
    }
  }
</pre>
<p>This singleton is <code>MyClass</code>&#8216;s metaclass, where singleton methods and instance variables are added.</p>
<pre class="prettyprint">
MyClass.instance_variable_set(:@a, 123)
</pre>
<pre class="prettyprint output">
  {
    "type": "class",
    "name": null,
    "singleton": true,
    "ivars": {
      "__attached__": "0x29f3228",
      "@a": 123
    }
  }
</pre>
<h3>Constants, Class and Instance Variables</h3>
<p>Classes store both constants and class variables along with the instance variables.</p>
<pre class="prettyprint">
class MyClass
  A=1
  @@b=2
  @c=3
end
</pre>
<pre class="prettyprint output">
  {
    "type": "class",
    "name": "MyClass",
    "ivars": {
      "@@b": 2,
      "A": 1,
      "@c": 3
    }
  }
</pre>
<h3>Methods</h3>
<p>Methods are stored in a separate method table and represented by <code>METHOD node objects</code> which hold the method body.</p>
<pre class="prettyprint">
class MyClass
  def d() end
end
</pre>
<pre class="prettyprint output">
  {
    "type": "class",
    "name": "MyClass",
    "methods": {
      "d": "0xb7ec30"
    }
  },
  {
    "address": "0xb7ec30",
    "type": "node",
    "node_type": "METHOD",
  }
</pre>
<h2>Method Invocation</h2>
<pre class="prettyprint">
def test()
  a=1
  b=:b
  c='c'
  Memprof.dump<sup>3</sup>
end
test()
</pre>
<pre class="prettyprint output">
  {
    "type": "scope",
    "node": "0xa9bdd0",
    "variables": {
      "_": null,
      "~": null,
      "a": 1,
      "b": ":b",
      "c": "0xb60ce8"
    }
  }
</pre>
<p>During method invocation, a new <code>scope object</code> is created on the heap. This scope points to the <code>node object</code> representing the method body, and has a list of all local variables.</p>
<p>The local variables include the perl-style ruby magic variables <code>$_</code> and <code>$~</code>.</p>
<h2>Modules and IClasses</h2>
<p>Modules in ruby are similar to classes and have the same associated strings and CREF nodes created with them.</p>
<pre class="prettyprint">
module MyModule; end
</pre>
<pre class="prettyprint output">
  {
    "address": "0xe82248",
    "type": "module",
    "name": "MyModule",
    "super": false,
    "ivars": {
      "__classpath__": "0x208eda8",
      "__classid__": ":MyModule"
    }
  }
</pre>
<p>When a module is included into a class, an extra <code>iclass object</code> is created:</p>
<pre class="prettyprint">
class MyClass
  include MyModule
end
</pre>
<pre class="prettyprint output">
  {
    "address": "0x208ecc8",
    "source": "-e:1",
    "type": "iclass",
    "super": "0x20bfb40",
    "super_name": "Object",
    "ivars": {
      "__classpath__": "0x208eda8",
      "__classid__": ":MyModule"
    }
  }
</pre>
<p>This new <code>iclass</code> points to <code>MyClass</code>&#8216;s old superclass, and shares its instance variable and method tables with <code>MyModule</code>. Once created, this <code>iclass</code> becomes <code>MyClass</code>&#8216;s new superclass.</p>
<pre class="prettyprint output">
  {
    "type": "class",
    "name": "MyClass",
    "super": "0x208ecc8",
    "super_name": "MyModule",
  }
</pre>
<h2>and more..</h2>
<p>Ruby has various other internal object types, including Regexps, Matches, Bignums, Structs, Files, Varmaps, and almost 130 different types of Nodes. Memprof will eventually be able to dump out all these objects in individual detail.
</p></div>
<ol class="footnotes"><li id="footnote_0_1426" class="footnote">Fixnums can, however, <a href="http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/17321">still have instance variables</a></li><li id="footnote_1_1426" class="footnote">Future versions of memprof will print out <code>struct BLOCK</code>s in more detail, to show all references held by ruby procs</li><li id="footnote_2_1426" class="footnote">Memprof.dump was called in the method body, because the scope is freed explicitly when the method ends (unless it is referenced by a block).</li></ol>]]></content:encoded>
			<wfw:commentRss>http://timetobleed.com/what-is-a-ruby-object-introducing-memprof-dump/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>memprof: A Ruby level memory profiler</title>
		<link>http://timetobleed.com/memprof-a-ruby-level-memory-profiler/</link>
		<comments>http://timetobleed.com/memprof-a-ruby-level-memory-profiler/#comments</comments>
		<pubDate>Fri, 11 Dec 2009 12:59:43 +0000</pubDate>
		<dc:creator>Joe Damato</dc:creator>
				<category><![CDATA[bugfix]]></category>
		<category><![CDATA[debugging]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[systems]]></category>
		<category><![CDATA[x86]]></category>
		<category><![CDATA[debug]]></category>
		<category><![CDATA[garbage collection]]></category>
		<category><![CDATA[GC]]></category>
		<category><![CDATA[memory]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[profiling]]></category>
		<category><![CDATA[system health]]></category>
		<category><![CDATA[x86_64]]></category>

		<guid isPermaLink="false">http://timetobleed.com/?p=1398</guid>
		<description><![CDATA[If you enjoy this article, subscribe (via RSS or e-mail) and follow me on twitter. What is memprof and why do I care? memprof is a Ruby gem which supplies memory profiler functionality similar to bleak_house without patching the Ruby VM. You just install the gem, call a function or two, and off you go. [...]]]></description>
			<content:encoded><![CDATA[<p><center><img src="http://timetobleed.com/images/memory.jpg" alt="" width="300" height="200" /></center><br />
If you enjoy this article, <a rel="alternate" type="application/rss+xml" href="http://feeds.feedburner.com/TimeToBleed">subscribe (via RSS or e-mail)</a> and <a href="http://twitter.com/joedamato">follow me on twitter.</a></p>
<h2>What is memprof and why do I care?</h2>
<p>memprof is a Ruby gem which supplies memory profiler functionality similar to bleak_house <b>without</b> patching the Ruby VM. You just install the gem, call a function or two, and off you go.</p>
<h2>Where do I get it?</h2>
<p>memprof is available on gemcutter, so you can just:</p>
<p><b><code>gem install memprof</code></b></p>
<p>Feel free to browse the source code at: <a href="http://github.com/ice799/memprof">http://github.com/ice799/memprof</a>.</p>
<h2>How do I use it?</h2>
<p>Using memprof is simple. Before we look at some examples, let me explain more precisely what memprof is measuring.</p>
<p>memprof is measuring the number of objects created and not destroyed during a segment of Ruby code. The ideal use case for memprof is to show you where objects that do not get destroyed are being created: </p>
<ul>
<li>Objects are created and not destroyed when you create new classes. This is a good thing.</li>
<li>Sometimes garbage objects sit around until <code>garbage_collect</code> has had a chance to run. These objects will go away.</li>
<li>Yet in other cases you might be holding a reference to a large chain of objects without knowing it. Until you remove this reference, the entire chain of objects will remain in memory taking up space.</li>
</ul>
<p>memprof will show objects created in all cases listed above.</p>
<p>OK, now Let&#8217;s take a look at two examples and their output.</p>
<p>A simple program with an obvious memory &#8220;leak&#8221;:</p>
<pre class="prettyprint">
require 'memprof'

@blah = Hash.new([])

Memprof.start
100.times {
  @blah[1] << "aaaaa"
}

1000.times {
   @blah[2] << "bbbbb"
}
Memprof.stats
Memprof.stop
</pre>
<p>
<p>
This program creates 1100 objects which are not destroyed during the <code>start</code> and <code>stop</code> sections of the file because references are held for each object created.</p>
<p>Let's look at the output from memprof:</p>
<pre>
   1000 test.rb:11:String
    100 test.rb:7:String
</pre>
<p>
<p>In this example memprof shows the 1100 created, broken up by file, line number, and type.</p>
<p>Let's take a look at another example:</p>
<pre class="prettyprint">
require 'memprof'
Memprof.start
require "stringio"
StringIO.new
Memprof.stats
</pre>
<p>
<p>This simple program is measuring the number of objects created when requiring <code>stringio</code>.</p>
<p>Let's take a look at the output:</p>
<pre>
    108 /custom/ree/lib/ruby/1.8/x86_64-linux/stringio.so:0:__node__
     14 test2.rb:3:String
      2 /custom/ree/lib/ruby/1.8/x86_64-linux/stringio.so:0:Class
      1 test2.rb:4:StringIO
      1 test2.rb:4:String
      1 test2.rb:3:Array
      1 /custom/ree/lib/ruby/1.8/x86_64-linux/stringio.so:0:Enumerable
</pre>
<p>
<p>This output shows an internal Ruby interpreter type <code>__node__</code> was created (these represent code), as well as a few <code>String</code>s and other objects. Some of these objects are just garbage objects which haven't had a chance to be recycled yet.</p>
<p>What if nudge the garbage_collector along a little bit just for our example? Let's add the following two lines of code to our previous example:</p>
<pre class="prettyprint">
GC.start
Memprof.stats
</pre>
<p>
<p>We're now nudging the garbage collector and outputting memprof stats information again. This should show fewer objects, as the garbage collector will recycle some of the garbage objects:</p>
<pre>
    108 /custom/ree/lib/ruby/1.8/x86_64-linux/stringio.so:0:__node__
      2 test2.rb:3:String
      2 /custom/ree/lib/ruby/1.8/x86_64-linux/stringio.so:0:Class
      1 /custom/ree/lib/ruby/1.8/x86_64-linux/stringio.so:0:Enumerable
</pre>
<p></p>
<p>As you can see above, a few <code>String</code>s and other objects went away after the garbage collector ran.</p>
<h2>Which Rubies and systems are supported?</h2>
<ul>
<li>Only <b>unstripped</b> binaries are supported. To determine if your Ruby binary is stripped, simply run: <code>file `which ruby`</code>. If it is, consult your package manager's documentation. Most Linux distributions offer a package with an unstripped Ruby binary.</li>
<li>Only <b>x86_64</b> is supported at this time. Hopefully, I'll have time to add support for i386/i686 in the immediate future.</li>
<li>Linux Ruby Enterprise Edition (1.8.6 and 1.8.7) is supported.</li>
<li>Linux MRI Ruby 1.8.6 and 1.8.7 built with --disable-shared are supported. Support for --enable-shared binaries is <b>coming soon.</b></li>
<li>Snow Leopard support is <b>experimental</b> at this time.</li>
<li>Ruby 1.9 support <b>coming soon</b>.</li>
</ul>
<h2>How does it work?</h2>
<p>If you've been reading my blog over the last week or so, you'd have noticed two previous blog posts (<a href="http://timetobleed.com/rewrite-your-ruby-vm-at-runtime-to-hot-patch-useful-features/">here</a> and <a href="http://timetobleed.com/hot-patching-inlined-functions-with-x86_64-asm-metaprogramming/">here</a>) that describe some tricks I came up with for modifying a running binary image in memory.</p>
<p>memprof is a combination of all those tricks and other hacks to allow memory profiling in Ruby without the need for custom patches to the Ruby VM. You simply require the gem and off you go.</p>
<p>memprof works by inserting trampolines on object allocation and deallocation routines. It gathers metadata about the objects and outputs this information when the <code>stats</code> method is called.</p>
<h2>What else is planned?</h2>
<p><a href="http://twitter.com/joedamato">Myself</a>, <a href="http://twitter.com/jakedouglas">Jake Douglas</a>, and <a href="http://www.twitter.com/tmm1">Aman Gupta</a> have lots of interesting ideas for new features. We don't want to ruin the surprise, but stay tuned. More cool stuff coming really soon :)</p>
<p>Thanks for reading and don't forget to <a rel="alternate" type="application/rss+xml" href="http://feeds.feedburner.com/TimeToBleed">subscribe (via RSS or e-mail)</a> and <a href="http://twitter.com/joedamato">follow me on twitter.</a></p>
]]></content:encoded>
			<wfw:commentRss>http://timetobleed.com/memprof-a-ruby-level-memory-profiler/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Hot patching inlined functions with x86_64 asm metaprogramming</title>
		<link>http://timetobleed.com/hot-patching-inlined-functions-with-x86_64-asm-metaprogramming/</link>
		<comments>http://timetobleed.com/hot-patching-inlined-functions-with-x86_64-asm-metaprogramming/#comments</comments>
		<pubDate>Thu, 10 Dec 2009 12:59:47 +0000</pubDate>
		<dc:creator>Joe Damato</dc:creator>
				<category><![CDATA[debugging]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[systems]]></category>
		<category><![CDATA[x86]]></category>
		<category><![CDATA[debug]]></category>
		<category><![CDATA[garbage collection]]></category>
		<category><![CDATA[GC]]></category>
		<category><![CDATA[memory]]></category>
		<category><![CDATA[profiling]]></category>
		<category><![CDATA[x86_64]]></category>

		<guid isPermaLink="false">http://timetobleed.com/?p=1331</guid>
		<description><![CDATA[If you enjoy this article, subscribe (via RSS or e-mail) and follow me on twitter. Disclaimer The tricks, techniques, and ugly hacks in this article are PLATFORM SPECIFIC, DANGEROUS, and NOT PORTABLE. This article will make reference to information in my previous article Rewrite your Ruby VM at runtime to hot patch useful features so [...]]]></description>
			<content:encoded><![CDATA[<p><center><img src="http://timetobleed.com/images/hotpatch.jpg" alt="" width="200" height="300" /></center><br />
If you enjoy this article, <a rel="alternate" type="application/rss+xml" href="http://feeds.feedburner.com/TimeToBleed">subscribe (via RSS or e-mail)</a> and <a href="http://twitter.com/joedamato">follow me on twitter.</a></p>
<h2>Disclaimer</h2>
<p><i>The tricks, techniques, and ugly hacks in this article are  <b>PLATFORM SPECIFIC</b>, <b>DANGEROUS</b>, and <b>NOT PORTABLE</b>. </i></p>
<p>This article will make reference to information in my previous article <a href="http://timetobleed.com/rewrite-your-ruby-vm-at-runtime-to-hot-patch-useful-features/">Rewrite your Ruby VM at runtime to hot patch useful features</a> so be sure to check it out if you find yourself lost during this article.</p>
<p>Also, this <b>might not qualify as metaprogramming in the traditional definition</b><sup>1</sup>, but this article will show how to generate assembly at runtime that works well with the particular instructions generated for a binary. In other words, the <b>assembly is constructed based on data collected from the binary at runtime</b>. When I explained this to <a href="http://twitter.com/tmm1">Aman</a>, he called it <i>assembly metaprogramming</i>.</p>
<h2>TLDR</h2>
<p>This article expands on a <a href="http://timetobleed.com/rewrite-your-ruby-vm-at-runtime-to-hot-patch-useful-features/">previous article</a> by showing how to hook functions which are inlined by the compiler. This technique can be applied to other binaries, but the binary in question is Ruby Enterprise Edition 1.8.7. The use case is to build a memory profiler without requiring patches to the VM, but just a Ruby gem.</p>
<h2>It&#8217;s on GitHub</h2>
<p>The memory profiler <b>is NOT DONE, yet.</b> It will be soon. Stay tuned.</p>
<p>The code described here is incorporated into a Ruby Gem which can be found on github: <a href="http://github.com/ice799/memprof">http://github.com/ice799/memprof</a> specifically at: <a href="http://github.com/ice799/memprof/blob/master/ext/memprof.c#L202-318">http://github.com/ice799/memprof/blob/master/ext/memprof.c#L202-318</a></p>
<h2>Overview of the plan of attack</h2>
<p>The plan of attack is relatively straight forward:</p>
<ol>
<li>Find the inlined code.</li>
<li>Overwrite part of it to redirect to a stub.</li>
<li>Call out to a handler from the stub.</li>
<li>Make sure the return path is sane.</li>
</ol>
<p>As simple as this seems, implementing these steps is actually a bit tricky.</p>
<h2>Finding pieces of inlined code</h2>
<p>Before finding pieces of inlined code, let&#8217;s first examine the C code we want to hook. I&#8217;m going to be showing how to hook the inline function <code>add_freelist</code>.</p>
<p>The code for <code>add_freelist</code> is short:</p>
<pre class="prettyprint">
static inline void
add_freelist(p)
    RVALUE *p;
{
    if (p->as.free.flags != 0)
        p->as.free.flags = 0;
    if (p->as.free.next != freelist)
        p->as.free.next = freelist;
    freelist = p;
}
</pre>
</p>
<p>
<p>There is one really important feature of this code which stands out almost immediately. <code>freelist</code> has (at least) compilation unit scope. This is <b>awesome</b> because <code>freelist</code> serves as a marker when searching for assembly instructions to overwrite. Since the <code>freelist</code> has compilation unit scope, it&#8217;ll live at some static memory location. </p>
<p><b>If we find writes to this static memory location, we find our inline function code.</b></i></p>
<p>Let&#8217;s take a look at the instructions generated from this C code (unrelated instructions snipped out):</p>
<pre class="prettyprint">
  437f21:       48 c7 00 00 00 00 00    movq   $0x0,(%rax)
   . . . . .
  437f2c:       48 8b 05 65 de 2d 00    mov    0x2dde65(%rip),%rax  # 715d98 [freelist]
   . . . . .
  437f48:       48 89 05 49 de 2d 00    mov    %rax,0x2dde49(%rip)  # 715d98 [freelist]
</pre>
</p>
<p>
<p>The last instruction above updates freelist, it is the instruction generated for the C statement <code>freelist = p;</code>.</p>
<p>As you can see from the instruction, the destination is <code>freelist</code>. This makes it insanely easy to locate instances of this inline function. Just need to write a piece of C code which scans the binary image in memory, searching for <code>mov</code> instructions where the destination is <code>freelist</code> and I&#8217;ve found the inlined instances of <code>add_freelist</code>.</p>
<p>Why not insert a trampoline by overwriting that last <code>mov</code> instruction?</p>
<h2>Overwriting with a <code>jmp</code></h2>
<p>The <code>mov</code> instruction above is <b>7 bytes</b> wide. As long as the instruction we&#8217;re going to implant is 7 bytes or thinner, everything is good to go. Using a <code>callq</code> is out of the question because we can&#8217;t ensure the stack is 16-byte aligned as per the x86_64 ABI<sup>2</sup>. As it turns out, a <code>jmp</code> instruction that uses a 32bit displacement from the instruction pointer only requires <b>5 bytes</b>. We&#8217;ll be able to implant the instruction that&#8217;s needed, and even have room to spare.</p>
<p>I created a struct to encapsulate this short 7 byte trampoline. 5 bytes for the <code>jmp</code>, 2 bytes for <code>NOPs</code>. Let&#8217;s take a look:</p>
<pre class="prettyprint">
  struct tramp_inline tramp = {
    .jmp           = {'\xe9'},
    .displacement  = 0,
    .pad           = {'\x90', '\x90'},
  };
</pre>
<p>Let&#8217;s fill in the displacement later, after actually <i>finding</i> the instruction that&#8217;s going to get overwritten.</p>
<p>So, to find the instruction that&#8217;ll be overwritten, just look for a <code>mov</code> opcode and check that the destination is <code>freelist</code>:</p>
<pre class="prettyprint">
    /* make sure it is a mov instruction */
    if (byte[1] == '\x89') {

      /* Read the REX byte to make sure it is a mov that we care about */
      if ( (byte[0] == '\x48') ||
          (byte[0] == '\x4c') ) {

        /* Grab the target of the mov. REMEMBER: in this case the target is
         * a 32bit displacment that gets added to RIP (where RIP is the adress of
         * the next instruction).
         */
        mov_target = *(uint32_t *)(byte + 3);

        /* Sanity check. Ensure that the displacement from freelist to the next
         * instruction matches the mov_target. If so, we know this mov is
         * updating freelist.
         */
        if ( (freelist - (void *)(byte+7) ) == mov_target) {
</pre>
<p>
<p>At this point we&#8217;ve definitely found a mov instruction with <code>freelist</code> as the destination. Let&#8217;s calculate the displacement to the stage 2 trampoline for our <code>jmp</code> instruction and write the instruction into memory.</p>
<pre class="prettyprint">
/* Setup the stage 1 trampoline. Calculate the displacement to
 * the stage 2 trampoline from the next instruction.
 *
 * REMEMBER!!!! The next instruction will be NOP after our stage 1
 * trampoline is written. This is 5 bytes into the structure, even
 * though the original instruction we overwrote was 7 bytes.
 */
 tramp.displacement = (uint32_t)(destination - (void *)(byte+5));

/* Figure out what page the stage 1 tramp is gonna be written to, mark
 * it WRITE, write the trampoline in, and then remove WRITE permission.
 */
 aligned_addr = page_align(byte);
 mprotect(aligned_addr, (void *)byte - aligned_addr + 10,
               PROT_READ|PROT_WRITE|PROT_EXEC);
 memcpy(byte, &#038;tramp, sizeof(struct tramp_inline));
 mprotect(aligned_addr, (void *)byte - aligned_addr + 10,
              PROT_READ|PROT_EXEC);
</pre>
<p>
<p>Cool, all that&#8217;s left is to build the stage 2 trampoline which will set everything up for the C level handler.</p>
<h2>An assembly stub to set the stage for our C handler</h2>
<p>So, what does the assembly need to do to call the C handler? Quite a bit actually so let&#8217;s map it out, step by step:</p>
<ol>
<li>Replicate the instruction which was overwritten so that the object is actually added to the freelist.</li>
<li>Save the value of <code>rdi</code> register. This register is where the first argument to a function lives and will store the obj that was added to the freelist for the C handler to do analysis on.</li>
<li>Load the object being added to the freelist into <code>rdi</code></li>
<li>Save the value of <code>rbx</code> so that we can use the register as an operand for an absolute indirect <code>callq</code> instruction.</li>
<li>Save <code>rbp</code> and <code>rsp</code> to allow a way to undo the stack alignment later.</li>
<li>Align the stack to a 16-byte boundary to comply with the x86_64 ABI.</li>
<li>Move the address of the handler into <code>rbx</code></li>
<li>Call the handler through <code>rbx</code>.</li>
<li>Restore <code>rbp</code>, <code>rsp</code>, <code>rdi</code>, <code>rbx</code>.</li>
<li>Jump back to the instruction after the instruction which was overwritten.</li>
</ol>
<p>
<p>To accomplish this let&#8217;s build out a structure with as much set up as possible and fill in the displacement fields later. This &#8220;base&#8221; struct looks like this:</p>
<pre class="prettyprint">
  struct inline_tramp_tbl_entry inline_ent = {
    .rex     = {'\x48'},
    .mov     = {'\x89'},
    .src_reg = {'\x05'},
    .mov_displacement = 0,

    .frame = {
      .push_rdi = {'\x57'},
      .mov_rdi = {'\x48', '\x8b', '\x3d'},
      .rdi_source_displacement = 0,
      .push_rbx = {'\x53'},
      .push_rbp = {'\x55'},
      .save_rsp = {'\x48', '\x89', '\xe5'},
      .align_rsp = {'\x48', '\x83', '\xe4', '\xf0'},
      .mov = {'\x48', '\xbb'},
      .addr = error_tramp,
      .callq = {'\xff', '\xd3'},
      .leave = {'\xc9'},
      .rbx_restore = {'\x5b'},
      .rdi_restore = {'\x5f'},
    },

    .jmp  = {'\xe9'},
    .jmp_displacement = 0,
  };
</pre>
</p>
<p>
<p>So, what&#8217;s left to do:</p>
<ol>
<li>Copy the REX and source register bytes of the instruction which was overwritten to replicate it.</li>
<li>Calculate the displacement to <code>freelist</code> to fully generate the overwritten <code>mov</code>.</li>
<li>Calculate the displacement to <code>freelist</code> so that it can be stored in <code>rdi</code> as an argument to the C handler.</li>
<li>Fill in the absolute address for the handler.</li>
<li>Calculate the displacement to the instruction after the stage 1 trampoline in order to <code>jmp</code> back to resume execution as normal.</li>
</ol>
<p>Doing that is relatively straight-forward. Let&#8217;s take a look at the C snippets that make this happen:</p>
<pre class="prettyprint">
/* Before the stage 1 trampoline gets written, we need to generate
 * the code for the stage 2 trampoline. Let's copy over the REX byte
 * and the byte which mentions the source register into the stage 2
 * trampoline.
 */
inl_tramp_st2 = inline_tramp_table + entry;
inl_tramp_st2->rex[0] = byte[0];
inl_tramp_st2->src_reg[0] = byte[2];

. . . . . 

/* Finish setting up the stage 2 trampoline. */

/* calculate the displacement to freelist from the next instruction.
 *
 * This is used to replicate the original instruction we overwrote.
 */
inl_tramp_st2->mov_displacement = freelist - (void *)&#038;(inl_tramp_st2->frame);

/* fill in the displacement to freelist from the next instruction.
 *
 * This is to arrange for the new value in freelist to be in %rdi, and as such
 * be the first argument to the C handler. As per the amd64 ABI.
 */
inl_tramp_st2->frame.rdi_source_displacement = freelist -
                                          (void *)&#038;(inl_tramp_st2->frame.push_rbx);

/* jmp back to the instruction after stage 1 trampoline was inserted
 *
 * This can be 5 or 7, it doesn't matter. If its 5, we'll hit our 2
 * NOPS. If its 7, we'll land directly on the next instruction.
 */
inl_tramp_st2->jmp_displacement = (uint32_t)((void *)(byte + 7) -
                                         (void *)(inline_tramp_table + entry + 1));

/* write the address of our C level trampoline in to the structure */
inl_tramp_st2->frame.addr = freelist_tramp;
</pre>
</p>
<p>
<p><b>Awesome.</b></p>
<p>We&#8217;ve successfully patched the binary in memory, inserted an assembly stub which was generated at runtime, called a hook function, and ensured that execution can resume normally.</p>
<h2>So, what&#8217;s the status on that memory profiler?</h2>
<p>Almost done, stay tuned for more updates coming SOON.</p>
<h2>Conclusion</h2>
<ul>
<li>Hackery like this is unmaintainable, unstable, stupid, but also fun to work on and think about.</li>
<li>Being able to hook <code>add_freelist</code> like this provides the last tool needed to implement a version of bleak_house (a Ruby memory profiler) without patching the Ruby VM.</li>
<li>x86_64 instruction set is a painful instruction set.</li>
<li>Use the GNU assembler (<a href="http://en.wikipedia.org/wiki/GNU_Assembler">gas</a>) instead of trying to generate opcodes by reading the Intel instruction set PDFs if you value your sanity.</li>
</ul>
<p>Thanks for reading and don&#8217;t forget to <a rel="alternate" type="application/rss+xml" href="http://feeds.feedburner.com/TimeToBleed">subscribe (via RSS or e-mail)</a> and <a href="http://twitter.com/joedamato">follow me on twitter.</a></p>
<h2>References</h2>
<ol class="footnotes"><li id="footnote_0_1331" class="footnote"><a href="http://en.wikipedia.org/wiki/Metaprogramming">http://en.wikipedia.org/wiki/Metaprogramming</a></li><li id="footnote_1_1331" class="footnote"> <a href="http://www.x86-64.org/documentation/abi.pdf">x86_64 ABI</a></li></ol>]]></content:encoded>
			<wfw:commentRss>http://timetobleed.com/hot-patching-inlined-functions-with-x86_64-asm-metaprogramming/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Debugging Ruby: Understanding and Troubleshooting the VM and your Application</title>
		<link>http://timetobleed.com/debugging-ruby-understanding-and-troubleshooting-the-vm-and-your-application/</link>
		<comments>http://timetobleed.com/debugging-ruby-understanding-and-troubleshooting-the-vm-and-your-application/#comments</comments>
		<pubDate>Thu, 03 Dec 2009 03:30:14 +0000</pubDate>
		<dc:creator>Aman Gupta</dc:creator>
				<category><![CDATA[bugfix]]></category>
		<category><![CDATA[debugging]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[systems]]></category>
		<category><![CDATA[x86]]></category>
		<category><![CDATA[debug]]></category>
		<category><![CDATA[ltrace]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[profiling]]></category>
		<category><![CDATA[scaling]]></category>
		<category><![CDATA[strace]]></category>
		<category><![CDATA[syscall]]></category>
		<category><![CDATA[system health]]></category>
		<category><![CDATA[x86_64]]></category>

		<guid isPermaLink="false">http://timetobleed.com/?p=1325</guid>
		<description><![CDATA[Download the PDF here. Debugging Ruby]]></description>
			<content:encoded><![CDATA[<p style="text-align: right;">Download the PDF <a href="http://dl.dropbox.com/u/635/debugging_ruby.pdf">here</a>.</p>
<p><a style="margin: 12px auto 6px auto; font-family: Helvetica,Arial,Sans-serif; font-style: normal; font-variant: normal; font-weight: normal; font-size: 14px; line-height: normal; font-size-adjust: none; font-stretch: normal; -x-system-font: none; display: block; text-decoration: underline;" title="View Debugging Ruby on Scribd" href="http://www.scribd.com/doc/23548865/Debugging-Ruby">Debugging Ruby</a> <object id="doc_804966268746695" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="100%" height="500" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="name" value="doc_804966268746695" /><param name="align" value="middle" /><param name="quality" value="high" /><param name="play" value="true" /><param name="loop" value="true" /><param name="scale" value="showall" /><param name="wmode" value="opaque" /><param name="devicefont" value="false" /><param name="bgcolor" value="#ffffff" /><param name="menu" value="true" /><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="mode" value="slideshow" /><param name="src" value="http://d1.scribdassets.com/ScribdViewer.swf?document_id=23548865&amp;access_key=key-x28ugx92842n19ucqs&amp;page=1&amp;version=1&amp;viewMode=slideshow" /><param name="allowfullscreen" value="true" /><embed id="doc_804966268746695" type="application/x-shockwave-flash" width="100%" height="500" src="http://d1.scribdassets.com/ScribdViewer.swf?document_id=23548865&amp;access_key=key-x28ugx92842n19ucqs&amp;page=1&amp;version=1&amp;viewMode=slideshow" mode="slideshow" allowscriptaccess="always" allowfullscreen="true" menu="true" bgcolor="#ffffff" devicefont="false" wmode="opaque" scale="showall" loop="true" play="true" quality="high" align="middle" name="doc_804966268746695"></embed></object></p>
]]></content:encoded>
			<wfw:commentRss>http://timetobleed.com/debugging-ruby-understanding-and-troubleshooting-the-vm-and-your-application/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Rewrite your Ruby VM at runtime to hot patch useful features</title>
		<link>http://timetobleed.com/rewrite-your-ruby-vm-at-runtime-to-hot-patch-useful-features/</link>
		<comments>http://timetobleed.com/rewrite-your-ruby-vm-at-runtime-to-hot-patch-useful-features/#comments</comments>
		<pubDate>Mon, 23 Nov 2009 12:59:53 +0000</pubDate>
		<dc:creator>Joe Damato</dc:creator>
				<category><![CDATA[bugfix]]></category>
		<category><![CDATA[debugging]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[systems]]></category>
		<category><![CDATA[testing]]></category>
		<category><![CDATA[x86]]></category>
		<category><![CDATA[allocator]]></category>
		<category><![CDATA[debug]]></category>
		<category><![CDATA[garbage collection]]></category>
		<category><![CDATA[GC]]></category>
		<category><![CDATA[memory]]></category>
		<category><![CDATA[x86_64]]></category>

		<guid isPermaLink="false">http://timetobleed.com/?p=1253</guid>
		<description><![CDATA[If you enjoy this article, subscribe (via RSS or e-mail) and follow me on twitter. Some notes before the blood starts flowin&#8217; CAUTION: What you are about to read is dangerous, non-portable, and (in most cases) stupid. The code and article below refer only to the x86_64 architecture. Grab some gauze. This is going to [...]]]></description>
			<content:encoded><![CDATA[<p><center><img src="http://timetobleed.com/images/tramp.png" alt="" width="400" height="300" /></center><br />
If you enjoy this article, <a rel="alternate" type="application/rss+xml" href="http://feeds.feedburner.com/TimeToBleed">subscribe (via RSS or e-mail)</a> and <a href="http://twitter.com/joedamato">follow me on twitter.</a></p>
<h2>Some notes before the blood starts flowin&#8217;</h2>
<ul>
<li><strong>CAUTION:</strong> What you are about to read is dangerous, non-portable, and (in most cases) stupid.</li>
<li>The code and article below refer only to the <strong>x86_64</strong> architecture.</li>
<li>Grab some gauze. This is going to get ugly.</li>
</ul>
<h2>TLDR</h2>
<p>This article shows off a Ruby gem which has the power to overwrite a Ruby binary <em>in memory</em> while <em>it is running</em> to allow your code to execute in place of internal VM functions. This is useful if you&#8217;d like to hook all object allocation functions to build a memory profiler.</p>
<h2>This gem is on GitHub</h2>
<p>Yes, it&#8217;s on GitHub: <a href="http://github.com/ice799/memprof">http://github.com/ice799/memprof</a>.</p>
<h2>I want a memory profiler for Ruby</h2>
<p>This whole science experiment started during <a href="http://rubyconf.org/">RubyConf</a> when <a href="http://twitter.com/tmm1">Aman</a> and I began brainstorming ways to build a memory profiling tool for Ruby.</p>
<p>The big problem in our minds was that for most tools we&#8217;d have to include patches to the Ruby VM. That process is <b>long and somewhat difficult</b>, so I started thinking about ways to do this without modifying the Ruby source code itself.</p>
<p> The memory profiler is <b>NOT DONE</b> just yet. I thought that the hack I wrote to let us build something without modifying Ruby source code was interesting enough that it warranted a blog post. So let&#8217;s get rolling.</p>
<h2>What is a trampoline?</h2>
<p>Let&#8217;s pretend you have 2 functions: <code>functionA()</code> and <code>functionB()</code>. Let&#8217;s assume that <code>functionA()</code> calls <code>functionB()</code>.</p>
<p>Now also imagine that you&#8217;d like to insert a piece of code to execute in between the call to <code>functionB()</code>. You can imagine inserting a piece of code that <i>diverts execution</i> elsewhere, creating a flow: <code>functionA()</code> &#8211;> <code>functionC()</code> &#8211;> <code>functionB()</code></p>
<p>You can accomplish this by <i>inserting a trampoline</i>.</p>
<p>A trampoline is a piece of code that program execution jumps into and then <i>bounces</i> out of and on to somewhere else<sup>1</sup>.</p>
<p>This hack relies on the use of multiple trampolines. We&#8217;ll see why shortly.</p>
<h2>Two different kinds of trampolines</h2>
<p>There are two different kinds of trampolines that I considered while writing this hack, let&#8217;s take a closer look at both.</p>
<p>
<h3>Caller-side trampoline</h3>
<p>A <i>caller-side</i> trampoline works by overwriting the <a href="http://en.wikipedia.org/wiki/Opcodes">opcodes</a> in the <i>.text</i> segment of the program in the calling function causing it to call a different function <i>at runtime</i>.</p>
</p>
<p>The <b>big pros</b> of this method are:
<ul>
<li>You aren&#8217;t overwriting any code, only the address operand of a <code>callq</code> instruction.</li>
<li>Since you are only changing an operand, you can hook any function. You don&#8217;t need to build custom trampolines for each function.</li>
</ul>
<p> This method also has some <b>big cons</b> too:
<ul>
<li>You&#8217;ll need to scan <i>the entire binary in memory</i> and find and <i>overwrite</i> all address operands of <code>callq</code>. This is problematic because if you overwrite any false-positives you might break your application.</li>
<li>You have to deal with the implications of <code>callq</code>, which can be painful as we&#8217;ll see soon.</li>
</ul>
<p><h3>Callee-side trampoline</h3>
<p>A <i>callee-side</i> trampoline works by overwriting the opcodes in the <i>.text</i> segment of the program in the called function, causing it to call another function immediately</p>
<p>The <b>big pro</b> of this method is:
<ul>
<li>You only need to overwrite code in <i>one</i> place and don&#8217;t need to worry about accidentally scribbling on bytes that you didn&#8217;t mean to.</li>
</ul>
<p> this method has some <b>big cons</b> too:
<ul>
<li>You&#8217;ll need to carefully construct your trampoline code to only overwrite as little of the function as possible (or some how restore opcodes), especially if you expect the original function to work as expected later.</li>
<li>You&#8217;ll need to special case each trampoline you build for different optimization levels of the binary you are hooking into.</ul>
<p>I went with a <i>caller-side</i> trampoline because I wanted to ensure that I can hook any function and not have to worry about different Ruby binaries causing problems when they are compiled with different optimization levels.</p>
<h2>The stage 1 trampoline</h2>
<p>To insert my trampolines I needed to <i>insert some binary into the process</i> and then overwrite <code>callq</code> instructions like this:</p>
<p><pre class="prettyprint">
  41150b:       e8 cc 4e 02 00         callq  4363dc [rb_newobj]
  411510:       48 89 45 f8             ....
</pre>
</p>
<p></p>
<p> In the above code snippet, the byte <code>e8</code> is the <code>callq</code> opcode and the bytes <code>cc 4e 02 00</code> are the distance to <code>rb_newobj</code> from the address of the next instruction, 0&#215;411510</p>
<p>All I need to do is change the 4 bytes following <code>e8</code> to equal the displacement between the next instruction, 0&#215;411510 in this case, and my trampoline.</p>
<p><b>Problem.</b></p>
<p>My first cut at this code lead me to an important realization: the <code>callq</code> instructions used expect a <i>32bit displacement</i> from the function I am calling and <i>not</i> absolute addresses. <b>But</b>, the 64bit address space is <i>very</i> large. The displacement between the code for the Ruby binary that lives in the <code>.text</code> segment is so far away from my Ruby gem that the displacement <b>cannot be represented with only 32bits</b>.</p>
<p><b>So what now?</b></p>
<p>Well, luckily <code>mmap</code> has a flag <code>MAP_32BIT</code> which maps a page in the first 2GB of the address space. If I map some code there, it should be well within the range of values whose displacement I can represent in 32bits.</p>
<p>So, why not map a <b>second trampoline</b> to that page which can contains code that can call an <i>absolute address</i>?</p>
<p>My stage 1 trampoline code looks something like this:</p>
<p>
<pre class="prettyprint">
  /* the struct below is just a sequence of bytes which represent the
    *  following bit of assembly code, including 3 nops for padding:
    *
    *  mov $address, %rbx
    *  callq *%rbx
    *  ret
    *  nop
    *  nop
    *  nop
    */
  struct tramp_tbl_entry ent = {
    .mov = {'\x48','\xbb'},
    .addr = (long long)&#038;error_tramp,
    .callq = {'\xff','\xd3'},
    .ret = '\xc3',
    .pad =  {'\x90','\x90','\x90'},
  };

  tramp_table = mmap(NULL, 4096, PROT_WRITE|PROT_READ|PROT_EXEC,
                                   MAP_32BIT|MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
  if (tramp_table != MAP_FAILED) {
    for (; i < 4096/sizeof(struct tramp_tbl_entry); i ++ ) {
      memcpy(tramp_table + i, &#038;ent, sizeof(struct tramp_tbl_entry));
    }
  }
}
</pre>
<p>
<p>It <code>mmap</code>s a single page and writes a table of default trampolines (like a jump table) that all call an error trampoline by default. When a new trampoline is inserted, I just go to that entry in the table and insert the address that should be called.</p>
<p>To get around the displacement challenge described above, the addresses I insert into the stage 1 trampoline table are addresses for stage 2 trampolines.</p>
<h2>The stage 2 trampoline</h2>
<p>Setting up the stage 2 trampolines are pretty simple once the stage 1 trampoline table has been written to memory. All that needs to be done is update the address field in a free stage 1 trampoline to be the address of my stage 2 trampoline. These trampolines are written in C and live in my Ruby gem.</p>
<p>
<pre class="prettyprint">
static void
insert_tramp(char *trampee, void *tramp) {
  void *trampee_addr = find_symbol(trampee);
  int entry = tramp_size;
  tramp_table[tramp_size].addr = (long long)tramp;
  tramp_size++;
  update_image(entry, trampee_addr);
}
</pre>
</p>
<p>
<p>An example of a stage 2 trampoline for <code>rb_newobj</code> might be:</p>
<p>
<pre class="prettyprint">
static VALUE
newobj_tramp() {
  /* print the ruby source and line number where the allocation is occuring */
  printf("source = %s, line = %d\n", ruby_sourcefile, ruby_sourceline);

  /* call newobj like normal so the ruby app can continue */
  return rb_newobj();
}
</pre>
</p>
<h2>Programatically rewriting the Ruby binary in memory</h2>
<p>Overwriting the Ruby binary to cause my stage 1 trampolines to get hit is pretty simple, too. I can just scan the <code>.text</code> segment of the binary looking for bytes which look like <code>callq</code> instructions. Then, I can sanity check by reading the next 4 bytes which should be the displacement to the original function. Doing that sanity check should prevent false positives.</p>
<pre class="prettyprint">
static void
update_image(int entry, void *trampee_addr) {
  char *byte = text_segment;
  size_t count = 0;
  int fn_addr = 0;
  void *aligned_addr = NULL;

 /* check each byte in the .text segment */
  for(; count < text_segment_len; count++) {

    /* if it looks like a callq instruction... */
    if (*byte == '\xe8') {

      /* the next 4 bytes SHOULD BE the original displacement */
      fn_addr = *(int *)(byte+1);

      /* do a sanity check to make sure the next few bytes are an accurate displacement.
        * this helps to eliminate false positives.
        */
      if (trampee_addr - (void *)(byte+5) == fn_addr) {
        aligned_addr = (void*)(((long)byte+1)&#038;~(0xffff));

        /* mark the page in the .text segment as writable so it can be modified */
        mprotect(aligned_addr, (void *)byte+1 - aligned_addr + 10,
                       PROT_READ|PROT_WRITE|PROT_EXEC);

        /* calculate the new displacement and write it */
        *(int  *)(byte+1) = (uint32_t)((void *)(tramp_table + entry)
                                     - (void *)(byte + 5));

        /* disallow writing to this page of the .text segment again  */
        mprotect(aligned_addr, (((void *)byte+1) - aligned_addr) + 10,
                      PROT_READ|PROT_EXEC);
      }
    }
    byte++;
  }
}
</pre>
<p></p>
<h2>Sample output</h2>
<p>After requiring my ruby gem and running a test script which creates lots of objects, I see this output:</p>
<pre class="prettify">
...
source = test.rb, line = 8
source = test.rb, line = 8
source = test.rb, line = 8
source = test.rb, line = 8
source = test.rb, line = 8
source = test.rb, line = 8
source = test.rb, line = 8
...
</pre>
<p>
<p><b>Showing the file name and line number for each object getting allocated.</b> That should be a strong enough primitive to build a Ruby memory profiler without requiring end users to build a custom version of Ruby. It should also be possible to re-implement <a href="http://blog.evanweaver.com/articles/2007/04/28/bleak_house/">bleak_house</a> by using this gem (and maybe another trick or two).</p>
<p><b>Awesome.</b></p>
<h2>Conclusion</h2>
<ul>
<li>One step closer to building a memory profiler without requiring end users to find and use patches floating around the internet.</li>
<li>It is unclear whether cheap tricks like this are useful or harmful, but they are <b>fun</b> to write.</li>
<li>If you understand how your system works at an intimate level, nearly anything is possible. The work required to make it happen might be difficult though.</li>
</ul>
<p>
Thanks for reading and don't forget to <a rel="alternate" type="application/rss+xml" href="http://feeds.feedburner.com/TimeToBleed">subscribe (via RSS or e-mail)</a> and <a href="http://twitter.com/joedamato">follow me on twitter.</a></p>
<h2>References</h2>
<ol class="footnotes"><li id="footnote_0_1253" class="footnote"><a href="http://en.wikipedia.org/wiki/Trampoline_%28computers%29">http://en.wikipedia.org/wiki/Trampoline_(computers)</a></li></ol>]]></content:encoded>
			<wfw:commentRss>http://timetobleed.com/rewrite-your-ruby-vm-at-runtime-to-hot-patch-useful-features/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
	</channel>
</rss>
