<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>time to bleed by Joe Damato &#187; bugfix</title>
	<atom:link href="http://timetobleed.com/category/bugfix/feed/" rel="self" type="application/rss+xml" />
	<link>http://timetobleed.com</link>
	<description>technical ramblings from a wanna-be unix dinosaur</description>
	<lastBuildDate>Tue, 20 Jul 2010 21:03:03 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Descent into Darkness: Understanding your system&#8217;s binary interface is the only way out</title>
		<link>http://timetobleed.com/descent-into-darkness-understanding-your-systems-binary-interface-is-the-only-way-out/</link>
		<comments>http://timetobleed.com/descent-into-darkness-understanding-your-systems-binary-interface-is-the-only-way-out/#comments</comments>
		<pubDate>Mon, 15 Mar 2010 19:11:19 +0000</pubDate>
		<dc:creator>Joe Damato</dc:creator>
				<category><![CDATA[bugfix]]></category>
		<category><![CDATA[debugging]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[scaling]]></category>
		<category><![CDATA[systems]]></category>
		<category><![CDATA[x86]]></category>
		<category><![CDATA[debug]]></category>
		<category><![CDATA[garbage collection]]></category>
		<category><![CDATA[GC]]></category>
		<category><![CDATA[memory]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[syscall]]></category>
		<category><![CDATA[x86_64]]></category>

		<guid isPermaLink="false">http://timetobleed.com/?p=1602</guid>
		<description><![CDATA[Download as PDF (3mb) Descent into Darkness: Understanding your system&#8217;s binary interface is the only way out.]]></description>
			<content:encoded><![CDATA[<p><a style="float:right" href="http://dl.dropbox.com/u/1681973/abi.pdf">Download as PDF (3mb)</a><br />
<a title="View Descent into Darkness: Understanding your system's binary interface is the only way out. on Scribd" href="http://www.scribd.com/doc/28264000/Descent-into-Darkness-Understanding-your-system-s-binary-interface-is-the-only-way-out" style="margin: 12px auto 6px auto; font-family: Helvetica,Arial,Sans-serif; font-style: normal; font-variant: normal; font-weight: normal; font-size: 14px; line-height: normal; font-size-adjust: none; font-stretch: normal; -x-system-font: none; display: block; text-decoration: underline;">Descent into Darkness: Understanding your system&#8217;s binary interface is the only way out.</a> <object id="doc_50009547124029" name="doc_50009547124029" height="600" width="100%" type="application/x-shockwave-flash" data="http://d1.scribdassets.com/ScribdViewer.swf" style="outline:none;" ><param name="movie" value="http://d1.scribdassets.com/ScribdViewer.swf"><param name="wmode" value="opaque"><param name="bgcolor" value="#ffffff"><param name="allowFullScreen" value="true"><param name="allowScriptAccess" value="always"><param name="FlashVars" value="document_id=28264000&#038;access_key=key-nywmlzldrcxb47d7tv9&#038;page=1&#038;viewMode=slideshow"><embed id="doc_50009547124029" name="doc_50009547124029" src="http://d1.scribdassets.com/ScribdViewer.swf?document_id=28264000&#038;access_key=key-nywmlzldrcxb47d7tv9&#038;page=1&#038;viewMode=slideshow" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" height="600" width="100%" wmode="opaque" bgcolor="#ffffff"></embed></object>	</p>
]]></content:encoded>
			<wfw:commentRss>http://timetobleed.com/descent-into-darkness-understanding-your-systems-binary-interface-is-the-only-way-out/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Garbage Collection Slides from LA Ruby Conference</title>
		<link>http://timetobleed.com/garbage-collection-slides-from-la-ruby-conference/</link>
		<comments>http://timetobleed.com/garbage-collection-slides-from-la-ruby-conference/#comments</comments>
		<pubDate>Sat, 20 Feb 2010 22:03:14 +0000</pubDate>
		<dc:creator>Aman Gupta</dc:creator>
				<category><![CDATA[bugfix]]></category>
		<category><![CDATA[debugging]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[debug]]></category>
		<category><![CDATA[garbage collection]]></category>
		<category><![CDATA[GC]]></category>
		<category><![CDATA[memory]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[profiling]]></category>

		<guid isPermaLink="false">http://timetobleed.com/?p=1569</guid>
		<description><![CDATA[Garbage Collection and the Ruby Heap]]></description>
			<content:encoded><![CDATA[<p><a title="View Garbage Collection and the Ruby Heap on Scribd" href="http://www.scribd.com/doc/27174770/Garbage-Collection-and-the-Ruby-Heap" style="margin: 12px auto 6px auto; font-family: Helvetica,Arial,Sans-serif; font-style: normal; font-variant: normal; font-weight: normal; font-size: 14px; line-height: normal; font-size-adjust: none; font-stretch: normal; -x-system-font: none; display: block; text-decoration: underline;">Garbage Collection and the Ruby Heap</a> <object id="doc_629766057039419" name="doc_629766057039419" height="600" width="100%" type="application/x-shockwave-flash" data="http://d1.scribdassets.com/ScribdViewer.swf" style="outline:none;" ><param name="movie" value="http://d1.scribdassets.com/ScribdViewer.swf"><param name="wmode" value="opaque"><param name="bgcolor" value="#ffffff"><param name="allowFullScreen" value="true"><param name="allowScriptAccess" value="always"><param name="FlashVars" value="document_id=27174770&#038;access_key=key-2g5x6qhwa28yz3ia1hih&#038;page=1&#038;viewMode=slideshow"><embed id="doc_629766057039419" name="doc_629766057039419" src="http://d1.scribdassets.com/ScribdViewer.swf?document_id=27174770&#038;access_key=key-2g5x6qhwa28yz3ia1hih&#038;page=1&#038;viewMode=slideshow" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" height="600" width="100%" wmode="opaque" bgcolor="#ffffff"></embed></object>	</p>
]]></content:encoded>
			<wfw:commentRss>http://timetobleed.com/garbage-collection-slides-from-la-ruby-conference/feed/</wfw:commentRss>
		<slash:comments>17</slash:comments>
		</item>
		<item>
		<title>memprof: A Ruby level memory profiler</title>
		<link>http://timetobleed.com/memprof-a-ruby-level-memory-profiler/</link>
		<comments>http://timetobleed.com/memprof-a-ruby-level-memory-profiler/#comments</comments>
		<pubDate>Fri, 11 Dec 2009 12:59:43 +0000</pubDate>
		<dc:creator>Joe Damato</dc:creator>
				<category><![CDATA[bugfix]]></category>
		<category><![CDATA[debugging]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[systems]]></category>
		<category><![CDATA[x86]]></category>
		<category><![CDATA[debug]]></category>
		<category><![CDATA[garbage collection]]></category>
		<category><![CDATA[GC]]></category>
		<category><![CDATA[memory]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[profiling]]></category>
		<category><![CDATA[system health]]></category>
		<category><![CDATA[x86_64]]></category>

		<guid isPermaLink="false">http://timetobleed.com/?p=1398</guid>
		<description><![CDATA[If you enjoy this article, subscribe (via RSS or e-mail) and follow me on twitter. What is memprof and why do I care? memprof is a Ruby gem which supplies memory profiler functionality similar to bleak_house without patching the Ruby VM. You just install the gem, call a function or two, and off you go. [...]]]></description>
			<content:encoded><![CDATA[<p><center><img src="http://timetobleed.com/images/memory.jpg" alt="" width="300" height="200" /></center><br />
If you enjoy this article, <a rel="alternate" type="application/rss+xml" href="http://feeds.feedburner.com/TimeToBleed">subscribe (via RSS or e-mail)</a> and <a href="http://twitter.com/joedamato">follow me on twitter.</a></p>
<h2>What is memprof and why do I care?</h2>
<p>memprof is a Ruby gem which supplies memory profiler functionality similar to bleak_house <b>without</b> patching the Ruby VM. You just install the gem, call a function or two, and off you go.</p>
<h2>Where do I get it?</h2>
<p>memprof is available on gemcutter, so you can just:</p>
<p><b><code>gem install memprof</code></b></p>
<p>Feel free to browse the source code at: <a href="http://github.com/ice799/memprof">http://github.com/ice799/memprof</a>.</p>
<h2>How do I use it?</h2>
<p>Using memprof is simple. Before we look at some examples, let me explain more precisely what memprof is measuring.</p>
<p>memprof is measuring the number of objects created and not destroyed during a segment of Ruby code. The ideal use case for memprof is to show you where objects that do not get destroyed are being created: </p>
<ul>
<li>Objects are created and not destroyed when you create new classes. This is a good thing.</li>
<li>Sometimes garbage objects sit around until <code>garbage_collect</code> has had a chance to run. These objects will go away.</li>
<li>Yet in other cases you might be holding a reference to a large chain of objects without knowing it. Until you remove this reference, the entire chain of objects will remain in memory taking up space.</li>
</ul>
<p>memprof will show objects created in all cases listed above.</p>
<p>OK, now Let&#8217;s take a look at two examples and their output.</p>
<p>A simple program with an obvious memory &#8220;leak&#8221;:</p>
<pre class="prettyprint">
require 'memprof'

@blah = Hash.new([])

Memprof.start
100.times {
  @blah[1] << "aaaaa"
}

1000.times {
   @blah[2] << "bbbbb"
}
Memprof.stats
Memprof.stop
</pre>
<p>
<p>
This program creates 1100 objects which are not destroyed during the <code>start</code> and <code>stop</code> sections of the file because references are held for each object created.</p>
<p>Let's look at the output from memprof:</p>
<pre>
   1000 test.rb:11:String
    100 test.rb:7:String
</pre>
<p>
<p>In this example memprof shows the 1100 created, broken up by file, line number, and type.</p>
<p>Let's take a look at another example:</p>
<pre class="prettyprint">
require 'memprof'
Memprof.start
require "stringio"
StringIO.new
Memprof.stats
</pre>
<p>
<p>This simple program is measuring the number of objects created when requiring <code>stringio</code>.</p>
<p>Let's take a look at the output:</p>
<pre>
    108 /custom/ree/lib/ruby/1.8/x86_64-linux/stringio.so:0:__node__
     14 test2.rb:3:String
      2 /custom/ree/lib/ruby/1.8/x86_64-linux/stringio.so:0:Class
      1 test2.rb:4:StringIO
      1 test2.rb:4:String
      1 test2.rb:3:Array
      1 /custom/ree/lib/ruby/1.8/x86_64-linux/stringio.so:0:Enumerable
</pre>
<p>
<p>This output shows an internal Ruby interpreter type <code>__node__</code> was created (these represent code), as well as a few <code>String</code>s and other objects. Some of these objects are just garbage objects which haven't had a chance to be recycled yet.</p>
<p>What if nudge the garbage_collector along a little bit just for our example? Let's add the following two lines of code to our previous example:</p>
<pre class="prettyprint">
GC.start
Memprof.stats
</pre>
<p>
<p>We're now nudging the garbage collector and outputting memprof stats information again. This should show fewer objects, as the garbage collector will recycle some of the garbage objects:</p>
<pre>
    108 /custom/ree/lib/ruby/1.8/x86_64-linux/stringio.so:0:__node__
      2 test2.rb:3:String
      2 /custom/ree/lib/ruby/1.8/x86_64-linux/stringio.so:0:Class
      1 /custom/ree/lib/ruby/1.8/x86_64-linux/stringio.so:0:Enumerable
</pre>
<p></p>
<p>As you can see above, a few <code>String</code>s and other objects went away after the garbage collector ran.</p>
<h2>Which Rubies and systems are supported?</h2>
<ul>
<li>Only <b>unstripped</b> binaries are supported. To determine if your Ruby binary is stripped, simply run: <code>file `which ruby`</code>. If it is, consult your package manager's documentation. Most Linux distributions offer a package with an unstripped Ruby binary.</li>
<li>Only <b>x86_64</b> is supported at this time. Hopefully, I'll have time to add support for i386/i686 in the immediate future.</li>
<li>Linux Ruby Enterprise Edition (1.8.6 and 1.8.7) is supported.</li>
<li>Linux MRI Ruby 1.8.6 and 1.8.7 built with --disable-shared are supported. Support for --enable-shared binaries is <b>coming soon.</b></li>
<li>Snow Leopard support is <b>experimental</b> at this time.</li>
<li>Ruby 1.9 support <b>coming soon</b>.</li>
</ul>
<h2>How does it work?</h2>
<p>If you've been reading my blog over the last week or so, you'd have noticed two previous blog posts (<a href="http://timetobleed.com/rewrite-your-ruby-vm-at-runtime-to-hot-patch-useful-features/">here</a> and <a href="http://timetobleed.com/hot-patching-inlined-functions-with-x86_64-asm-metaprogramming/">here</a>) that describe some tricks I came up with for modifying a running binary image in memory.</p>
<p>memprof is a combination of all those tricks and other hacks to allow memory profiling in Ruby without the need for custom patches to the Ruby VM. You simply require the gem and off you go.</p>
<p>memprof works by inserting trampolines on object allocation and deallocation routines. It gathers metadata about the objects and outputs this information when the <code>stats</code> method is called.</p>
<h2>What else is planned?</h2>
<p><a href="http://twitter.com/joedamato">Myself</a>, <a href="http://twitter.com/jakedouglas">Jake Douglas</a>, and <a href="http://www.twitter.com/tmm1">Aman Gupta</a> have lots of interesting ideas for new features. We don't want to ruin the surprise, but stay tuned. More cool stuff coming really soon :)</p>
<p>Thanks for reading and don't forget to <a rel="alternate" type="application/rss+xml" href="http://feeds.feedburner.com/TimeToBleed">subscribe (via RSS or e-mail)</a> and <a href="http://twitter.com/joedamato">follow me on twitter.</a></p>
]]></content:encoded>
			<wfw:commentRss>http://timetobleed.com/memprof-a-ruby-level-memory-profiler/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Debugging Ruby: Understanding and Troubleshooting the VM and your Application</title>
		<link>http://timetobleed.com/debugging-ruby-understanding-and-troubleshooting-the-vm-and-your-application/</link>
		<comments>http://timetobleed.com/debugging-ruby-understanding-and-troubleshooting-the-vm-and-your-application/#comments</comments>
		<pubDate>Thu, 03 Dec 2009 03:30:14 +0000</pubDate>
		<dc:creator>Aman Gupta</dc:creator>
				<category><![CDATA[bugfix]]></category>
		<category><![CDATA[debugging]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[systems]]></category>
		<category><![CDATA[x86]]></category>
		<category><![CDATA[debug]]></category>
		<category><![CDATA[ltrace]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[profiling]]></category>
		<category><![CDATA[scaling]]></category>
		<category><![CDATA[strace]]></category>
		<category><![CDATA[syscall]]></category>
		<category><![CDATA[system health]]></category>
		<category><![CDATA[x86_64]]></category>

		<guid isPermaLink="false">http://timetobleed.com/?p=1325</guid>
		<description><![CDATA[Download the PDF here. Debugging Ruby]]></description>
			<content:encoded><![CDATA[<p style="text-align: right;">Download the PDF <a href="http://dl.dropbox.com/u/635/debugging_ruby.pdf">here</a>.</p>
<p><a style="margin: 12px auto 6px auto; font-family: Helvetica,Arial,Sans-serif; font-style: normal; font-variant: normal; font-weight: normal; font-size: 14px; line-height: normal; font-size-adjust: none; font-stretch: normal; -x-system-font: none; display: block; text-decoration: underline;" title="View Debugging Ruby on Scribd" href="http://www.scribd.com/doc/23548865/Debugging-Ruby">Debugging Ruby</a> <object id="doc_804966268746695" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="100%" height="500" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="name" value="doc_804966268746695" /><param name="align" value="middle" /><param name="quality" value="high" /><param name="play" value="true" /><param name="loop" value="true" /><param name="scale" value="showall" /><param name="wmode" value="opaque" /><param name="devicefont" value="false" /><param name="bgcolor" value="#ffffff" /><param name="menu" value="true" /><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="mode" value="slideshow" /><param name="src" value="http://d1.scribdassets.com/ScribdViewer.swf?document_id=23548865&amp;access_key=key-x28ugx92842n19ucqs&amp;page=1&amp;version=1&amp;viewMode=slideshow" /><param name="allowfullscreen" value="true" /><embed id="doc_804966268746695" type="application/x-shockwave-flash" width="100%" height="500" src="http://d1.scribdassets.com/ScribdViewer.swf?document_id=23548865&amp;access_key=key-x28ugx92842n19ucqs&amp;page=1&amp;version=1&amp;viewMode=slideshow" mode="slideshow" allowscriptaccess="always" allowfullscreen="true" menu="true" bgcolor="#ffffff" devicefont="false" wmode="opaque" scale="showall" loop="true" play="true" quality="high" align="middle" name="doc_804966268746695"></embed></object></p>
]]></content:encoded>
			<wfw:commentRss>http://timetobleed.com/debugging-ruby-understanding-and-troubleshooting-the-vm-and-your-application/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Rewrite your Ruby VM at runtime to hot patch useful features</title>
		<link>http://timetobleed.com/rewrite-your-ruby-vm-at-runtime-to-hot-patch-useful-features/</link>
		<comments>http://timetobleed.com/rewrite-your-ruby-vm-at-runtime-to-hot-patch-useful-features/#comments</comments>
		<pubDate>Mon, 23 Nov 2009 12:59:53 +0000</pubDate>
		<dc:creator>Joe Damato</dc:creator>
				<category><![CDATA[bugfix]]></category>
		<category><![CDATA[debugging]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[systems]]></category>
		<category><![CDATA[testing]]></category>
		<category><![CDATA[x86]]></category>
		<category><![CDATA[allocator]]></category>
		<category><![CDATA[debug]]></category>
		<category><![CDATA[garbage collection]]></category>
		<category><![CDATA[GC]]></category>
		<category><![CDATA[memory]]></category>
		<category><![CDATA[x86_64]]></category>

		<guid isPermaLink="false">http://timetobleed.com/?p=1253</guid>
		<description><![CDATA[If you enjoy this article, subscribe (via RSS or e-mail) and follow me on twitter. Some notes before the blood starts flowin&#8217; CAUTION: What you are about to read is dangerous, non-portable, and (in most cases) stupid. The code and article below refer only to the x86_64 architecture. Grab some gauze. This is going to [...]]]></description>
			<content:encoded><![CDATA[<p><center><img src="http://timetobleed.com/images/tramp.png" alt="" width="400" height="300" /></center><br />
If you enjoy this article, <a rel="alternate" type="application/rss+xml" href="http://feeds.feedburner.com/TimeToBleed">subscribe (via RSS or e-mail)</a> and <a href="http://twitter.com/joedamato">follow me on twitter.</a></p>
<h2>Some notes before the blood starts flowin&#8217;</h2>
<ul>
<li><strong>CAUTION:</strong> What you are about to read is dangerous, non-portable, and (in most cases) stupid.</li>
<li>The code and article below refer only to the <strong>x86_64</strong> architecture.</li>
<li>Grab some gauze. This is going to get ugly.</li>
</ul>
<h2>TLDR</h2>
<p>This article shows off a Ruby gem which has the power to overwrite a Ruby binary <em>in memory</em> while <em>it is running</em> to allow your code to execute in place of internal VM functions. This is useful if you&#8217;d like to hook all object allocation functions to build a memory profiler.</p>
<h2>This gem is on GitHub</h2>
<p>Yes, it&#8217;s on GitHub: <a href="http://github.com/ice799/memprof">http://github.com/ice799/memprof</a>.</p>
<h2>I want a memory profiler for Ruby</h2>
<p>This whole science experiment started during <a href="http://rubyconf.org/">RubyConf</a> when <a href="http://twitter.com/tmm1">Aman</a> and I began brainstorming ways to build a memory profiling tool for Ruby.</p>
<p>The big problem in our minds was that for most tools we&#8217;d have to include patches to the Ruby VM. That process is <b>long and somewhat difficult</b>, so I started thinking about ways to do this without modifying the Ruby source code itself.</p>
<p> The memory profiler is <b>NOT DONE</b> just yet. I thought that the hack I wrote to let us build something without modifying Ruby source code was interesting enough that it warranted a blog post. So let&#8217;s get rolling.</p>
<h2>What is a trampoline?</h2>
<p>Let&#8217;s pretend you have 2 functions: <code>functionA()</code> and <code>functionB()</code>. Let&#8217;s assume that <code>functionA()</code> calls <code>functionB()</code>.</p>
<p>Now also imagine that you&#8217;d like to insert a piece of code to execute in between the call to <code>functionB()</code>. You can imagine inserting a piece of code that <i>diverts execution</i> elsewhere, creating a flow: <code>functionA()</code> &#8211;> <code>functionC()</code> &#8211;> <code>functionB()</code></p>
<p>You can accomplish this by <i>inserting a trampoline</i>.</p>
<p>A trampoline is a piece of code that program execution jumps into and then <i>bounces</i> out of and on to somewhere else<sup>1</sup>.</p>
<p>This hack relies on the use of multiple trampolines. We&#8217;ll see why shortly.</p>
<h2>Two different kinds of trampolines</h2>
<p>There are two different kinds of trampolines that I considered while writing this hack, let&#8217;s take a closer look at both.</p>
<p>
<h3>Caller-side trampoline</h3>
<p>A <i>caller-side</i> trampoline works by overwriting the <a href="http://en.wikipedia.org/wiki/Opcodes">opcodes</a> in the <i>.text</i> segment of the program in the calling function causing it to call a different function <i>at runtime</i>.</p>
</p>
<p>The <b>big pros</b> of this method are:
<ul>
<li>You aren&#8217;t overwriting any code, only the address operand of a <code>callq</code> instruction.</li>
<li>Since you are only changing an operand, you can hook any function. You don&#8217;t need to build custom trampolines for each function.</li>
</ul>
<p> This method also has some <b>big cons</b> too:
<ul>
<li>You&#8217;ll need to scan <i>the entire binary in memory</i> and find and <i>overwrite</i> all address operands of <code>callq</code>. This is problematic because if you overwrite any false-positives you might break your application.</li>
<li>You have to deal with the implications of <code>callq</code>, which can be painful as we&#8217;ll see soon.</li>
</ul>
<p><h3>Callee-side trampoline</h3>
<p>A <i>callee-side</i> trampoline works by overwriting the opcodes in the <i>.text</i> segment of the program in the called function, causing it to call another function immediately</p>
<p>The <b>big pro</b> of this method is:
<ul>
<li>You only need to overwrite code in <i>one</i> place and don&#8217;t need to worry about accidentally scribbling on bytes that you didn&#8217;t mean to.</li>
</ul>
<p> this method has some <b>big cons</b> too:
<ul>
<li>You&#8217;ll need to carefully construct your trampoline code to only overwrite as little of the function as possible (or some how restore opcodes), especially if you expect the original function to work as expected later.</li>
<li>You&#8217;ll need to special case each trampoline you build for different optimization levels of the binary you are hooking into.</ul>
<p>I went with a <i>caller-side</i> trampoline because I wanted to ensure that I can hook any function and not have to worry about different Ruby binaries causing problems when they are compiled with different optimization levels.</p>
<h2>The stage 1 trampoline</h2>
<p>To insert my trampolines I needed to <i>insert some binary into the process</i> and then overwrite <code>callq</code> instructions like this:</p>
<p><pre class="prettyprint">
  41150b:       e8 cc 4e 02 00         callq  4363dc [rb_newobj]
  411510:       48 89 45 f8             ....
</pre>
</p>
<p></p>
<p> In the above code snippet, the byte <code>e8</code> is the <code>callq</code> opcode and the bytes <code>cc 4e 02 00</code> are the distance to <code>rb_newobj</code> from the address of the next instruction, 0&#215;411510</p>
<p>All I need to do is change the 4 bytes following <code>e8</code> to equal the displacement between the next instruction, 0&#215;411510 in this case, and my trampoline.</p>
<p><b>Problem.</b></p>
<p>My first cut at this code lead me to an important realization: the <code>callq</code> instructions used expect a <i>32bit displacement</i> from the function I am calling and <i>not</i> absolute addresses. <b>But</b>, the 64bit address space is <i>very</i> large. The displacement between the code for the Ruby binary that lives in the <code>.text</code> segment is so far away from my Ruby gem that the displacement <b>cannot be represented with only 32bits</b>.</p>
<p><b>So what now?</b></p>
<p>Well, luckily <code>mmap</code> has a flag <code>MAP_32BIT</code> which maps a page in the first 2GB of the address space. If I map some code there, it should be well within the range of values whose displacement I can represent in 32bits.</p>
<p>So, why not map a <b>second trampoline</b> to that page which can contains code that can call an <i>absolute address</i>?</p>
<p>My stage 1 trampoline code looks something like this:</p>
<p>
<pre class="prettyprint">
  /* the struct below is just a sequence of bytes which represent the
    *  following bit of assembly code, including 3 nops for padding:
    *
    *  mov $address, %rbx
    *  callq *%rbx
    *  ret
    *  nop
    *  nop
    *  nop
    */
  struct tramp_tbl_entry ent = {
    .mov = {'\x48','\xbb'},
    .addr = (long long)&#038;error_tramp,
    .callq = {'\xff','\xd3'},
    .ret = '\xc3',
    .pad =  {'\x90','\x90','\x90'},
  };

  tramp_table = mmap(NULL, 4096, PROT_WRITE|PROT_READ|PROT_EXEC,
                                   MAP_32BIT|MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
  if (tramp_table != MAP_FAILED) {
    for (; i < 4096/sizeof(struct tramp_tbl_entry); i ++ ) {
      memcpy(tramp_table + i, &#038;ent, sizeof(struct tramp_tbl_entry));
    }
  }
}
</pre>
<p>
<p>It <code>mmap</code>s a single page and writes a table of default trampolines (like a jump table) that all call an error trampoline by default. When a new trampoline is inserted, I just go to that entry in the table and insert the address that should be called.</p>
<p>To get around the displacement challenge described above, the addresses I insert into the stage 1 trampoline table are addresses for stage 2 trampolines.</p>
<h2>The stage 2 trampoline</h2>
<p>Setting up the stage 2 trampolines are pretty simple once the stage 1 trampoline table has been written to memory. All that needs to be done is update the address field in a free stage 1 trampoline to be the address of my stage 2 trampoline. These trampolines are written in C and live in my Ruby gem.</p>
<p>
<pre class="prettyprint">
static void
insert_tramp(char *trampee, void *tramp) {
  void *trampee_addr = find_symbol(trampee);
  int entry = tramp_size;
  tramp_table[tramp_size].addr = (long long)tramp;
  tramp_size++;
  update_image(entry, trampee_addr);
}
</pre>
</p>
<p>
<p>An example of a stage 2 trampoline for <code>rb_newobj</code> might be:</p>
<p>
<pre class="prettyprint">
static VALUE
newobj_tramp() {
  /* print the ruby source and line number where the allocation is occuring */
  printf("source = %s, line = %d\n", ruby_sourcefile, ruby_sourceline);

  /* call newobj like normal so the ruby app can continue */
  return rb_newobj();
}
</pre>
</p>
<h2>Programatically rewriting the Ruby binary in memory</h2>
<p>Overwriting the Ruby binary to cause my stage 1 trampolines to get hit is pretty simple, too. I can just scan the <code>.text</code> segment of the binary looking for bytes which look like <code>callq</code> instructions. Then, I can sanity check by reading the next 4 bytes which should be the displacement to the original function. Doing that sanity check should prevent false positives.</p>
<pre class="prettyprint">
static void
update_image(int entry, void *trampee_addr) {
  char *byte = text_segment;
  size_t count = 0;
  int fn_addr = 0;
  void *aligned_addr = NULL;

 /* check each byte in the .text segment */
  for(; count < text_segment_len; count++) {

    /* if it looks like a callq instruction... */
    if (*byte == '\xe8') {

      /* the next 4 bytes SHOULD BE the original displacement */
      fn_addr = *(int *)(byte+1);

      /* do a sanity check to make sure the next few bytes are an accurate displacement.
        * this helps to eliminate false positives.
        */
      if (trampee_addr - (void *)(byte+5) == fn_addr) {
        aligned_addr = (void*)(((long)byte+1)&#038;~(0xffff));

        /* mark the page in the .text segment as writable so it can be modified */
        mprotect(aligned_addr, (void *)byte+1 - aligned_addr + 10,
                       PROT_READ|PROT_WRITE|PROT_EXEC);

        /* calculate the new displacement and write it */
        *(int  *)(byte+1) = (uint32_t)((void *)(tramp_table + entry)
                                     - (void *)(byte + 5));

        /* disallow writing to this page of the .text segment again  */
        mprotect(aligned_addr, (((void *)byte+1) - aligned_addr) + 10,
                      PROT_READ|PROT_EXEC);
      }
    }
    byte++;
  }
}
</pre>
<p></p>
<h2>Sample output</h2>
<p>After requiring my ruby gem and running a test script which creates lots of objects, I see this output:</p>
<pre class="prettify">
...
source = test.rb, line = 8
source = test.rb, line = 8
source = test.rb, line = 8
source = test.rb, line = 8
source = test.rb, line = 8
source = test.rb, line = 8
source = test.rb, line = 8
...
</pre>
<p>
<p><b>Showing the file name and line number for each object getting allocated.</b> That should be a strong enough primitive to build a Ruby memory profiler without requiring end users to build a custom version of Ruby. It should also be possible to re-implement <a href="http://blog.evanweaver.com/articles/2007/04/28/bleak_house/">bleak_house</a> by using this gem (and maybe another trick or two).</p>
<p><b>Awesome.</b></p>
<h2>Conclusion</h2>
<ul>
<li>One step closer to building a memory profiler without requiring end users to find and use patches floating around the internet.</li>
<li>It is unclear whether cheap tricks like this are useful or harmful, but they are <b>fun</b> to write.</li>
<li>If you understand how your system works at an intimate level, nearly anything is possible. The work required to make it happen might be difficult though.</li>
</ul>
<p>
Thanks for reading and don't forget to <a rel="alternate" type="application/rss+xml" href="http://feeds.feedburner.com/TimeToBleed">subscribe (via RSS or e-mail)</a> and <a href="http://twitter.com/joedamato">follow me on twitter.</a></p>
<h2>References</h2>
<ol class="footnotes"><li id="footnote_0_1253" class="footnote"><a href="http://en.wikipedia.org/wiki/Trampoline_%28computers%29">http://en.wikipedia.org/wiki/Trampoline_(computers)</a></li></ol>]]></content:encoded>
			<wfw:commentRss>http://timetobleed.com/rewrite-your-ruby-vm-at-runtime-to-hot-patch-useful-features/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Defeating the Matasano C++ Challenge with ASLR enabled</title>
		<link>http://timetobleed.com/defeating-the-matasano-c-challenge-with-aslr-enabled/</link>
		<comments>http://timetobleed.com/defeating-the-matasano-c-challenge-with-aslr-enabled/#comments</comments>
		<pubDate>Fri, 16 Oct 2009 11:59:29 +0000</pubDate>
		<dc:creator>Joe Damato</dc:creator>
				<category><![CDATA[bugfix]]></category>
		<category><![CDATA[debugging]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[security]]></category>
		<category><![CDATA[systems]]></category>
		<category><![CDATA[testing]]></category>
		<category><![CDATA[x86]]></category>
		<category><![CDATA[memory]]></category>
		<category><![CDATA[vulnerability]]></category>
		<category><![CDATA[x86_64]]></category>

		<guid isPermaLink="false">http://timetobleed.com/?p=1152</guid>
		<description><![CDATA[If you enjoy this article, subscribe (via RSS or e-mail) and follow me on twitter. Important note I am NOT a security researcher (I kinda want to be though). As such, there are probably way better ways to do everything in this article. This article is just illustrating my thought process when cracking this challenge. [...]]]></description>
			<content:encoded><![CDATA[<p><center><img src="http://timetobleed.com/images/computer_bug.jpg"  alt="" width="400" height="300"/></center><br />

<p>If you enjoy this article, <a rel="alternate" type="application/rss+xml" href="http://feeds.feedburner.com/TimeToBleed">subscribe (via RSS or e-mail)</a> and <a href="http://twitter.com/joedamato">follow me on twitter.</a></p>
<h2>Important note</h2>
<p>I am <b>NOT</b> a security researcher (I kinda want to be though). As such, there are probably way better ways to do everything in this article. This article is just illustrating my thought process when cracking this challenge.</p>
<h2>The Challenge</h2>
<p>The <a href="http://chargen.matasano.com/chargen/2009/10/9/a-c-challenge.html">Matasano Security blog</a> recently posted an article titled <i>A C++ Challenge</i><sup>1</sup> which included a particularly ugly piece of C++ code that has a security vulnerability. The challenge is for the reader to find the vulnerability, use it execute arbitrary code, and submit the data to Matasano.</p>
<p>Sounds easy enough, let&#8217;s do this! <i>cue hacking music</i></p>
<h2>Making it harder</h2>
<p>Recent linux kernels have feature called Address Space Layout Randomization (ASLR) which can be set in <code>/proc/sys/kernel/randomize_va_space</code>. ASLR is a security feature which randomizes the start address of various parts of a process image. Doing this makes exploiting a security bug more difficult because the exploit cannot use any hard coded addresses.</p>
<p>The options you can set are:</p>
<ul>
<li>0 &#8211; ASLR off</li>
<li>1 &#8211; Randomize the addresses of the stack, mmap area, and VDSO page. <b>This is the default.</b></li>
<li>2 &#8211; Everything in option 1, but also randomize the <code>brk</code> area so the heap is randomized.</li>
</ul>
<p>Just for fun I decided to set it to <b>2</b> to make exploiting the challenge more difficult.</p>
<h2>Got the code, but now what?</h2>
<p>I decided to start attacking this problem by looking for a few common errors, in this order:</p>
<ol>
<li><code>strcpy()/strncpy()</code> bugs <b>No calls</b></li>
<li><code>memcpy()</code> bugs <b>A few calls</b></li>
<li>Off by one bugs <b>None obvious</b></li>
</ol>
<p>It turned out from a quick look that all calls to <code>memcpy()</code> included sane, hard-coded values. So, it had to be something more complex.</p>
<h2>Digging deeper &#8211; finding input streams the user can control</h2>
<p>Next, I decided to actually <b>read</b> the code and see what it was doing at a high level and what inputs could be controlled. Turns out that the program reads data from a file and uses the data from the file to determine how many objects to allocate.</p>
<p>Obviously, this portion of the code caught my interest so let&#8217;s take a quick look:</p>
<pre class="prettyprint">
/* ... */

fd.read(file_in_mem, MAX_FILE_SIZE-1);

/* ... */

struct _stream_hdr *s = (struct _stream_hdr *) file_in_mem;

if(s->num_of_streams >= INT_MAX / (int)sizeof(int)) {
    safe_count = MAX_STREAMS;
} else {
    safe_count = s->num_of_streams;
}

Obj *o = new Obj[safe_count];
</pre>
<p>
<p>OK, so clearly that <code>if</code> statement is suspect. At the <i>very least</i> it doesn&#8217;t check for negative values, so you could end up with <code>safe_count = -1</code> which might do something interesting when passed to the <code>new</code> operator. Moreover, it appears this <code>if</code> statement will allow values as large as 536870910 ([INT_MAX / sizeof(int)] &#8211; 1).</p>
<p>Maybe the exploit has something to do with values this <code>if</code> statement is allowing through?</p>
<h2>A closer look at the integer overflow in <code>new</code></h2>
<p>Let&#8217;s use GDB to take a closer look at what the compiler does before calling new. I&#8217;ve added a few comments in line to explain the assembly code:</p>
<pre class="prettyprint">
mov    %edx,%eax   ;  %edx and %eax store s->num_of_streams
add    %eax,%eax   ;  add %eax to itself (s->num_of_streams * 2)
add    %edx,%eax   ;  add  s->num_of_streams + %eax (s->num_of_streams*3)
shl    $0x2,%eax   ;  multiply (s->num_of_streams * 3) by 4  (s->num_of_streams * 12)
mov    %eax,(%esp) ;  move it into position to pass to new
call   0x8048a7c <_Znaj@plt> ; call new
</pre>
<p>
<p>The compiler has generated code to calculate: <code>s->num_of_streams * sizeof(Obj)</code>. <code>sizeof(Obj)</code> is 12 bytes. For large values of <code>s->num_of_streams</code> multiplying it by 12, causes an <b>integer overflow</b> and the value passed to new will actually be <i>less than</i> what was intended.</p>
<p>For my exploit, I ended up using the value 357913943. This value causes an overflow, because 357913943 * 12 is <i>greater than</i> the biggest possible value for an integer by 20. So the value passed to new is 20. Which is, of course, significantly less than what we actually wanted to allocate. Other people have written about integer overflow in <code>new</code> in other compilers<sup>2</sup> before.</p>
<p>Let&#8217;s see how this can be used to cause arbitrary code to execute. <b>Remember</b>, for arbitrary code execution to occur there <i>must</i> be a way to <i>cause the target program to write some data to a memory address that can be controlled</i>.</p>
<h2>Find the (possible) hand-off(s) to arbitrary code</h2>
<p>To find any hand-off locations, I looked for places where memory writes were occurring in the program. I found a few memory writes:</p>
<ul>
<li>2 calls to <code>memset()</code></li>
<li>2 calls to <code>memcpy()</code></li>
<li><code>parse_stream()</code> of <code>class Obj</code></li>
</ul>
<p>Unfortunately (from the attacker&#8217;s perspective) the calls to <code>memcpy()</code> and <code>memset()</code> <i>looked</i> pretty sane. The <code>parse_stream()</code> function caught my interest, though.</p>
<p>Take a look:</p>
<pre class="prettyprint">
class Obj {
    public:
    int parse_stream(int t, char *stream)
    {
      type = t;
      // ... do something with stream here ...
      return 0;
    }

    int length;
    int type;
/* ... */
</pre>
<p>
<p><b>REMEMBER:</b> In C++, member functions of <code>class</code>es have a <b>sekrit parameter</b> which is a pointer to the object the function is being called on. In the function itself, this parameter is accessed using <code>this</code>. So the line writing to the <code>type</code> variable is actually doing <code>this->type = t;</code> where <code>this</code> is supplied to the function <b>sektrily</b> by the compiler.</p>
<p><b>This is important</b> because this piece of code could be our hand-off! We need to find a way to control the value of <code>this</code> so we can cause a memory write to a location of our choice.</p>
<h2>Controlling <code>this</code> to cause arbitrary code to execute</h2>
<p>Take a look at an important piece of code in the challenge:</p>
<pre class="prettyprint">
struct imetad {
  int msg_length;
  int (*callback)(int, struct imetad *);
/* ... */
</pre>
<p>
<p>Nice! The <code>callback</code> field of <code>struct imetad</code> is offset by 4 bytes into the structure. The <code>type</code> field of <code>class Obj</code> is also offset by 4 bytes. See where I&#8217;m going?</p>
<p>If we can control the <code>this</code> pointer to point at the <code>struct imetad</code> on the heap when <code>parse_stream</code> is called, it will overwrite the <code>callback</code> pointer. We&#8217;ll then be able to set the pointer to any address we want and hand-off execution to arbitrary code!</p>
<p>But how can we manipulate <code>this</code>?</p>
<p>Take a look at this piece of code that calls <code>callback</code>:</p>
<pre class="prettyprint">
o[i].parse_stream(dword, stream_temp);
imd->callback(o[i].type, imd);
</pre>
<p>
<p>Since it is possible to overflow <code>new</code> and allocate fewer objects than <code>safe_count</code> is counting, that means that for some values of i, <i><code>o[i]</code> will be pointing at data that isn&#8217;t actually an <code>Obj</code> object, but just other data on the heap</i>. Infact, when <code>i = 2</code>, <b><code>o[i]</code> will be pointing at the <code>struct imetad</code> object on the heap</b>. The call to <code>parse_stream</code> will pass in a corrupted <code>this</code> pointer, that points at <code>struct imetad</code>. The write to <code>type</code> will actually overwrite <code>callback</code> since they are both offset equal amounts into their respective structures.</p>
<p>And with that, we&#8217;ve successfully exploited the challenge causing arbitrary code to execute.</p>
<p>Let&#8217;s now figure out how to beat ASLR!</p>
<h2>How to defeat address space layout randomization</h2>
<p>I <b>did NOT</b> invent this technique, but I read about it and thought it was cool. You can read a more verbose explanation of this technique <a href="http://sophsec.com/research/aslr_research.html">here</a>. The idea behind the technique is pretty simple:
</p>
<ul>
<li>When you call <code>exec</code>, the PID remains the same, but the image of the process in memory is changed.</li>
<li>The kernel uses the PID and the number of jiffies (jiffies is a fine-grained time measurement in the kernel) to pull data from the entropy pool.</li>
<li>If you can run a program which records stack, heap, and other addresses and then quickly call <code>exec</code> to start the vulnerable program, you can end up with the <b>same memory layout</b>.</li>
</ul>
<p>My exploit program is actually a <i>wrapper</i> which records an approximate location of the heap (by just calling <code>malloc()</code>), generates the exploit file, and then executes the challenge binary.</p>
<p>Take a look at the relevant pieces of my exploit to get an idea of how it works:
<pre class="prettyprint">
/* ... */

/* do a malloc to get an idea of where the heap lives */
void *dummy = malloc(10);

/* ... */

unsigned int shell_addr = reinterpret_void_ptr_as_uint(dummy);

/*
 * XXX TODO FIXME - on my platform, execl'ing from here to the challenge binary
 * incurs a constant offset of 0x3160, probably for changes in the environment
 * (libs linked for c++ and whatnot).
 */
shell_addr += 0x3160;

/*
 * a guess as to how far off the heap the shellcode lives.
 *
 * luckily we have a large NOP sled, so we should only fail when we miss
 * the current entropy cycle (see below).
 */
shell_addr += 700;

/* ... build exploit file in memory ... */

/* copy in our best guess as to the address of the shellcode, pray NOPs
 * take care of the rest! */
memcpy(entire_file+88, &#038;shell_addr, sizeof(shell_addr));

/* ... write exploit out to disk ... */

/* launch program with the generated exploit file!
*
* calling execl here inherits the PID of this process, and IF we get lucky
* ~85%+ of the time, we'll execute before the next entropy cycle and hit
* the shellcode, even with ASLR=2.
*/
execl("./cpp_challenge", "cpp_challenge", "exploit", (char *)0);
</pre>
<h2>My exploit for the C++ challenge</h2>
<p>My exploit comes with the following caveats:</p>
<ul>
<li>i386 system</li>
<li>The challenge binary is called &#8220;cpp_challenge&#8221; and lives in the same directory as the exploit binary.</li>
<li>The exploit binary can write to the directory and create a file called &#8220;exploit&#8221; which will be handed off to &#8220;cpp_challenge&#8221;</li>
</ul>
<p>Get the full code of my exploit <a href="http://timetobleed.com/files/exploit_gen.c">here</a>.</p>
<h2>Results</h2>
<p>Results on my i386 Ubuntu 8.04 VM running in VMWare fusion, for each level of randomize_va_space:</p>
<ul>
<li>0 &#8211; <b>100%</b> exploit hit rate</li>
<li>1 &#8211; <b>100%</b> exploit hit rate</li>
<li>2 &#8211; <b>~85%</b> exploit hit rate. Sometimes, my exploit code falls out of the time window and the address map changes before the challenge binary is run</li>
</ul>
<p>I could probably boost the hit rate for 2 a bit, but then I&#8217;d probably re-write the entire exploit in assembly to make it run as fast as possible. I didn&#8217;t think there was really a point to going to such an extreme, though. So, an 85% hit rate is good enough.</p>
<h2>Conclusion</h2>
<ol>
<li>Security challenges are fun.</li>
<li>More emphasis and more freely available information on secure coding would be very useful.</li>
<li>Like it or not developers need to be security conscious when writing code in C and C++.</li>
<li>As C and C++ change, developers need to carefully consider security implications of new features.</li>
</ol>
<p>
Thanks for reading and don&#8217;t forget to <a rel="alternate" type="application/rss+xml" href="http://feeds.feedburner.com/TimeToBleed">subscribe (via RSS or e-mail)</a> and <a href="http://twitter.com/joedamato">follow me on twitter.</a></p>
<h2>References</h2>
<ol class="footnotes"><li id="footnote_0_1152" class="footnote"><a href="http://chargen.matasano.com/chargen/2009/10/9/a-c-challenge.html">Matasano Security LLC &#8211; Chargen &#8211; A C++ Challenge</a></li><li id="footnote_1_1152" class="footnote"><a href="http://blogs.msdn.com/oldnewthing/archive/2004/01/29/64389.aspx">Integer overflow in the new[] operator</a></li></ol>]]></content:encoded>
			<wfw:commentRss>http://timetobleed.com/defeating-the-matasano-c-challenge-with-aslr-enabled/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>Fixing Threads in Ruby 1.8: A 2-10x performance boost</title>
		<link>http://timetobleed.com/fixing-threads-in-ruby-18-a-2-10x-performance-boost/</link>
		<comments>http://timetobleed.com/fixing-threads-in-ruby-18-a-2-10x-performance-boost/#comments</comments>
		<pubDate>Mon, 18 May 2009 10:00:50 +0000</pubDate>
		<dc:creator>Joe Damato</dc:creator>
				<category><![CDATA[bugfix]]></category>
		<category><![CDATA[debugging]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[scaling]]></category>
		<category><![CDATA[systems]]></category>
		<category><![CDATA[x86]]></category>
		<category><![CDATA[fibers]]></category>
		<category><![CDATA[kernel]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[patch]]></category>
		<category><![CDATA[patches]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[threading]]></category>
		<category><![CDATA[threads]]></category>

		<guid isPermaLink="false">http://timetobleed.com/?p=685</guid>
		<description><![CDATA[Quick notes before things get crazy OK, things might get a little crazy in this blog post so let&#8217;s clear a few things up before we get moving. I like the gritty details, and this article in particular has a lot of gritty info. To reduce the length of the article for the casual reader, [...]]]></description>
			<content:encoded><![CDATA[<p><center><img src="http://timetobleed.com/images/ruby-threads.jpg" alt="" width="400" height="300" /></center></p>
<h2>Quick notes before things get crazy</h2>
<p>OK, things might get a little crazy in this blog post so let&#8217;s clear a few things up before we get moving.</p>
<ul>
<li>I like the gritty details, and this article in particular has a lot of gritty info. To reduce the length of the article for the casual reader, I&#8217;ve put a portion of the really gritty stuff in the Epilogue below. Definitely check it out if that is your thing.</li>
<li>This article, the code, and the patches below are for Linux and OSX for the x86 and x86_64 platforms, only.</li>
<li>Even though there are code paths for both x86 and x86_64, I&#8217;m going to use the 64bit register names and (briefly) mention the 64bit binary interface.</li>
<li>Let&#8217;s assume the binary is built with -fno-omit-frame-pointer, the patches don&#8217;t care, but it&#8217;ll make the explanation a bit simpler later.</li>
<li>If you don&#8217;t know what the above two things mean, don&#8217;t worry; I got your back chief.</li>
</ul>
<h2>How threads work in Ruby</h2>
<p>Ruby 1.8 implements pre-emptible userland threads, also known as &#8220;green threads.&#8221; (Want to know more about threading models? See <a href="http://timetobleed.com/threading-models-so-many-different-ways-to-get-stuff-done/">this post</a>.) The major performance killer in Ruby&#8217;s implementation of green threads is that the <strong>entire thread stack is copied</strong> to and from the heap <strong>every context switch</strong>. Let&#8217;s take a look at a high level what happens when you:</p>
<pre class="prettyprint lang-rb">Thread.new{
	10000.times {
		a &lt;&lt; "a"
		a.pop
	}
}</pre>
<p>

<ol>
<li>A thread control block (tcb) is allocated in Ruby.</li>
<li>The infamous thread timer is initialized, either as a pthread or as an itimer.</li>
<li>Ruby scope information is copied to the heap.</li>
<li>The new thread is added to the list of threads.</li>
<li>The current thread is set as the new thread.</li>
<li>rb_thread_yield is called to yield to the block you passed in.</li>
<li>Your block starts executing.</li>
<li>The timer interrupts the executing thread.</li>
<li>The current thread&#8217;s state is stored:
<ul>
<li><code>memcpy()</code> #1 (sometimes): If the stack has grown since the last save, <code>realloc</code> is called. If the allocator cannot extend the size of the current block in place, it may decide to move the data to a new block that is large enough. If that happens <code>memcpy()</code> is called to move the data over.</li>
<li><code>memcpy()</code> #2 (always): A copy of this thread&#8217;s <strong>entire stack</strong> (starting from the top of the interpreter&#8217;s stack) is put on the heap.</li>
</ul>
</li>
<li>The next thread&#8217;s state is restored.
<ul>
<li><code>memcpy()</code> #3 (always): A copy of this thread&#8217;s <strong>entire stack</strong> is placed on the stack.</li>
</ul>
</li>
</ol>
<p>Steps 9 and 10 <strong>crush performance</strong> when even small amounts of Ruby code are executed.</p>
<p>Many of the functions the interpreter uses to evaluate code are <em>massive</em>. They allocate a large number of local variables creating stack frames up to <strong>4 kilobytes</strong> per function call. Those functions also call themselves recursively many times in a single expression. This leads to huge stacks, huge <code>memcpy()s</code>, and an incredible performance penalty.</p>
<p>If we can eliminate the <code>memcpy()s</code> we can get a lot of performance back. So, let&#8217;s do it.</p>
<h2>Increase performance by putting thread stacks on the heap</h2>
<p><strong>[Remember: we are only talking about x86_64]</strong></p>
<h3>How stacks work &#8211; a refresher</h3>
<p>Stacks grow <strong>downward</strong> from high addresses to low addresses. As data is <code>push</code>ed on to the stack, it grows downward. As stuff is <code>pop</code>ped, it shrinks upward. The register <code>%rsp</code> serves as a pointer to the bottom of the stack. When it is decremented or incremented the stack grows or shrinks, respectively. The <strong>special property</strong> of the program stack is that <strong>it will grow</strong> until you run out of memory (or are killed by the OS for being bad). The operating system handles the automatic growth. See the Epilogue for some more information about this.</p>
<h3>How to actually switch stacks</h3>
<p>The <code>%rsp</code> register can be (and is) changed and adjusted directly by user code. So all we have to do is put the address of our stack in <code>%rsp</code>, and we&#8217;ve switched stacks. Then we can just call our thread start function. Pretty easy. A small blob of inline assembly should do the trick:</p>
<pre class="prettyprint lang-c">__asm__ __volatile__ ("movq %0, %%rsp\n\t"
                      "callq *%1\n"
                      :: "r" (th-&gt;stk_base),
                         "r" (rb_thread_start_2));</pre>
<p>
<p>
Two instructions, not too bad.</p>
<ol>
<li><code>movq %0, %%rsp</code> moves a quad-word (th-&gt;stk_base) into the %rsp. <em>Quad-word</em> is Intel speak for 4 words, where 1 Intel word is 2 bytes.</li>
<li><code>callq *%1</code> calls a function at the address &#8220;rb_thread_start_2.&#8221; This has a side-effect or two, which I&#8217;ll mention in the Epilogue below, for those interested in a few more details.</li>
</ol>
<p>The above code is called <em>once per thread</em>. Calling <code>rb_thread_start_2</code> spins up your thread and it never returns.</p>
<h3>Where do we get stack space from?</h3>
<p>When the tcb is created, we&#8217;ll allocate some space with <code>mmap</code> and set a pointer to it.</p>
<pre class="prettyprint lang-c">/* error checking omitted for brevity, but exists in the patch =] */
stack_area = mmap(NULL, total_size, PROT_READ | PROT_WRITE | PROT_EXEC,
			MAP_PRIVATE | MAP_ANON, -1, 0);

th-&gt;stk_ptr = th-&gt;stk_pos = stack_area;
th-&gt;stk_base = th-&gt;stk_ptr + (total_size - sizeof(int))/sizeof(VALUE *);</pre>
<p>
<p>
Remember, stacks <strong>grow downward</strong> so that last line: <code>th-&gt;stk_base = ... </code> is necessary because the base of the stack is actually at the <em>top</em> of the memory region return by <code>mmap()</code>. The ugly math in there is for alignment, to comply with the x86_64 binary interface. Those curious about more gritty details should see the Epilogue below.</p>
<p><strong>BUT WAIT, I thought stacks were supposed to grow automatically?</strong></p>
<p>Yeah, the OS does that for the normal program stack. Not gonna happen for our <code>mmap</code>&#8216;d regions. The best we can do is pick a good default size and export a tuning lever so that advanced users can adjust the stack size as they see fit.</p>
<p><strong>BUT WAIT, isn&#8217;t that dangerous? If you fall off your stack, wouldn&#8217;t you just overwrite memory below?</strong></p>
<p>Yep, but there is a fix for that too. It&#8217;s called a guard page. We&#8217;ll create a guard page below each stack that has its permission bits set to <code>PROT_NONE</code>. This means, if a thread falls off the bottom of its stack and tries to read, write, or execute the memory below the thread stack, a signal (usually <code>SIGSEGV</code> or <code>SIGBUS</code>) will be sent to the process.</p>
<p>The code for the guard page is pretty simple, too:</p>
<pre class="prettyprint lang-c">/* omit error checking for brevity */
mprotect(th-&gt;stk_ptr, getpagesize(), PROT_NONE);</pre>
<p>
<p>
Cool, let&#8217;s modify the SIGSEGV and SIGBUS signal handlers to check for stack overflow:</p>
<pre class="prettyprint lang-c">/* if the address which generated the fault is within the current thread's guard page... */
  if(fault_addr &lt;= (caddr_t)rb_curr_thread-&gt;guard &#038;&#038;
     fault_addr &gt;= (caddr_t)rb_curr_thread-&gt;stk_ptr) {
  /* we hit the guard page, print out a warning to help app developers */
  rb_bug("Thread stack overflow! Try increasing it!");
}</pre>
<p>
<p>
See the epilogue for more details about this signal handler trick.</p>
<h2>Patches</h2>
<p><strong>As always, this is super-alpha software.</strong></p>
<table style="height: 60px;" border="0" cellspacing="1" cellpadding="1" width="300" summary="”&quot;">
<tbody>
<tr>
<td>Ruby 1.8.6</td>
<td><a href="http://github.com/ice799/matzruby/tree/heap_stacks_186">github</a></td>
<td><a href="http://timetobleed.com/files/186-hs.patch">raw .patch</a></td>
</tr>
<tr>
<td>Ruby 1.8.7</td>
<td><a href="http://github.com/ice799/matzruby/tree/heap_stacks">github</a></td>
<td><a href="http://timetobleed.com/files/187-hs.patch">raw .patch</a></td>
</tr>
</tbody>
</table>
<h2>Benchmarks</h2>
<p>The <a href="http://shootout.alioth.debian.org/">computer language shootout</a> has a thread test called thread-ring; let&#8217;s start with that.</p>
<pre class="prettyprint lang-rb">require 'thread'
THREAD_NUM = 403
number = ARGV.first.to_i

threads = []
for i in 1..THREAD_NUM
   threads &lt;&lt; Thread.new(i) do |thr_num|
      while true
         Thread.stop
         if number &gt; 0
            number -= 1
         else
            puts thr_num
            exit 0
         end
      end
   end
end

prev_thread = threads.last
while true
   for thread in threads
      Thread.pass until prev_thread.stop?
      thread.run
      prev_thread = thread
   end
end</pre>
<p>
<p>
Results (ARGV[0] = 50000000):</p>
<table style="height: 60px;" border="0" cellspacing="1" cellpadding="1" width="300" summary="”&quot;">
<tbody>
<tr>
<td>Ruby 1.8.6</td>
<td>1389.52s</td>
</tr>
<tr>
<td>Ruby 1.8.6 w/ heap stacks</td>
<td>793.06s</td>
</tr>
<tr>
<td>Ruby 1.9.1</td>
<td>752.44s</td>
</tr>
</tbody>
</table>
<p>
<p>
A <strong>speed up of about 2.3x</strong> compared to Ruby 1.8.6. A bit slower than Ruby 1.9.1.
</p>
<p>
<p>
That is a pretty strong showing, for sure. Let&#8217;s modify the test slightly to illustrate the true power of this implementation.</p>
<p>
<p>Since our implementation does no <code>memcpy</code>()s we <i>expect</i> the cost of context switching to stay constant regardless of thread stack size. Moreover, the unmodified Ruby 1.8.6 should perform worse as thread stack size increases (therefore increasing the amount of time the CPU is doing <code>memcpy</code>()s).</p>
<p>
<p>Let&#8217;s <b>test this hypothesis</b> by modifying thread-ring slightly so that it increases the size of the stack after spawning threads.</p>
<pre class="prettyprint lang-rb">def grow_stack n=0, &#038;blk
  unless n &gt; 100
    grow_stack n+1, &#038;blk
  else
    yield
  end
end

require 'thread'
THREAD_NUM = 403
number = ARGV.first.to_i

threads = []
for i in 1..THREAD_NUM
  threads &lt;&lt; Thread.new(i) do |thr_num|
    grow_stack do
      while true
        Thread.stop
        if number &gt; 0
          number -= 1
        else
          puts thr_num
          exit 0
        end
      end
    end
  end
end

prev_thread = threads.last
while true
   for thread in threads
      Thread.pass until prev_thread.stop?
      thread.run
      prev_thread = thread
   end
end</pre>
<p>
<p>
Results (ARGV[0] = 50000000):</p>
<table style="height: 60px;" border="0" cellspacing="1" cellpadding="1" width="300" summary="”&quot;">
<tbody>
<tr>
<td>Ruby 1.8.6</td>
<td>7493.50s</td>
</tr>
<tr>
<td>Ruby 1.8.6 w/ heap stacks</td>
<td>799.52s</td>
</tr>
<tr>
<td>Ruby 1.9.1</td>
<td>680.92s</td>
</tr>
</tbody>
</table>
<p>
<p>
A <strong>speed up of about 9.4x</strong> compared to Ruby 1.8.6. A bit slower than Ruby 1.9.1.</p>
<p>Now, lets benchmark mongrel+sinatra.</p>
<pre class="prettyprint lang-rb">
require 'rubygems'
require 'sinatra'

disable :reload

set :server, 'mongrel' 

get '/' do
  'hi'
end
</pre>
<p>
<p>
Results:</p>
<table style="height: 60px;" border="0" cellspacing="1" cellpadding="1" width="400" summary="”&quot;">
<tbody>
<tr>
<td>Ruby 1.8.6</td>
<td>1395.43 request/sec</td>
</tr>
<tr>
<td>Ruby 1.8.6 w/ heap stacks</td>
<td>1770.26 request/sec</td>
</tr>
</tbody>
</table>
<p>
<p>
An <b>increase of about 1.26x</b> in the <i>most naive case possible</i>.</p>
<p>
<p> Of course, if the handler did anything more than simply write &#8220;hi&#8221; (like use memcache or make sql queries) there would be more function calls, more context switches, and <b>a much greater savings.</b></p>
<h2>Conclusion</h2>
<p>A couple lessons learned this time:</p>
<ul>
<li>Hacking a VM like Ruby is kind of like hacking a kernel. Some subset of the tricks used in kernel hacking are useful in userland.</li>
<li>The x86_64 ABI is a <em>must read</em> if you plan on doing any low-level hacking.</li>
<li>Keep your CPU manuals close by, they come in handy even in userland.</li>
<li>Installing your own signal handlers is really useful for debugging, even if they are dumping architecture specific information.</li>
</ul>
<p>Hope everyone enjoyed this blog post. I&#8217;m always looking for things to blog about. If there is something you want explained or talked about, send me an email or a tweet!</p>
<p>Don&#8217;t forget to <a href="http://feeds.feedburner.com/TimeToBleed">subscribe</a> and <a href="http://twitter.com/joedamato">follow me</a> and <a href="http://twitter.com/tmm1">Aman</a> on twitter.</p>
<h2>Epilogue</h2>
<h3>Automatic stack growth</h3>
<p>This can be achieved pretty easily with a little help from virtual memory and the programmable interrupt controller (PIC). The idea is pretty simple. When you (or your shell on your behalf) calls <code>exec()</code> to execute a binary, the OS will map a bunch of pages of memory for the stack and set the stack pointer of the process to the top of the memory. Once the stack space is exhausted, and the stack pointer is <code>push</code>ed onto un-mapped memory, a page fault will be generated.</p>
<p>The OS&#8217;s page fault handler (installed via the PIC) will fire. The OS can then check the address that generated the exception and see that you fell off the bottom of your stack. This works very similarly to the guard page idea we added to protect Ruby thread stacks. It can then just map more memory to that area, and tell your process to continue executing. Your process doesn&#8217;t know anything bad happened.</p>
<p>I hope to chat a little bit about interrupt and exception handlers in an upcoming blog post. Stay tuned!</p>
<h3><code>callq</code> side-effects</h3>
<p>When a <code>callq</code> instruction is executed, the CPU pushes the return address on to the stack and then begins executing the function that was called. This is important because when the function you are calling executes a <code>ret</code> instruction, a quad-word is popped from the stack and put into the instruction pointer (<code>%rip</code>).</p>
<h3>x86_64 Application Binary Interface</h3>
<p>The x86_64 ABI is an extension of the x86 ABI. It specifies architecture programming information such as the fundamental types, caller and callee saved registers, alignment considerations and more. It is a really important document for any programmer messing with x86_64 architecture specific code.<br />
The particular piece of information relevant for this blog post is found buried in section 3.2.2</p>
<blockquote><p>The end of the input argument area shall be aligned on a 16 &#8230; byte boundary.</p></blockquote>
<p>This is important to keep in mind when constructing thread stacks. We decided to avoid messing with alignment issues. As such we did not pass any arguments to rb_thread_start_2. We wanted to avoid mathematical error that could happen if we try to align the memory ourselves after pushing some data. We also wanted to avoid writing more assembly than we had to, so we avoided passing the arguments in registers, too.</p>
<h3>Signal handler trick</h3>
<p>The signal handler &#8220;trick&#8221; to check if you have hit the guard page is made possible by the <code>sigaltstack()</code> system call and the POSIX <code>sa_sigaction</code> interface.</p>
<p><code>sigaltstack()</code> lets us specify a memory region to be used as the stack when a signal is delivered. This extremely important for the signal handler trick because once we fall off our thread stack, we certainly cannot expect to handle a signal using that stack space.</p>
<p>POSIX provides two ways for signals to be handled:</p>
<ul>
<li>sa_handler interface: calls your handler and passes in the signal number.</li>
<li>sa_sigaction interface: calls your handler and passes in the signal number, a <code>siginfo_t</code> struct, and a <code>ucontext_t</code>. The <code>siginfo_t</code> struct contains (among other things), the address which generated the fault. Simply check this address to see if its in the guard page and if so let the user know they just overflowed their stack. Another useful, but <em>extremely non-portable</em> modification that was added to Ruby&#8217; signal handlers was a dump of the contents in <code>ucontext_t</code> to provide useful debugging information. This structure contains the register state at the time of signal. Dumping it can help debugging by showing which values are in what registers.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://timetobleed.com/fixing-threads-in-ruby-18-a-2-10x-performance-boost/feed/</wfw:commentRss>
		<slash:comments>20</slash:comments>
		</item>
		<item>
		<title>Fix a bug in Ruby&#8217;s configure.in and get a ~30% performance boost.</title>
		<link>http://timetobleed.com/fix-a-bug-in-rubys-configurein-and-get-a-30-performance-boost/</link>
		<comments>http://timetobleed.com/fix-a-bug-in-rubys-configurein-and-get-a-30-performance-boost/#comments</comments>
		<pubDate>Tue, 05 May 2009 08:20:29 +0000</pubDate>
		<dc:creator>Joe Damato</dc:creator>
				<category><![CDATA[bugfix]]></category>
		<category><![CDATA[debugging]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[scaling]]></category>
		<category><![CDATA[testing]]></category>
		<category><![CDATA[x86]]></category>
		<category><![CDATA[debug]]></category>
		<category><![CDATA[kernel]]></category>
		<category><![CDATA[patch]]></category>
		<category><![CDATA[patches]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[strace]]></category>
		<category><![CDATA[syscall]]></category>
		<category><![CDATA[systems]]></category>
		<category><![CDATA[threading]]></category>
		<category><![CDATA[threads]]></category>

		<guid isPermaLink="false">http://timetobleed.com/?p=615</guid>
		<description><![CDATA[Special thanks&#8230; Going out to Jake Douglas for pushing the initial investigation and getting the ball rolling. The whole --enable-pthread thing Ask any Ruby hacker how to easily increase performance in a threaded Ruby application and they&#8217;ll probably tell you: Yo dude&#8230; Everyone knows you need to configure Ruby with --disable-pthread. And it&#8217;s true; configure [...]]]></description>
			<content:encoded><![CDATA[<p><center><img src="http://timetobleed.com/images/ruby_bug.jpg"/></center><br />
</p>
<p>
<h2>Special thanks&#8230;</h2>
<p>Going out to <a href="http://twitter.com/jakedouglas">Jake Douglas</a> for pushing the initial investigation and getting the ball rolling.</p>
<p><h2>The whole <code>--enable-pthread</code> thing</h2>
<p>Ask any Ruby hacker how to easily increase performance in a threaded Ruby application and they&#8217;ll probably tell you:<br />
<b><br />
Yo dude&#8230; <i>Everyone</i> knows you need to <code>configure</code> Ruby with <code>--disable-pthread</code>.<br />
</b><br />
And it&#8217;s true; <code>configure</code> Ruby with <code>--disable-pthread</code> and you get a ~30% performance boost. But&#8230; <b><i>why?</i></b></p>
<p> For this, we&#8217;ll have to turn to our handy tool <a href="http://timetobleed.com/hello-world/">strace</a>. We&#8217;ll also need a simple Ruby program to this one. How about something like this:</p>
<p>
<pre class="prettyprint lang-rb">
def make_thread
  Thread.new {
    a = []
    10_000_000.times {
      a << "a"
      a.pop
    }
  }
end

t = make_thread
t1 = make_thread 

t.join
t1.join</pre>
<p></p>
<p>Now, let's run <code>strace</code> on a version of Ruby <code>configure</code>'d with <code>--enable-pthread</code> and point it at our test script. The output from <code>strace</code> looks like this:</p>
<p>
<pre class="prettyprint lang-c">
22:46:16.706136 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000004>
22:46:16.706177 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000004>
22:46:16.706218 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000004>
22:46:16.706259 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000005>
22:46:16.706301 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000004>
22:46:16.706342 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000004>
22:46:16.706383 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000004>
22:46:16.706425 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000004>
22:46:16.706466 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000004></pre>
<p></p>
<p><b>Pages and pages and pages</b> of sigprocmask system calls (Actually, running with <code>strace -c</code>, I get about <b>20,054,180</b> calls to <code>sigprocmask</code>, <b>WOW</b>). Running the <i>same test script</i> against a Ruby built with <code>--disable-pthread</code> and the output does <b>not</b> have pages and pages of <code>sigprocmask</code> calls (only <b>3</b> times, a <b>HUGE</b> reduction).
</p>
<p><h2>OK, so let's just set a breakpoint in GDB... right?</h2>
<p>OK, so we should just be able to set a <code>breakpoint</code> on <code>sigprocmask</code> and figure out who is calling it.</p>
<p><b>Well, not exactly.</b> You can try it, but the breakpoint <b>won't trigger</b> (we'll see why a little bit later).</p>
<p>Hrm, that kinda sucks and is confusing. This will make it harder to track down who is calling <code>sigprocmask</code> in the threaded case.</p>
<p> Well, we know that when you run <code>configure</code> the script creates a <code>config.h</code> with a bunch of <code>define</code>s that Ruby uses to decide which functions to use for what. So let's compare <code>./configure --enable-pthread</code> with <code>./configure --disable-pthread</code>:</p>
<pre class="prettyprint lang-bsh">
[joe@mawu:/home/joe/ruby]% diff config.h config.h.pthread
> #define _REENTRANT 1
> #define _THREAD_SAFE 1
> #define HAVE_LIBPTHREAD 1
> #define HAVE_NANOSLEEP 1
> #define HAVE_GETCONTEXT 1
> #define HAVE_SETCONTEXT 1</pre>
</p>
<p>
<br />
OK, now if we <code>grep</code> the Ruby source code, we see that whenever <code>HAVE_[SG]ETCONTEXT</code> are set, Ruby uses the system calls <code>setcontext()</code> and <code>getcontext()</code> to save and restore state for context switching and for exception handling (via the <code>EXEC_TAG</code>). </p>
<p>What about when <code>HAVE_[SG]ETCONTEXT</code> are <b>not</b> <code>define</code>'d? Well in that case, Ruby uses <code>_setjmp/_longjmp</code>.</p>
<p><b>Bingo!</b></p>
<p>That's what's going on! From the <code>_setjmp/_longjmp</code> man page:</p>
<blockquote><p>... The _longjmp()  and  _setjmp()  functions  shall  be  equivalent  to  longjmp() and setjmp(), respectively, with the additional restriction that _longjmp() and _setjmp() shall not manipulate the signal mask...</p></blockquote>
<p>And from the <code>[sg]etcontext</code> man page:</p>
<blockquote><p>... uc_sigmask is the set of signals blocked in this context (see sigprocmask(2)) ...</p></blockquote>
<p>
<br />The issue is that <code>getcontext</code> calls <code>sigprocmask</code> on <b>every invocation</b> but <code>_setjmp</code> does not.</p>
<p><b>BUT WAIT</b> if that's true why didn't <code>GDB</code> hit a <code>sigprocmask</code> breakpoint before?</p>
<p><h2>x86_64 assembly FTW, again</h2>
</p>
<p>
Let's fire up <code>gdb</code> and figure out this breakpoint-not-breaking thing. First, let's start by disassembling <code>getcontext</code> (snipped for brevity):<br />
<code><br />
(gdb) p getcontext<br />
$1 = {<text variable, no debug info>} 0x7ffff7825100 <getcontext><br />
(gdb) disas getcontext<br />
...<br />
0x00007ffff782517f <getcontext+127>:	mov    $0xe,%rax<br />
0x00007ffff7825186 <getcontext+134>:	syscall<br />
...<br />
</code></p>
<p>Yeah, that's pretty weird. I'll explain why in a minute, but let's look at the disassembly of <code>sigprocmask</code> first:<br />
<code><br />
(gdb) p sigprocmask<br />
$2 = {<text variable, no debug info>} 0x7ffff7817340 <__sigprocmask><br />
(gdb) disas sigprocmask<br />
...<br />
0x00007ffff7817383 <__sigprocmask+67>:	mov    $0xe,%rax<br />
0x00007ffff7817388 <__sigprocmask+72>:	syscall<br />
...<br />
</code><br />
Yeah, this is a bit confusing, but here's the deal.</p>
<p>
Recent Linux kernels implement a shiny new method for calling system calls called <code>sysenter/sysexit</code>. This new way was created because the old way (<code>int $0x80</code>) turned out to be pretty slow. So Intel created some new instructions to execute system calls without such huge overhead.</p>
<p> All you need to know right now (I'll try to blog more about this in the future) is that the <code>%rax</code> register holds the system call number. The <code>syscall</code> instruction transfers control to the kernel and the kernel figures out which syscall you wanted by checking the value in <code>%rax</code>. Let's just make sure that <code>sigprocmask</code> is actually 0xe:</p>
<pre class="prettyprint lang-c">
[joe@pluto:/usr/include]% grep -Hrn "sigprocmask" asm-x86_64/unistd.h
asm-x86_64/unistd.h:44:#define __NR_rt_sigprocmask                     14</pre>
<p>
<br />
<b>Bingo. It's calling <code>sigprocmask</code> (albeit a bit obscurely).</b></p>
<p>
OK, so <code>getcontext</code> isn't calling <code>sigprocmask</code> directly, instead it replicates a bunch of code that <code>sigprocmask</code> has in its function body. That's why we didn't hit the <code>sigprocmask</code> breakpoint; <code>GDB</code> was going to break if you landed on the address <code>0x7ffff7817340</code> but <b>you didn't</b>. </p>
<p>Instead, <code>getcontext</code> reimplements the wrapper code for <code>sigprocmask</code> itself and <code>GDB</code> is none the wiser. </p>
<p><b>Mystery solved</b>.</p>
<p><h2>The patch</h2>
</p>
<p>
Get it <b><a href="http://github.com/ice799/matzruby/commit/0b9b69f9653782a33aee2b8937d405eae245b60c">HERE</a></b></p>
<p>
The patch works by adding a new configure flag called <code>--disable-ucontext</code> to allow you to specifically disable <code>[sg]etcontext</code> from being called, you <b>use this in conjunction with</b> <code>--enable-pthread</code>, like this:<br />
<code><br />
./configure --disable-ucontext --enable-pthread</code><br />
<br />
After you build Ruby configured like that, its performance is on par with (and sometimes slightly faster) than Ruby built with <code>--disable-pthread</code> for about a 30% performance boost when compared to <code>--enable-pthread</code>.</p>
<p>I added the switch because I wanted to preserve the original Ruby behavior, if you just pass <code>--enable-pthread</code> <b>without</b> <code>--disable-ucontext</code></b> Ruby will do the old thing and generate piles of sigprocmasks.</p>
<h2>Conclusion</h2>
<ol>
<li> Things aren't always what they seem - GDB may lie to you. Be careful. </li>
<li> Use the source, Luke. Libraries can do unexpected things, debug builds of libc can help!</li>
<li> I know I keep saying this, assembly is useful. Start learning it today!</li>
</ol>
<p>
If you enjoyed this blog post, consider <a href="http://feeds.feedburner.com/TimeToBleed" rel="alternate" type="application/rss+xml">subscribing (via RSS)</a> or <a href="http://twitter.com/joedamato">following (via twitter)</a>.</p>
<p><b>You'll want to stay tuned; <a href="http://twitter.com/tmm1">tmm1</a> and I have been on a roll the past week. Lots of cool stuff coming out!</b></p>
]]></content:encoded>
			<wfw:commentRss>http://timetobleed.com/fix-a-bug-in-rubys-configurein-and-get-a-30-performance-boost/feed/</wfw:commentRss>
		<slash:comments>43</slash:comments>
		</item>
		<item>
		<title>6 Line EventMachine Bugfix = 2x faster GC, +1300% requests/sec</title>
		<link>http://timetobleed.com/6-line-eventmachine-bugfix-2x-faster-gc-1300-requestssec/</link>
		<comments>http://timetobleed.com/6-line-eventmachine-bugfix-2x-faster-gc-1300-requestssec/#comments</comments>
		<pubDate>Wed, 29 Apr 2009 06:36:09 +0000</pubDate>
		<dc:creator>Joe Damato</dc:creator>
				<category><![CDATA[bugfix]]></category>
		<category><![CDATA[debugging]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[scaling]]></category>
		<category><![CDATA[systems]]></category>
		<category><![CDATA[x86]]></category>
		<category><![CDATA[debug]]></category>
		<category><![CDATA[garbage collection]]></category>
		<category><![CDATA[GC]]></category>
		<category><![CDATA[memory]]></category>
		<category><![CDATA[patch]]></category>
		<category><![CDATA[patches]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[threading]]></category>
		<category><![CDATA[threads]]></category>

		<guid isPermaLink="false">http://timetobleed.com/?p=554</guid>
		<description><![CDATA[Nothing is possible without lunch So Aman Gupta (tmm1) and I were eating lunch at the Oaxacan Kitchen on Tuesday and as usual, we were talking about scaling Ruby. We got into a small debate about which phase of garbage collection took the most CPU time. Aman&#8217;s claim: The mark phase, specifically the stack marking [...]]]></description>
			<content:encoded><![CDATA[<p><center><br />
<img src="http://timetobleed.com/images/oaxacan.jpg"/><br />
</center><br />
</p>
<p><h2>Nothing is possible without lunch</h2>
<p>So Aman Gupta (<a href="http://twitter.com/tmm1">tmm1</a>) and I were eating lunch at the <a href="http://www.theoaxacankitchen.com/">Oaxacan Kitchen</a> on Tuesday and as usual, we were talking about scaling Ruby. We got into a small debate about which phase of garbage collection took the most CPU time.</p>
<p>Aman&#8217;s claim:</p>
<ul>
<li>The mark phase, specifically the stack marking phase because of the huge stack frames created by rb_eval</li>
</ul>
<p>My claim:</p>
<ul>
<li>The sweep phase, because every single object has to be touched and some freeing happens.</li>
</ul>
<p>I told Aman that I didn&#8217;t believe the stack frames were that large, and we bet on how big we thought they would be. Couldn&#8217;t be more than a couple kilobytes, could it? <b>Little did we know how wrong our estimates were.</b>
</p>
<h2>Quick note about Ruby&#8217;s GC</h2>
<p>Ruby MRI has a mark-and-sweep garbage collector. As part of the mark phase, it <b>scans the process stack</b>. This is required because a pointer to a Ruby object can be passed to a C extension (like Eventmachine, or Hpricot, or whatever). If that happens, it isn&#8217;t safe to free the object yet. So Ruby does a simple scan and checks if <b>each word on the stack</b> is a pointer to the Ruby heap, if so, that item cannot be freed.<br />
</p>
<h2>GDB to the rescue</h2>
<p>We get back from lunch, launch our application, attach GDB and set a breakpoint. The breakpoint gets triggered and we see this seemingly innocuous stack trace [Note: To help with debugging, we compiled the EventMachine gem with -fno-omit-frame-pointer]:<br />
<code><br />
#0  0x00007ffff77629ac in epoll_wait () from /lib/libc.so.6<br />
#1  0x00007ffff6c0b220 in EventMachine_t::_RunEpollOnce (this=0x158d7e0) at em.cpp:461<br />
#2  0x00007ffff6c0b86c in EventMachine_t::_RunOnce (this=0x158d7e0) at em.cpp:423<br />
#3  0x00007ffff6c0bbd6 in EventMachine_t::Run (this=0x158d7e0) at em.cpp:404<br />
#4  0x00007ffff6c06638 in evma_run_machine () at cmain.cpp:83<br />
#5  0x00007ffff6c1897f in t_run_machine_without_threads (self=26066936) at rubymain.cpp:154<br />
#6  0x000000000041d598 in call_cfunc (func=0x7ffff6c1896e <t_run_machine_without_threads>, recv=26066936, len=0, argc=0, argv=0x0) at eval.c:5759<br />
#7  0x000000000041c92f in rb_call0 (klass=26065816, recv=26066936, id=29417, oid=29417, argc=0, argv=0x0, body=0x18dba10, flags=0) at eval.c:5911<br />
#8  0x000000000041e0ad in rb_call (klass=26065816, recv=26066936, mid=29417, argc=0, argv=0x0, scope=2, self=26066936) at eval.c:6158<br />
#9  0x00000000004160d5 in rb_eval (self=26066936, n=0x1940330) at eval.c:3514<br />
#10 0x00000000004150b7 in rb_eval (self=26066936, n=0x1941018) at eval.c:3357<br />
#11 0x000000000041d196 in rb_call0 (klass=26065816, recv=26066936, id=5393, oid=5393, argc=0, argv=0x0, body=0x1941018, flags=0) at eval.c:6062<br />
#12 0x000000000041e0ad in rb_call (klass=26065816, recv=26066936, mid=5393, argc=0, argv=0x0, scope=0, self=47127864) at eval.c:6158<br />
#13 0x0000000000415d01 in rb_eval (self=47127864, n=0x2cf5298) at eval.c:3493<br />
#14 0x00000000004148b2 in rb_eval (self=47127864, n=0x2cf4380) at eval.c:3223<br />
#15 0x000000000041d196 in rb_call0 (klass=47127808, recv=47127864, id=5313, oid=5313, argc=0, argv=0x0, body=0x2cf4380, flags=0) at eval.c:6062<br />
#16 0x000000000041e0ad in rb_call (klass=47127808, recv=47127864, mid=5313, argc=0, argv=0x0, scope=0, self=9606072) at eval.c:6158<br />
#17 0x0000000000415d01 in rb_eval (self=9606072, n=0x194b2a0) at eval.c:3493<br />
#18 0x00000000004148b2 in rb_eval (self=9606072, n=0x19587b0) at eval.c:3223<br />
#19 0x000000000041072c in eval_node (self=9606072, node=0x19587b0) at eval.c:1437<br />
#20 0x0000000000410dff in ruby_exec_internal () at eval.c:1642<br />
#21 0x0000000000410e4f in ruby_exec () at eval.c:1662<br />
#22 0x0000000000410e72 in ruby_run () at eval.c:1672<br />
#23 0x000000000040e78a in main (argc=3, argv=0x7fffffffebd8, envp=0x7fffffffebf8) at main.c:48<br />
</code><br />
Looks pretty normal, nothing to worry about, <i>right</i>?</p>
<p>We started checking the rb_eval frames because we assumed that those would be the largest stack frames. The rb_eval function inlines other functions and call itself recursively. So how big is one of the rb_eval frames?<br />
<code><br />
(gdb) frame 10<br />
#10 0x00000000004150b7 in rb_eval (self=26066936, n=0x1941018) at eval.c:3357<br />
3357		    result = rb_eval(self, node->nd_head);<br />
(gdb) p $rbp-$rsp<br />
$2 = 1904<br />
</code><br />
1,904 bytes &#8211; pretty large. If all the stack frames are that large, we are looking at <i>around</i> 47,600 bytes. Pretty serious. Let&#8217;s verify that Ruby thinks the stack is a sane size. There is a global in the Ruby interpreter called <code>rb_gc_stack_start</code>. It gets set when the Ruby stack is created in <code>Init_stack()</code>. When Ruby calculates the stack size it subtracts the current stack pointer from <code>rb_gc_stack_start</code> [<b>remember</b> on x86_64, the stack grows from high addresses to low addresses]. Let&#8217;s do that and see how big Ruby thinks the stack is.<br />
<code><br />
(gdb) p (unsigned int)rb_gc_stack_start - (unsigned int)$rsp<br />
$3 = 802688<br />
</code><br />
<b>Wait, wait, wait. 802,688 bytes with only 23 stack frames? WTF?!</b> Something is wrong. We started at the top and checked <i>all the rb_eval stack frames</i>, but none of them are larger than 2kb. We did find something <b>quite a bit larger than 2kb</b>, though.<br />
<code><br />
(gdb) frame 1<br />
#1  0x00007ffff6c0b220 in EventMachine_t::_RunEpollOnce (this=0x158d7e0) at em.cpp:461<br />
461		s = epoll_wait (epfd, ev, MaxEpollDescriptors, timeout == 0 ? 5 : timeout);<br />
(gdb) p $rbp-$rsp<br />
$28 = 786816<br />
</code><br />
Uh, the RunEpollOnce stack frame is <b>786,816 bytes</b>? That&#8217;s <i>got</i> to be wrong. <b>WTF?</b></p>
<p>Time to bring out the big guns.</p>
<h2>objdump + x86_64 asm FTW</h2>
<p>I pumped EventMachine&#8217;s shared object into <code>objdump</code> and captured the assembly dump:<br />
<code><br />
objdump -d rubyeventmachine.so > em.S<br />
</code><br />
I headed down to the <code>RunEpollOnce</code> function and saw the following:<br />
<code><br />
2f12b:       48 81 ec 78 01 0c 00    sub    $0xc0178,%rsp<br />
</code><br />
<b>Interesting</b>. So the code is moving <code>%rsp</code> down by 786,808 bytes to make room for something <b>big</b>. So, let&#8217;s see if the EventMachine code matches up with the assembly output.<br />
<code><br />
struct epoll_event ev [MaxEpollDescriptors];<br />
</code><br />
Where <code>MaxEpollDescriptors = 64*1024</code> and <code>sizeof(struct epoll_event) == 12</code>. That matches up with the assembly dump and the GDB output.</p>
<p>Usually, doing something like that in C/C++ is (usually) OK. Avoiding the heap whenever you can is a good idea because you avoid heap-lock contention, fragmenting the heap, and memory overhead for tracking the memory region. <b>When writing Ruby extensions, this isn&#8217;t necessarily true.</b> Remember, Ruby&#8217;s GC algorithm scans the <i>entire process stack</i> searching for references to Ruby objects. This EventMachine code causes Ruby to search an <i>extra</i> ~800,000 bytes drastically slowing down garbage collection.</p>
<h2>The patch</h2>
<p>Get the patch <a href="http://github.com/eventmachine/eventmachine/commit/1f6a4c912256b8110af94e270f7dde486f3c9d75">HERE</a></p>
<p> The patch simply moves the stack allocated <code>struct epoll_event ev</code> to the class definition so that it is allocated on the heap when an instance of the class is created with <code>new</code>. This <b>does not</b> change the memory usage of the process at all. It just moves the object off the stack. This makes all the difference because Ruby&#8217;s GC scans the <i>process stack</i> and <b>not</b> the process heap.</p>
<p>On top of all that, this patch helps with Ruby&#8217;s green threads, too. If the <code>epoll_wait</code> causes a Ruby event to fire and that event creates a Ruby thread, that Ruby thread gets an entire <b>copy</b> of the existing stack. Each time that thread is switched into and out of, that thread stack has to be memcpy&#8217;d into and out of place. Reducing those memcpys by ~800,000 bytes is a <b>HUGE</b> performance win. Want to learn more about threading implementations? Check out my threading models post: <a href="http://timetobleed.com/threading-models-so-many-different-ways-to-get-stuff-done/">here</a>.
</p>
<p>
Fixing this turned out to be pretty simple. A six (<b>6!!</b>) line patch:
</p>
<ul>
<li>Speeds up GC by <b>2-3x</b> because of the <i>huge</i> decrease in stack frame size.</li>
<li>Fixes an open bug in EventMachine where using threads with Epoll causes lots of slowness. The reason is that each thread will <b>inherit an ~800,000 byte stack</b> that gets copied in and out <b>every context switch</b>.</li>
<li>This results in an increase from <b>500 requests/sec to 7000 requests/sec</b> when using Sinatra+Thin+Epoll+Threads. <b>That is pretty ill.</b></li>
</ul>
<h2>Conclusion</h2>
<p>All in all, a productive debugging session lasting about an hour. The result was a simple patch, with 2 big performance improvements.
<p>A couple things to take away from this experience:</p>
<ul>
<li>Spend time learning your debugging tools because it pays off, especially <code>nm</code>, <code>objdump</code>, and of course <code>GDB</code>.</li>
<li>Getting familiar with x86_64 assembly is crucial if you hope to debug complex software and optimize it correctly.</li>
</ul>
<p>Keep your eyes open for up-coming blog posts about x86_64 assembly! Don&#8217;t forget to <a href="http://feeds.feedburner.com/TimeToBleed" rel="alternate" type="application/rss+xml">subscribe via RSS</a> or <a href="http://twitter.com/joedamato">follow me on twitter</a></p>
]]></content:encoded>
			<wfw:commentRss>http://timetobleed.com/6-line-eventmachine-bugfix-2x-faster-gc-1300-requestssec/feed/</wfw:commentRss>
		<slash:comments>46</slash:comments>
		</item>
		<item>
		<title>Ruby threading bugfix: small fix goes a long way.</title>
		<link>http://timetobleed.com/ruby-threading-bugfix-small-fix-goes-a-long-way/</link>
		<comments>http://timetobleed.com/ruby-threading-bugfix-small-fix-goes-a-long-way/#comments</comments>
		<pubDate>Mon, 06 Oct 2008 03:17:59 +0000</pubDate>
		<dc:creator>Joe Damato</dc:creator>
				<category><![CDATA[bugfix]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[latency]]></category>
		<category><![CDATA[patch]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[scaling]]></category>
		<category><![CDATA[systems]]></category>

		<guid isPermaLink="false">http://timetobleed.com/?p=39</guid>
		<description><![CDATA[Quick Overview of Ruby Threads Ruby 1.8.7 (MRI) implements threads completely in userland (also called &#8220;green threads&#8221; for short) even if built with pthreads. This means that underlying OS kernel has no knowledge about any threads created in ruby programs. In the view of the kernel, it only sees a process with one thread. This [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://timetobleed.com/images/threads.jpg" alt="threads" /></p>
<h2>Quick Overview of Ruby Threads</h2>
<p>Ruby 1.8.7 (MRI) implements threads completely in userland (also called &#8220;green threads&#8221; for short) <em>even if built with pthreads</em>. This means that underlying OS kernel has no knowledge about any threads created in ruby programs. In the view of the kernel, it only sees a process with one thread. This one thread is the ruby interpreter which has its own scheduler and threading implementation built-in. What this means for the Ruby developer is that any thread which does I/O will cause the entire ruby process (the ruby interpretter and all ruby green threads) to block.</p>
<p>Implementing threads in userland has some interesting design questions, one of which is: How does the interpretter start and stop executing ruby threads? One way to implement this is to create a timer which interrupts the interpretter at some interval. Ruby (depending on your platform and build options) creates either:</p>
<ol>
<li>An interval timer with setitimer, which delivers a SIGVTALRM signal to the process at the specified interval, or</li>
<li>A real native OS thread (via pthreads) which sleeps for the length of the interval</li>
</ol>
<p>In either case, a flag called <em>rb_thread_pending</em> is set (for those of you following along with the Ruby source, the flag is checked with the CHECK_INTS macro).<strong> It is important to note, however</strong> that the timer created with setitimer is of type ITIMER_VIRTUAL which means time will be measured <em>only when the interpretter is executing</em> (and not during system calls executed on behalf of ruby) whereas the sleeping OS thread is always measuring time, regardless of whether or not Ruby is executing.</p>
<h2>strace saves the day</h2>
<p>I am working on an event-based real-time distributed (insert more buzzwords) system built in ruby. As a result I am constantly trying to push ruby to its limits, like many other people out there. I noticed that the latency of my eventloop started to increase and after I spawned threads to do short tasks (like send an email, for example). The weird thing was that the latency didn&#8217;t go down even after the thread had finished executing! To debug this problem I attached strace to my running ruby process and I saw this:</p>
<pre>[joe@mawu]% strace -ttTp `pidof ruby` 2&gt;&amp;1 | egrep '(sigret|setitimer|timer|exit_group)'
19:41:21.282700 setitimer(ITIMER_VIRTUAL, {it_interval={0, 10000}, itvalue={0, 10000}}, NULL) = 0 &lt;0.000022&gt;
19:41:26.778386 --- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
19:41:26.780578 sigreturn()             = ? (mask now []) &lt;0.000022&gt;
19:41:26.814172 --- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
19:41:26.823761 sigreturn()             = ? (mask now []) &lt;0.000022&gt;
19:41:26.888419 --- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
19:41:26.890691 sigreturn()             = ? (mask now []) &lt;0.000041&gt;
19:41:26.904949 --- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
19:41:26.907327 sigreturn()             = ? (mask now []) &lt;0.000040&gt;
19:41:26.995445 --- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
19:41:26.997699 sigreturn()             = ? (mask now []) &lt;0.000041&gt;
19:41:27.144428 --- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
19:41:27.147146 sigreturn()             = ? (mask now []) &lt;0.000023&gt;
19:41:27.303472 --- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
19:41:27.306825 sigreturn()             = ? (mask now []) &lt;0.000021&gt;
...</pre>
<p>Weird! Looks like the timer is interrupting the executing Ruby process causing it to enter the thread scheduler only to schedule the only thread in the app and start executing again. This was <em>really</em> bad for our system because our main eventloop was being constantly interrupted to the point where under high load the eventloop was unable to service connection requests fast enough and timing out our test scripts. This is also a big problem if you use ruby gems piled on top of ruby gems because the more layers of gem code executing for the short time quanta means that less of your actual app code gets to execute! Not cool, but before getting excited I decided to try to reproduce this on a smaller scale, so:</p>
<pre>[joe@mawu]% strace -ttT ruby -e 't1 = Thread.new{ sleep(5) }; t1.join; 10000.times{"aaaaa" * 1000};' 2&gt;&amp;1 | egrep '(sigret|setitimer|timer|exit_group)'
19:41:21.282700 setitimer(ITIMER_VIRTUAL, {it_interval={0, 10000}, itvalue={0, 10000}}, NULL) = 0 &lt;0.000022&gt;
19:41:26.778386 --- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
19:41:26.780578 sigreturn()             = ? (mask now []) &lt;0.000022&gt;
19:41:26.814172 --- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
19:41:26.823761 sigreturn()             = ? (mask now []) &lt;0.000022&gt;
19:41:26.888419 --- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
19:41:26.890691 sigreturn()             = ? (mask now []) &lt;0.000041&gt;
19:41:26.904949 --- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
19:41:26.907327 sigreturn()             = ? (mask now []) &lt;0.000040&gt;
19:41:26.995445 --- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
19:41:26.997699 sigreturn()             = ? (mask now []) &lt;0.000041&gt;
19:41:27.144428 --- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
19:41:27.147146 sigreturn()             = ? (mask now []) &lt;0.000023&gt;
19:41:27.303472 --- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
19:41:27.306825 sigreturn()             = ? (mask now []) &lt;0.000021&gt;
19:41:27.314461 exit_group(0)           = ?</pre>
<p>Definitely starting to look like a bug from the strace output.</p>
<p>I decided to dive into the ruby 1.8.7 MRI source code (eval.c for those following along in the source) and found that a timer is created whenever a thread is created, <em>but the timer is not destroyed when the thread terminates!</em> Definitely a bug. A quick fix to eval.c fixed the problem and my latency dropped like a rock!</p>
<h2><strong>Patch for ruby 1.8.7</strong></h2>
<p>I posted a patch to ruby-core and some code was added to fix pthread-enabled Ruby.<strong> <span style="text-decoration: underline;">NOTE:</span></strong> You should <strong>ALWAYS</strong> test new patches before applying them to your live site, this is no exception!<br />
<a href="http://timetobleed.com/files/ruby-1.8.7p72-threadfix.patch"> Ruby MRI 1.8.7p72 patch</a></p>
<h2>Future directions</h2>
<p>I&#8217;ve been asked a bunch of different questions about threads and threading models, so my next couple blog posts will be about different threading models. I&#8217;m going to dive into the details, go through the pros and cons, and try to clear things up a bit, so stay tuned and thanks for reading!</p>
]]></content:encoded>
			<wfw:commentRss>http://timetobleed.com/ruby-threading-bugfix-small-fix-goes-a-long-way/feed/</wfw:commentRss>
		<slash:comments>17</slash:comments>
		</item>
	</channel>
</rss>
