time to bleed by Joe Damato

technical ramblings from a wanna-be unix dinosaur

Fix a bug in Ruby’s configure.in and get a ~30% performance boost.

View Comments


Special thanks…

Going out to Jake Douglas for pushing the initial investigation and getting the ball rolling.

The whole --enable-pthread thing

Ask any Ruby hacker how to easily increase performance in a threaded Ruby application and they’ll probably tell you:

Yo dude… Everyone knows you need to configure Ruby with --disable-pthread.

And it’s true; configure Ruby with --disable-pthread and you get a ~30% performance boost. But… why?

For this, we’ll have to turn to our handy tool strace. We’ll also need a simple Ruby program to this one. How about something like this:

def make_thread
  Thread.new {
    a = []
    10_000_000.times {
      a << "a"
      a.pop
    }
  }
end

t = make_thread
t1 = make_thread 

t.join
t1.join

Now, let's run strace on a version of Ruby configure'd with --enable-pthread and point it at our test script. The output from strace looks like this:

22:46:16.706136 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000004>
22:46:16.706177 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000004>
22:46:16.706218 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000004>
22:46:16.706259 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000005>
22:46:16.706301 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000004>
22:46:16.706342 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000004>
22:46:16.706383 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000004>
22:46:16.706425 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000004>
22:46:16.706466 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000004>

Pages and pages and pages of sigprocmask system calls (Actually, running with strace -c, I get about 20,054,180 calls to sigprocmask, WOW). Running the same test script against a Ruby built with --disable-pthread and the output does not have pages and pages of sigprocmask calls (only 3 times, a HUGE reduction).

OK, so let's just set a breakpoint in GDB... right?

OK, so we should just be able to set a breakpoint on sigprocmask and figure out who is calling it.

Well, not exactly. You can try it, but the breakpoint won't trigger (we'll see why a little bit later).

Hrm, that kinda sucks and is confusing. This will make it harder to track down who is calling sigprocmask in the threaded case.

Well, we know that when you run configure the script creates a config.h with a bunch of defines that Ruby uses to decide which functions to use for what. So let's compare ./configure --enable-pthread with ./configure --disable-pthread:

[joe@mawu:/home/joe/ruby]% diff config.h config.h.pthread
> #define _REENTRANT 1
> #define _THREAD_SAFE 1
> #define HAVE_LIBPTHREAD 1
> #define HAVE_NANOSLEEP 1
> #define HAVE_GETCONTEXT 1
> #define HAVE_SETCONTEXT 1


OK, now if we grep the Ruby source code, we see that whenever HAVE_[SG]ETCONTEXT are set, Ruby uses the system calls setcontext() and getcontext() to save and restore state for context switching and for exception handling (via the EXEC_TAG).

What about when HAVE_[SG]ETCONTEXT are not define'd? Well in that case, Ruby uses _setjmp/_longjmp.

Bingo!

That's what's going on! From the _setjmp/_longjmp man page:

... The _longjmp() and _setjmp() functions shall be equivalent to longjmp() and setjmp(), respectively, with the additional restriction that _longjmp() and _setjmp() shall not manipulate the signal mask...

And from the [sg]etcontext man page:

... uc_sigmask is the set of signals blocked in this context (see sigprocmask(2)) ...


The issue is that getcontext calls sigprocmask on every invocation but _setjmp does not.

BUT WAIT if that's true why didn't GDB hit a sigprocmask breakpoint before?

x86_64 assembly FTW, again

Let's fire up gdb and figure out this breakpoint-not-breaking thing. First, let's start by disassembling getcontext (snipped for brevity):

(gdb) p getcontext
$1 = {} 0x7ffff7825100
(gdb) disas getcontext
...
0x00007ffff782517f : mov $0xe,%rax
0x00007ffff7825186 : syscall
...

Yeah, that's pretty weird. I'll explain why in a minute, but let's look at the disassembly of sigprocmask first:

(gdb) p sigprocmask
$2 = {} 0x7ffff7817340 <__sigprocmask>
(gdb) disas sigprocmask
...
0x00007ffff7817383 <__sigprocmask+67>: mov $0xe,%rax
0x00007ffff7817388 <__sigprocmask+72>: syscall
...

Yeah, this is a bit confusing, but here's the deal.

Recent Linux kernels implement a shiny new method for calling system calls called sysenter/sysexit. This new way was created because the old way (int $0x80) turned out to be pretty slow. So Intel created some new instructions to execute system calls without such huge overhead.

All you need to know right now (I'll try to blog more about this in the future) is that the %rax register holds the system call number. The syscall instruction transfers control to the kernel and the kernel figures out which syscall you wanted by checking the value in %rax. Let's just make sure that sigprocmask is actually 0xe:

[joe@pluto:/usr/include]% grep -Hrn "sigprocmask" asm-x86_64/unistd.h
asm-x86_64/unistd.h:44:#define __NR_rt_sigprocmask                     14


Bingo. It's calling sigprocmask (albeit a bit obscurely).

OK, so getcontext isn't calling sigprocmask directly, instead it replicates a bunch of code that sigprocmask has in its function body. That's why we didn't hit the sigprocmask breakpoint; GDB was going to break if you landed on the address 0x7ffff7817340 but you didn't.

Instead, getcontext reimplements the wrapper code for sigprocmask itself and GDB is none the wiser.

Mystery solved.

The patch

Get it HERE

The patch works by adding a new configure flag called --disable-ucontext to allow you to specifically disable [sg]etcontext from being called, you use this in conjunction with --enable-pthread, like this:

./configure --disable-ucontext --enable-pthread


After you build Ruby configured like that, its performance is on par with (and sometimes slightly faster) than Ruby built with --disable-pthread for about a 30% performance boost when compared to --enable-pthread.

I added the switch because I wanted to preserve the original Ruby behavior, if you just pass --enable-pthread without --disable-ucontext Ruby will do the old thing and generate piles of sigprocmasks.

Conclusion

  1. Things aren't always what they seem - GDB may lie to you. Be careful.
  2. Use the source, Luke. Libraries can do unexpected things, debug builds of libc can help!
  3. I know I keep saying this, assembly is useful. Start learning it today!

If you enjoyed this blog post, consider subscribing (via RSS) or following (via twitter).

You'll want to stay tuned; tmm1 and I have been on a roll the past week. Lots of cool stuff coming out!

Written by Joe Damato

May 5th, 2009 at 3:20 am

  • lazzarello
    Heya. I applied your patch to ruby 1.8.7 (2008-08-11 patchlevel 72) [x86_64-linux] in Debian stable and rebuilt the package. It does not remove the dominance of the rt_sigprocmask syscall in your test script. I ran the script for 24 seconds and it called that syscall 416614 times.
  • cryptoromantic
    so what is the underlying bug here - that get/setcontext is completely unnecessary because green threads are being used even with pthreads enabled, or that some work get/setcontext is doing is necessary, but it also always calls sigprocmasks (which is unnecessary), and this calls for a better decomposition of the get/setcontext call... ?
  • Great work! I've been investigating these issues around --enable-pthread recently as I *just* became aware of their significance.. and I'm surprised, frankly.. there seem to be a lot of people that knew this, yet a lot of people who still don't and who still run into tutorials like the popular Hivelogic one which recommends you use --enable-pthread.
  • staffing1
    Brilliant work! I hope the guys upstream patch it in.
    www.staffingpower.com
  • According to Matz, this isn't a problem for Ruby 1.8.7

    http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ru...

    yet I'm still seeing performance improvements on the ruby benchmark suite when applying your patch...
  • redditrebel
    XBMC and Boxee(based on XBMC) suffer from this on newer 64 bit kernels, calling rt_sigprocmask a bazillion times per second and basically eating a whole CPU.

    I stopped trying to debug it as the developers were truculent about even supporting newer 64 bit installs, ever.

    I wouldn't be surprised if this is the cause.
  • febeling
    Someone should eventually figure out what this usage of pthread is there for in the first place. My best theory is that it keeps timeouts working even when you have native function calls. Ruby green / simulated threads would otherwise not be able to step in. But I could not really find a confirmation for this. Anyone having more insight on this?
  • Hi.

    Check out my previous blog post which explains this: http://timetobleed.com/ruby-threading-bugfix-sm...

    Thanks for reading.
  • Man, you are my idol. I want more stuff like this! :)
  • Thanks dude! If you liked this article be sure to check out my older posts too.

    Thanks for reading - stay tuned lots more cool stuff coming.
  • nobu nakada
    You can run `ac_cv_header_ucontext_h=no ./configure' to override autodetection.
  • Good stuff!

    Just one nitpick, you say that "getcontext reimplements the wrapper code", but I doubt that's the case. My guess is that GCC is just inlining the sigprocmask function. If you compiled Ruby with all compiler optimizations disabled, I bet your breakpoint would fire.
  • Nope, it is not inlined, it is actually re-implemented.

    Check it out: http://www.google.com/codesearch/p?hl=en#HT3Jwv...

    Thanks for reading!
  • Wow. I stand corrected. Nice work!
  • Marco Maggi
    Which OS are you running? On my Slackware Linux the setjmp(3) manual page says that POSIX does not define whether "setjmp()" saves the sigmask or not, and it suggests to use "sigsetjmp()" to save it. My interpretation of the header file "setjmp.h" from the GNU C Library is that "setjmp()" is set to "_setjmp()" on all the platforms except BSD; so on Linux setjmp == _setjmp, no?
  • completely off-topic, but what is the animal on the photo? It's ugly!
  • Adam
    I believe the 'animal' is the 'Brain bug' from the movie Starship Troopers. ;)
  • You are correct.

    Thanks for reading, guys!
  • lygaret
    I'd just like to say that you're my hero. Bravo man.
  • Thanks for reading!
  • I spent an illuminating and exasperating (due to the hardware, not the language) year updating some 68000 assembly code a few years ago. I'd like to learn x86 assembly too - do you have any recommendations for learning x86 assembly, particularly in the context of Linux?
  • I've been getting this question more frequently. I'll add an x86 assembly primer and reference list to my queue of blog posts.

    I learned assembly by objdumping stuff and trying to decipher what it was doing. You can try to start small with simple C programs and work your way up to more complicated stuff.

    The Intel manuals are pretty helpful, too.

    Thanks for reading!
  • Hongli Lai
    Very nice! But what's the downside of using _setjmp instead of setcontext? Can it break anything?

    By the way, are you attending RailsConf? If so would you like to meet? I'm giving the "Scaling Rails" talk this afternoon so you can find me there.
  • Hi -

    Switching to _setjmp shouldn't break anything because Ruby uses that when you build with --disable-pthread. So, no I don't think it'd break anything, but thats why I left the original behavior intact =]

    Ruby only uses getcontext in a few places, in EXEC_TAGS and when switching threads. Neither of those cases need to worry about the signal mask.

    I'm not at RailsConf, but I'm sure Aman is (tmm1). Maybe you'll run into him there? We are working on some other cool stuff that I think you'll be interested in for REE.

    Thanks for reading!
  • Hongli Lai
    Cool, I met up with Aman but unfortunately I did not have a lot of time to speak with him.

    So if _setjmp doesn't break anything, why not always turn it on?
  • Is there a way to get the configure options used to compile the binary from a ruby binary ?
  • rogerdpack
    appears to be
    require 'rbconfig'
    Config::CONFIG["configure_args"]


    ref: http://groups.google.com/group/ruby-benchmark-s...
  • jamesaguilar
    I've been enjoying your stuff. I don't use Ruby, but I do wonder why Ruby was using the original behavior. Are there any gotchas here or places where the change in behavior from your patch could bite someone?
  • Does Enterprise Ruby have this configured properly by default?

    (Great post, very interesting. Thanks for the info.)
  • As far as I know (could be wrong) REE is built with --disable-pthread which has roughly equivalent performance to --enable-pthread --disable-ucontext.

    Thanks for reading!
  • Brilliant work! I hope the guys upstream patch it in.
  • Once again, your debugging cycle shows clear intimate knowledge and a good method, well done! :)
  • Good stuff, keep it coming!

    With the risk of being "that guy", are you submitting this upstream?

    Does it ever make sense to use the old behaviour of --enable-pthread (eg. why the extra flag)?
  • Yup -- submitted upstream. Haven't heard anything back, yet. I submit most of my patches upstream, have yet to see something make it into 1.8, though.

    Thanks for reading!
  • Hongli Lai
    Upstream can be a bit slow timetimes. But I'd be happy to consider including it in Ruby Enterprise Edition so that people can still use your improvements.
  • Thanks!

    I think REE is built with --disable-pthread right? In that case, the performance is the same as building with --enable-pthread --disable-ucontext.
  • Hongli Lai
    Yeah. But Tcl/Tk wants --enable-pthread. If we can Tcl/Tk compatible while not paying the performance price then that would be even better.
  • Ah, right. I totally forgot about Tcl/Tk. Cool!
blog comments powered by Disqus