time to bleed by Joe Damato

technical ramblings from a wanna-be unix dinosaur

Archive for the ‘memory’ tag

a/b test mallocs against your memory footprint

View Comments


The other day at Kickball Labs we were discussing whether linking Ruby against tcmalloc (or ptmalloc3, nedmalloc, or any other malloc) would have any noticeable effect on application latency. After taking a side in the argument, I started wondering how we could test this scenario.

We had a couple different ideas about testing:

  • Look at other people’s benchmarks
    BUT do the memory workloads tested in the benchmarks actually match our own workload at all?
  • Run different allocators on different Ruby backends
    BUT different backends will get different users who will use the system differently and cause different allocation patterns
  • Try to recreate our applications memory footprint and test that against different mallocs
    BUT how?

I decided to explore the last option and came up with an interesting solution. Let’s dive into how to do this.

Get the code:

http://github.com/ice799/malloc_wrap/tree/master

Step 1: We need to get a memory footprint of our process

So we have some random binary  (in this case it happens to be a Ruby interpreter, but it could be anything) and we’d like to track when it calls malloc/realloc/calloc and free (from now on I’ll refer to all of these as malloc-family for brevity). There are two ways to do this, the right way and the wrong/hacky/unsafe way.

  • The “right” way to do this, with libc malloc hooks:

    Edit your application code to use the malloc debugging hooks provided by libc. When a malloc-family function is called, your hook executes and outputs to a file which function was called and what arguments were passed to it.

  • The “wrong/hacky/unsafe” way to do this, with LD_PRELOAD:

    Create a shim library and point LD_PRELOAD at it. The shim exports the malloc-family symbols, and when your application calls one of those functions, the shim code gets executed. The shim logs which function was called and with what arguments. The shim then calls the libc version of the function (so that memory is actually allocated/freed) and returns control to the application.

I chose to do it the second way, because I like living on the edge. The second way is unsafe because you can’t call any functions which use a malloc-family function before your hooks are setup. If you do, you can end up in an infinite loop and crash the application.

You can check out my implementation for the shim library here: malloc_wrap.c

Why does your shim output such weirdly formatted data?

Answer is sort of complicated, but let’s keep it simple: I originally had a different idea about how I was going to use the output. When that first try failed, I tried something else and translated the data to the format I needed it in, instead of re-writing the shim. What can I say, I’m a lazy programmer.

OK, so once you’ve built the shim (gcc -O2 -Wall -ldl -fPIC -o malloc_wrap.so -shared malloc_wrap.c), you can launch your binary like this:

% LD_PRELOAD=/path/to/shim/malloc_wrap.so /path/to/your/binary -your -args

You should now see output in /tmp/malloc-footprint.pid

Step 2: Translate the data into a more usable format

Yeah, I should have went back and re-written the shim, but nothing happens exactly as planned. So, I wrote a quick ruby script to convert my output into a more usable format. The script sorts through the output and renames memory addresses to unique integer ids starting at 1 (0 is hardcoded to NULL).

The format is pretty simple. The first line of the file has the number of calls to malloc-family functions, followed by a blank line, and then the memory footprint. Each line of the memory footprint has 1 character which represents the function called followed by a few arguments. For the free() function, there is only one argument, the ID of the memory block to free. malloc/calloc/realloc have different arguments, but the first argument following the one character is always the ID of the return value. The next arguments are the arguments usually passed to malloc/calloc/realloc in the same order.

Have a look at my ruby script here: build_trace_file.rb

It might take a while to convert your data to this format, I suggest running this in a screen session, especially if your memory footprint data is large. Just as a warning, we collected 15 *gigabytes* of data over a 10 hour period. This script took *10 hours* to convert the data. We ended up with a 7.8 gigabyte file.

% ruby /path/to/script/build_trace_file.rb /path/to/raw/malloc-footprint.PID /path/to/converted/my-memory-footprint

Step 3: Replay the allocation data with different allocators and measure time, memory usage.

OK, so we now have a file which represents the memory footprint of our application. It’s time to build the replayer, link against your malloc implementation of choice, fire it up and start measuring time spent in allocator functions and memory usage.

Have a look at the replayer here: alloc_tester.c
Build the replayer: gcc -ggdb -Wall -ldl -fPIC -o tester alloc_tester.c

Use ltrace

ltrace is similar to strace, but for library calls. You can use ltrace -c to sum the amount of time spent in different library calls and output a cool table at the end, it will look something like this:

% time     seconds  usecs/call     calls      function
------ ----------- ----------- --------- --------------------
86.70   37.305797          62    600003 fscanf
10.64    4.578968          33    138532 malloc
2.36    1.014294          18     55263 free
0.25    0.109550          18      5948 realloc
0.03    0.011407          45       253 printf
0.02    0.010665          42       252 puts
0.00    0.000167          20         8 calloc
0.00    0.000048          48         1 fopen
------ ----------- ----------- --------- --------------------
100.00   43.030896                800260 total

Conclusion

Using a different malloc implementation can provide a speed/memory increases depending on your allocation patterns. Hopefully the code provided will help you test different allocators to determine whether or not swapping out the default libc allocator is the right choice for you. Our results are still pending; we had a lot of allocator data (15g!) and it takes several hours to replay the data with just one malloc implementation. Once we’ve gathered some data about the different implementations and their effects, I’ll post the results and some analysis. As always, stay tuned and thanks for reading!

Written by Joe Damato

March 16th, 2009 at 8:39 pm

Plugging Ruby Memory Leaks: heap/stack dump patches to help take out the trash.

View Comments

EDIT: For all those tuning into my blog for the first time be sure to check out the Ruby threading bugfix from an earlier post: Ruby Threading Bugfix: Small fix goes a long way

In this blog post I’m releasing some patches to MRI Ruby 1.8.7p72 that add heap dumping, object reference finder, stack dumping, object allocation/deallocation tracking, and some more goodies to MRI Ruby 1.8.7p72. These are some changes to Ruby I’ve been working on at the startup I am with (Kickball Labs!), but I think they are generally useful to other people who are looking to try and track down memory problems in Ruby. Let’s get started…

Memory leaks? In Ruby ?!

Many people working on large Ruby applications have complained about memory consumption and the lack of tools available to dump the object heap and search for which objects hold references to other objects. In a future blog post, I hope to talk about different methods of garbage collection and dissect some popular languages and their GC algorithms. Very briefly though, let’s take a quick look at how Ruby’s GC works and how leaks can occur.

The garbage collector in MRI Ruby marks objects as in use during the mark phase if a reference to the object is found. The traversal is started at a set of top level root objects which are marked as in use and recurses on the objects which are referenced from the current object. In this way all reachable objects are marked as in use. The sweep phase then hits every object on the heap and marks objects as free if they are not in use.

We all try to write code which is decoupled, clean, and well documented. Let’s face it though: we are human and we make mistakes. Image a pile of objects all holding references to each other. As long as this pile of objects has no reachable reference from a path originating from a root object, these objects will get garbage collected. Unfortunately, from time to time we forget to clear out all references to old objects we are no longer using. In our pile of objects example, all it takes is one reachable reference from a root object to keep the entire pile of objects in memory.

Here at Kickball labs we thought we were encountering such a situation, but we had no easy way to verify and track it down. I hacked up a heap dumper, stack dumper, reference finder, and some other stuff to help in our quest to take out the trash.

Memtrack APIs

To start I did a bunch of google searching to find if anyone else had gone down this path before. I stumbled across a company called Software Verify, which released a really cool set of APIs, Ruby Memory Tracking API , for tracking object allocation, deallocation, getting the root heap objects, and getting references to objects. The API was really clean and allowed me to hook in my own callbacks for each of those events. One small downside though; the memtrack APIs were for ruby 1.8.6 and when I created a diff they did not apply cleanly to 1.8.7. With a tiny bit of work I was able to get APIs to apply to 1.8.7p72. The patch can be found below and applies to MRI Ruby 1.8.7-p72.

Heap/Stack dumper overview

I hooked in some callbacks to the Memtrack API and added some additional code for heap_find and stack_dump. My algorithm for heap_dump and heap_find is pretty simple. I start by traversing each of the objects on the heap and dumping them each (for heap_dump) or checking if the object has a reference to the object passed into heap_find.

I also copied some code from the mark phase of Ruby’s GC to check the contents of the registers and stack frames. I did this because Ruby checks this information before marking an object as in use. You may be wondering how this can possibly work; couldn’t any FIXNUM that gets passed around in the Ruby internals look like a heap pointer? Well… not quite. Ruby FIXNUMs are 31 bits and as a result they can be bit shifted by 1 bit and their LSB can be set to 1 without losing any data. Doing this ensures that the resulting value cannot look like a pointer to the heap.

HOWTO use the stack/heap dumper

Four functions were added to GC: heap_dump, heap_find, heap_counts, and stack_dump. After applying the patches below and rebuilding Ruby, you can use these functions. There are a bunch of environment variables you can set to control how much output you get. WARNING: There can be a lot of output.

Environment vars and what they do:

RUBY_HEAP_LOG – setting this environment var will cause heap dumping output to be written to the specified file. If this is not set data will be written to stderr

RUBY_HEAP_ALLOC – setting this environment var to anything greather than or equal to 1 will cause output whenever a ruby object is allocated or deallocated this creates a LOT of output.

RUBY_HEAP_SETUP – setting this environment var to anything greater than or equal to 1 will cause output whenever a ruby object has its klass field set.

Ok – now let’s take a quick look at the four added functions and what they do when you use them:

GC.heap_dump – this function writes to stderr (unless RUBY_HEAP_LOG is set) the contents of the Ruby heap. It outputs each object that is not marked as free and information about that object and the objects it references. For strings, for example, the value and length of the string will be output. For classes, the name of the class (unless it is an anonymous class) and the superclass.

GC.heap_find – this function takes one argument and traverses the heap looking for objects which hold a reference to the argument. Once found, it outputs information about the object holding the reference and also information about the object you passed in.

GC.heap_counts – this function is poorly implemented and can only be called after you have called GC.heap_dump. This function outputs the number of each type of Ruby object that has been created (object, node, class, module, string, fixnum, float, etc).

GC.stack_dump – this function checks the registers and stack frames for things which look like pointers to the heap. If the pointer looks like it is pointed at the heap, the address of the pointer is output as well as information about the object it references.

The good stuff: Patches for Ruby 1.8.7p72

Apply the patches below to the MRI Ruby 1.8.7p72 code base and rebuild Ruby. The code below is a bit ugly as it was written in a rush. Use with caution and be sure to email me with any bugfixes or cool additions you come up with!

Patch to apply Memtrack API to MRI Ruby 1.8.7-p72:
memtrack-api-187p72.patch

Patch to apply heap dumper, stack dumper, ref finder, alloc/dealloc tracker, and everything else which is awesome:

heapdumper.patch

Conclusion

Ruby is lacking tools for serious memory analysis. Using the Memtrack API and some other code, tracking down memory issues can be less painful. The tools included in the patches above are only a first cut at developing real, useful memory analysis tools for Ruby. I hope to see more work in creating tools for Ruby.

Thanks for reading! If you have problems using the patches or you create a cool memory analysis tool send me some mail.

Written by Joe Damato

October 15th, 2008 at 11:30 am