time to bleed by Joe Damato

technical ramblings from a wanna-be unix dinosaur

Archive for October, 2009

Defeating the Matasano C++ Challenge with ASLR enabled

View Comments

If you enjoy this article, subscribe (via RSS or e-mail) and follow me on twitter.

Important note

I am NOT a security researcher (I kinda want to be though). As such, there are probably way better ways to do everything in this article. This article is just illustrating my thought process when cracking this challenge.

The Challenge

The Matasano Security blog recently posted an article titled A C++ Challenge1 which included a particularly ugly piece of C++ code that has a security vulnerability. The challenge is for the reader to find the vulnerability, use it execute arbitrary code, and submit the data to Matasano.

Sounds easy enough, let’s do this! cue hacking music

Making it harder

Recent linux kernels have feature called Address Space Layout Randomization (ASLR) which can be set in /proc/sys/kernel/randomize_va_space. ASLR is a security feature which randomizes the start address of various parts of a process image. Doing this makes exploiting a security bug more difficult because the exploit cannot use any hard coded addresses.

The options you can set are:

  • 0 – ASLR off
  • 1 – Randomize the addresses of the stack, mmap area, and VDSO page. This is the default.
  • 2 – Everything in option 1, but also randomize the brk area so the heap is randomized.

Just for fun I decided to set it to 2 to make exploiting the challenge more difficult.

Got the code, but now what?

I decided to start attacking this problem by looking for a few common errors, in this order:

  1. strcpy()/strncpy() bugs No calls
  2. memcpy() bugs A few calls
  3. Off by one bugs None obvious

It turned out from a quick look that all calls to memcpy() included sane, hard-coded values. So, it had to be something more complex.

Digging deeper – finding input streams the user can control

Next, I decided to actually read the code and see what it was doing at a high level and what inputs could be controlled. Turns out that the program reads data from a file and uses the data from the file to determine how many objects to allocate.

Obviously, this portion of the code caught my interest so let’s take a quick look:

/* ... */

fd.read(file_in_mem, MAX_FILE_SIZE-1);

/* ... */

struct _stream_hdr *s = (struct _stream_hdr *) file_in_mem;

if(s->num_of_streams >= INT_MAX / (int)sizeof(int)) {
    safe_count = MAX_STREAMS;
} else {
    safe_count = s->num_of_streams;

Obj *o = new Obj[safe_count];

OK, so clearly that if statement is suspect. At the very least it doesn’t check for negative values, so you could end up with safe_count = -1 which might do something interesting when passed to the new operator. Moreover, it appears this if statement will allow values as large as 536870910 ([INT_MAX / sizeof(int)] – 1).

Maybe the exploit has something to do with values this if statement is allowing through?

A closer look at the integer overflow in new

Let’s use GDB to take a closer look at what the compiler does before calling new. I’ve added a few comments in line to explain the assembly code:

mov    %edx,%eax   ;  %edx and %eax store s->num_of_streams
add    %eax,%eax   ;  add %eax to itself (s->num_of_streams * 2)
add    %edx,%eax   ;  add  s->num_of_streams + %eax (s->num_of_streams*3)
shl    $0x2,%eax   ;  multiply (s->num_of_streams * 3) by 4  (s->num_of_streams * 12)
mov    %eax,(%esp) ;  move it into position to pass to new
call   0x8048a7c <_Znaj@plt> ; call new

The compiler has generated code to calculate: s->num_of_streams * sizeof(Obj). sizeof(Obj) is 12 bytes. For large values of s->num_of_streams multiplying it by 12, causes an integer overflow and the value passed to new will actually be less than what was intended.

For my exploit, I ended up using the value 357913943. This value causes an overflow, because 357913943 * 12 is greater than the biggest possible value for an integer by 20. So the value passed to new is 20. Which is, of course, significantly less than what we actually wanted to allocate. Other people have written about integer overflow in new in other compilers2 before.

Let’s see how this can be used to cause arbitrary code to execute. Remember, for arbitrary code execution to occur there must be a way to cause the target program to write some data to a memory address that can be controlled.

Find the (possible) hand-off(s) to arbitrary code

To find any hand-off locations, I looked for places where memory writes were occurring in the program. I found a few memory writes:

  • 2 calls to memset()
  • 2 calls to memcpy()
  • parse_stream() of class Obj

Unfortunately (from the attacker’s perspective) the calls to memcpy() and memset() looked pretty sane. The parse_stream() function caught my interest, though.

Take a look:

class Obj {
    int parse_stream(int t, char *stream)
      type = t;
      // ... do something with stream here ...
      return 0;

    int length;
    int type;
/* ... */

REMEMBER: In C++, member functions of classes have a sekrit parameter which is a pointer to the object the function is being called on. In the function itself, this parameter is accessed using this. So the line writing to the type variable is actually doing this->type = t; where this is supplied to the function sektrily by the compiler.

This is important because this piece of code could be our hand-off! We need to find a way to control the value of this so we can cause a memory write to a location of our choice.

Controlling this to cause arbitrary code to execute

Take a look at an important piece of code in the challenge:

struct imetad {
  int msg_length;
  int (*callback)(int, struct imetad *);
/* ... */

Nice! The callback field of struct imetad is offset by 4 bytes into the structure. The type field of class Obj is also offset by 4 bytes. See where I’m going?

If we can control the this pointer to point at the struct imetad on the heap when parse_stream is called, it will overwrite the callback pointer. We’ll then be able to set the pointer to any address we want and hand-off execution to arbitrary code!

But how can we manipulate this?

Take a look at this piece of code that calls callback:

o[i].parse_stream(dword, stream_temp);
imd->callback(o[i].type, imd);

Since it is possible to overflow new and allocate fewer objects than safe_count is counting, that means that for some values of i, o[i] will be pointing at data that isn’t actually an Obj object, but just other data on the heap. Infact, when i = 2, o[i] will be pointing at the struct imetad object on the heap. The call to parse_stream will pass in a corrupted this pointer, that points at struct imetad. The write to type will actually overwrite callback since they are both offset equal amounts into their respective structures.

And with that, we’ve successfully exploited the challenge causing arbitrary code to execute.

Let’s now figure out how to beat ASLR!

How to defeat address space layout randomization

I did NOT invent this technique, but I read about it and thought it was cool. You can read a more verbose explanation of this technique here. The idea behind the technique is pretty simple:

  • When you call exec, the PID remains the same, but the image of the process in memory is changed.
  • The kernel uses the PID and the number of jiffies (jiffies is a fine-grained time measurement in the kernel) to pull data from the entropy pool.
  • If you can run a program which records stack, heap, and other addresses and then quickly call exec to start the vulnerable program, you can end up with the same memory layout.

My exploit program is actually a wrapper which records an approximate location of the heap (by just calling malloc()), generates the exploit file, and then executes the challenge binary.

Take a look at the relevant pieces of my exploit to get an idea of how it works:

/* ... */

/* do a malloc to get an idea of where the heap lives */
void *dummy = malloc(10);

/* ... */

unsigned int shell_addr = reinterpret_void_ptr_as_uint(dummy);

 * XXX TODO FIXME - on my platform, execl'ing from here to the challenge binary
 * incurs a constant offset of 0x3160, probably for changes in the environment
 * (libs linked for c++ and whatnot).
shell_addr += 0x3160;

 * a guess as to how far off the heap the shellcode lives.
 * luckily we have a large NOP sled, so we should only fail when we miss
 * the current entropy cycle (see below).
shell_addr += 700;

/* ... build exploit file in memory ... */

/* copy in our best guess as to the address of the shellcode, pray NOPs
 * take care of the rest! */
memcpy(entire_file+88, &shell_addr, sizeof(shell_addr));

/* ... write exploit out to disk ... */

/* launch program with the generated exploit file!
* calling execl here inherits the PID of this process, and IF we get lucky
* ~85%+ of the time, we'll execute before the next entropy cycle and hit
* the shellcode, even with ASLR=2.
execl("./cpp_challenge", "cpp_challenge", "exploit", (char *)0);

My exploit for the C++ challenge

My exploit comes with the following caveats:

  • i386 system
  • The challenge binary is called “cpp_challenge” and lives in the same directory as the exploit binary.
  • The exploit binary can write to the directory and create a file called “exploit” which will be handed off to “cpp_challenge”

Get the full code of my exploit here.


Results on my i386 Ubuntu 8.04 VM running in VMWare fusion, for each level of randomize_va_space:

  • 0 – 100% exploit hit rate
  • 1 – 100% exploit hit rate
  • 2 – ~85% exploit hit rate. Sometimes, my exploit code falls out of the time window and the address map changes before the challenge binary is run

I could probably boost the hit rate for 2 a bit, but then I’d probably re-write the entire exploit in assembly to make it run as fast as possible. I didn’t think there was really a point to going to such an extreme, though. So, an 85% hit rate is good enough.


  1. Security challenges are fun.
  2. More emphasis and more freely available information on secure coding would be very useful.
  3. Like it or not developers need to be security conscious when writing code in C and C++.
  4. As C and C++ change, developers need to carefully consider security implications of new features.

Thanks for reading and don’t forget to subscribe (via RSS or e-mail) and follow me on twitter.


  1. Matasano Security LLC – Chargen – A C++ Challenge []
  2. Integer overflow in the new[] operator []

Written by Joe Damato

October 16th, 2009 at 4:59 am

Extending ltrace to make your Ruby/Python/Perl/PHP apps faster

View Comments

If you enjoy this article, subscribe (via RSS or e-mail) and follow me on twitter.

A few days ago, Aman (@tmm1) was complaining to me about a slow running process:

I want to see what is happening in userland and trace calls to extensions. Why doesn’t ltrace work for Ruby processes? I want to figure out which MySQL queries are causing my app to be slow.

It turns out that ltrace did not have support for libraries loaded with libdl. This is a problem for languages like Ruby, Python, PHP, Perl, and others because in many cases extensions, libraries, and plugins for these languages are loaded by the VM using libdl. This means that ltrace is somewhat useless for tracking down performance issues in dynamic languages.

A couple late nights of hacking and I managed to finagle libdl support in ltrace. Since most people probably don’t care about the technical details of how it was implemented, I’ll start with showing how to use the patch I wrote and what sort of output you can expect. This patch has made tracking down slow queries (among other things) really easy and I hope others will find this useful.

How to use ltrace:

After you’ve applied my patch (below) and rebuilt ltrace, let’s say you’d like to trace MySQL queries and have ltrace tell you when the query was executed and how long it took. There are two steps:

  1. Give ltrace info so it can pretty print – echo “int mysql_real_query(addr,string,ulong);” > custom.conf
  2. Tell ltrace you want to hear about mysql_real_query: ltrace -F custom.conf -ttTgx mysql_real_query -p <pid>

Here’s what those arguments mean:

  • -F use a custom config file when pretty-printing (default: /etc/ltrace.conf, add your stuff there to avoid -F if you wish).
  • -tt print the time (including microseconds) when the call was executed
  • -T time the call and print how long it took
  • -x tells ltrace the name of the function you care about
  • -g avoid placing breakpoints on all library calls except the ones you specify with -x. This is optional, but it makes ltrace produce much less output and is a lot easier to read if you only care about your one function.


Test script

mysql_connect("localhost", "root");
    mysql_query("SELECT sleep(1)");

ltrace output

22:31:50.507523 zend_hash_find(0x025dc3a0, "mysql_query", 12) = 0 <0.000029>
22:31:50.507781 mysql_real_query(0x027bc540, "SELECT sleep(1)", 15) = 0 <1.000600>
22:31:51.508531 zend_hash_find(0x025dc3a0, "mysql_query", 12) = 0 <0.000025>
22:31:51.508675 mysql_real_query(0x027bc540, "SELECT sleep(1)", 15) = 0 <1.000926>

ltrace command

ltrace -ttTg -x zend_hash_find -x mysql_real_query -p [pid of script above]


Test script

import MySQLdb
db = MySQLdb.connect("localhost", "root", "", "test")
cursor = db.cursor()
sql = """SELECT sleep(1)"""
while True:
	data = cursor.fetchone()

ltrace output

22:24:39.104786 PyEval_SaveThread() = 0x21222e0 <0.000029>
22:24:39.105020 PyEval_SaveThread() = 0x21222e0 <0.000024>
22:24:39.105210 PyEval_SaveThread() = 0x21222e0 <0.000024>
22:24:39.105303 mysql_real_query(0x021d01d0, "SELECT sleep(1)", 15) = 0 <1.002083>
22:24:40.107553 PyEval_SaveThread() = 0x21222e0 <0.000026>
22:24:40.107713 PyEval_SaveThread()= 0x21222e0 <0.000024>
22:24:40.107909 PyEval_SaveThread() = 0x21222e0 <0.000025>
22:24:40.108013 mysql_real_query(0x021d01d0, "SELECT sleep(1)", 15) = 0 <1.001821>

ltrace command

ltrace -ttTg -x PyEval_SaveThread -x mysql_real_query -p [pid of script above]


Test script

use DBI;

$dsn = "DBI:mysql:database=test;host=localhost";
$dbh = DBI->connect($dsn, "root", "");
$drh = DBI->install_driver("mysql");
@databases = DBI->data_sources("mysql");
$sth = $dbh->prepare("SELECT SLEEP(1)");

while (1) {

ltrace output

22:42:11.194073 Perl_push_scope(0x01bd3010) =  <0.000028>
22:42:11.194299 mysql_real_query(0x01bfbf40, "SELECT SLEEP(1)", 15) = 0 <1.000876>
22:42:12.195302 Perl_push_scope(0x01bd3010) =  <0.000024>
22:42:12.195408 mysql_real_query(0x01bfbf40, "SELECT SLEEP(1)", 15) = 0 <1.000967>

ltrace command

ltrace -ttTg -x mysql_real_query -x Perl_push_scope -p [pid of script above]


Test script

require 'rubygems'
require 'sequel'

DB = Sequel.connect('mysql://root@localhost/test')

while true
  p DB['select sleep(1)'].select.first

snip of ltrace output

22:10:00.195814 garbage_collect()  = 0 <0.022194>
22:10:00.218438 mysql_real_query(0x02740000, "select sleep(1)", 15) = 0 <1.001100>
22:10:01.219884 garbage_collect() = 0 <0.021401>
22:10:01.241679 mysql_real_query(0x02740000, "select sleep(1)", 15) = 0 <1.000812>

ltrace command used:

ltrace -ttTg -x garbage_collect -x mysql_real_query -p [pid of script above]

Where to get it

How ltrace works normally

ltrace works by setting software breakpoints on entries in a process’ Procedure Linkage Table (PLT).

What is a software breakpoint

A software breakpoint is just a series of bytes (0xcc on the x86 and x86_64) that raise a debug interrupt (interrupt 3 on the x86 and x86_64). When interrupt 3 is raised, the CPU executes a handler installed by the kernel. The kernel then sends a signal to the process that generated the interrupt. (Want to know more about how signals and interrupts work? Check out an earlier blog post: here)

What is a PLT and how does it work?

A PLT is a table of absolute addresses to functions. It is used because the link editor doesn’t know where functions in shared objects will be located. Instead, a table is created so that the program and the dynamic linker can work together to find and execute functions in shared objects. I’ve simplified the explanation a bit1, but at a high level:

  1. Program calls a function in a shared object, the link editor makes sure that the program jumps to a slot in the PLT.
  2. The program sets some data up for the dynamic linker and then hands control over to it.
  3. The dynamic linker looks at the info set up by the program and fills in the absolute address of the function that was called in the PLT.
  4. Then the dynamic linker calls the function.
  5. Subsequent calls to the same function jump to the same slot in the PLT, but every time after the first call the absolute address is already in the PLT (because when the dynamic linker is invoked the first time, it fills in the absolute address in the PLT).

Since all calls to library functions occur via the PLT, ltrace sets breakpoints on each PLT entry in a program.

Why ltrace didn’t work with libdl loaded libraries

Libraries loaded with libdl are loaded at run time and functions (and other symbols) are accessed by querying the dynamic linker (by calling dlsym()). The compiler and link editor don’t know anything about libraries loaded this way (they may not even exist!) and as such no PLT entries are created for them.

Since no PLT entries exist, ltrace can’t trace these functions.

What needed to be done to make ltrace libdl-aware

OK, so we understand the problem. ltrace only sets breakpoints on PLT entries and libdl loaded libraries don’t have PLT entries. How can this be fixed?

Luckily, the dynamic linker and ELF all work together to save your ass.

Executable and Linking Format (ELF) is a file format for executables, shared libraries, and more2. The file format can get a bit complicated, but all you really need to know is: ELF consists of different sections which hold different types of entries. There is a section called .dynamic which has an entry named DT_DEBUG. This entry stores the address of a debugging structure in the address space of the process. In Linux, this struct has type struct r_debug.

How to use struct r_debug to win the game

The debug structure is updated by the dynamic linker at runtime to reflect the current state of shared object loading. The structure contains 3 things that will help us in our quest:

  1. state – the current state of the mapping change taking place (begin add, begin delete, consistent)
  2. brk – the address of a function internal to the dynamic linker that will be called when the linker maps, unmaps, or has completed mapping a shared object.
  3. link map – Pointer to the start of a list of currently loaded objects. This list is called the link map and is represented as a struct link_map in Linux.

Tie it all together and bring it home

To add support for libdl loaded libraries to ltrace, the steps are:

  1. Find the address of the debug structure in the .dynamic section of the program.
  2. Set a software breakpoint on brk.
  3. When the dynamic linker updates the link map, it will trigger the software breakpoint.
  4. When the breakpoint is triggered, check state in the debug structure.
  5. If a new library has been added, walk the link map and figure out what was added.
  6. Search the added library’s symbol table for the symbols we care about.
  7. Set a software breakpoints on whatever is found.
  8. Steps 3-8 repeat.

That isn’t too hard all thanks to the dynamic linker providing a way for us to hook into its internal events.


  • Read the System V ABI for your CPU. It is filled with insanely useful information that can help you be a better programmer.
  • Use the source. A few times while hacking on this patch I looked through the source for GDB and glibc to help me figure out what was going on.
  • Understanding how things work at a low-level can help you build tools to solve your high-level problems.

Thanks for reading and don’t forget to subscribe (via RSS or e-mail) and follow me on twitter.


  1. System V Application Binary Interface AMD64 Architecture Processor Supplement, p 78 []
  2. Executable and Linking Format (ELF) Specification []

Written by Joe Damato

October 8th, 2009 at 4:59 am