time to bleed by Joe Damato

technical ramblings from a wanna-be unix dinosaur

How do debuggers keep track of the threads in your program?

View Comments


If you enjoy this article, subscribe (via RSS or e-mail) and follow me on twitter.

tl;dr

This post describes the relatively undocumented API for debuggers (or other low level programs) that can be used to enumerate the existing threads in a process and receive asynchronous notifications when threads are created or destroyed. This API also provides asynchronous notifications of other interesting thread-related events and feels very similar to the interface exposed by libdl for notifying debuggers when libraries are loaded dynamically at run time.

amd64 and gnu syntax

As usual, everything below refers to amd64 unless otherwise noted. Also, all assembly is in AT&T syntax.

software breakpoints

It’s important to begin first by examining how software breakpoints work. We’ll see shortly why this is important, but for now just trust me.

A debugger sets a software breakpoint by using the ptrace system call to write a special instruction into a target process’ address space. That instruction raises software interrupt #3 which is defined as the Breakpoint Exception in the Intel 64 Architecture Developers Manual.1 When this interrupt is raised, the processor undergoes a privilege level change and calls a function specified by the kernel to handle the exception.

The exception handler in the kernel executes to deliver the SIGTRAP signal to the process. However, if a debugger is attached to a process with ptrace, all signals are first delivered to the debugger. In the case of SIGTRAP, the debugger can examine the list of breakpoints set by the user and take the appropriate action (draw a UI, update the console, or whatever).

The debugger finishes up by masking this signal from the process it is attached to, preventing that process from being killed (most processes will not have a signal handler for SIGTRAP).

In practice most binaries generated by compilers will not have this instruction; it is up to the debugger to write this instruction into the process’ address space during runtime. If you are so inclined, you can raise interrupt #3 via inline assembly or by calling an assembly stub yourself. Many debuggers will catch this signal and trigger an update of some form in the UI.

All that said, this is what the instruction looks like when disassembled:

int 0x03

You may find it useful to check out an earlier and more in-depth article I wrote a while ago about signal handling.

Enumerating threads when first attaching

When a debugger first attaches to a program the program has an unknown number of threads that must be enumerated. glibc exposes a straightforward API for this called td_ta_thr_iter2 found in glibc at nptl_db/td_ta_thr_iter.c. This function takes a callback as one of its arguments. The callback is called once per thread and is passed a handle to an object describing each thread in the process.

We can see the code in GDB3 which uses this API to hand over a callback which will be hit to enumerate the existing threads in a process:

static int
find_new_threads_once (struct thread_db_info *info, int iteration,
      				   td_err_e *errp)
{
  volatile struct gdb_exception except;
  struct callback_data data;
  td_err_e err = TD_ERR;

  data.info = info;
  data.new_threads = 0;

  TRY_CATCH (except, RETURN_MASK_ERROR)
    {
      /* Iterate over all user-space threads to discover new threads.  */
      err = info->td_ta_thr_iter_p (info->thread_agent,
	   			find_new_threads_callback,
	   			&data,
	   			TD_THR_ANY_STATE,
	   			TD_THR_LOWEST_PRIORITY,
	   			TD_SIGNO_MASK,
	   			TD_THR_ANY_USER_FLAGS);
    }
  /* ... */

That’s pretty straightforward, but there are some hairy race conditions, as we can see in this code snippet from thread_db_find_new_threads_2 which calls find_new_threads_once:

if (until_no_new)
  {
    /* Require 4 successive iterations which do not find any new threads.
 	The 4 is a heuristic: there is an inherent race here, and I have
 	seen that 2 iterations in a row are not always sufficient to
 	"capture" all threads.  */
    for (i = 0, loop = 0; loop < 4; ++i, ++loop)
 	if (find_new_threads_once (info, i, NULL) != 0)
 	  /* Found some new threads.  Restart the loop from beginning.»·*/
 	  loop = -1;
  }

It's fiiiiiiiiiinnnneeee.

Now, on to the more interesting interface that is, IMHO, much less straightforward.

Notification of thread create and destroy

A debugger can also gather thread create and destroy events through an interesting asynchronous interface. Let's go step by step and see how a debugger can listen for create and destroy events.

Enable event notification

First, process wide event notification has to be enabled. This API looks very much like some pieces of the signal API. First we have to create a set of events of we care about (from GDB4 ):

static void
enable_thread_event_reporting (void)
{
  td_thr_events_t events;
  td_err_e err;

  /* ... */

  /* Set the process wide mask saying which events we're interested in.  */
  td_event_emptyset (&events);
  td_event_addset (&events, TD_CREATE);

  /* ... */

  td_event_addset (&events, TD_DEATH);
  
  /* NB: the following is just a pointer to the function td_ta_set_event on linux */
  err = info->td_ta_set_event_p (info->thread_agent, &events);

The above code adds TD_CREATE and TD_DEATH to the (empty) set of events that GDB wants to get notifications about. Then the event mask is handed over to glibc with a call to the function td_ta_set_event, which just happens to be stored in a function pointer named td_ta_set_event_p in GDB.

Set asynchronous notification breakpoints

The next step is interesting.

The debugger must use an API to get the addresses of a functions that will be called whenever a thread is created or destroyed. The debugger will then set a software breakpoint at those addresses. When the program creates a thread or a thread is killed the breakpoint will be triggered and the debugger can walk the thread list and update its internal state that describes the threads in the process.

This API is td_ta_event_addr. Let's check out how GDB uses this API. This code is from the same function as above, but happens after the code shown above:

static void
enable_thread_event_reporting (void)
{

	/* ... code above here ... */

	/* Delete previous thread event breakpoints, if any.  */
	remove_thread_event_breakpoints ();
	info->td_create_bp_addr = 0;
	info->td_death_bp_addr = 0;
	
	/* Set up the thread creation event.  */
	err = enable_thread_event (TD_CREATE, &info->td_create_bp_addr);
	
	/* ... */

	/* Set up the thread death event.  */
	err = enable_thread_event (TD_DEATH, &info->td_death_bp_addr);

GDB's helper function enable_thread_event is pretty straightforward:

static td_err_e
enable_thread_event (int event, CORE_ADDR *bp)
{
  td_notify_t notify;
  td_err_e err;
  struct thread_db_info *info;

  info = get_thread_db_info (GET_PID (inferior_ptid));

  /* Access an lwp we know is stopped.  */
  info->proc_handle.ptid = inferior_ptid;

  /* Get the breakpoint address for thread EVENT.  */
  err = info->td_ta_event_addr_p (info->thread_agent, event, &notify);
  /* ... */

  /* Set up the breakpoint.  */
  gdb_assert (exec_bfd);
  (*bp) = (gdbarch_convert_from_func_ptr_addr
		  (target_gdbarch,
		   /* Do proper sign extension for the target.  */
		   (bfd_get_sign_extend_vma (exec_bfd) > 0
		    ? (CORE_ADDR) (intptr_t) notify.u.bptaddr
		    : (CORE_ADDR) (uintptr_t) notify.u.bptaddr),
		   &current_target));

  create_thread_event_breakpoint (target_gdbarch, *bp);

  return TD_OK;
}

So, GDB stores the addresses of the functions that get called on TD_CREATE and TD_DEATH in td_create_bp_addr and td_death_bp_addr, respectively and sets breakpoints on these addresses in enable_thread_event.

Check if the event has been triggered and drain the event queue

Next time a thread is stopped because a breakpoint has been hit, the debugger needs to check if the breakpoint occurred on an address that is associated with the registered events. If so, the thread event queue needs to be drained with a call to td_ta_event_getmsg and the thread's information can be retrieved with a call to td_thr_get_info .

GDB does all this in a function called check_event:

/* Check if PID is currently stopped at the location of a thread event
   breakpoint location.  If it is, read the event message and act upon
   the event.  */

static void
check_event (ptid_t ptid)
{
  /* ... */
  td_event_msg_t msg;
  td_thrinfo_t ti;
  td_err_e err;
  CORE_ADDR stop_pc;
  int loop = 0;
  struct thread_db_info *info;

  info = get_thread_db_info (GET_PID (ptid));

  /* Bail out early if we're not at a thread event breakpoint.  */
  stop_pc =  /* ... */
  if (stop_pc != info->td_create_bp_addr
      && stop_pc != info->td_death_bp_addr)
    return;

  /* Access an lwp we know is stopped.  */
  info->proc_handle.ptid = ptid;

  /* ... */

  /* If we are at a create breakpoint, we do not know what new lwp
     was created and cannot specifically locate the event message for it.
     We have to call td_ta_event_getmsg() to get
     the latest message.  Since we have no way of correlating whether
     the event message we get back corresponds to our breakpoint, we must
     loop and read all event messages, processing them appropriately.
     This guarantees we will process the correct message before continuing
     from the breakpoint.

     Currently, death events are not enabled.  If they are enabled,
     the death event can use the td_thr_event_getmsg() interface to
     get the message specifically for that lwp and avoid looping
     below.  */

  loop = 1;

  do
    {
      err = info->td_ta_event_getmsg_p (info->thread_agent, &msg);
	  /* ... */
	
      err = info->td_thr_get_info_p (msg.th_p, &ti);
	  /* ... */

      ptid = ptid_build (GET_PID (ptid), ti.ti_lid, 0);

      switch (msg.event)
		{
		case TD_CREATE:
		  /* Call attach_thread whether or not we already know about a
		     thread with this thread ID.  */
		  attach_thread (ptid, msg.th_p, &ti);
		
		  break;
		
		case TD_DEATH:
		
		  if (!in_thread_list (ptid))
		    error (_("Spurious thread death event."));
		
		  detach_thread (ptid);
		
		  break;
		
		default:
		  error (_("Spurious thread event."));
		}
    }
  while (loop);
}

And that is how GDB finds out about existing threads and gets notified about new threads being created or existing threads dying. This asynchronous breakpoint interface is very similar to the interface exposed by libdl that I described briefly toward the end of a blog post I wrote a while ago.

Notifications for other interesting events

Other interesting events are supported by the API but are currently not implemented in glibc, but a motivated programmer could build a shim which implements these events. Doing so would allow you to build some very interesting visualization applications for lock contention and scheduling:

/* Events reportable by the thread implementation.  */
typedef enum
{
  TD_ALL_EVENTS,			/* Pseudo-event number.  */
  TD_EVENT_NONE = TD_ALL_EVENTS, 	/* Depends on context.  */
  TD_READY,				/* Is executable now. */
  TD_SLEEP,				/* Blocked in a synchronization obj.  */
  TD_SWITCHTO,				/* Now assigned to a process.  */
  TD_SWITCHFROM,			/* Not anymore assigned to a process.  */
  TD_LOCK_TRY,				/* Trying to get an unavailable lock.  */
  TD_CATCHSIG,				/* Signal posted to the thread.  */
  TD_IDLE,				/* Process getting idle.  */
  TD_CREATE,				/* New thread created.  */
  TD_DEATH,				/* Thread terminated.  */
  TD_PREEMPT,				/* Preempted.  */
  TD_PRI_INHERIT,			/* Inherited elevated priority.  */
  TD_REAP,				/* Reaped.  */
  TD_CONCURRENCY,			/* Number of processes changing.  */
  TD_TIMEOUT,				/* Conditional variable wait timed out.  */
  TD_MIN_EVENT_NUM = TD_READY,
  TD_MAX_EVENT_NUM = TD_TIMEOUT,
  TD_EVENTS_ENABLE = 31		/* Event reporting enabled.  */
} td_event_e;

Take my shovel and flashlight and go look around

Check the reference section below which has links to some of the source file mentioned above. Also, be sure to check out the header file:

/usr/include/thread_db.h

That header lists the exported functions from glibc as well as the various flags and types necessary for interacting with this interface.

Conclusion

  • Debuggers have really interesting ways of interacting with lower level system libraries.
  • Comments found tucked away in these pits of despair are pretty amazing.
  • Don't be scared. Grab a shovel and see what other interesting things you can dig up in glibc or elsewhere.

If you enjoyed this article, subscribe (via RSS or e-mail) and follow me on twitter.

References

  1. Intel 64 Architecture Developers Manual Volume 3A 6-31 []
  2. glibc/nptl_db/td_ta_thr_iter.c []
  3. gdb/linux-thread-db.c []
  4. gdb/linux-thread-db.c []

Written by Joe Damato

July 2nd, 2012 at 7:30 am