time to bleed by Joe Damato

technical ramblings from a wanna-be unix dinosaur

Archive for the ‘systems’ tag

WARNING: American Express fails miserably at basic security.

View Comments


If you enjoy this article, subscribe (via RSS or e-mail) and follow me on twitter.

As of 3:35pm PST on 5/25/2010 it seems to be fixed. wireshark shows only TLS traffic now, nothing in the clear. Pretty quick fix, since this was published at 11:54am. Good deal.

This article is going to reveal a pretty serious error in a web form on the American Express Network website. I would strongly recommend NOT filling out the web form described below.

Daily Wish from the American Express Network

Daily wish from the American Express Network sent me an email this morning trying to get me to sign up for their deal of the day service where they offer a very limited quantity of products for a low price.

Sounds simple enough, right?

Well, the time of the sale is not released until the day the sale occurs, unless you are an American Express cardholder. If you are a card holder, you get a special landing page on their website telling you that if you sign up, you can get the sale times before the sale date.

The white arrow below points to the tab that only appears if you clicked through from an email from American Express. The red arrow below points to the sign up button. Take a look:

Sign up page

After clicking the sign up button (red arrow above), a lightbox appears asking for:

  • First and last name
  • American Express credit card number
  • Security code
  • Expiration date
  • Billing zip

Quite a bit of personal information, much of it sensitive. [sarcarsm]Don’t worry the page is secure[/sarcasm], see the form and the white arrow below:

The code from the form

This form looked very suspicious to me, so I decided to take a look at the code to see if the action for this sign up form was over HTTPS. Check it:

<form name="form1" method="post" action="preid2.aspx?ct=7" onsubmit="javascript:return WebForm_OnSubmit();" id="form1">

So the action is to a handler at http://dailywish.amexnetwork.com/preid2.aspx?ct=7. The lack of https doesn’t make me feel very good.

Maybe the WebForm_OnSubmit() function is doing something that might make this secure? Let’s take a look:

<script type="text/javascript"> 
//<![CDATA[
function WebForm_OnSubmit() {
if (typeof(ValidatorOnSubmit) == "function" && ValidatorOnSubmit() == false) return false;
return true;
}
//]]>
</script>


So it looks like that function is just a validator. It is really starting to feel like this form is insecure.

Let’s bring out wireshark and see what it has to say.

Wireshark packet sniff

So I filled out the form with fake information and sniffed the POST to the server.

The Daily Wish sign up form from the American Express Network is sending credit card numbers, expiration dates, and all the other personal information on the sign up form in the clear back to their server.

Holy. Fuck.

Conclusion

  • Do NOT fill out the form until American Express fixes this issue.

Thanks for reading and don’t forget to subscribe (via RSS or e-mail) and follow me on twitter.

Written by Joe Damato

May 25th, 2010 at 11:54 am

Posted in security

Tagged with , ,

Dynamic Linking: ELF vs. Mach-O

View Comments


If you enjoy this article, subscribe (via RSS or e-mail) and follow me on twitter.

The intention of this post is to highlight some of the similarities and differences between ELF and Mach-O dynamic linking that I encountered while building memprof.

I hope to write more posts about similarities and differences in other aspects of Mach-O and ELF that I stumbled across to shed some light on what goes on down there and provide (in some cases) the only documentation.

Procedure Linkage Table

The procedure linkage table (PLT) is used to determine the absolute address of a function at runtime. Both Mach-O and ELF objects have PLTs that are generated at compile time. The initial table simply invokes the dynamic linker which finds the symbol you want. The way this works is very similar at a high level in ELF and Mach-O, but there are some implementation differences that I thought were worth mentioning.

Mach-O PLT arrangement

Mach-O objects have several different sections across different segments that are all involved to create a PLT entry for a specific symbol.

Consider the following assembly stub which calls out to the PLT entry for malloc:

# MACH-O calling a PLT entry (ELF is nearly identical)
0x000000010008c504 [str_new+52]:	callq  0x10009ebbc [dyld_stub_malloc]

The dyld_stub prefix is added by GDB to let the user know that the callq instruction is calling a PLT entry and not malloc itself. The address 0x10009ebbc is the first instruction of malloc‘s PLT entry in this Mach-O object. In Mach-O terminology, the instruction at 0x10009ebbc is called a symbol stub. Symbol stubs in Mach-O objects are found in the __TEXT segment in the __symbol_stub1 section.

Let’s examine some instructions at the symbol stub address above:

# MACH-O "symbol stubs" for malloc and other functions
0x10009ebbc [dyld_stub_malloc]:	  jmpq   *0x3ae46(%rip)        # 0x1000d9a08
0x10009ebc2 [dyld_stub_realloc]:  jmpq   *0x3ae48(%rip)        # 0x1000d9a10
0x10009ebc8 [dyld_stub_seekdir$INODE64]:	jmpq   *0x3ae4c(%rip)  # 0x1000d9a20
. . . .

Each Mach-O symbol stub is just a single jmpq instruction. That jmpq instruction either:

  • Invokes the dynamic linker to find the symbol and transfer execution there
  • OR

  • Transfers execution directly to the function.

via an entry in a table.

In the example above, GDB is telling us that the address of the table entry for malloc is 0x1000d9a08. This table entry is stored in a section called the __la_symbol_ptr within the __DATA segment.

Before malloc has been resolved, the address in that table entry points to a helper function which (eventually) invokes the dynamic linker to find malloc and fill in its address in the table entry.

Let’s take a look at what a few entries of the helper functions look like:

# MACH-O stub helpers
0x1000a08d4 [stub helpers+6986]:	pushq  $0x3b73
0x1000a08d9 [stub helpers+6991]:	jmpq   0x10009ed8a [stub helpers]
0x1000a08de [stub helpers+6996]:	pushq  $0x3b88
0x1000a08e3 [stub helpers+7001]:	jmpq   0x10009ed8a [stub helpers]
0x1000a08e8 [stub helpers+7006]:	pushq  $0x3b9e
0x1000a08ed [stub helpers+7011]:	jmpq   0x10009ed8a [stub helpers]
. . . . 

Each symbol that has a PLT entry has 2 instructions above; a pair of pushq and jmpq. This instruction sequence sets an ID for the desired function and then invokes the dynamic linker. The dynamic linker looks up this ID so it knows which function it should be looking for.

ELF PLT arrangement

ELF objects have the same mechanism, but organize each PLT entry into chunks instead of splicing them out across different sections. Let’s take a look at a PLT entry for malloc in an ELF object:

# ELF complete PLT entry for malloc
0x40f3d0 [malloc@plt]:	jmpq   *0x2c91fa(%rip)        # 0x6d85d0
0x40f3d6 [malloc@plt+6]:	pushq  $0x2f
0x40f3db [malloc@plt+11]:	jmpq   0x40f0d0
. . . .

Much like a Mach-O object, an ELF object uses a table entry to direct the flow of execution to either invoke the dynamic linker or transfer directly to the desired function if it has already been resolved.

Two differences to point out here:

  1. ELF puts the entire PLT entry together in nicely named section called plt instead of splicing it out across multiple sections.
  2. The table entries indirected through with the initial jmpq instruction are stored in a section named: .got.plt.

Both invoke an assembly trampoline…

Both Mach-O and ELF objects are set up to invoke the runtime dynamic linker. Both need an assembly trampoline to bridge the gap between the application and the linker. On 64bit Intel based systems, linkers in both systems must comply to the same Application Binary Interace (ABI).

Strangely enough, the two linkers have slightly different assembly trampolines even though they share the same calling convention1 2.

Both trampolines ensure that the program stack is 16-byte aligned to comply with the amd64 ABI’s calling convention. Both trampolines also take care to save the “general purpose” caller-saved registers prior to invoking the dynamic link, but it turns out that the trampoline in Linux does not save or restore the SSE registers. It turns out that this “shouldn’t” matter, so long as glibc takes care not to use any of those registers in the dynamic linker. OSX takes a more conservative approach and saves and restores the SSE registers before and after calling out the dynamic linker.

I’ve included a snippet from the two trampolines below and some comments so you can see the differences up close.

Different trampolines for the same ABI

The OSX trampoline:

dyld_stub_binder:
  pushq   %rbp
  movq    %rsp,%rbp
  subq    $STACK_SIZE,%rsp  # at this point stack is 16-byte aligned because two meta-parameters where pushed
  movq    %rdi,RDI_SAVE(%rsp) # save registers that might be used as parameters
  movq    %rsi,RSI_SAVE(%rsp)
  movq    %rdx,RDX_SAVE(%rsp)
  movq    %rcx,RCX_SAVE(%rsp)
  movq    %r8,R8_SAVE(%rsp)
  movq    %r9,R9_SAVE(%rsp)
  movq    %rax,RAX_SAVE(%rsp)
  movdqa    %xmm0,XMMM0_SAVE(%rsp)
  movdqa    %xmm1,XMMM1_SAVE(%rsp)
  movdqa    %xmm2,XMMM2_SAVE(%rsp)
  movdqa    %xmm3,XMMM3_SAVE(%rsp)
  movdqa    %xmm4,XMMM4_SAVE(%rsp)
  movdqa    %xmm5,XMMM5_SAVE(%rsp)
  movdqa    %xmm6,XMMM6_SAVE(%rsp)
  movdqa    %xmm7,XMMM7_SAVE(%rsp)
  movq    MH_PARAM_BP(%rbp),%rdi  # call fastBindLazySymbol(loadercache, lazyinfo)
  movq    LP_PARAM_BP(%rbp),%rsi
  call    __Z21_dyld_fast_stub_entryPvl

The OSX trampoline saves all the caller saved registers as well as the the %xmm0 - %xmm7 registers prior to invoking the dynamic linker with that last call instruction. These registers are all restored after the call instruction, but I left that out for the sake of brevity.

The Linux trampoline:

  subq $56,%rsp 
  cfi_adjust_cfa_offset(72) # Incorporate PLT
  movq %rax,(%rsp)  # Preserve registers otherwise clobbered.
  movq %rcx, 8(%rsp)
  movq %rdx, 16(%rsp)
  movq %rsi, 24(%rsp)
  movq %rdi, 32(%rsp)
  movq %r8, 40(%rsp)
  movq %r9, 48(%rsp)
  movq 64(%rsp), %rsi # Copy args pushed by PLT in register.
  movq %rsi, %r11   # Multiply by 24
  addq %r11, %rsi
  addq %r11, %rsi
  shlq $3, %rsi
  movq 56(%rsp), %rdi # %rdi: link_map, %rsi: reloc_offset
  call _dl_fixup    # Call resolver.

The Linux trampoline doesn’t touch the SSE registers because it assumes that the dynamic linker will not modify them thus avoiding a save and restore.

Conclusion

  • Tracing program execution from call site to the dynamic linker is pretty interesting and there is a lot to learn along the way.
  • glibc not saving and restoring %xmm0-%xmm7 kind of scares me, but there is a unit test included that disassembles the built ld.so searching it to make sure that those registers are never touched. It is still a bit frightening.
  • Stay tuned for more posts explaining other interesting similarities and differences between Mach-O and ELF coming soon.

Thanks for reading and don’t forget to subscribe (via RSS or e-mail) and follow me on twitter.

References

  1. http://developer.apple.com/mac/library/documentation/DeveloperTools/Conceptual/LowLevelABI/140-x86-64_Function_Calling_Conventions/x86_64.html#//apple_ref/doc/uid/TP40005035-SW1 []
  2. http://www.x86-64.org/documentation/abi.pdf []

Written by Joe Damato

May 12th, 2010 at 7:00 am

Descent into Darkness: Understanding your system’s binary interface is the only way out

View Comments

Written by Joe Damato

March 15th, 2010 at 12:11 pm

EventMachine: scalable non-blocking i/o in ruby

View Comments

Written by Aman Gupta

March 12th, 2010 at 1:07 pm

String together global offset tables to build a Ruby memory profiler

View Comments


If you enjoy this article, subscribe (via RSS or e-mail) and follow me on twitter.

Disclaimer

The tricks, techniques, and ugly hacks in this article are PLATFORM SPECIFIC, DANGEROUS, and NOT PORTABLE.

This is the third article in a series of articles describing a set of low level hacks that I used to create memprof a Ruby level memory profiler. You should be able to survive without reading the other articles in this series, but you can check them out here and here.

How is this different from the other hooking articles/techniques?

The previous articles explained how to insert trampolines in the .text segment of a binary. This article explains a cool technique for hooking functions in the .text segment of shared libraries, allowing your handler to run, and then resuming execution. Hooking shared libraries turns out to be less work than hooking the binary (in the case of Ruby, that is), but making it all happen was a bit tricky. Read on to learn more.

The “problem” with shared libraries

The problem is that if a trampoline is inserted into the code of the shared library, the trampoline will need to invoke the dynamic linker to resolve the function that is being hooked, call the function, do whatever additional logic is desired, and then resume execution.

In other words you need to (somehow) insert a trampoline for a function that will call the function being trampolined without ending up in an infinite loop.

The additional complexity occurs because when shared libraries are loaded, the kernel decides at runtime where exactly in memory the library should be loaded. Since the exact location of symbols is not known at link time, a procedure linkage table (.plt) is created so that the program and the dynamic linker can work together to resolve symbol addresses.

I explained how .plts work in a previous article, but looking at this again is worthwhile. I’ve simplified the explanation a bit1, but at a high level:

  1. Program calls a function in a shared object, the link editor makes sure that the program jumps to a stub function in the .plt
  2. The program sets some data up for the dynamic linker and then hands control over to it.
  3. The dynamic linker looks at the info set up by the program and fills in the absolute address of the function that was called in the .plt in the global offset table (.got).
  4. Then the dynamic linker calls the function.
  5. Subsequent calls to the same function jump to the same stub in the .plt, but every time after the first call the absolute address is already in the .got (because when the dynamic linker is invoked the first time, it fills in the absolute address in the .got).

Disassembling a short Ruby VM function that calls rb_newobj (a memory allocation routine that we’d like to hook), shows the calls to the .plt:

000000000001af10 :
   . . . . 
   1af14:       e8 e7 c6 ff ff          callq  17600 [rb_newobj@plt]
   . . . . 

Let’s take a look at the corresponding .plt stub:

0000000000017600 :
   17600:       ff 25 6a 9c 2c 00       jmpq   *0x2c9c6a(%rip) # 2e1270 [_GLOBAL_OFFSET_TABLE_+0x288]
   17606:       68 4e 00 00 00          pushq  $0x4e
   1760b:       e9 00 fb ff ff          jmpq   17110 <_init+0x18>

Important fact: The program and each shared library has its own .plt and .got sections (amongst other sections). Keep this in mind as it’ll be handy very shortly.

That is a lot of stub code to reproduce in the trampoline. Reproducing that stuff in the trampoline shouldn’t be hard, but invites a large number of bugs over to play. Is there a better way?

What is a global offset table (.got)?

The global offset table (.got) is a table of absolute addresses that can be filled in at runtime. In the assembly dump above, the .got entry for rb_newobj is referenced in the .plt stub code.

Intercepting a function call

It would be awesome if it were possible to overwrite the .got entry for rb_newobj and insert the address of a trampoline. But how would the intercepting function call rb_newobj itself without ending up in an infinite loop?

The important fact above comes in to save the day.

Since each shared object has its own .plt and .got sections, it is possible to overwrite the .got entry for rb_newobj in every shared object except for the object where the trampoline lives. Then, when rb_newobj is called, the .plt entry will redirect execution to the trampoline. The trampoline then calls out to its .plt entry for rb_newobj which is left untouched allowing rb_newobj to be resolved and called out to successfully.

Not as easy as it sounds, though

This solution is less work than the other hooking methods, but it has its own particular details as well:

  1. You’ll need to walk the link map at runtime to determine the base address for the shared library you are hooking (it could be anywhere).
  2. Next, you’ll need to parse the .rela.plt section which contains information on the location of each .plt stub, relative to the base address of the shared object.
  3. Once you have the address of the .plt stub, you’ll need to determine the absolute address of the .got entry by parsing the first instruction of the .plt stub (a jmp) as seen in the disassembly above.
  4. Finally, you can write to the .got entry the address of your trampoline, as long as the trampoline lives in a different shared library.

You’ve now successfully managed to poison the .got entry of a symbol in one shared library to direct execution to your own function which can then call the intercepted function itself without getting stuck in an infinite loop.

Conclusion

  • There are lots of sections in each ELF object. Each section is special and important.
  • ELF documentation can be difficult to obtain and understand.
  • Got pretty lucky this time around. I was getting a little worried that it would get complicated. Made it out alive, though.

Thanks for reading and don’t forget to subscribe (via RSS or e-mail) and follow me on twitter.

References

  1. System V Application Binary Interface AMD64 Architecture Processor Supplement, p 78 []

Written by Joe Damato

January 25th, 2010 at 5:59 am