==Phrack Inc.== Volume 0x0d, Issue 0x42, Phile #0x0E of 0x11 |=-----------------------------------------------------------------------=| |=-----------------=[ manual binary mangling with radare ]=--------------=| |=-----------------------------------------------------------------------=| |=--------------------=[ by pancake <at> nopcode.org> ]=-----------------=| |=-----------------------------------------------------------------------=| 1 - Introduction 1.1 - The framework 1.2 - First steps 1.3 - Base conversions 1.4 - The target 2 - Injecting code in ELF 2.1 - Resolving register based branches 2.2 - Resizing data section 2.3 - Basics on code injection 2.4 - Mmap trampoline 2.4.1 - Call trampoline 2.4.2 - Extending trampolines 3 - Protections and manipulations 3.1 - Trashing the ELF header 3.2 - Source level watermarks 3.3 - Ciphering .data section 3.4 - Finding differences in binaries 3.5 - Removing library dependencies 3.6 - Syscall obfuscation 3.7 - Replacing library symbols 3.8 - Checksumming 4 - Playing with code references 4.1 - Finding xrefs 4.2 - Blind code references 4.3 - Graphing xrefs 4.4 - Randomizing xrefs 5 - Conclusion 6 - Future work 7 - References 8 - Greetings --[ 1 - Introduction Reverse engineering is something usually related to w32 environments where there is lot of non-free software and where the use of protections is more extended to enforce evaluation time periods or protect intellectual (?) property, using binary packing and code obfuscation techniques. These kind of protections are also used by viruses and worms to evade anti-virus engines in order to detect sandboxes. This makes reverse engineering a double-edged sword. The way to do low level analysis does usually involve the development and use of a lot of small specific utilities to gather information about the contents of firmware or a disk image to carve for keywords and dump its blocks. Actually we have a wide variety of software and hardware platforms and reverse engineering tools should reflect this. Obviously, reverse engineering, cracking and such are not only related to legal tasks. Crackmes, hacking challenges and learning for the heck of it are also good reasons to play with binaries. For us, there is a simple reason behind them all: fun. --[ 1.1 - The framework At this point, we can imagine a framework that combines some of the basics of *nix, offering a set of actions over an abstracted IO layer. Following these premises I started to write a block based hexadecimal editor. It allows to seamlessly open a disc device, file, socket, serial or process. You can then both automatize processing actions by scripts and perform manual interactive analysis. The framework is composed of various small utilities that can be easily integrated between them by using pipes, files or an API: radare: the entrypoint for everything :) rahash: block based hashing utility radiff: multiple binary diffing algorithms rabin: extract information from binaries (ELF,MZ,CLASS,MACH0..) rasc: shellcode construction helper rasm: commandline assembler/disassembler rax: inline multiple base converter xrefs: blind search for relative code references The magic of radare resides in the ortogonality of commands and metadata which makes easy to implement automatizations. Radare comes with a native debugger that works on many os/arches and offers many interesting commands to ease the analysis of binaries without having the source. Another interesting feature is supporting high level scripting languages like python, ruby or lua. Thanks to a remote protocol that abstracts the IO, it is allowed the use of immunity debugger, IDA, gdb, bochs, vmware and many other debugging backends only writing a few lines of script or even using the remote GDB protocol. In this article I will introduce various topics. It is impossible to explain everything in a single article. If you want to learn more about it I recommend you to pull the latest hg tip and read the online documentation at: http://www.radare.org [0] I have also written a book [1] in pdf and html available at: http://www.radare.org/get/radare.pdf --[ 1.2 - First steps To ease the reading of the article I will first introduce the use of radare with some of its basics. Before running radare I recommend to setup the ~/.radarerc with this: e scr.color=1 ; enable color screen e file.id=1 ; identify files when opened e file.flag=1 ; automatically add bookmarks (flags) to syms, strings.. e file.analyze=1 ; perform basic code analysis Here's a list of points that will help us to better understand how radare commands are built. - each command is identified by a single letter - subcommands are just concatenated characters - e command stands for eval and is used to define configuration variables - comments are defined with a ";" char This command syntax offers a flexible input CLI that allows to perform temporal seeks, repeat commands multiple times, iterate over file flags, use real system pipes and much more. Here are some examples of commands: pdf @@ sym. ; run "pdf" (print disassembled function) over all ; flags matching "sym." wx cc @@.file ; write 0xCC at every offset specified in file . script ; interpret file as radare commands b 128 ; set block size to 128 bytes b 1M ; any math expression is valid here (hex, dec, flags, ..) x @ eip ; dump "block size" bytes (see b command) at eip pd | grep call ; disassemble blocksize bytes and pipe to system's grep pd~call ; same as previous but using internal grep pd~call[0] ; get column 0 (offset) after grepping calls in disasm 3!step ; run 3 steps (system is handled by the wrapped IO) !!ls ; run system command f* > flags.txt ; dump all flags as radare commands into a file --[ 1.3 - Base conversions rax is an utility that converts values between bases. Given an hexadecimal value it returns the decimal conversion and viceversa. It is also possible to convert raw streams or hexpair strings into printable C-like strings using the -s parameter: $ rax -h Usage: rax [-] | [-s] [-e] [int|0x|Fx|.f|.o] [...] int -> hex ; rax 10 hex -> int ; rax 0xa -int -> hex ; rax -77 -hex -> int ; rax 0xffffffb3 float -> hex ; rax 3.33f hex -> float ; rax Fx40551ed8 oct -> hex ; rax 035 hex -> oct ; rax Ox12 (O is a letter) bin -> hex ; rax 1100011b hex -> bin ; rax Bx63 -e swap endianness ; rax -e 0x33 -s swap hex to bin ; rax -s 43 4a 50 - read data from stdin until eof With it we can convert a raw file into a list of hexpairs: $ cat /bin/true | rax - There is an inverse operation for every input format, so we can do things like this: $ cat /bin/true | rax - | rax -s - > /tmp/true $ md5sum /bin/true /tmp/true 20c1598b2f8a9dc44c07de2f21bcc5a6 /bin/true 20c1598b2f8a9dc44c07de2f21bcc5a6 /tmp/true rax(1) can be also used in interactive mode by reading lines from stdin. --[ 1.4 - The target This article aims to explain some practical cases where an scriptable hexadecimal editor can trivially solve and provide new perspectives to solve reverse engineering quests. We will take a x86 [2] GNU/Linux ELF [3] binary as a main target. Radare supports many architectures (ppc, arm, mips, java, msil..) and operating systems (osx, gnu/ linux, solaris, w32, wine, bsd), You should take in mind that commands and debugger features explained in this article can be applied to any other platform than gnu/linux-x86. In this article we will focus on the most widely-known platform (gnu/linux-x86) to ease the reading. Our victim will suffer various manipulation stages to make reverse engineering harder keeping functionality intact. The assembly snippets are written in AT&T syntax [4], which I think it's more coherent than Intel so you will have to use GAS, which is a nice multi architecture assembler. Radare2 has a full API with pluggable backends to assemble and disassemble for many architectures supporting labels and some basic assembler directives. This is done in 'rasm2' which is a reimplementation of rasm using libr. However, this article covers the more stable radare1 and will only use the 'rasm' command to assemble simple instructions instead of complete snippets. The article exposes multiple situations where low level analysis and metadata processing will help us to transform part of the program or simply obtain information from a binary blob. Usually the best program to start learning radare is the GNU 'true', which can be completely replaced with two asm instructions, leaving the rest of the code to play with. However it is not a valid target for this article, because we are interested in looking for complex constructions to hide control flow with call trampolines, checksumming artifacts, ciphered code, relocations, syscall obfuscation, etc. --[ 2 - Injecting code in ELF The first question that pops up in our minds is: How the hell we will add more code in an already compiled program? What real packers do is to load the entire program structures in memory for in-memory manipulation, handling all the code relocations and generating a completely brand new executable. Program transformation is not an easy task. It requires a complete analysis which sometimes is impossible to do automatically, due to the need to mix the results of static, dynamic and manual code analysis. Instead of this approach, we will do it in a bottom to top way, taking low level code structures as units and doing transformations without the need to understand the whole program. This enables ensuring that transformations won't break the program in any way because their local and internal dependencies will be easy identified. Because the target program is generated by a compiler we can make some assumptions. For instance, there will be a place to inject code, the functions will be sequentially defined, access to variables will be easier to track and calling conventions will be the same along all the program. Our first action will be to find and identify all the parts of the text section with unused or noppable code. 'noppable' code is defined as code that is never executed, or that can be replaced by a smaller implementation, taking benefit of the non-overwritten bytes. Compiler-related stubs can be sometimes noppable or replaced, and more of this is found in non-C native-compiled programs (haskell, C++, ...) The spaces between functions where no code is executed are called 'function paddings'. They exist because the compiler tried to optimize the position of the functions at aligned addresses to ease to job on the cpu. Some other compiler-related optimizations will inline the same piece of code multiple times. By identifying such patterns, we can patch the spaghetti code into a single loop with an end branch and use the rest for our own purposes. Following paragraphs will verbosely explain those situations. Never executed code: We can find this code by doing many execution traces of the program and finding the uncovered ranges. This option will probably require human interaction. Using static code analysis we would not be able to follow runtime pointer calculations or call tables. Therefore, this method will also require some human interaction and, in the event we get to identify a pattern, we can write a script to automatize this task and add N code xrefs from a single source address. We can also emulate some parts to resolve register values before reaching a register based call/branch instruction. Inlined code and unrolled loops: Programmers and compilers usually prefer to inline (duplicate the code of a external function in the middle of another one) to reduce the execution cost of a 'call' instruction. Loops are also unrolled and this produces longer, but more optimal code because the cpu doesn't need to compute unnecessary branches all the time. These are common speedup practices, and they are a good target to place our code. This process is done in four steps: - Find/identify inline code or unrolled loops (repeated similar blocks) - Patch to uninline or re-roll the loop - ... - Profit! Radare provides different commands to help to find/identify such kind of code blocks. The related commands are not going to do all the job automatically. They require some automatization, scripting or manual interaction. The basic approach to identify such kind of code is by finding repeated sequences of bytes allowing a percentage of similarity. Most inline code will come from small functions and unrolled loops will be probably small too. There are two basic search commands that can help on this: [0xB7F13810]> /? ... /p len ; search pattern of length 'len' /P count ; search patterns of size $$b matching at >N bytes of curblock ... The first command (/p) will find repeated bytes of a specified length. The other one (/P) will find a block that matches at least N bytes of the current blocksize compared to the current one. These pattern search must be restricted to certain ranges, like the .text section or the function boundaries: The search should be restricted to the text section: > e search.from = section._text > e search.to = section._text_end All search commands (/) can be configured by the 'search.' eval variables. The 'section.' flag names are defined by 'rabin'. Calling rabin with the -r flag will dump the binary information to stdout as radare commands. This is done at radare startup when file.id and file.flag are enabled, but this information can be reimported from the shell by calling: > .!!rabin -rS ${FILE} This command can be understood as a dot command '.', which interprets the output of the following command '!' as radare commands. In the example below there are two admiration marks (!!). This is because the single '!' will run the command through the io subsystem of radare, falling in the debugger layer if trying to run a program in the shell which has a name that equals a debugger command's. The double '!!' is used to directly escape from the io subsystem and run the program directly in the system. This command can be used to externalize some scripting facilities by running other programs, processing data, and feeding radare's core back with dynamically generated commands. Once all the inlined code is identified, search hits should be analyzed, discarding false positives and making all those blocks work as subroutines using a call/ret pattern, nopping the rest of the bytes. Unused bytes can now be filled with any code. Another way to identify inlined code can be done by splitting a function code analysis into basic blocks without splitting nodes (e graph.split=false) and finding similar basic blocks or repeated. Unrolled loops are based on repeated sequences of code with small variations based on the iterator variable which must change along the repeated blocks in a progressive way (incremental, decremental, etc). Language bindings are recommended for implementing such kind of tasks, not only because the higher language representation, but also thanks to the high level APIs for code analysis that are distributed with radare for python and other scripting languages (perl, ruby, lua). The '/' command is used to perform searches. A radare command can be defined to be executed every time a hit is found, or we can later use the '@@' foreach iterator can be used over all the flags prefixed with 'hit0' later. See '/?' for help on search-related commands but, in short, there are commands to search in plain text, hexadecimal, apply binary masks to keywords, launch multiple keywords at a time, define keywords in a file, search+replace, and more. One of the search algorithms accessible through the '/' command is called '/p', which performs a pattern search. The command accepts a numeric value which determines the minimum size for a pattern to be resolved. Patterns are identified by a series of consecutive bytes, with a minimum length, that appear more than once. The output will return a pattern id, the pattern bytes and a list of offsets where the pattern is found. This kind of search algorithm can be useful for analyzing huge binary files like firmwares, disc images or uncompressed pictures. In order to get a global overview of a big file the 'pO' command can be used, which will display the full file represented in 'blocksize' bytes in hexadecimal. Each byte will represent the number of printable characters in the block, the entropy calculation, the number of functions identified there, the number of executed opcodes, or whatever is set at the 'zoom.byte' eval variable. zoom view real file +-----------+ +----------+ | 0 |--------------| 0 | | 1 | '-.__ | ... | | 2 | '-.__ | | | 3 | '-.| 128 | (filesize/blocksize) | ... | +----------+ The number of bytes of the real file represented in a byte of the zoom view is the quotient between the filesize and the blocksize. Function padding and preludes: Compilers usually pad functions with nop instructions to align the following one. They are usually small, but sometimes they are big enough to store trampolines that will help obfuscate the execution flow. In the 'extending trampolines' chapter I will explain a technique to hide function preludes inside a 'call trampoline' which runs the initial code of each function and then jumps back to the next instruction. By using call tables loaded in .data or memory it will be harder to follow for the reverser, because the program can lose all function preludes and footers so that it'll be read as a single blob. Compiler stubs: Function preludes and calling conventions can be used as watermarks to identify the compiler used. But when compiler optimizations are enabled the generated code will be heavily transformed and those preludes will probably be lost. Some new versions of gcc are using a new function prelude aligning the stack before entering the code to make memory access by the cpu on local variables. But we can replace them with a shorter hand made one adapted to the concrete function needs and this way ripping some more bytes for our own fun. The entrypoint embeds a system dependent loader that acts as a launcher for the constructor/main/destructor. We can analyze the constructor and the destructor to check if they are void. Then, we can drop all this code by just adding a branch to the main pointer, getting about 190 bytes for a C program. You will probably find much more unused code in programs written in D, C++, haskell or any other high level language. This is how a common gcc's entrypoint looks like: 0x08049a00, / xor ebp, ebp 0x08049a02 | pop esi 0x08049a03 | mov ecx, esp 0x08049a05 | and esp, 0xf0 0x08049a08, | push eax 0x08049a09 | push esp 0x08049a0a | push edx 0x08049a0b | push dword 0x805b630 ; sym.fini 0x08049a10, | push dword 0x805b640 ; sym.init 0x08049a15 | push ecx 0x08049a16 | push esi 0x08049a17 | push dword 0x804f4e0 ; sym.main 0x08049a1c, | call 0x80495b4 ; 1 = imp.__libc_start_main 0x08049a21 \ hlt By pushing 0's instead of fini/init pointers we can ignore the constructor and destructor functions of the program, most of the programs will work with no problems without them. By analyzing code in both (sym.init and sym.fini) pointers we can determine the number of bytes: [0x080482C0]> af @ sym.__libc_csu_fini~size size = 4 [0x080482C0]> af @ sym.__libc_csu_init~size size = 89 Tracing Execution traces are wonderful utilities to easily cleanup the view of a binary by identifying parts of it as tasks or areas. We can for example make a fast identification of the GUI code by starting a trace and graphically clicking on all the menus and visual options of the target application. The most common problem to tracing is the low performance of a !stepall operation, which mainly gets stupidly slow on loops, reps and tracing already traced code blocks. To speedup this process, radare implements a "touchtrace" method (available under the !tt command) which implements an idea of MrGadix (thanks!) This method consist in keeping a swapped copy of the targeted process memory space on the debugger side and replacing it with software breakpoint instructions. On intel we would use CC (aka int3). Once the program runs, it raises an exception because of the trap instruction, which is handled by the debugger. Then the debugger checks if the instruction at %eip has been swapped and replaces the userspace one, storing information about this step (like timestamp, number of times it has been executed and step counter). Once the instruction has been replaced, the debugger instructs the process to continue execution. Each time an unswapped instruction is executed the debugger replaces it. At the end (giving an end-breakpoint) the debugger have a buffer with enough information to determine the executed ranges. Using this easy method we can easily skip all the reps, loops and already analyzed code blocks, speeding up the binary analysis. Using the "at" command (which stands for analyze traces) it is possible to take statistics or information about the traces and, for example.. serialize the program and unroll loops. We can also use !trace, giving a certain debug level as numeric argument, to log register changes, executed instructions, buffers, stack changes, etc.. After these steps we can subtract the resulting ranges against the .text section and get how many bytes we can use to inject code. If the resulting space is not enough for our purposes we will have to face section resizing to put our code. This is explained later. Radare offers the "ar" (analyze ranges) command to manage range lists. It can perform addition, subtraction and boolean operations. > ar+ 0x1000 0x1040 ; manual range addition > ar+ 0x1020 0x1080 ; manual add with overlap > ari ; import traces from other places ..... > .atr* ; import execution traces information > .CF* ; import function ranges > .gr* ; import code analysis graph ranges ..... > ar ; display non overlapped ranges > ; show booleaned ranges inside text section > arb section._text section._text_end We can also import range information from external files or programs. One of the nicer characteristics of *nix is the ability to communicate applications using pipes by just making one program 'talk' in the other program language. So if we write a perl script that parses an IDA database (or IDC), we can extract the information we need and print radare commands ('ar' in this case) to stdout, and then parsing the results with the '.' command. This time ar gives us information about the number of bytes that are theoretically not going to be executed and its ranges. We can now place some breakpoints on these locations to ensure we are not missing anything. Manual revision of the automated code analysis and tracing results is recommended. --[ 2.1 - Resolving register based branchs The 'av' command provides subcommands to analyze opcodes inside the native virtual machine. The basic commands are: 'av-' to restart the vm state, 'avr' to manage registers, 'ave' to evaluate VM expressions and 'avx' that will execute N instructions from the current seek (will set the eip VM register to the offset value). Internally, the virtual machine engine translates instructions into evaluable strings that are used to manipulate memory and change registers. This simple concept allows to easily implement support for new instructions or architectures. Here is an example. The code analysis has reached a call instruction addressing with a register. The code looks like: /* --- */ $ cat callreg.S .global main .data .long foo .text foo: ret main: xor %eax, %eax mov $foo, %ebx add %eax, %ebx call *%ebx ret $ gcc callreg.S $ radare -d ./a.out ( ... ) [0xB7FC6810]> !cont sym.main Continue until (sym.main) = 0x08048375 [0x08048375]> pd 5 @ sym.main 0x08048375 eip,sym.main: 0x08048375 / xor eax, eax 0x08048377 | mov ebx, 0x8048374 ; sym.foo 0x0804837c,| add ebx, eax 0x0804837e | call ebx 0x08048380, \ ret [0x08048375]> avx 4 Emulating 4 opcodes MMU: cached Importing register values 0x08048375 eax ^= eax ; eax ^= eax 0x08048377 ebx = 0x8048374 ; sym.foo ; ebx = 0x8048374 0x0804837c, ebx += eax ; ebx += eax 0x0804837e call ebx ; esp=esp-4 ; [esp]=eip+2 ;==> [0xbf9be388] = 8048382 ((esp])) ; write 8048382 @ 0xbf9be388 ; eip=ebx [0x08048375]> avr eip eip = 0x08048374 At this point we can continue the code analysis where the 'eip' register of the virtual machine points. [0x08048375]> .afr @ `avr eip~[2]` --[ 2.2 - Resizing data section There is no simple way to resize a section in radare1. This process is completely manual and tricky. In order to solve this limitation, radare2 provides a set of libraries (named libr) that reimplement all the functionalities of radare 1.x following a modular design instead of the old monolithic approach. Both branches of development are being done in parallel. Here is where radare2 api (libr) comes in action: $ cat rs.c #include <stdio.h> #include <r_bin.h> int main(int argc, char *argv[]) { struct r_bin_t bin; if (argc != 4) { printf("Usage: %s <ELF file> <section> <size>\n", argv[0]); } else { r_bin_init(&bin); if (!r_bin_open(&bin, argv[1], 1, NULL)) { fprintf(stderr, "Cannot open '%s'.\n", argv[1]); } else if (!r_bin_resize_section(&bin, argv[2], r_num_math(NULL,argv[3]))) { fprintf(stderr, "Cannot resize section.\n"); } else { r_bin_close(&bin); return 0; } } return 1; } $ gcc -I/usr/include/libr -lr_bin rs.c -o rs $ rabin -S my-target-elf | grep .data idx=24 address=0x0805d0c0 offset=0x000150c0 size=00000484 \ align=0x00000020 privileges=-rw- name=.data $ ./rs my-target-elf .data 0x9400 $ rabin -S my-target-elf | grep .data idx=24 address=0x0805d0c0 offset=0x000150c0 size=00009400 \ align=0x00000020 privileges=-rw- name=.data Our 'rs' program will open a binary file (ELF 32/64, PE32/32+, ...) it will resolve a section named '.data' to change its size and will shift the rest of sections to keep the elf information as is. The reason for resizing the .data section is because this one is commonly linked after the text section. There will be no need to reanalyze the program code to rebase all the data accesses to the new address. The target program should keep working after the section resizing and we already know which places in the program can be used to inject our trampoline. +-------+ +-------+ | .text | | .text | | | | | <-- inject trampoline |-------| -> |-------| | .data | | .data | +-------+ | | <-- inject new code or data | | +-------+ Next chapters will explain multiple trampoline methods that will use our resized data section to store pointer tables or new code. --[ 2.3 - Basics on code injection Self modifying code is a nice thing, but usually a dangerous task too. Also patches/tools like SElinux will not let us to write to an already executable section. To avoid this problem we can change in the ELF headers the .text perms to writable but not executable, and include our decryption function in a new executable section. When the decryption process is done, it will have to change text perms to executable and then transfer control to it. In order to add multiple nested encryption layers the task will be harder mostly because of the relocation problems, and if the program is not PIC-ed, rebasing an entire application on the fly by allocating and copying the program multiple times in memory can be quite complicated. The simplest implementation consists in these steps: - Define from/to ranges - Define cipher key and algorithm - Inject deciphering code in the entrypoint - We will need to use mprotect before looping - Statically cipher the section The from/to addresses can be hardcoded in the assembly snippet or written in .data or somewhere for later. We can also ask the user for a key and make the binary work only with a password. If the encryption key is not found in the binary, the unpacking process becomes harder, and we will have to implement a dynamic method to find the correct one. This will be explained later. Another way to define ranges is by using watermarks and let our injected code walk over the memory looking for these marks. The last step is probably the easier one. We already know the key and the algorithm, its time to rewrite the inverse one using radare commands and let the script run over these ranges. This is an example: The 'wo' command is a subcommand of the 'w' (write) which offers arithmetic write operations over the current block. [0x00000000]> wo? Usage: wo[xrlasmd] [hexpairs] Example: wox 90 ; xor cur block with 90 Example: woa 02 03 ; add 2, 3 to all bytes of cur block Supported operations: woa addition += wos subtraction -= wom multiply *= wod divide /= wox xor ^= woo or |= woA and &= wor shift right >>= wol shift left <<= It uses cyclic hexpair byte arrays as arguments. Just by seeking on the "from" address and setting blocksize as "to-from" we will be ready to type the 'wo' ones. Implementing this process in macros will make the code more maintainable and easy to manage. The understanding of functional programming is good for writing radare macros, split the code in small functions doing minimal tasks and make them interact with each other. Macros are defined like in lisp, but syntax is really closer to a mix between perl and brainfuck. Don't get scary at first instance..it doesn't hurt as much as you can think and the resulting code is funnier than expected ;) (cipher from to key s $0 b $1-$0 woa $2 wox $2) We can put it in a single line (by replacing newlines with commas ',') and put it in our ~/.radarerc: (cipher from to key,s $0,b $1-$0,woa $2,wox $2,) Now we can call this macro at any time by just prepending a dot before the parenthesis and feeding it with the arguments. f from @ 0x1200 f to @ 0x9000 .(cipher from to 2970) To ease the code injection we can write a macro like this: (inject hookaddr file addr wa call $2 @ $0 wf $1 @ $2) Feeding this macro with the proper values will assemble a call opcode branching to the place where we inject our code (addr or $2) stored in the 'file' or $1. Radare has the expanded AES key searching algorithm developed by Victor Mu~noz. It can be helpful while reversing a program that uses this kind of ciphering algorithm. As the IO is abstracted it is possible to launch this search against files, program memory, ram dumps, etc.. For example: $ radare -u /dev/mem [0x00000000]> /a ... WARNING: Most current GNU/Linux distributions comes with the kernel compiled with the CONFIG_STRICT_DEVMEM option which restricts the access of /dev/mem to the only first 1.1M. All write operations are stored in a linked list of backups with the removed contents. This make possible to undo and redo all changes done directly on file. This allows you to try different modifications on the binary without having to restore manually the original one. These changes can be diffed later with the "radiff -r" command. The command performs a deltified binary diff between two files, The '-r' displays the result of the operation as radare commands that transforming the first binary into the second one. The output can be piped to a file and used later from radare to patch it at runtime or statically. For example: $ cp /bin/true true $ radare -w true [0x08048AE0]> wa xor eax,eax;inc eax;xor ebx,ebx;int 0x80 @ entrypoint [0x08048AE0]> u 03 + 2 00000ae0: 31 ed => 31 c0 02 + 1 00000ae2: 5e => 40 01 + 2 00000ae3: 89 e1 => 31 db 00 + 2 00000ae5: 83 e4 => cd 80 [0x08048AE0]> u* | tee true.patch wx 31 c0 @ 0x00000ae0 wx 40 @ 0x00000ae2 wx 31 db @ 0x00000ae3 wx cd 80 @ 0x00000ae5 The 'u' command is used to 'undo' write operations. Appending the '*' we force the 'u' command to list the write changes as radare commands, and we pipe the output to the 'tee' unix program. Let's generate the diff: $ radiff -r /bin/true true e file.insertblock=false e file.insert=false wx c0 @ 0xae1 wx 40 @ 0xae2 wx 31 @ 0xae3 wx db @ 0xae4 wx cd @ 0xae5 wx 80 @ 0xae6 This output can be piped into radare through stdin or using the '.' command to interpret the contents of a file or the output of a command. [0x08048AE0]> . radiff.patch or [0x08048AE0]> .!!radiff -r /bin/true $FILE The '$FILE' environment is exported to the shell from inside radare as well as other variables like $BLOCK, $BYTES, $ENDIAN, $SIZE, $ARCH, $BADDR, ... Note that in radare there's not really much difference between memory addresses and on-disk ones because the virtual addresses are emulated by the IO layer thanks to the io.vaddr and io.paddr variables which are used to define the virtual and physical addresses for a section or for the whole file. The 'S' command is used to configure a fine-grained setup of virtual and physical addressing rules for ranged sections of offsets. Do not care too much about it because 'rabin' will configure these variables at startup when the file.id eval variable is true. --[ 2.4 - Mmap trampoline The problem faced when resizing the .text section is that .data section is shifted, and the program will try to access it absolutely or relatively. This situation forces us to rebase all the text section and adapt the rest of sections. The problem is that we need a deep code analysis to rebase and this is something that we will probably do wrong if we try to do only a static code analysis. This situation usually require a complex dynamic, and sometimes manual, analysis to fix all the possible pointers. To skip this issue we can just resize the .data section which is after the .text one and just write a trampoline in the .text section loading code from .data and putting it in memory using mmap. The decision to use mmap is because is the only way to get executable pages with write permissions in memory even with SELinux enabled. The C code looks like this: ---------------------8<-------------------------- #include <stdio.h> #include <string.h> #include <sys/mman.h> int run_code(const char *buf, int len) { unsigned char *ptr; int (*fun)(); ptr = mmap(NULL, len, PROT_EXEC | PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); if (ptr == NULL) return -1; fun = (int(*)(void))ptr; memcpy(ptr, buf, len); mprotect(ptr, len, PROT_READ | PROT_EXEC); return fun(); } int main() { unsigned char trap = 0xcc; return run_code(&trap, 1); } ---------------------8<-------------------------- The manual rewrite in assembly become: ---------------------8<-------------------------- .global main .global run_code .data retaddr: .long 0 shellcode: .byte 0xcc hello: .string "Hello World\n" .text tailcall: push %ebp mov %esp, %ebp push $hello call printf addl $4, %esp pop %ebp ret run_code: pop (retaddr) /* lcall *0 */ call tailcall push (retaddr) /* our snippet */ push %ebp mov %esp, %ebp pusha push %ebp xor %ebp, %ebp /* 0 */ mov $-1, %edi /* -1 */ mov $0x22, %esi /* MAP_ANONYMOUS | MAP_PRIVATE */ mov $7, %edx /* PROT_EXEC | PROT_READ | PROT_WRITE */ mov $4096, %ecx /* len = 4096 */ xor %ebx, %ebx /* 0 */ mov $192, %eax /* mmap2 */ int $0x80 pop %ebp mov %eax, %edi /* dest pointer */ mov 0x8(%ebp), %esi /* get buf (arg0) */ mov $128, %ecx cld rep movsb mov %eax, %edi mov $5, %edx /* R-X */ mov $4096, %ecx /* len */ mov %edi, %ebx /* addr */ mov $125, %eax /* mprotect */ int $0x80 call *%edi popa pop %ebp ret main: push $shellcode call run_code add $4, %esp ret ---------------------8<-------------------------- Running this little program will result on a trap instruction executed on the mmaped area. I decided to do not checks on return values to make the code smaller. $ gcc rc.S $ ./a.out Hello World Trace/breakpoint trap The size in bytes of this code can be measured with this line in the shell: $ echo $((`rabin -o d/s a.out |grep run_code|cut -d ' ' -f 2 |wc -c`/2)) 76 This graph explains the layout of the patch required +----+ / push dword csu_fini .---------------| EP | < push dword csu_init | +----+ \ push dword main V v +------------+ +--------+ | csu_init | | main | / push 0x80483490 |------------| |--------| < call getopt | mmap | .-| ... | \ mov [esp+30], eax | trampoline | | | | +------------+ | +--------+ .-----------------. ||'|| | | this code needs | | |- - - - |- - - - - - - - - - - - - -. < the mmap tramp | : __|___ +-----------+ . | to be executed | . / hook \ ---> | mmap code | . `-----------------' . \getopt/ +-----------+ . - - - - - - - - - - - - - -|- - - - - - ' | +--------+ / fall to the getopt flow | getopt | <-----' +--------+ | (ret) The injection process consists on the following steps: * Resize the .data section to get enought space for our code f newsize @ section._data_end - section._data + 4096 !rabin2 -o r/.data/`?v newsize` $FILE q! * Write the code at the end of the unresized .data section !!gcc our-code.c "!!rabin a.out -o d/s a.out | grep ^main | cut -d ' ' -f 2 > inj.bytes" wF inj.bytes @ section._data_end * Null the __libc_csu_init pointer at entrypoint (2nd push dword) e asm.profile=simple s entrypoint wa push 0 @ `pd 20~push dword[0]#1` * Inject the mmap loader at symbol __libc_csu_init # generate mmap.bytes with the mmap trampoline !!gcc mmap-tramp.S "!!rabin a.out -o d/s a.out | grep ^main | cut -d ' ' -f 2 > mmap.bytes" # write hexpair bytes contained in mmap.bytes file at csu_init wF mmap.bytes @ sym.__libc_csu_init * Hook a call instruction to bridge the execution of our injected code and continue the original call workflow. - Determine the address of the call to the target symbol to be hooked. For example. Having 'ping' program, this script will determine an xref to the getopt import PLT entry. s entrypoint # seek to the third push dword in the entry point (main) s `pd 20~push dword[3]#2` # Analyze function: the '.' will interpret the output of the command # 'af*' that prints radare commands registering xrefs, variable # accesses, stack frame changes, etc. .af* # generate a file named 'xrefs' containing the code references # to the getopt symbol. Cx~`?v imp.getopt`[1] > xrefs # create an eNumerate flag at every offset specified in 'xrefs' file fN calladdr @@.xrefs - Create a proxy hook to bridge execution without destroying the stack. This is explained in the following lines. - Change the call of the hook to our proxy function # write assembly code 'call <addr>' # quotes will replace the inner string with the result of the # command ?v which prints the hexadecimal value of the given expr. wa call `?v dstaddr` - Profit Here is an example assembly code that demonstrates the functionality of the trampoline for the patched call. This snippet of code can be injected after the mmap trampoline. ----------8<----------- .global main trampoline: push $hookstr call puts add $4, %esp jmp puts main: push $hello call trampoline add $4, %esp ret .data hello: .string "hello world" hookstr: .string "hook!" ----------8<----------- The previous code cannot be injected directly in the binary, and one of the main conceptual reasons is that the destination address of the trampoline will be defined at runtime, when the mmapped region of the data section is ready with executable permissions, the destination address must be stored somewhere in the data section. --[ 2.4.1 - Call trampoline Once the holes have been identified we can now modify some parts of the program to obfuscate the control flow, symbol access, function calls and inlined syscalls in different ways. To walk through the program code blocks we can use iterators or nested iterators to run from simple to complex scripted code analysis engines and retrieve or manipulate the program by changing instructions or moving code. The control flow of the program can be obfuscated by adding a calltable trampoline for calling functions. We will have to patch the calls and store the jump addresses in a table in the data section. The same hack can be done for long jumps by placing a different trampoline to drop the stacked eip and use jumps instead of calls. Iterators in radare can be implemented as macros appending the macro call after the @@ foreach mark. For example: "x @ 0x200" will temporally seek to 0x200 and run the "x" command. Using: "x @@ .(iterate-functions)" will run the iterator macro at every address returned by the macro. To return values from a macro we can use the null macro body and append the resulting offset. If no value is given, null is returned and the iterator will stop. Heres the implementation for the function-iterator: "(iterate-functions,()CF~#$@,)" We can also use the search engine with cmd.hit eval variable to run a command when a search hit is found. Obviously, the command can be a macro or a script file in any language. e cmd.hit=ar- $$ $$+5 /x xx yy zz But we can also do it after processing the search by just removing the false positive hits and running a foreach command. # configure and perform search e search.from=section._text e search.to=section._text_end e cmd.search= /x xx yy zz # filtering hit results fs search ; enter into the search flagspace f ; list flags in search space f -hit0_3 ; remove hit #3 # write on every filtered hit wx 909090 @@ hit* Flagspaces are groups of flags. Some of them are automatically created by rabin while identifying strings, symbols, sections, etc, and others are updated at runtime like by commands like 'regs' (registers) or 'search' (search results). You can switch to or create another flagspace using the 'fs' command. When a flagspace is selected the 'f' command will only show the flags in the current flagspace. 'fs *' is used to enable all flagspaces at the same time (global view). See fs? for extended help. The internal metadata database can be managed with the C command and stores information about code and data xrefs, comments, function definitions, type, data conversions and structure definitions to be used by the disassembler engine, the code analysis module and user defined scripts. Data and code xrefs are very useful reference points while reverse engineering. It is interesting to obfuscate them all as much as possible to fool code analysis engines, forcing the reverser to use runtime or emulated environments to determine when a string or a buffer is used. To analyze a function in radare is preferably to go into Visual mode with command 'V', and then press the keys "d" and "f". This action will make a code analysis from the current seek or where the cursor points (if in cursor mode, toggleable with the lowercase "c" key in Visual mode). This code analysis will define multiple things like argument, local and global variable accesses, feeding the database with xref information that can be later used to determine which points of the program are using a certain pointer. By defining few eval vars in your ~/.radarerc most of this basic code analysis will be done by default. $ cat ~/.radarerc e file.id=true ; identify file type and set arch/baddr... e file.analyze=true ; perform basic code analysis at startup e file.flag=true ; add basic flags e scr.color=true ; use color screen NOTE: By passing the '-n' parameter to radare the startup process will skip the interpretation of .radarerc. All this information can be cached, manipulated and restored using project files (with the -P program flag at startup or enabling this feature in runtime with the 'P' command). After this parenthesis we continue obfuscating the xrefs to make reversers life funnier. One simple way to do this would be to create an initialization code, copying some pointers required by the program to run at random mmaped addresses and defining a way to transform these accesses into a sequence jigsaw. This can be achieved turning each pointer "random" based on some rules like a shifted array accessed by a mod. # calltable initializer (ct-init tramp ctable ctable_end f trampoline @ $0 f calltable @ $1 f calltable_ptr @ calltable wf trampoline.bin @ trampoline b $2-$1 wb 00 ; fill calltable with nops ) ; calltable adder (ct-add ; check if addr is a long call ? [1:$$]==0xeb ?!() ; check if already patched ? [4:$$+1]==trampoline ??() wv $$+5 @ calltable_ptr yt 4 calltable_ptr+4 @ $$+1 f calltable_ptr @ calltable_ptr+8 wv trampoline @ $$+1 ) We can either create a third helper macro that automatically iterates over every long call instruction in .text and running ct-add on it. (ct-run from to tramp ctable ctable_end .(ct-init tramp ctable ctable_end) s from loop: ? [1:$$]==0xeb ?? .(ct-add) s +$$$ ? $$ < to ??.loop: ) Or we can also write a more fine-grained macro to walk over the function e asm.profile=simple pD~call[0] > jmps .(ct-init 0x8048000 0x8048200) .(ct-add)@@.jmps The trampoline.bin file will be something like this: --------------------------------------------------------------- /* * Calltable trampoline example // --pancake * --------------------------------------------------- * $ gcc calltrampoline.S -DCTT=0x8048000 -DCTT_SZ 100 */ #ifndef CTT_SZ #define CTT_SZ 100 #endif #ifndef CTT #define CTT call_trampoline_table #endif .text .global call_trampoline .global main main: /* */ call ct_init call ct_test /* */ pushl $hello call puts add $4, %esp ret call_trampoline: pop %esi leal CTT, %edi movl %esi, 0(%edi) /* store ret addr in ctable[0] */ 0: add $8, %edi cmpl 0(%edi), %esi /* check ret addr against ctable*/ jnz 0b movl 4(%edi), %edi /* get real pointer */ call *%edi /* call real pointer */ jmp *CTT /* back to caller */ /* initialize calltable */ ct_init: leal CTT+8, %edi movl $retaddr, 0(%edi) movl $puts, 4(%edi) ret /* hello world using the calltable */ ct_test: push $hello call call_trampoline retaddr: add $4, %esp ret .data hello: .string "Hello World" call_trampoline_table: .fill 8*(CTT_SZ+1), 1, 0x00 --------------------------------------------------------------- Note that this trampoline implementation is not thread safe. If two threads are use the same trampoline call table at the same time to temporally store the return address, the application will crash. To fix this issue you should use the %gs segment explained in the 'syscall obfuscation' chapter. But to avoid complexity no thread support will be added at this moment. To inject the shellcode into the binary we will have to make some space in the .data segment for the calltable. We can do this with gcc -D: [0x8048400]> !!gcc calltrampoline.S -DCTT=0x8048000 -DCTT_SZ=100 > !! rabin -o d/s a.out|grep call_trampoline|cut -d : -f 2 > tramp.hex > wF tramp.hex @ trampoline_addr --[ 2.4.2 - Extending trampolines We can take some real code from the target program and move it into the .data section encrypted with a random key. Then we can extend the call trampoline to run an initialization function before running the function. Our call trampoline extension will cover three points: - Function prelude obfuscation - Function mmaping (loading code from data) - Unciphering and checksumming At the same time we can re-encrypt the function again after calling the end point address. This will be good for our protector because the reverser will be forced to manually step into the code because no software breakpoints will be able to be used on this code because they will be re-encrypted and the checksumming will fail. To do this we need to extend our call trampoline table or just add another field in the call trampoline table specifying the status of the destination code (if it is encrypted or not) and optionally the checksum. The article aims to explain guidelines and possibilities when playing with binaries. But we are not going to implement such kind of extensions to avoid enlarge too much this article explaining these advanced techniques. The 'p' command is used to print the bytes in multiple formats, the more powerful print format is 'pm' which permits to describe memory contents using a format-string like string and name each of the fields. When the format-string starts with a numeric value it is handled as an array of structures. The 'am' command manages a list of named 'pm' commands to ease the re-use of structure signatures along the execution of radare. We will start introducing this command describing the format of an entry of our trampoline call table: [0x08049AD0]> pm XX 0x00001ad0 = 0x895eed31 0x00001ad4 = 0xf0e483e1 Now adding the name of fields: [0x08049AD0]> pm XX from to from : 0x00001ad0 = 0x895eed31 to : 0x00001ad4 = 0xf0e483e1 And now printing a readable array of structures: [0x08049AD0]> pm 3XX from to 0x00001ad0 [0] { from : 0x00001ad0 = 0x895eed31 to : 0x00001ad4 = 0xf0e483e1 } 0x00001ad8 [1] { from : 0x00001ad8 = 0x68525450 to : 0x00001adc = 0x0805b6a0 } 0x00001ae0 [2] { from : 0x00001ae0 = 0x05b6b068 to : 0x00001ae4 = 0x68565108 } There is another command named 'ad' that stands for 'analyze data'. It automatically analyzes the contents of the memory (starting from the current seek) looking for recognized pointers (following the configured endianness), strings, pointers to strings, and more. The easiest way to see how this command works is by starting a program in debugger mode (f.ex: radare -d ls) and running 'ad@esp'. The command will walk through the stack contents identifying pointers to environment variables, function pointers, etc.. Internal grep '~' syntax sugar can be used to retrieve specific fields of structures for scripting, displaying or analysis. # register a 'pm' in the 'am' command [0x08048000]> am foo 3XX from to # grep for the rows matching 'from' [0x08048000]> am foo~from from : 0x00001ad0 = 0x895eed31 from : 0x00001ad8 = 0x68525450 from : 0x00001ae0 = 0x05b6b068 # grep for the 4th column [0x08048000]> am foo~from[4] 0x895eed31 0x68525450 0x05b6b068 # grep for the first row [0x08048000]> am foo~from[4]#0 0x895eed31 --[ 3 - Protections and manipulations This chapter will face different manipulations that can be done on binaries to add some layers of protections to make reverse engineering on the target binaries a more complicated. --[ 3.1 - Trashing the ELF header Corrupting the ELF header is another possible target of a packer for making the reverser's life little harder. Just modifying one byte in the ELF header becomes a problem for tools like gdb, ltrace, objdump, ... The reason is that they depend on section headers to get information about sections, symbols, library dependencies, etc... and if they are not able to parse them they just stop with an error. At the moment of writing, only IDA and radare are able to bypass this issue. This can be easily done in this way: $ echo wx 99 @ 0x21 | radare -nw a.out A reverser can use the fix-shoff.rs (included in the scripts) directory in the radare sources, to reconstruct the 'e_sh_off' dword in the ELF header. $ cat scripts/fix-shoff.rs ; radare script to fix elf.shoff (fix-shoff s 0 ; seek to offset 0 s/ .text ; seek to first occurrence of ".text" string loop: s/x 00 ; go next word in array of strings ? [1:$$+1] ; check if this is the last one ?!.loop: ; loop if buf[1] is == 0 s +4-$$%4 ; align seek pointer to 4 f nsht ; store current seek as 'nsht' wv nsht @ 0x20 ; write the new section header table at 0x20 (elf.sh_off) ) The script is used in this way: $ echo ".(fix-shoff) && q" | radare -i scripts/fix-shoff.rs -nw "target-elf" This script works on any generic bin with a trashed shoff. It will not work on binaries with stripped section headers (after 'sstrip' f.ex.) This is possible because GCC stores the section header table immediately after the string table index, and this section is easily identified because the .text section name will appear on any standard binary file. It is not hard to think on different ways to bypass this patch. By just stripping all the rest of section after setting shoff to 0 will work pretty well and the reverser will have to regenerate the section headers completely. This can be done by using sstrip(1) (double 's' is not a typo) from the elf-kickers package. This way we eliminate all unnecessary sections to run the program. The nice thing of this header bug is that the e_shoff value is directly passed to lseek(2) as second argument, and we can either try with values like 0 or -1 to get different errors instead of just giving an invalid address. This way it is possible to bypass wrong ELF parsers in other ways. Similar bugs are found in MACH-O parsers from GNU software like setting number of sections to -1 in the SEGMENT_COMMAND header. These kind of bugs don't hit kernels because they usually use minimum input from the program headers to map it in memory, and the dynamic linker at user space do the rest. But debuggers and file analyzers usually fail when try to extract corrupted information from headers. --[ 3.2 - Source level watermarks If we have the source of the target application there will be more ways to play with it. For instance, using defines as volatile asm inline macros, the developer can add marks or paddings usable for the binary packer. Some commercial packers offer an SDK which is nothing more than a set of include files offering helper functions and watermarks to make the packers life easier. And giving to the developer more tools to protect their code. $ cat watermark.h #define WATERMARK __asm__ __volatile__ (".byte 3,1,3,3,7,3,1,3,3,7"); We will use this CPP macro to watermark the generated binary for later post-processing with radare. We can specify multiple different watermarks for various purposes. We must be sure that these watermarks are not already in the generated binary. We can even have larger watermarks to pad our program with noisy bytes in the .text section, so we can later replace these watermarks with our assembly snippets. Radare can ease the manual process of finding the watermarks and injecting code, by using the following command. It will generate a file named 'water.list' containing one offset per line. [0x00000000]> /x 03010303070301030307 [0x00000000]> f~hit[0] > water.list [0x00000000]> !!cat water.list 0x102 0x13c 0x1a2 0x219 If we want to run a command at every offset specified in the file: [0x00000000]> .(fill-water-marks) @@. water.list Inside every fill-water-mark macro we can analyze the watermark type depending on the 'hit' flag number (we can search more than one keyword at a time) or by checking the bytes there and perform the proper action. If those watermarks have a fixed size (or the size is embedded into the watermark) we can then write a radare script that write a branch instruction at to skip the padding. (This will avoid program crash because it will not execute the dummy bytes of our watermark). Here's an ascii-art explanation: |<-- watermark ------------------------------------------>| |<-- unused bytes -->| +---------------------------------------------------------+ | jmp $$+size | .. .. .. .. .. .. .. .. .. .. .. .. .. .. | program +---------------------------------------------------------+ | ^ `------------------------------------------------------' At this point we have some more controlled holes to put our code. Our variable-sized watermark will be defined in this way: .long 0,1,2,3,4,5,6,7,4*20,9,10,11,12,13,14,15,16,17,18,19,20 And the watermark filling macro: (fill-water-marks,wa jmp $${io.vaddr}+$$+[$$+8],) # run the macro to patch the watermarks with jumps .(fill-water-marks) @@. water.list The io.vaddr specified the virtual base address of the program in memory. --[ 3.3 - Ciphering .data section A really common protection in binaries consists on ciphering the data section of a program to hide the strings and make the static code analysis harder. The technique presented here aims to protect the data section. This section generally does not contain program strings (because they are usually statically defined in the rodata section), but will help to understand some characteristics of the ELF loader and how programs work with the rw sections. The PT_LOAD program header type specifies where, which and how part of the binary should be mapped in memory. By default, the kernel maps the program at a base address. More over it will map aligned size (asize) of the program starting at 'physical' on 'virtual' address with 'flags' permissions. $ rabin -I /bin/ls | grep baddr baddr=0x08048000 $ rabin -H /bin/ls | grep LOAD virtual=0x08060000 physical=0x00018000 asize=0x1000 size=0x0fe0 \ flags=0x06 type=LOAD name=phdr_0 In this case, 4K is the aligned size of 0xfe0 bytes of the binary will be loaded at address 0x8060000 starting at address 0x18000 of the file. By running '.!rabin -rIH $FILE' from radare all this information will be imported into radare and it will emulate such situation. The problem we face here, is that the size and address of the mapped area doesn't match with the offset of the .data section. Some operations are required to get the correct offsets to statically modify the program to cipher such range of bytes and inject a snippet of assembly to dynamically decipher such bytes. The script looks like: -------8<------------ # flag from virtual address (the base address where the binary is mapped) # this will make all new flags be created at current seek + <addr> # we need to do this because we are calculating virtual addresses. ff $${io.vaddr} # Display the ranges of the .data section ?e Data section ranges : `?v section._data`- `?v section._data_end` # set flag space to 'segments'. This way 'f' command will only display the # flags contained in this namespace. fs segments # Create flags for 'virtual from' and 'virtual to' addresses. # We grep for the first row '#0' and the first word '[0]' to retrieve # the offset where the PT_LOAD segment is loaded in memory f vfrom @ `f~vaddr[0]#0` # 'virtual to' flag points to vfrom+ptload segment size f vto @ vfrom+`f~vaddr[1]#0` # Display this information ?e Range of data decipher : `?v vfrom`- `?v vto`= `?v vto-vfrom`bytes # Create two flags to store the position of the physical address of the # PT_LOAD segment. f pfrom @ `f~paddr[0]#0` f pto @ pfrom+`f~paddr[1]#0` # Adjust deltas against data section # ---------------------------------- # pdelta flag is not an address, we set flag from to 0 # pdelta = (address of .data section)-(physical address of PTLOAD segment) ff 0 && f pdelta @ section._data-pfrom ?e Delta is `?v pdelta` # reset flag from again, we are calculating virtual and physical addresses ff $${io.vaddr} f pfrom @ pfrom+pdelta f vfrom @ vfrom+pdelta ff 0 ?e Range of data to cipher: `?v pfrom`- `?v pto`= `?v pto-pfrom`bytes # Calculate the new physical size of bytes to be ciphered. # we dont want to overwrite bytes not related to the data section f psize @ section._data_end - section._data ?e PSIZE = `?v psize` # 'wox' stands for 'write operation xor' which accepts an hexpair list as # a cyclic XOR key to be applied to at 'pfrom' for 'psize' bytes. wox a8 @ pfrom:psize # Inject shellcode #------------------ # Setup the simple profile for the disassembler to simplify the parsing # of the code with the internal grep e asm.profile=simple # Seek at entrypoint s entrypoint # Seek at address (fourth word '[3]') at line'#1' line of the disassembly # matching the string 'push dword' which stands for the '__libc_csu_init' # symbol. This is the constructor of the program, and we can modify it # without many consequences for the target program (it is not widely used). s `pd 20~push dword[3]#1` # Compile the 'uxor.S' specifying the virtual from and virtual to addresses !gcc uxor.S -DFROM=`?v vfrom` -DTO=`?v vfrom+psize` # Extract the symbol 'main' of the previously compiled program and write it # in current seek (libc_csu_init) wx `!!rabin -o d/s a.out|grep ^main|cut -d ' ' -f 2` # Cleanup! ?e Uncipher code: `?v $$+$${io.vaddr}` !!rm -f a.out q! -------8<------------ The unciphering snippet of code looks like: -------8<------------ .global main main: mov $FROM, %esi mov $TO, %edi 0: inc %esi xorb $0xa8, 0(%esi) cmp %edi, %esi jbe 0b xor %edi, %edi ret -------8<------------ Finally we run the code: $ cat hello.c #include <stdio.h> char message[128] = "Hello World"; int main() { message[1]='a'; if (message[0]!='H') printf("Oops\n"); else printf("%s\n", message); return 0; } $ gcc hello.c -o hello $ strings hello | grep Hello Hello World $ ./hello Hallo World $ radare -w -i xordata.rs hello $ strings hello | grep Hello $ ./hello Hallo World The 'Hello World' string is no longer in plain text. The ciphering algorithm is really stupid. The reconstructed data section can be extracted from a running process using the debugger: $ cat unpack-xordata.rs e asm.profile=simple s entrypoint !cont `pd 20~push dword[3]#2` f delta @ section._data- segment.phdr_0.rw_.paddr-section. s segment.phdr_0.rw_.vaddr+delta b section._data_end - section._data wt dump q! $ radare -i unpack-xordata.rs -d ./hello $ strings dump Hello World Or statically: e asm.profile=simple wa ret @ `pd 20~push dword[3]#1` wox a8 @ section._data:section._data_end-section.data This is a plain example of how complicated can be to add a protection and how easy is to bypass it. As a final touch for the unpacker, just write a 'ret' instruction at symbol '__libc_csu_init' where the deciphering loop lives: wa ret @ sym.__libc_csu_init Or just push a null pointer instead of the csu_init pointer in the entrypoint. e asm.profile=simple s entrypoint wa push 0 @ `pd 20~push dword[0]#1` --[ 3.4 - Finding differences in binaries Lets draw a scenario where a modified binary (detected by a checksum file analyzer) has been found on a server, and the administrator or the forensic analyst wants to understand whats going on that file. To simplify the explanation the explanation will be based on the previous chapter example. At first point, checksum analysis will determine if there's any collision in the checksum algorithm, and if the files are really different: $ rahash -a all hello par: 0 xor: 00 hamdist: 01 xorpair: 2121 entropy: 4.46 mod255: 84 crc16: 2654 crc32: a8a3f265 md4: cbfc07c65d377596dd27e64e0dcac6cd md5: 2b3b691d92e34ad04b1affea0ad2e5ab sha1: 8ac3f9165456e3ac1219745031426f54c6997781 sha256: 749e6a1b06d0cf56f178181c608fc6bbd7452e361b1e8bfbcfac1310662d4775 sha384: c00abb14d483d25d552c3f881334ba60d77559ef2cb01ff0739121a178b05... sha512: 099b42a52b106efea0dc73791a12209854bc02474d312e6d3c3ebf2c272ac... $ rahash -a all hello.orig par: 0 xor: de hamdist: 01 xorpair: a876 entropy: 4.29 mod255: 7b crc16: 2f1f crc32: b8f63e24 md4: bc9907d433d9b6de7c66730a09eed6f4 md5: 6085b754b02ec0fc371a0bb7f3d2f34e sha1: 6b0d34489c5ad9f3443aadfaf7d8afca6d3eb101 sha256: 8d58d685cf3c2f55030e06e3a00b011c7177869058a1dcd063ad1e0312fa1de1 sha384: 6e23826cc1ee9f2a3e5f650dde52c5da44dc395268152fd5c98694ec9afa0... sha512: 09b7394c1e03decde83327fdfd678de97696901e6880e779c640ae6b4f3bf... There is the possibility to generate two different files matching until 3 different checksum algorithms (and maybe more in the future). If any of them differs we can determine that the file is different. $ du -bs hello hello.orig 4913 hello 4913 hello.orig As both files have the same size it will be worth using a byte-level diffing, because it is more probable easy to read the differences than trying to understand them as deltified ones (which is the default diffing algorithm). $ radiff -b hello.orig hello ------ 000003ed 00 000003f0 55 | 60 000003f1 89 | be 000003f2 e5 | c0 ... 00000406 fe | 61 00000407 ff | c3 00000408 ff | 90 00000409 8d | 90 0000040a bb | 90 0000040b 18 | 90 0000040c ff ------ 000005bd 00 000005c0 00 | a8 000005c1 00 | a8 000005c2 00 | a8 ... 000005df 00 | a8 000005e0 48 | e0 000005e1 65 | cd 000005e2 6c | c4 000005e3 6c | c4 000005e4 6f | c7 0000065f 00 | a8 00000660 00 Radiff detects two blocks of changes determined by the '------' lines. The first block starts at 0x3f0 and the second one (which is bigger) begins at address 0x5c0. While watching the massive 00->a8 conversion it is possible to blindly determine that the ciphering algorithm is XOR and the key is 0xa8. But let's get a look on the first block of changes and disassemble the code. $ BYTES=$(radiff -f 0x3f0 -t 0x40b -b hello.orig hello | awk '{print $4}') $ rasm -d "$(echo ${BYTES})" pushad mov esi, 0x80495c0 mov edi, 0x8049660 inc esi xor byte [esi], 0xa8 cmp esi, edi jbe 0x38 popad ret ... Here it is! this code of assembler is looping from 0x80495c0 to 0x8049660 and XORing each byte with 0xa8. radiff can also dump the results of the diff in radare commands. The execution of the script will result on a binary patch script to convert one file to another. $ radiff -rb hello hello.orig > binpatch.rs $ cat binpatch.rs | radare -w hello $ md5sum hello hello.orig 6085b754b02ec0fc371a0bb7f3d2f34e hello 6085b754b02ec0fc371a0bb7f3d2f34e hello.orig --[ 3.5 - Removing library dependencies In some gnu/linux distributions, all the ELF binaries are linked against libselinux, which is an undesired dependency if we want to make our hello world portable. SElinux is not available in all gnu/linux distros. This is a simple example, but we can extend the issue to bad pkg-config configuration files that can make a binary depend on more libraries than the minimum required. Perhaps most users will not care and will probably install those dependencies, recompile the kernel or whatever else. It is a pity to make steps in backward instead of telling gcc to not link against those libraries. But we can avoid these dependencies easily: all the library names are stored as zero-terminated strings in the ELF header, and the loader (at least the Solaris, Linux and BSD ones) will ignore dependency entries if the library name is zero length. The patch consists on finding the non-desired library name (or the prefix, if there are more than one libraries with similar names) and then write 0x00 in each search result. $ radare -nw hello [0x00000000]> / libselinux ... (results) ... [0x00000000]> wx 00 @@ hit [0x00000000]> q! Using -nw radare will run without interpreting the ~/.radarerc and it will not detect symbols or analyze code (we only need to use the basic hexadecimal capabilities). The -w flag will open the file in read-write. Each search result will be stored in the 'search' flagspace with names predefined by the 'search.flagname' eval variable that will be hit%d_%d by default. (This is keyword_index and hit_index). The last command will run 'wx 00' at every '@@' flag matching 'hit' in the current flagspace. Use 'q!' if you don't want to be prompted to restore the changes done before exiting. --[ 3.6 - Syscall obfuscation Looking for raw int 0x80 instructions on standard binaries it's going probably to be a hard task unless these are statically compiled. This is because they usually live in libraries which are embedded in the program. Libraries like libc have loads of symbols that are directly mapped to syscalls, and the reverser could use this information to identify code in a statically compiled program. To obfuscate symbols we can embed syscalls directly on the host program, after trashing the stack to break standard backtraces. If you go deep enough on the new syscalling system of Linux you will notice that the int 0x80 syscall mechanism is no longer used in the current x86 platforms because performance reasons. The new syscall mechanism consists in calling a trampoline function installed by the kernel as a 'fake' ELF shared object mapped on the running process memory. This trampoline function implements the most suitable syscall mechanism for the current platform. This shared object is called VDSO. In Linux only two segment registers are used: cs and gs. This is for simplicity reasons. The first one allows to access to the whole program memory maps, and the second one is used to store Thread related information. Here is a simple segment dumper to get the contents from %gs. ----------------------8<---------------------------- $ cat segdump.S #define FROM 0 /* 0x8048000 */ #define LEN 0x940 #define TIMES 1 #define SEGMENT %gs /* %ds */ .global main .data buffer: .fill LEN, TIMES, 0x00 .text main: mov $TIMES, %ecx loopme: push %ecx sub $TIMES, %ecx movl $LEN, %eax imul %eax, %ecx movl %ecx, %esi /* esi = ($TIMES-%ecx)*$LEN */ addl $FROM, %esi lea buffer, %edi movl $LEN, %ecx rep movsb SEGMENT:(%esi),%es:(%edi) movl $4, %eax movl $1, %ebx lea buffer, %ecx movl $LEN, %edx int $0x80 pop %ecx xor %eax, %eax cmp %eax, %ecx dec %ecx jnz loopme ret ----------------------8<---------------------------- At this point we disable the random va space to analyze the contents of the VDSO map always at the same address during executions. $ sudo sysctl -w kernel.randomize_va_space=0 Compile and dump the segment to a file: $ gcc segdump.S $ ./a.out > gs.segment And now let's do some little analysis in the debugger: $ radare -d ls ; debug 'ls' program ... ; Inside the debugger we write the contents of the gs.segment file. ; In the initial state, the current seek points to 'eip'. [0x4A13B8C0]> wf gs.segment Poke 100 bytes from gs.segment 1 times? (y/N) y file poked. offset 0 1 2 3 4 5 6 7 8 9 A B C D E F 0123456789ABCDEF 0x4a13b8c0, c006 e8b7 580b e8b7 c006 e8b7 0000 0000 ....X........... 0x4a13b8d0, 1414 feb7 ec06 0a93 b305 6beb 0000 0000 ..........k..... 0x4a13b8e0, 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x4a13b8f0, 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x4a13b900, 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x4a13b910, 0000 0000 0000 0000 0000 0000 0000 0000 ................ ; Once the file is poked in process memory we have different ways to ; analyze the contents of the current data block. One possible option ; is to use the 'pm' command which prints memory contents following the ; format string we use as argument, it is also possible to specify the ; element names as in a structure. ; ; Another option is the 'ad' command which 'analyzes data' reading the ; memory as 32bit primitives following pointers and identifying ; references to code, data, strings or null pointers. [0x4A13B8C0]> pm XXXXXXXXX 0x4a13b8c0 = 0xb7fe46c0 0x4a13b8c4 = 0xb7fe4ac8 0x4a13b8c8 = 0xb7fe46c0 0x4a13b8cc = 0x00000000 ; eax 0x4a13b8d0 = 0xb7fff400 ; maps._vdso__0+0x400 0x4a13b8d4 = 0xff0a0000 0x4a13b8d8 = 0x856003b1 0x4a13b8dc = 0x00000000 ; eax 0x4a13b8e0 = 0x00000000 ; eax [0x4A13B8C0]> ad 0x4a13b8c0, int be=0xc046feb7 le=0xb7fe46c0 0x4a13b8c4, int be=0xc84afeb7 le=0xb7fe4ac8 0x4a13b8c8, int be=0xc046feb7 le=0xb7fe46c0 0x4a13b8cc, (NULL) 0x4a13b8d0, int be=0x00f4ffb7 le=0xb7fff400 maps._vdso__0+0x400 0xb7fff400, string "QRU" 0xb7fff404, int be=0xe50f3490 le=0x90340fe5 ; Both analysis result in a pointer to the vdso map at 0x4a13b8d0 ; Disassembling these contents with 'pd' (print disassembly) we get that ; the code uses sysenter. [0x4A13B8C0]> pd 17 @ [+0x10] _vdso__0:0xb7fff400, 51 push ecx _vdso__0:0xb7fff401 52 push edx _vdso__0:0xb7fff402 55 push ebp .-> _vdso__0:0xb7fff403 89e5 mov ebp, esp | _vdso__0:0xb7fff405 0f34 sysenter | _vdso__0:0xb7fff407 90 nop | _vdso__0:0xb7fff408, 90 nop | _vdso__0:0xb7fff409 90 nop | _vdso__0:0xb7fff40a 90 nop | _vdso__0:0xb7fff40b 90 nop | _vdso__0:0xb7fff40c, 90 nop | _vdso__0:0xb7fff40d 90 nop `=< _vdso__0:0xb7fff40e ebf3 jmp 0xb7fff403 _vdso__0:0xb7fff410, 5d pop ebp _vdso__0:0xb7fff411 5a pop edx _vdso__0:0xb7fff412 59 pop ecx _vdso__0:0xb7fff413 c3 ret After this little analysis we can use a new way to call syscalls and transform the "int 0x80" instructions into a bunch of operations. new_syscall: push %edi push %esi movl $0x10, %esi movl %gs:(%esi), %edi call *%edi pop %esi pop %edi ret Using this 'big' primitive we are able to add more trash code and make static analysis harder. Using dynamic analysis it is possible to bypass this protection by writing a simple script that dumps the backtrace every time a syscall is executed. (syscall-bt e scr.tee=log.txt label: !contsc !bt .label: ) Or with a shell single liner: $ echo '(syscall-bt,e scr.tee=log.txt,label:,!contsc,!bt,.label:,) \ &&.(syscall-bt)' | radare -d ./a.out A way to make the analysis harder is to trash the stack contents with a random offset every time we run a syscall. By decreasing the stack pointer and later incrementing it we will make the backtrace fail. To trash the stack we can just increment the stack before the syscall and decrement it after calling it. The analyst will have to patch these inc/dec opcodes for every syscall trampolines to get proper backtrace and resolve the xrefs for each entry. Radare implements a "!contsc" command that makes the debugger stop before and after a syscall. This can be scripted together with "!bt" and "!st" to get a handmade version of the strace utility. Another trick that we can exploit for writing our homebrew packer is to use instructions needed to be patched for proper analysis as cipher keys. Making the program crash when trying to decipher a section using invalid keys for example. --[ 3.7 - Replacing library symbols Statically linked binaries are full of unused bytes, we can take use of this to inject our code. Lot of library symbols (for example from libc) can be directly replaced by system calls at the symbol or directly in-place where it is called. For example: call exit can be replaced these (also 5 byte length) instructions: xor eax,eax inc eax int 0x80 $ rasm 'xor eax,eax;inc eax;int 0x80' 31 c0 40 cd 80 This way we can just go into the statified libc's exit implementation and overwrite it with our own code. If the binary is statically compiled with stripped symbols we will have to look for signatures or directly find the 'int 0x80' instruction and then analyze code backward looking for the beginning of the implementation and then find for xrefs to this point. Most compilers append libraries at the end of the binary, so we can easily 'draw' an imaginary line where the program ends and libraries start. There's another tool that may help to identify which libraries and versions are embedded in our target program. It is called 'rsc gokolu'. The name stands for 'g**gle kode lurker' and it will look for strings in the given program in the famous online code browser and give approximated results of the source of the string. When we have debugging or symbolic information the process becomes much easier and we can directly overwrite the first bytes of the embedded glibc code with the small implementation in raw assembly using syscalls. rabin is the utility in radare which extracts information from the headers of a given program. This is like a multi-architecture multi-platform and multi-format 'readelf' or 'nm' replacement which supports ELF32/64, PE32/32+, CLASS and MACH0. The use of this tool is quite simple and the output is quite clean to ease the parsing with sed and other shell tools. $ rabin -vi a.out [Imports] Memory address File offset Name 0x08049034 0x00001034 __gmpz_mul_ui 0x08049044 0x00001044 __cxa_atexit 0x08049054 0x00001054 memcmp 0x08049064 0x00001064 pcap_close 0x08049074 0x00001074 __gmpz_set_ui ... $ rabin -ve a.out [Entrypoint] Memory address: 0x080494d0 $ rabin -vs a.out [Symbols] Memory address File offset Name 0x0804d86c 0x0000586c __CTOR_LIST__ 0x0804d874 0x00005874 __DTOR_LIST__ 0x0804d87c 0x0000587c __JCR_LIST__ 0x08049500 0x00001500 __do_global_dtors_aux 0x0804dae8 0x00005ae8 completed.5703 0x0804daec 0x00005aec dtor_idx.5705 0x08049560 0x00001560 frame_dummy ... $ rabin -h rabin [options] [bin-file] -e shows entrypoints one per line -i imports (symbols imported from libraries) -s symbols (exports) -c header checksum -S show sections -l linked libraries -H header information -L [lib] dlopen library and show address -z search for strings in elf non-executable sections -x show xrefs of symbols (-s/-i/-z required) -I show binary info -r output in radare commands -o [str] operation action (str=help for help) -v be verbose See manpage for more information. The same flags can be used for any type of binary for any architecture. Using the -r flag the output will be formatted in radare commands, this way you can import the binary information. This command imports the symbols, sections, imports and binary information. [0x00000000]> .!!rabin -rsSziI $FILE This is done automatically when loading a file if you have file.id=true, if you want to disable the analysis run 'radare' with '-n' (because this way it will not interpret the ~/.radarerc). Programs with symbols or debug information are mostly analyzed by the code analysis of 'radare' thanks to the hints given by 'rabin'. So, theoretically we will be able to automatically resolve the crossed references to functions. But if we are not that lucky we can just setup a simple disassembly output format (e asm.profile=simple) and disassemble the whole program code grepping for 'call 0x???????' to resolve calls to our desired owned symbol. [0x08049A00]> e scr.color=false [0x08049A00]> e asm.profile=simple [0x08049A00]> s section._text [0x08049A00]> b section._text_end-section._text [0x08049A00]> pd~call 0x80499e4 0x08049fd9 call 0x80499e4 ; imp.exit 0x0804fa89 call 0x80499e4 ; imp.exit 0x0805047b call 0x80499e4 ; imp.exit [0x08049A00]> If we are not lucky and do not have symbols we will have to identify which parts of the binary are part of our target binary. $ rsc syms-dump /lib/libc.so.6 24 > symbols --[ 3.8 - Checksumming The most common procedure to check for undesired runtime or static program patches is done by using checksumming operations over sections, functions, code or data blocks. These kind of protections are usually easy to patch, but allow the end user to get a grateful error message instead of a raw Segmentation Fault. The error message string can be a simple entrypoint for the reverser to catch the position of the checksumming code. Radare offers a hashing command under the "h" char and allows you to calculate block-based checksums while debugging or statically by opening the file from disk. Heres an usage example: hmd5 It is also possible to use these hashing instructions to find pieces of the program matching a certain checksum value, or just recalculate the new sum to not patch the hash check. The 'h' command can be also performed from the shell by using the "rahash" program which permits to calculate multiple checksums for a single file based on blocks of bytes. This is commonly used on forensics, when you want a checksum to not only give you a boolean reply (if the program has been modified), because this way you can reduce the problem on a smaller piece of the binary because you have multiple checksums at different offsets for the same file. This action can be done over hard disks, device firmwares or memory dumps of the same size. If you are dealing similar file with deltified contents or you require a more fine grained result you should try "radiff" which implements multiple bindiffing algorithms that goes from byte level detecting deltas to code level diffing using information of basic blocks, variables accessed and functions called. We will also have to decide a protection method for the checksumming algorithm, because this will be the first patch required to enable software breakpoints and user defined patches to be done. To protect the hasher we just put it distributed along the program by mixing its code with the rest of the program making more difficult to identify the its position. We can also choose to put this code into a ciphered area in .data together with other necessary code to make the program run. One of the main headaches for reversers is the layering of protections that can be represented by lot of entrypoints to the same code making more difficult to locate and patch in a proper way. The checksumming of a section can be used as a deciphering key for other ciphered sections, to avoid easy patching, the program should check for the consistence of the deciphered code. This can be done by checking for another checksum or by comparing few bytes at the beginning of the deciphered and mmap code. There is another possible design to implement the checksumming algorithm in a more flexible way which consists on checksumming the code in runtime instead of hardcoding the resulting value somewhere. This method allows us to 'protect' multiple functions without having to statically calculate the checksum value of the region. The problem with this method is that it doesn't provide us protection against static modifications on the binary. This snippet can be compiled with DEBUG=0 to get a fully working checksumming function which returns 0 when the checksum matches. ---------------------8<-------------------------- #define CKSUM 0x68 #define FROM 0x8048000 #define SIZE 1024 str: .asciz "Checksum: 0x%08x\n" .global cksum cksum: pusha movl $FROM, %esi movl $SIZE, %ecx xor %eax, %eax 1: xorb 0(%esi), %al add $1, %esi loopnz 1b #if DEBUG pushl %eax pushl $str call printf addl $8, %esp #else sub $CKSUM, %eax jnz 2f #endif popa ret #if DEBUG .global main main: call cksum; ret #else 2: mov $20, %eax int $0x80 mov $37, %ebx mov $9, %ecx xchg %eax, %ebx int $0x80 #endif ---------------------8<-------------------------- To get the bytes of the function to be pushed we can use an 'rsc' script that dumps the hexpairs contained managed by a symbol in a binary: $ gcc cksum.S -DDEBUG $ ./a.out Checksum: 0x69 $ gcc -c cksum.S $ rsc syms-dump cksum.o | grep cksum cksum: 60 be 00 80 04 08 b9 00 04 00 00 31 c0 32 06 83 c6 01 e0 f9 2d d2 04 \ 00 00 75 02 61 c3 b8 14 00 00 00 cd 80 bb 25 00 00 00 b9 09 00 00 00 93 cd 80 The offsets to patch with our ranged values are: +2: original address (4 bytes in little endian) +7: size of buffer (4 bytes in little endian) +21: checksum byte (1 byte) We can use this hexdump from a radare macro that automatically patches the correct bytes while inserting the code in the target program. $ rsc syms-dump cksum.o | grep cksum | cut -d : -f 2 > cksum.hex $ cat cksum.macro (cksum-init ckaddr size,f ckaddr @ $0,f cksize @ $1,) (cksum hexfile size # kidnap some 4 opcodes from the function f kidnap @ `aos 4` yt kidnap ckaddr+cksize wa jmp $$+kidnap @ ckaddr+cksize+kidnap wa call `?v ckaddr` wF $0 @ ckaddr f cksize @ $$w f ckaddr_new @ ckaddr+cksize+50 wv $$ @ ckaddr+2 wv $1 @ ckaddr+7 wx `hxor $1` @ ckaddr+21 f ckaddr @ ckaddr_new ) $ cat cksum.hooks 0x8048300 0x8048400 0x8048800 $ radare -w target [0x8048923]> . cksum.macro [0x8048923]> .(cksum-init 0x8050300 30) [0x8048923]> .(cksum cksum.hex 100) @@. cksum.hooks --[ 4 - Playing with code references The relationship between parts of code in the program are a good source of information for understanding the dependencies between code blocks, functions. Following sub-chapters will explain how to generate, retrieve and use the code references for retrieving information from a program in memory or statically on-disk. --[ 4.1 - Finding xrefs One of the most interesting information we need when understanding how a program works or what it does is the flow graph between functions. This give us the big picture of the program and allows us to find faster the points of interest. There are different ways to obtain this information in radare, but you will finally manage it with the 'Cx' and 'CX' commands (x=code xref, X=data xref). This means that you can even write your own program to extract this information and import it into radare by just interpreting the output of the program or a file. [0x8048000]> !!my-xrefs-analysis a.out > a.out.xrefs [0x8048000]> . a.out.xrefs or [0x8048000]> .!my-xrefs-analysis a.out At the time of writing this, radare implements a basic code analysis algorithm that identifies function boundaries, splits basic blocks, finds local variables and arguments, tracks their access and also identifies code and data references of individual instructions. When 'file.analyze' is 'true', this code analysis is done recursively over the entrypoint and all the symbols identified by rabin. This means that it will not be able to automatically resolve code references following pointer tables, register branches and so on and lot of code will remain unanalyzed. We can manually force to analyze a function at a certain offset by using the '.afr@addr' command which will interpret ('.') the output of the command 'afr' that stands for 'analyze function recursively' at (@) a certain address (addr). When a 'call' instruction is identified, the code analysis will register a code xref from the instruction to the endpoint. This is because compilers uses these instructions to call into subroutines or symbols. As we did previously, we can disassemble the whole text section looking for a 'call' instruction and register code xrefs between the source and destination address with these two lines: [0x000074E0]> e asm.profile=simple [0x000074E0]> "(xref,Cx `pd 1~[2]`)" [0x000074E0]> .(xref) @@=`pd 100~call[0]` We use the simple asm.profile to make the disassembly text parsing easier. The first line will register a macro named 'xref' that creates a code xref at address pointed by the 3rd word of the disassembled instruction. In the second line it will run the previous macro at every offset specified by the output of the "pd 100~call[0]" instruction. This is the offset of every call instruction of the following 100 instructions. When disassembling code (using the 'pd' command (print disasm)) xrefs will be displayed if asm.xrefs is enabled. Optionally, we can set asm.xrefsto to visualize the xref information at both end-points of the code reference. --[ 4.2 - Blind code references The xrefs(1) program that comes with radare implements a blind code reference search algorithm based on identifying sequences of bytes matching a certain pattern. The pattern is calculated depending on the offset. This way the program calculates the relative delta between the current offset and the destination address and tries to find relative branches from one point to another of the program. The current implementation supports x86, arm and powerpc, but allows to define templates of branch instructions for other architectures by using the option flags to determine the instruction size, endianness, the way to calculate the delta, etc. The main purpose of this program was to identify code xrefs on big files (like firmwares), where the code analysis takes too long or requires many manual interaction, it is not designed to identify absolute calls or jumps. Here's an usage example: $ xrefs -a arm bigarmblob.bin 0x6a390 0x80490 0x812c4 --[ 4.3 - Graphing xrefs Radare ships some commands to manage and create interactive graphs to display code analysis based on basic blocks, function blocks or whatever we want by using the 'gu' command (graph user) The most simple way to use the graphing capabilities is to use the 'ag' command that will directly popup a gtk window with a graphed code analysis from the current seek, splitting the code by basic blocks. (use the eval graph.jmpblocks and graph.callblocks to split blocks by jumps or calls to analyze basic blocks or function blocks). The graph is stored internally, but if we want to create our own graphs we can use the 'gu' command which provides subcommands to reset a graph, create nodes, edges and display them. [0x08049A00]> gu? Usage: gu[dnerv] [args] gur user graph reset gun $$ $$b pd add node gue $$F $$t add edge gud display graph in dot format guv visualize user defined graph This is a radare script that will generate a graph between analyzed function by using the xrefs metadata information. After configuring the graph we can display it using the internal viewer (guv) or using graphviz's dot. --------8<--------------- "(xrefs-to from,f XT @ `Cx~$0[3]`,)" "(graph-xrefs-init,gur,)" "(graph-xrefs,.(xrefs-to `?v $$`),gue $$F XT,)" "(graph-viz,gud > dot,!!dot -Tpng -o graph.png dot,!!gqview graph.png,)" .(graph-xrefs-init) .(graph-xrefs) @@= `Cx~[1]` .(graph-viz) --------8<--------------- Here's the explanation of the previous script: Note that all commands are quoted '"' because this way the inner special characters are ignored ('>', '~' ...) "(xrefs-to from,f XT @ `Cx~$0[3]`,)" Register a macro named 'xrefs-to' that accepts one argument named 'from' and aliased as $0 inside the macro body. The macro executes the 'f XT' (create a flag named XT) at ('@') the address where the code xref (Cx) points (3rd column of the row matching $0 (the first argument of the macro (from)). "(graph-xrefs-init,gur,)" The 'graph-xrefs-init' macro resets the graph user metadata. "(graph-xrefs,.(xrefs-to `?v $$`),gue $$F XT,)" This macro named 'graphs-xrefs' that run commands separated by commas: .(xrefs-to `?v $$`) execute the macro 'xrefs-to' at current seek ($$), the value is concatenated to the output of executing '?v $$' which evaluates the '$$' special variable and prints the value in hexadecimal. gue XF XT Adds an 'graph-user' edge between the beginning of the function at current seek and XT. "(graph-viz,gud > dot,!!dot -Tpng -o graph.png dot,!!rsc view graph.png,)" These commands will generate a dot file, render it in png and display it. .(graph-xrefs-init) .(graph-xrefs) @@= `Cx~[1]` .(graph-viz) Then, we bundle all those macros and the xrefs graph will be displayed --[ 4.4 - Randomizing xrefs Another annoying low level manipulation we can do is to repeat functions at random places with random modifications to make the program code more verbose and hard to analyze because of complex control flow paths. By doing a cross-references (xrefs) analysis between the symbols we will be able to determine which symbols are referenced more from than one single point. The internal grep syntax can be tricky to retrieve lists of offsets for xrefs at a given position. [0x0000c830]> Cx Cx 0x00000c59 @ 0x00000790 Cx 0x00000c44 @ 0x000007b0 Cx 0x00000c35 @ 0x00000810 Cx 0x00000c26 @ 0x000008a0 ... These numbers specify the caller address and the called one. That is: 0xc59 -(calls)-> 0x790 To disassemble the opcodes referencing code to other positions we can use the 'pd' command with a foreach statement: [0x0000c830]> pd 1 @@= `C*~Cx[1]` This command will run 'pd 1' on every address specified in the 1st column of the output of the 'C*~Cx[1]' --[ 5 - Conclussion This article has covered many aspects of reverse engineering, low level program manipulation, binary patching/diffing and more, despite being a concrete task it has tried to expose a variety of scenarios where a command line hexadecimal editor becomes an essential tool. Several external scripting languages can be used, don't care about the cryptic scripting of the radare commands, there are APIs for perl, ruby and python for controlling radare and analyze code or connect to remote radares. Being used in batch mode enables the ability to automatize many analysis tasks, generate graphs of code, determine if it is packed or infected, find vulnerabilities, etc. Maybe the more important characteristic of this software is the abstraction of the IO which allows you to open hard disk images, serial connections, process memory, haret wince devices, gdb-enabled applications like bochs, vmware, qemu, shared memory areas, winedbg and more. There are lot of things we have not covered in this article but the main concepts have been exposed while explaining multiple protection techniques for binaries. Using scriptable tools for implementing binary protection techniques eases the development and permits to recycle and accelerate the analysis and manipulation of files. --[ 6 - Future work The project was born in 2006, and it has grown pretty fast. At the moment of writing this paper the development is done on two different branches: radare and radare2. The second one is a complete refactoring of the original code aiming to reimplement all the features of the radare framework into multiple libraries. This design permits to bypass some limitations of the design of r1 enabling full scripting support for specific or composed set of libraries. Each module extends its functionalities into plugins. While using parrot as a virtual machine, multiple scripting languages can be used together. Being scripting language agnostic enables the possibility to mix multiple frameworks together (like metasploit [5], ERESI [6], etc) The modular design based on libraries, plugins and scripts will open new ways to extend the framework by third party developers and additionally make the code much more maintainable and bugfree. New applications will appear on top of the r2 set of libraries, there are some projects aiming to integrate it from web frontends and graphical frontends. Plugins extend the modules adding support for new architectures, binary formats, code analysis modules, debugging backends, remote protocols, scripting facilities and more. At the moment of writing this, radare2 (aka libr) is actually under development but implements some test programs that try to reimplement the applications distributed with radare. --[ 7 - References [0] radare homepage http://www.radare.org [1] radare book (pdf) http://www.radare.org/get/radare.pdf [2] x86-32/64 architecture http://www.intel.com/Assets/PDF/manual/253666.pdf [3] ELF file format http://en.wikipedia.org/wiki/Executable_and_Linkable_Format [4] AT&T syntax http://sig9.com/articles/att-syntax [5] Metasploit, ruby based exploit development framework http://www.metasploit.com/ [6] ERESI http://www.eresi-project.org/ --[ 8 - Greetings Thanks to JV for his support during the write-up of this article, and to the Phrack staff to make the publication possible. I would also like to greet some people for their support while the development of radare and this article: SexyPandas .. for the great quals, ctfs and moments nopcode .. for those great cons, feedback and beers nibble .. for those pingpong development sessions and support killabyte .. for the funny abstract situations we faced everyday esteve .. for the talks, knowledge and the arm/wince reversing tips wzzx .. for the motivators, feedback and luls lia .. my gf which supported me all those nightly hacking hours :) .. Certainly... Thanks to all the people who managed to read the whole text and also to the rest for being part of that ecosystem named 'underground' and computer security which bring us lot of funny hours. There is still lot of work to be done, and ideas or contributions are always welcome. Feel free to send me comments. --pancake --------[ EOF