BIRD: Binary Interpretation using Runtime Disassembly



Faculty: Tzi-cker Chiueh
Members: Wei Li, Yanjun Wu, Susanta Nanda, Lap-Chung Lam

The majority of security vulnerabilities published in the literature are due to software bugs. Many researchers have developed source program transformation and analysis techniques to automatically detect or eliminate such vulnerabilities. However, since most of the commercially distributed applications on the Windows/x86 platform don't come with source code or even debugging symbol information, the tools developed on source code level can not work in such environment.

Binary analysis and instrumentation is therefore needed on Win32/x86 platform. Static binary analysis and rewriting is a much easier task in case of RISC architectures because of its smaller set of instructions and fixed-size instruction format. It may also work well for GCC compiled binaries due to the well behaveness of GCC in terms of separating data from code and well-known organization of code section. However, it is almost impossible for binaries on Win32/x86 platform without auxiliary information due to its variable instruction sizes and the difficulty in distinguishing code from data. This project aims at implementing the binary analysis and instrumentation framework that can work with Win32/x86 binaries in an accurate and efficient way.
¡¡

Solution Overview

The project employs a hybrid-disassembly scheme. First, it statically disassembles the input binary into a set of known and unknown code areas. The known areas (KA) are the regions that are guaranteed to be instructions and have been correctly interpreted by the static disassembly engine. The unknown areas (UA) are the areas that are either not covered or not confirmed to be valid instructions by the disassembler. We realize that the only possible way the instruction pointer could jump from a known area to an unknown area is through an indirect branch, which is either an indirect jump or an indirect call. By tracking each of the indirect branches (which are the gateways to the unknown areas) during the application execution, we can make sure we track all the entries to the UAs. In order to interpret the program completely and correctly, we interpret even the libraries (DLLs) too. The callback functions and exception handlers eventually get interpreted as they are invoked through system DLLs (typically, NTDLL.DLL) that use indirect branches to jump to the handler routines.  When we detect a control transfer from a KA to an UA during the program execution, we invoke dynamic disassembler to disassemble the code in the UA until it jumps out to a KA. At the meantime, the instrumentation engine will also patch the indirect branches discovered at runtime as well. As a result, the newly disassembled portion of the UA is declared as known and thus removed from the UA list. Continuing this process, we are able to traverse all the instructions that are executed during a particular run for the application. Although the system does not theoretically disassemble all the instructions in an application, it guarantees the coverage for all the instructions that would eventually be executed during a run.
¡¡

System Architecture

The following figure shows the architecture of our BIRD system. The analysis and instrumentation of the input binaries are carried out both statically and dynamically. The procedure in dotted line happens in static time. Given a Win32 binary, BIRD first invokes its static disassembler to analyze and patch it for indirect branches and auxiliary information such as UAs. The new binary will replace the original one. It executes together with BIRD's runtime engine, which is implemented as a dynamic linked library (DLL). 

There are three components in BIRD's runtime engine.
BIRD takes control at indirect branches by replacing them with a jump to a special function check(), which is the core of BIRD¡¯s run-time engine and performs the following functions:

The runtime disassembly is invoked only when a UA is discovered at runtime. The corresponding instrumentation will be done for dynamic disassembled region as well.


Applications

The BIRD infrastructure provides plenty of opportunities for many security-related applications to be built on top of it. We have successfully built multiple systems on top of BIRD's infrastructure.

FOOD (Foreign Code Detection)

There are many different ways to inject foreign-code into a running application through control hijacking. Using BIRD, it is possible to instrument a binary that inspects at each control transfer instruction and detects any foreign-code, if any. We leverage BIRD to detect such injections at return addresses, function pointers, indirect branches, and also prohibits any modifications to import address tables in DLLs. The overhead involved is well within the reasonable limits and are typically of the order of 10 to 25%.

BASS (Binary Application Specific Sandboxing)

BASS compares the system call sequence of a network application against a sandboxing policy to detect control-hijacking attack, in which the attacker exploits software vulnerabilities such as buffer overflow to grab the control of a victim application and possibly the underlying machine. Built on top of BIRD, BASS can automatically extract a highly accurate
application-specific sandboxing policy from aWin32/X86 binary, and enforce the extracted policy at runtime with low overhead.


Extensions


BIRD for Packed Binaries

On Win32, a lot of distributed binaries are also in packed forms. A packed binary is the binary with the original code compressed or encrypted by the packers to make it difficult for static analysis. We are currently extending BIRD to handle these packed binaries. Every packed binary eventually needs to be unpacked into original form at run-time and transfers the execution to the unpacked code. This procedure involves both memory writing and code execution on the section pages of in-memory image. BIRD starts with marking all pages with read-only protection attribute, and then captures the write or execution exceptions when the code is trying to write or execute the pages. After each run of exceptions, BIRD calculates the entropy value of the pages being rewritten. If the entropy value is less than a certain threshold, BIRD treats the binary as fully unpacked, and then initiates the regular disassembling process. By this way, BIRD should be able to handle a packed binary with multilayer unpackers as well.


BIRD on Speculative Disassembly

In general, to achieve high disassembly accuracy requires more conservative disassembling strategies, which imply low disassembly coverage and thus higher run-time overhead. To attain high disassembly accuracy while minimizing the run-time overhead, we propose using a speculative disassembly technique. When BIRD¡¯s static disassembler disassembles an input binary, it works in a very conservative way so that it may only takes a small portion of code as KA. However, instead of leaving the other parts to dynamic component, BIRD still tries to disassemble all the left parts as much as it can. It records all the disassembled results even it is not sure whether they are correct. At run time, when BIRD¡¯s dynamic disassembler is invoked due to an indirect branch instruction that jumps to an UA, it first checks whether the speculatively disassembled result also thinks the branch¡¯s target address starts an instruction. If so, the dynamic disassembler simply borrows the corresponding portion of the speculatively disassembled result without performing any disassembling; otherwise it needs to disassemble the UA on its own. 
¡¡

Publications

References

  1. Manish Prasad and Tzi-cker Chiueh. A Binary Rewriting Defense Against Stack-based Buffer Overflow Attacks, in the Proceedings of Usenix Annual Technical Conference, San Antonio, TX, June 2003.
  2. Tzi-cker Chiueh and Fu-hau Hsu. RAD: A compile time solution for buffer overflow attacks, 21st IEEE International Conference on Distributed Computing Systems (ICDCS), Phoenix, Arizona, USA, April 2001.
  3. V. Kiriansky, D. Bruening, and S. Amarasinghe, Secure execution via program shepherding , in 11th USENIX Security Symposium, 2002.