The majority of security vulnerabilities published in the literature are due to software bugs. Many researchers have developed source program transformation and analysis techniques to automatically detect or eliminate such vulnerabilities. However, since most of the commercially distributed applications on the Windows/x86 platform don't come with source code or even debugging symbol information, the tools developed on source code level can not work in such environment.
Binary analysis and instrumentation is therefore needed on Win32/x86 platform.
Static binary analysis and rewriting is a much easier task in case of RISC
architectures because of its smaller set of instructions and fixed-size instruction format.
It may also work well for GCC compiled binaries due to the well behaveness of
GCC in terms of separating data from code and well-known organization of code
section. However, it is almost impossible for binaries on Win32/x86 platform
without auxiliary information due to its variable instruction sizes
and the difficulty in distinguishing code from data. This project aims at
binary analysis and instrumentation framework that can work with Win32/x86
binaries in an accurate and efficient way.
There are three components in BIRD's runtime engine.
BIRD takes control at indirect branches by replacing them with a jump to a special function check(), which is the core of BIRD¡¯s run-time engine and performs the following functions:
The BIRD infrastructure provides plenty of opportunities for many security-related applications to be built on top of it. We have successfully built multiple systems on top of BIRD's infrastructure.
FOOD (Foreign Code Detection)There are many different ways to inject foreign-code into a running application through control hijacking. Using BIRD, it is possible to instrument a binary that inspects at each control transfer instruction and detects any foreign-code, if any. We leverage BIRD to detect such injections at return addresses, function pointers, indirect branches, and also prohibits any modifications to import address tables in DLLs. The overhead involved is well within the reasonable limits and are typically of the order of 10 to 25%.
BASS (Binary Application Specific Sandboxing)
BASS compares the system call sequence of a network application against a
sandboxing policy to detect control-hijacking attack, in which the attacker
exploits software vulnerabilities such as buffer overflow to grab the control of
a victim application and possibly the underlying machine. Built on top of BIRD,
BASS can automatically extract a highly accurate
application-specific sandboxing policy from aWin32/X86 binary, and enforce the extracted policy at runtime with low overhead.
On Win32, a lot of distributed binaries are also in packed forms. A packed binary is the binary with the original code compressed or encrypted by the packers to make it difficult for static analysis. We are currently extending BIRD to handle these packed binaries. Every packed binary eventually needs to be unpacked into original form at run-time and transfers the execution to the unpacked code. This procedure involves both memory writing and code execution on the section pages of in-memory image. BIRD starts with marking all pages with read-only protection attribute, and then captures the write or execution exceptions when the code is trying to write or execute the pages. After each run of exceptions, BIRD calculates the entropy value of the pages being rewritten. If the entropy value is less than a certain threshold, BIRD treats the binary as fully unpacked, and then initiates the regular disassembling process. By this way, BIRD should be able to handle a packed binary with multilayer unpackers as well.
In general, to achieve high disassembly accuracy requires more conservative disassembling strategies,
which imply low disassembly coverage and thus higher run-time overhead. To attain high disassembly
accuracy while minimizing the run-time overhead, we propose using a speculative disassembly technique.
When BIRD¡¯s static disassembler disassembles an input binary, it works in a very
conservative way so that it may only takes a small portion of code as KA. However, instead of leaving the other
parts to dynamic component, BIRD still tries to disassemble all the left parts as much as it can. It records all
the disassembled results even it is not sure whether they are correct.
At run time, when BIRD¡¯s dynamic disassembler is invoked due to an indirect branch instruction that
jumps to an UA, it first checks whether the speculatively disassembled result also thinks the branch¡¯s target
address starts an instruction. If so, the dynamic disassembler simply borrows the corresponding portion of
the speculatively disassembled result without performing any disassembling; otherwise it needs to disassemble
the UA on its own.