Project Description:
Control-hijacking attacks exploit vulnerabilities in software programs to take control of the victim applications
and eventually their underlying machines. Existing control-hijacking attacks overwrite controlsensitive
data structures in the victim applications by exploiting one of the following three types of vulnerabilities:
buffer overflow, integer overflow and format string vulnerabilities. Among the three types of vulnerabilities that known control-hijacking attacks exploit, integer overflow
vulnerability receives the least attention. An integer overflow vulnerability occurs when an integer operation
results in loss of information because of finiteness of integer representation. The following table shows an integer overflow vulnerability in Pine 4.55,
a popular program for reading, sending and managing electronic mails . In the function rfc2231_get_param,
the integer n is calculated from a network packet as shown at line 6. By giving a negative number, the
attacker is able to bypass the program’s upper bound check at line 9. When n is subsequently used as an
index to the local buffer pieces, because pieces is a pointer array, n will be implicitly multiplied by 4 and
wraps around into a small positive number. By properly crafting the value of n, the attacker can write to any
4-byte word in the memory.
| line 1 line 2 line 3 line 4 line 5 line 6 line 7 line 8 line 9 line 10 |
#define RFC2231_MAX 64 |
|
In general, an integer overflow alone cannot do much harm. However, it could be used as a stepping stone to mount a buffer overflow or a denial of service attack. Attackers have exploited integer overflow vulnerability to overflow a buffer/array, to manipulate an array’s index, to fool the program into allocating a very large buffer (e.g., through arguments to malloc), and to access unallocated space (e.g., through arguments to memcpy). The latter two typically trigger segmentation exceptions and program crashes, and lead to denial of service attacks.
This project targets at enforcing integer safety in arbitrary C programs. In the most general meaning, enforcing integer safety tries to ensure program execution to be free of unexpected integer operation results. In other words, results should conform to the ones evaluated with mathematical infinite integers. In specific, we deal with three kinds of integer security violations: integer overflow, truncation with information loss, and signed / unsigned integer conversion leading to misinterpretation of the most significant bit. Through detecting those three kinds of violations at run-time, we catch all integer operations generating unexpected results, thus prevent them from being exploited by attackers.
Approach:
We take the approach of an integer overflow-preventing compiler that adds checks to application programs to detect arithmetic overflow, truncation with information loss and signed/unsigned conversion that leads to misinterpretation. Upon detection of an integer error, an exception handler, which is supplied with information about the error, including its type, and the location of the triggering statement in terms of the source file name and source line number, is invoked to take certain actions as configured by the user beforehand.
For performance consideration, the arithmetic overflow/underflow detection leverages the hardware instructions "jo offset" and "jc offset". They are two hardware instructions in Intel IA32 Architecture, which will test the Overflow Flag and Carry Flag respectively to decide whether there is an overflow happens in the previous relevant arithmetic instructions like "add", "sub", "mul", etc.. With the branch prediction feature in today's microprocessor, these two instructions are nearly as cheap as "nop". For other type of violations, there is no specific instructions that enable hardware checking. Thus their detections are based on software checking.
Implementation:
Currently, the prototype is implemented as a compiler extension to GNU GCC compiler version 3.4.1.The compiler can be partitioned into a front end, a middle end, and a back end. The front end performs parsing and then converts source code into the abstract syntax tree (AST) representation, in which all language-specific and source information are available. The middle end converts the AST representation to RTL representation and conducts various optimizations. Finally, the back end generates assembly code for the target architecture. Internally, there are two major date structures: one is tree node and the other is rtx node. Trees are constructed up to represent complex expressions and statements, while rtx nodes are chained together to form instruction sequence.
All the three parts of compiler are extended with instrumentation logic, such that, given a C language program, along with the compilation procedure, first at front-end certain transformation is performed to the tree structure, then at middle-end extra codes are inserted into the instruction sequence in rtx form, and last at back-end several adjustments are made to the output assembly program.
For instrumentation, we inserted checking code in form of rtx sequence into the instruction chain. We also enforced Integer Promotion Rule in the compiler, which is not observed strictly in GCC-3.4.1, so that integer types smaller than int are promoted when an operation is performed on them. Beyond Integer Promotion Rule in ISO C99 specification, we proposed and employed in our prototype a special promotion rule for mix-signed operands, called as Signed/Unsigned Promotion Rule, which says: when the operands are mix-signed and the unsigned operand is no less wider than the signed operand, they are both converted to the shortest signed integer type with a higher rank. The intuition behined this promotion rule is that when the operation demands participating operands having same type, the choice of the destination type must ensure that the destination type subsumes the ranges of the original types of operands. Finally, there are several optimizations in GCC that may introduce integer violations or may interfere the correct setting of Carry/Overflow flags. For each of the optimizations, we have to either circumvent or turn off them.
Evaluation:
We first evaluated the performance overhead of the integer overflow-preventing compiler using four applications. The performance degradation are 8.18% for Apache http server 2.2.0, 4.29% for Samba 2.2.7a, 3.59% ProFTPD ftp server 1.2.10 and 1.77% for Gzip 1.2.4. Note that in contrast to intuition, although the massive number of integer operations, the performance overheads is pretty low, which demonstrates that the imstrumented codes are very efficient.
Among the integer violations that the IOP prototype detects, some of them are exploitable and others are benign. We discovered 14 benign integer overflows in Samba 2.2.7a, 3 in gzip 1.2.4, 7 in Apache http server 2.2.0, 2 in ProFTPD 1.2.10, and 6 in Pine 4.55. By carefully analyzing each of these benign integer violations or false positives, we found that they could be classified into the following four types:
Random Number Generation: Programs sometimes need to generate numbers that appear random, and do not particularly care whether the process of generating such numbers trigger integer violations or not, as long as the process itself is deterministic.
Message Encode/Decode: During marshalling/unmarshalling messages exchanged between client and server, variables of arbitrary types in the messages need to be converted into sequences of characters. In these conversions, an integer violation is acceptable as long as there is no information loss
Mixed usage of signed and unsigned char: There are many instances of assignments between signed and unsigned char, as usually char is used as encoding for characters and the numeric values are not interpreted usually.
Integer as ID: An ID usually consists of a sequence of bits or digits, and is used to identify a particular element in a set. When an integer is used as an ID, only the bit sequence matters.
To evaluate the False Negative rate, we tested our compiler with the following several known integer vulnerabilities. All integer violations are captured by the programs instrumented by the IOP compiler, except the signed/unsigned conversion violation in GNU Radius. In general, this problem is due to aliased pointers pointing to the same memory object, whose solution requires alias analysis and is thus not supported in the current IOP prototype.
Publication:
Related Work: