Sphinx: Interpretation based Software Protection

 

Faculty: Tzi-cker Chiueh

Group Members:


System Overview

Currently, most popular software protection tools are all based on binary encryption. A major problem of self-decrypting binaries is that it is difficult to hide the decryption logic and the decryption key, as they both need to be distributed in the plaintext form. To solve this problem, sphinx takes an alternative view toward binary encryption: it translates an executable binary into an intermediate representation for a special virtual machine, similar in spirit to the byte-code representation for Java, and interpret the intermediate representation at run time. In this process, the plaintext version of the original binary never shows up.

In spirit, the virtual machine used in this approach is much like a lightweight machine emulator. It has similar basic functionalities as an emulator has, such as simulating the memory layout and interpreting the instructions. Depending on the implementation, it could be much simpler than a full-fledged emulator since its goal is mainly to hide the original instructions instead of simulating every aspect of a real machine.


Custom Compiler and Virtual Machine Interpretation

A simple implementation of sphinx could be use a instruction set which one-to-one maps to x86 instructions. In this way, no compiler is needed. It only needs to replace each single instruction with the new instruction. And the interpretation could simple borrow the semantics of x86 instruction set. However, this scheme is no much better than encryption based protection approaches.

To work on a more complex virtual instruction set, we need to have a custom compiler, as well as the corresponding virtual machine to interpret it. Since we don't take for granted that users will protect all the code using the interpretation based scheme, we also need to take care of how to make seamless transitions between native mode and virtual machine mode. This would involve both context switch and shared memory references.
 

Program-Specific Virtual Machine

Because the virtual machine interpreter is in plaintext form, it is essential that the virtual machine specification be kept secret. One way to hide the virtual machine specification is to create a different virtual machine specification for each program to be protected and use it only once. This makes it difficult for attackers to leverage the effort associated with cracking a program instance and apply it to cracking another program instance. Once the instruction set of a virtual machine is defined, sphinx needs to translate a source program into a particular virtual machine's instruction set, as well as generate the corresponding virtual machine for the runtime interpretation. The missing technology here is how to programmatically generate "one-time" virtual machine instruction set that is very different from instance to instance. To be effective and to reduce performance overhead, interpretation should be applied randomly and sporadically, so that it is difficult for the attacker to distinguish between instructions of the input program and those of the interpreter.

Interpretation based protection is a general technique that is supreme to code encryption and decryption, but so far has not received much attention in the software protection literature.

Publications: