TY - GEN
T1 - Effective code discovery for ARM/Thumb mixed ISA binaries in a static binary translator
AU - Chen, Jiunn Yeu
AU - Shen, Bor Yeh
AU - Ou, Quan Huei
AU - Yang, Wuu
AU - Hsu, Wei Chung
PY - 2013
Y1 - 2013
N2 - Code discovery has been a main challenge for static binary translation, especially when the source ISA (Instruction Set Architecture) has variable-length instructions, such as the X86 architectures. Due to embedded data such as PC-relative data, jump tables, or paddings in the code section, a binary translator may be misled to translate data as instructions. With variable length instructions, once data is mis-translated as instructions, subsequent decoding of instructions could be wrong. This paper concerns static binary translation for the ARM architectures, which dominate the embedded-system market. Although ARM is considered RISC (Reduced Instruction Set Computing) in many aspects of processors, it does allow the mix of 32-bit instructions (ARM) with 16-bit instructions (Thumb) in the ARM/Thumb mixed executables. Since the instruction lengths of ARM and Thumb are not equal, the locations of the instructions could be 4-byte or 2-byte aligned addresses, respectively. Furthermore, because ARM and Thumb instructions share encoding space, a 4-byte word could be decoded as one ARM instruction or two Thumb instructions. The correct decoding of this 4-byte word is actually determined at run time by the least significant bit of the program counter. For unstripped binaries, mapping symbols can be used to identify ARM code regions and Thumb code regions. However, for stripped binaries, such mapping symbols are not available to assist translation. We have proposed a novel solution to statically translate the stripped executables for the ARM/Thumb mixed ISA. Our static binary translator includes a translation pass which guarantees the correctness of the translated executable by generating multiple versions of translated code for runtime selection. The binary translator also includes a series of optimization analyses which discover and remove most of the code generated in the baseline translation. Based on the SPEC2006 benchmark suite, stripped ARM/Thumb mixed binaries translated by our static binary translator achieve good performance with only 25% of code size increase.
AB - Code discovery has been a main challenge for static binary translation, especially when the source ISA (Instruction Set Architecture) has variable-length instructions, such as the X86 architectures. Due to embedded data such as PC-relative data, jump tables, or paddings in the code section, a binary translator may be misled to translate data as instructions. With variable length instructions, once data is mis-translated as instructions, subsequent decoding of instructions could be wrong. This paper concerns static binary translation for the ARM architectures, which dominate the embedded-system market. Although ARM is considered RISC (Reduced Instruction Set Computing) in many aspects of processors, it does allow the mix of 32-bit instructions (ARM) with 16-bit instructions (Thumb) in the ARM/Thumb mixed executables. Since the instruction lengths of ARM and Thumb are not equal, the locations of the instructions could be 4-byte or 2-byte aligned addresses, respectively. Furthermore, because ARM and Thumb instructions share encoding space, a 4-byte word could be decoded as one ARM instruction or two Thumb instructions. The correct decoding of this 4-byte word is actually determined at run time by the least significant bit of the program counter. For unstripped binaries, mapping symbols can be used to identify ARM code regions and Thumb code regions. However, for stripped binaries, such mapping symbols are not available to assist translation. We have proposed a novel solution to statically translate the stripped executables for the ARM/Thumb mixed ISA. Our static binary translator includes a translation pass which guarantees the correctness of the translated executable by generating multiple versions of translated code for runtime selection. The binary translator also includes a series of optimization analyses which discover and remove most of the code generated in the baseline translation. Based on the SPEC2006 benchmark suite, stripped ARM/Thumb mixed binaries translated by our static binary translator achieve good performance with only 25% of code size increase.
KW - Code discovery problem
KW - Reverse engineering
KW - Static binary translation
UR - http://www.scopus.com/inward/record.url?scp=84892662668&partnerID=8YFLogxK
U2 - 10.1109/CASES.2013.6662525
DO - 10.1109/CASES.2013.6662525
M3 - Conference contribution
AN - SCOPUS:84892662668
SN - 9781479914005
T3 - 2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems, CASES 2013
BT - 2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems, CASES 2013
PB - IEEE Computer Society
T2 - 2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems, CASES 2013
Y2 - 29 September 2013 through 4 October 2013
ER -