Some simple tests were performed on REC 1.6 for Linux.
This test file is the same one used to test the 286 decompilers (dcc and Exe-2-C). It's pretty impressive that REC can handle this old code at all. Original C source:
int main() { int i, numtimes, number; unsigned value, fib(); printf("Input number of iterations: "); scanf ("%d", &numtimes); for (i = 1; i <= numtimes; i++) { printf ("Input number: "); scanf ("%d", &number); value = fib(number); printf("fibonacci(%d) = %u\n", number, value); } exit(0); } unsigned fib(x) /* compute fibonacci number recursively */ int x; { if (x > 2) return (fib(x - 1) + fib(x - 2)); else return (1); }
The disassembly for the fib
function is
/* Procedure: 0x0000025B - 0x0000028A * Argument size: 4 * Local size: 0 * Save regs size: 4 */ L0000025B(A4) /* unknown */ void A4; { if(A4 <= 2) { ax = 1; } else { (save)L0000025B(A4 - 1); dx = L0000025B(A4 + 65534); (restore)ax; ax = ax + dx; } }As with Exe-2-C, push and pop are explicit (as
save
and restore
), registers are still visible, and no attempt is made to join the semantics of individual instructions. Parameter type and return values are not present.
Here is the code for main
:
/* Procedure: 0x000001FA - 0x0000025A * Argument size: 0 * Local size: 4 * Save regs size: 8 */ L000001FA() { /* unknown */ void si; /* unknown */ void di; /* unknown */ void Vfffffffc; /* unknown */ void Vfffffffe; L00000E11(0x194); L0000169A(0x1b1, & Vfffffffc); for(si = 1; si <= Vfffffffc; si = si + 1) { L00000E11(0x1b4); L0000169A(0x1c3, & Vfffffffe); (save)L0000025B(Vfffffffe); L00000E11(0x1c6, Vfffffffe); } return(L000002C7(0)); }
It has not recognised the library functions printf
, scanf
, and exit
. It has recovered the for loop. There is no restore
to correspond to the save
(it has not realised that the save is actually passing a second parameter to printf
). String constants are not recovered.
Elf is a more modern binary file format, and it has some symbolic information (even in binary files "stripped" of their symbol tables). REC was able to find the name for functions fib
and main
, even when the symbols were stripped. Strangely, however, it failed to recover the string constants in the _un_stripped version.
Here is the original C source (main
is a little different to the 286 Fibo program):
#includeHere is the decompilation forint fib (int x) { if (x > 1) return (fib(x - 1) + fib(x - 2)); else return (x); } int main (void) { int number, value; printf ("Input number: "); scanf ("%d", &number); value = fib(number); printf("fibonacci(%d) = %d\n", number, value); return (0); }
fib
:
/* Procedure: 0x08048798 - 0x080487C8 * Argument size: 4 * Local size: 0 * Save regs size: 8 */ fib(A8) /* unknown */ void A8; { /* unknown */ void esi; if(A8 > 1) { (save)A8 - 1; esi = fib(); eax = fib(A8 - 2) + esi; } else { eax = A8; } }It correctly generated the actual parameter for the second (recursive) call to
fib
, but in the first one it generated a save
and no parameter. It correctly deals with the return value from the recursive calls to fib
, but fails to emit a return statement. Again, the number of parameters is right, but the type is wrong.
Here is REC's decompilation for main
(stripped version):
/* Procedure: 0x080487CC - 0x0804882C * Argument size: 0 * Local size: 4 * Save regs size: 8 */ main() { /* unknown */ void ebx; /* unknown */ void esi; /* unknown */ void Vfffffffc; (save)"Input number: "; printf(); (save) & Vfffffffc; (save)"%d"; scanf(); ebx = Vfffffffc; esp = esp + 12; if(ebx > 1) { esi = fib(ebx - 1); eax = fib(ebx - 2) + esi; } else { eax = ebx; } printf("fibonacci(%d) = %d\n", Vfffffffc, eax); esp = ebp - 12; return(0); }Library function names are recovered (it's much easier with dynamic linking). Here, the arguments to some of the library functions are passed correctly to the library functions, and not to others. Note that one level of
fib
has actually been inlined into main
by the compiler, and this time, it gets the parameters correct.
The same program as above but compiled for SPARC/Solaris (also using gcc) was decompiled, using the SPARC/Solaris version of REC. The results for fib are:
/* Procedure: 0x00010A9C - 0x00010ACF * Argument size: -112 * Local size: 112 * Save regs size: 0 */ fib() { r16 = r24; if(r16 > 1) { r24 = fib(r16 + -1); r24 = r24 + fib(r16 + -2); } return(r24); }It got the formal parameters right to the recursive calls correct, but not the formal parameter. This time it does return a value.
This is the result for main:
/* Procedure: 0x00010AD0 - 0x00010B2F * Argument size: -120 * Local size: 120 * Save regs size: 0 */ main() { /* unknown */ void Vffffffec; printf("Input number: "); r9 = & Vffffffec; scanf("%d"); r24 = Vffffffec; r10 = r24; if(r24 > 1) { r8 = r24 + -1; r16 = fib(); r8 = r24 + -2; r10 = r16 + fib(); } r9 = Vffffffec; printf("fibonacci(%d) = %d\n"); return(r24); }Here, it gets the first parameter to printf and scanf correct; the second parameters are simply assigned to r9. This time, the actual parameter to the calls to fib are missing.
The original C source code is
#includeA disassembly of the first part of main is as follows:int main(int argc) { switch(argc) { case 2: printf("Two!\n"); break; case 3: printf("Three!\n"); break; case 4: printf("Four!\n"); break; case 5: printf( "Five!\n"); break; case 6: printf( "Six!\n"); break; case 7: printf( "Seven!\n"); break; default: printf( "Other!\n"); break; } return 0; }
main() 1090c: 9d e3 bf a0 save %sp, -96, %sp 10910: 92 26 20 02 sub %i0, 2, %o1 10914: 11 00 00 42 sethi %hi(_ex_text0), %o0 10918: 90 02 20 f4 add %o0, 0xf4, %o0 1091c: 80 a2 60 05 cmp %o1, 5 10920: 18 80 00 23 bgu 0x109ac 10924: 93 2a 60 02 sll %o1, 2, %o1 10928: d2 02 40 08 ld [%o1 + %o0], %o1 1092c: 81 c2 40 08 jmp %o1 + %o0 10930: 01 00 00 00 nopThe decompilation of the first part of main is as follows:
/* Procedure: 0x0001090C - 0x000109BF * Argument size: -96 * Local size: 96 * Save regs size: 0 */ main() { r9 = 2 - r24; r8 = 0x108f4; r9 = r9 << 2; if(r9 <= 5) { r9 = *(r9 + 0x108f4); goto (r9 + 0x108f4); r8 = printf(0x10a38); } return(r24); r8 = printf("Three!\n"); } return(r24); r8 = printf("Four!\n"); ...This shows that the SPARC semantics are quite immature compared to the X86 semantics. For example, the first statement has the operands to the subtract around the wrong way. Worse, the compare against 5 must happen before the left shift by 2 (which multiplies the register by 4). Finally, the indirect jump is left as goto (r9 + ...). Note that one of the parameters to printf is not converted to a string, but the others are.
I thought that the "computed goto" might be because the table above is of offsets, not direct pointers to code. However, the same code compiled with gcc
(which does use straight pointers) is about the same:
if(r9 <= 5) { goto ( *(0x10a7c + (r9 << 2)));
The same source code as the above was compiled (also with gcc) for Pentium (Solaris/X86, though the compiler for Linux/X86 is very similar). The beginning of main disassembles as follows:
main() 80488f0: 55 pushl %ebp 80488f1: 8b ec movl %esp,%ebp 80488f3: 8b 45 08 movl 8(%ebp),%eax 80488f6: 05 fe ff ff ff addl $0xfffffffe,%eax 80488fb: 3d 05 00 00 00 cmpl $0x5,%eax 8048900: 76 76 jbe 0x76 <8048978> 8048902: 68 a0 9b 04 08 pushl $0x8049ba0 8048907: e8 c8 fe ff ff call 0xfffffec8 ... 8048978: ff 24 85 e8 89 04 08 jmp *134515176(,%eax,4)
The decompiled output for the start of main is
/* DEST BLOCK NOT FOUND: 08048978 -> 080487d4 */ /* Procedure: 0x080488F0 - 0x0804898F * Argument size: 4 * Local size: 0 * Save regs size: 0 */ main(A8) /* unknown */ void A8; { eax = A8 + -2; if(eax > 5) { printf(0x8049ba0); return(0); printf(0x8049b58); esp = ebp; return(0); printf(0x8049b64); esp = ebp; return(0); printf(0x8049b70); esp = ebp; ... } goto *(eax * 4 + 0x80489e8)[L08048914, L08048928, L0804893c, L08048950, L08048964, L0804897f, ]goto ( *(eax * 4 + 0x80489e8));Again, the indirect jump becomes a sort of "computed goto", but this time the possible destination labels are given. Unfortunately, the labels are missing from the code, and it would have been nice to see an actual switch statement.