Decompiler Rec Test

Program-Transformation.Org: The Program Transformation Wiki

Reverse Engineering Compiler (REC) Tests

Some simple tests were performed on REC 1.6 for Linux.

Fibo/286

This test file is the same one used to test the 286 decompilers (dcc and Exe-2-C). It's pretty impressive that REC can handle this old code at all. Original C source:

int main()
{ int i, numtimes, number;
  unsigned value, fib();

    printf("Input number of iterations: ");
    scanf ("%d", &numtimes);
    for (i = 1; i <= numtimes; i++)
    {
        printf ("Input number: ");
        scanf ("%d", &number);
        value = fib(number);
        printf("fibonacci(%d) = %u\n", number, value);
    }
    exit(0);
}

unsigned fib(x)                 /* compute fibonacci number recursively */
int x;
{
    if (x > 2)
        return (fib(x - 1) + fib(x - 2));
    else
        return (1);
}

The disassembly for the fib function is

/*      Procedure: 0x0000025B - 0x0000028A
 *      Argument size: 4
 *      Local size: 0
 *      Save regs size: 4
 */

L0000025B(A4)
/* unknown */ void  A4;
{
    if(A4 <= 2) {
        ax = 1;
    } else {
        (save)L0000025B(A4 - 1);
        dx = L0000025B(A4 + 65534);
        (restore)ax;
        ax = ax + dx;
    }
}
As with Exe-2-C, push and pop are explicit (as save and restore), registers are still visible, and no attempt is made to join the semantics of individual instructions. Parameter type and return values are not present.

Here is the code for main:

/*      Procedure: 0x000001FA - 0x0000025A
 *      Argument size: 0
 *      Local size: 4
 *      Save regs size: 8
 */

L000001FA()
{
        /* unknown */ void  si;
        /* unknown */ void  di;
        /* unknown */ void  Vfffffffc;
        /* unknown */ void  Vfffffffe;



    L00000E11(0x194);
    L0000169A(0x1b1, & Vfffffffc);
    for(si = 1; si <= Vfffffffc; si = si + 1) {
        L00000E11(0x1b4);
        L0000169A(0x1c3, & Vfffffffe);
        (save)L0000025B(Vfffffffe);
        L00000E11(0x1c6, Vfffffffe);
    }
    return(L000002C7(0));
}

It has not recognised the library functions printf, scanf, and exit. It has recovered the for loop. There is no restore to correspond to the save (it has not realised that the save is actually passing a second parameter to printf). String constants are not recovered.

Fibo/Elf

Elf is a more modern binary file format, and it has some symbolic information (even in binary files "stripped" of their symbol tables). REC was able to find the name for functions fib and main, even when the symbols were stripped. Strangely, however, it failed to recover the string constants in the _un_stripped version. Here is the original C source (main is a little different to the 286 Fibo program):

#include 

int fib (int x)
{
        if (x > 1)
                return (fib(x - 1) + fib(x - 2));
        else return (x);
}

int main (void)
{       int number, value;

        printf ("Input number: ");
        scanf ("%d", &number);
        value = fib(number);
        printf("fibonacci(%d) = %d\n", number, value);
        return (0);
}
Here is the decompilation for fib:
/*      Procedure: 0x08048798 - 0x080487C8
 *      Argument size: 4
 *      Local size: 0
 *      Save regs size: 8
 */

fib(A8)
/* unknown */ void  A8;
{
        /* unknown */ void  esi;
    if(A8 > 1) {
        (save)A8 - 1;
        esi = fib();
        eax = fib(A8 - 2) + esi;
    } else {
        eax = A8;
    }
}
It correctly generated the actual parameter for the second (recursive) call to fib, but in the first one it generated a save and no parameter. It correctly deals with the return value from the recursive calls to fib, but fails to emit a return statement. Again, the number of parameters is right, but the type is wrong.

Here is REC's decompilation for main (stripped version):

/*      Procedure: 0x080487CC - 0x0804882C
 *      Argument size: 0
 *      Local size: 4
 *      Save regs size: 8
 */

main()
{
        /* unknown */ void  ebx;
        /* unknown */ void  esi;
        /* unknown */ void  Vfffffffc;



    (save)"Input number: ";
    printf();
    (save) & Vfffffffc;
    (save)"%d";
    scanf();
    ebx = Vfffffffc;
    esp = esp + 12;
    if(ebx > 1) {
        esi = fib(ebx - 1);
        eax = fib(ebx - 2) + esi;
    } else {
        eax = ebx;
    }
    printf("fibonacci(%d) = %d\n", Vfffffffc, eax);
    esp = ebp - 12;
    return(0);
}
Library function names are recovered (it's much easier with dynamic linking). Here, the arguments to some of the library functions are passed correctly to the library functions, and not to others. Note that one level of fib has actually been inlined into main by the compiler, and this time, it gets the parameters correct.

Fibo/SPARC

The same program as above but compiled for SPARC/Solaris (also using gcc) was decompiled, using the SPARC/Solaris version of REC. The results for fib are:

/*      Procedure: 0x00010A9C - 0x00010ACF
 *      Argument size: -112
 *      Local size: 112
 *      Save regs size: 0
 */

fib()
{
    r16 = r24;
    if(r16 > 1) {
        r24 = fib(r16 + -1);
        r24 = r24 + fib(r16 + -2);
    }
    return(r24);
}
It got the formal parameters right to the recursive calls correct, but not the formal parameter. This time it does return a value.

This is the result for main:

/*      Procedure: 0x00010AD0 - 0x00010B2F
 *      Argument size: -120
 *      Local size: 120
 *      Save regs size: 0
 */

main()
{
        /* unknown */ void  Vffffffec;
    printf("Input number: ");
    r9 = & Vffffffec;
    scanf("%d");
    r24 = Vffffffec;
    r10 = r24;
    if(r24 > 1) {
        r8 = r24 + -1;
        r16 = fib();
        r8 = r24 + -2;
        r10 = r16 + fib();
    }
    r9 = Vffffffec;
    printf("fibonacci(%d) = %d\n");
    return(r24);
}
Here, it gets the first parameter to printf and scanf correct; the second parameters are simply assigned to r9. This time, the actual parameter to the calls to fib are missing.

Switch/SPARC

This program tests the recognition of switch statements.

The original C source code is

#include 
int main(int argc)
{
    switch(argc)
    {
        case 2:
            printf("Two!\n"); break;
        case 3:
            printf("Three!\n"); break;
        case 4:
            printf("Four!\n"); break;
        case 5:
            printf( "Five!\n"); break;
        case 6:
            printf( "Six!\n"); break;
        case 7:
            printf( "Seven!\n"); break;
        default:
            printf( "Other!\n"); break;
    }
    return 0;
}
A disassembly of the first part of main is as follows:
main()
        1090c:  9d e3 bf a0        save         %sp, -96, %sp
        10910:  92 26 20 02        sub          %i0, 2, %o1
        10914:  11 00 00 42        sethi        %hi(_ex_text0), %o0
        10918:  90 02 20 f4        add          %o0, 0xf4, %o0
        1091c:  80 a2 60 05        cmp          %o1, 5
        10920:  18 80 00 23        bgu          0x109ac
        10924:  93 2a 60 02        sll          %o1, 2, %o1
        10928:  d2 02 40 08        ld           [%o1 + %o0], %o1
        1092c:  81 c2 40 08        jmp          %o1 + %o0
        10930:  01 00 00 00        nop          
The decompilation of the first part of main is as follows:
/*      Procedure: 0x0001090C - 0x000109BF
 *      Argument size: -96
 *      Local size: 96
 *      Save regs size: 0
 */

main()
{
    r9 = 2 - r24;
    r8 = 0x108f4;
    r9 = r9 << 2;
    if(r9 <= 5) {
        r9 = *(r9 + 0x108f4);
        goto (r9 + 0x108f4);
        r8 = printf(0x10a38);
        }
        return(r24);
        r8 = printf("Three!\n");
        }
        return(r24);
        r8 = printf("Four!\n");
        ...
This shows that the SPARC semantics are quite immature compared to the X86 semantics. For example, the first statement has the operands to the subtract around the wrong way. Worse, the compare against 5 must happen before the left shift by 2 (which multiplies the register by 4). Finally, the indirect jump is left as goto (r9 + ...). Note that one of the parameters to printf is not converted to a string, but the others are.

I thought that the "computed goto" might be because the table above is of offsets, not direct pointers to code. However, the same code compiled with gcc (which does use straight pointers) is about the same:

    if(r9 <= 5) {
        goto ( *(0x10a7c + (r9 << 2)));

Switch/X86

The same source code as the above was compiled (also with gcc) for Pentium (Solaris/X86, though the compiler for Linux/X86 is very similar). The beginning of main disassembles as follows:

main() 80488f0: 55 pushl %ebp 80488f1: 8b ec movl %esp,%ebp 80488f3: 8b 45 08 movl 8(%ebp),%eax 80488f6: 05 fe ff ff ff addl $0xfffffffe,%eax 80488fb: 3d 05 00 00 00 cmpl $0x5,%eax 8048900: 76 76 jbe 0x76 <8048978> 8048902: 68 a0 9b 04 08 pushl $0x8049ba0 8048907: e8 c8 fe ff ff call 0xfffffec8 ... 8048978: ff 24 85 e8 89 04 08 jmp *134515176(,%eax,4)

The decompiled output for the start of main is

/* DEST BLOCK NOT FOUND: 08048978 -> 080487d4 */
/*      Procedure: 0x080488F0 - 0x0804898F
 *      Argument size: 4
 *      Local size: 0
 *      Save regs size: 0
 */

main(A8)
/* unknown */ void  A8;
{
    eax = A8 + -2;
    if(eax > 5) {
        printf(0x8049ba0);
        return(0);
        printf(0x8049b58);
        esp = ebp;
        return(0);
        printf(0x8049b64);
        esp = ebp;
        return(0);
        printf(0x8049b70);
        esp = ebp;
...
    }
    goto *(eax * 4 + 0x80489e8)[L08048914, L08048928, L0804893c, L08048950,
L08048964, L0804897f, ]goto ( *(eax * 4 + 0x80489e8));
Again, the indirect jump becomes a sort of "computed goto", but this time the possible destination labels are given. Unfortunately, the labels are missing from the code, and it would have been nice to see an actual switch statement.

CategoryDecompilation

Transform.DecompilerRecTest moved from Transform.DecompilationRecTest on 20 Mar 2003 - 03:06 by MikeVanEmmerik - put it back