Run time machine code generation and execution

Some Common Lisp implementations (and I believe other languages’ also, but I don’t know of any) are incremental compilers, which means they are able to read Lisp code during program execution, compile it and make it immediately available for execution, without requiring the program to be restarted.

I find that quite amazing, so I decided to try it out for myself. The following code contains a function that generates an array of opcodes for a function that returns an integer passed as the first function’s argument. After calling the function, it is possible to convert the returned pointer to a function pointer and call it.

#include <stdio.h>
#include <stdlib.h>
 
char* makefunction(int ret)
{
    char* opcodes = malloc(6);
    opcodes[0] = 0xb8;
    *((int*)(opcodes + 1)) = ret;
    opcodes[5] = 0xc3;
    return opcodes;
}
 
int main()
{
    char* code = makefunction(20);
    int (*f)() = (int(*)()) code;
    int a = f();
    printf("%d\n", a);
    return 0;
}

That should print “20″ to the console.

What’s happening here? Here’s a function that returns 2 when called:

int f(void)
{
    return 2;
}

This is the output of objdump -d after compiling it:

f.o:     file format elf32-i386

Disassembly of section .text:

00000000 :
   0:	55                   	push   %ebp
   1:	89 e5                	mov    %esp,%ebp
   3:	b8 02 00 00 00       	mov    $0x2,%eax
   8:	5d                   	pop    %ebp
   9:	c3                   	ret

The first two instructions, as well as the fourth (push, mov and pop), are stack-related code the compiler generates for all functions. So what really accomplishes the task of returning 2 is the following opcode sequence:

b8 02 00 00 00 c3

By checking this reference, I verified that b8 moves a value to the EAX register, which is commonly used to store function return values. The other 4 bytes correspond to the value that should be stored in that register. c3 is the opcode for ret, which returns from a function call (Captain Obvious, 2010).

Now let’s see what makefunction does:

  1. Allocates memory for 6 opcodes:
    char* opcodes = malloc(6);
  2. Writes the opcode for moving a value into the EAX register into the array:
    opcodes[0] = 0xb8;
  3. Writes the actual value that should be moved into EAX
  4. *((int*)(opcodes + 1)) = ret;
  5. Writes the opcode for the ret instruction
    opcodes[5] = 0xc3;
  6. Returns the array of opcodes:
    return opcodes;

Back in main, The pointer to the opcode array is converted to a function pointer of the appropriate type:

int (*f)() = (int(*)()) code;

And then called as any other function:

int a = f();

This generates a call instruction, which causes the current value of the IP register to be saved and the execution to jump to the specified address. In this case, that address corresponds to the opcode array’s address, which contains valid instructions that will be executed by the processor.

The difference here in relation to regular function call code is that the code is being fetched and executed from the data section of the program, instead of the code section. I’m almost sure that will crash on processors with the NX bit enabled, but I haven’t checked it. Anyways, that’s an extremely simple example of how it is possible to generate machine code during run time and execute it right away.

This entry was posted in C, English, Programming. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="" highlight="">