Archive for the ‘C’ Category.

Run time machine code generation and execution

Some Common Lisp implementations (and I believe other languages’ also, but I don’t know of any) are incremental compilers, which means they are able to read Lisp code during program execution, compile it and make it immediately available for execution, without requiring the program to be restarted.

I find that quite amazing, so I decided to try it out for myself. The following code contains a function that generates an array of opcodes for a function that returns an integer passed as the first function’s argument. After calling the function, it is possible to convert the returned pointer to a function pointer and call it.

#include <stdio.h>
#include <stdlib.h>
 
char* makefunction(int ret)
{
    char* opcodes = malloc(6);
    opcodes[0] = 0xb8;
    *((int*)(opcodes + 1)) = ret;
    opcodes[5] = 0xc3;
    return opcodes;
}
 
int main()
{
    char* code = makefunction(20);
    int (*f)() = (int(*)()) code;
    int a = f();
    printf("%d\n", a);
    return 0;
}

That should print “20″ to the console.

What’s happening here? Here’s a function that returns 2 when called:

int f(void)
{
    return 2;
}

This is the output of objdump -d after compiling it:

f.o:     file format elf32-i386

Disassembly of section .text:

00000000 :
   0:	55                   	push   %ebp
   1:	89 e5                	mov    %esp,%ebp
   3:	b8 02 00 00 00       	mov    $0x2,%eax
   8:	5d                   	pop    %ebp
   9:	c3                   	ret

The first two instructions, as well as the fourth (push, mov and pop), are stack-related code the compiler generates for all functions. So what really accomplishes the task of returning 2 is the following opcode sequence:

b8 02 00 00 00 c3

By checking this reference, I verified that b8 moves a value to the EAX register, which is commonly used to store function return values. The other 4 bytes correspond to the value that should be stored in that register. c3 is the opcode for ret, which returns from a function call (Captain Obvious, 2010).

Now let’s see what makefunction does:

  1. Allocates memory for 6 opcodes:
    char* opcodes = malloc(6);
  2. Writes the opcode for moving a value into the EAX register into the array:
    opcodes[0] = 0xb8;
  3. Writes the actual value that should be moved into EAX
  4. *((int*)(opcodes + 1)) = ret;
  5. Writes the opcode for the ret instruction
    opcodes[5] = 0xc3;
  6. Returns the array of opcodes:
    return opcodes;

Back in main, The pointer to the opcode array is converted to a function pointer of the appropriate type:

int (*f)() = (int(*)()) code;

And then called as any other function:

int a = f();

This generates a call instruction, which causes the current value of the IP register to be saved and the execution to jump to the specified address. In this case, that address corresponds to the opcode array’s address, which contains valid instructions that will be executed by the processor.

The difference here in relation to regular function call code is that the code is being fetched and executed from the data section of the program, instead of the code section. I’m almost sure that will crash on processors with the NX bit enabled, but I haven’t checked it. Anyways, that’s an extremely simple example of how it is possible to generate machine code during run time and execute it right away.

Using feof() and fread()

When reading single bytes from a file in C, one must pay attention to the correct usage of feof() and fread(). At first, the following piece of code seems to work correctly:

const char *filename = "hello";
unsigned char byte;
FILE *fp;
 
fp = fopen(filename, "rb");
 
if (!fp) {
    printf("could not open file\n");
    return 1;
}
 
while(!feof(fp)) {
    fread(&byte, 1, 1, fp);
    printf("%02x\n",byte);
}
 
fclose(fp);

Suppose the file “hello” has the following contents:

0000000: 68 65 6c 6c 6f 0a                                hello.

(which is the string “hello” followed by an LF)

When the code above is run, the following output is produced:

68
65
6c
6c
6f
0a
0a

Notice the last character seems to be read twice. The problem is that feof() only returns true after attempting to read past the end of the file. In order to fix this “read-twice” behavior, the return value of fread() must be checked:

if(!fread(&byte, 1, 1, fp)) {
    break;
}

Note: Using feof() as the while condition is kind of redundant here. In this situation, one could simply use while(1) and the behavior would be the same.

Update: A much better solution was given by my friend Bryan:

const char *filename = "hello";
unsigned char byte;
FILE *fp;
 
fp = fopen(filename, "rb");
 
if (!fp) {
    printf("could not open file\n");
    return 1;
}
 
fread(&byte, 1, 1, fp);
while(!feof(fp)) {
    printf("%02x\n",byte);
    fread(&byte, 1, 1, fp);
}
 
fclose(fp);