Wednesday, July 28, 2010

Exploring libjit

Continuing investigation into JIT compilation for virtual machines, today we will delve into some of the plumbing. libjit is a library for implementing JIT compilers in a platform independent way. A virtual machine calls functions within libjit to emit abstract, low level operations such as arithmetic computation or branches. libjit will handle translation to the native opcodes of the platform on which it is running. x86 and ARM are currently supported, with other CPU architectures falling back to a slower interpreted path. libjit was created for DotGNU portable.NET by Rhys Weatherley, then the lead developer of the project. It is now used in the Mono project, an open source effort to implement the .NET CLR.

To investigate libjit we'll turn to the Official Subroutine for Programming Language Evaluation on the Internet, Fibonacci numbers. We'll start with an iterative C implementation:

unsigned long long fib (unsigned int n) {
  if (n <= 2) return 1;
  unsigned long long a = 1, b = 1, c;
  do {
    c = a + b;
    b = a;
    a = c;
  } while (n > 2);
  return c;

libjit is designed to be driven from an Abstract Syntax Tree description of an algorithm. By descending through the AST outputting JIT operations at each node, you can construct native instructions for that algorithm. Had we written fib() in an interpreted language, we'd start with the AST within the interpreter. For the purposes of this example I've skipped the AST, manually constructing a fibonacci number routine by translating each line of C source code into JIT calls. The code is below, for your edification and bemusement.

jit_function_t create_fib_jit (jit_context_t *context) {
  jit_type_t param;
  jit_type_t signature;
  jit_function_t function;
  jit_value_t n, a, b, c;
  jit_value_t constant, compare, n_minus_1, a_plus_b;
  jit_label_t label1 = jit_label_undefined;
  jit_label_t label2 = jit_label_undefined;

  *context = jit_context_create();

  /* Build the function signature, fib(unsigned int) */
  param = jit_type_uint;
  signature = jit_type_create_signature(jit_abi_cdecl, jit_type_ulong,
                                        &param, 1, 1);
  function = jit_function_create(*context, signature);

  /* Begin emitting instructions */

  /* "if (n <= 2)" */
  n = jit_value_create(function, jit_type_uint);
  n = jit_value_get_param(function, 0);
  constant = jit_value_create_nint_constant(function, jit_type_uint, 2);
  compare = jit_insn_le(function, n, constant);
  jit_insn_branch_if_not(function, compare, &label1);

  /* "return 1;" */
  constant = jit_value_create_nint_constant(function, jit_type_ulong, 1);
  jit_insn_return(function, constant);

  /* "if (n <= 2)" else branches here */
  jit_insn_label(function, &label1);

  /* "unsigned long long a = 1, b = 1, c;" */
  constant = jit_value_create_nint_constant(function, jit_type_ulong, 1);
  a = jit_value_create(function, jit_type_ulong);
  jit_insn_store(function, a, constant);
  b = jit_value_create(function, jit_type_ulong);
  jit_insn_store(function, b, constant);
  c = jit_value_create(function, jit_type_ulong);

  /* "do {" */
  jit_insn_label(function, &label2);

  /* "c = a + b;
   *  b = a;
   *  a = c;" */
  a_plus_b = jit_insn_add(function, a, b);
  jit_insn_store(function, c, a_plus_b);
  jit_insn_store(function, b, a);
  jit_insn_store(function, a, c);

  /* "n--;" */
  constant = jit_value_create_nint_constant(function, jit_type_uint, 1);
  n_minus_1 = jit_insn_sub(function, n, constant);
  jit_insn_store(function, n, n_minus_1);

  /* "} while (n > 2);" */
  constant = jit_value_create_nint_constant(function, jit_type_uint, 2);
  compare = jit_insn_gt(function, n, constant);
  jit_insn_branch_if(function, compare, &label2);

  /* "return c;" */
  jit_insn_return(function, c);

  if (jit_function_compile(function) == 0) {
    printf("compilation error occurred\n");
  return function;

libjit compiles this function to native opcodes. Running on x86, the resulting assembly looks a great deal like the series of jit_insn_* calls which generated it.

   push   %rbp
   mov    %rsp,%rbp
   sub    $0x20,%rsp
   mov    %r12,(%rsp)
   mov    %r13,0x8(%rsp)
   mov    %r14,0x10(%rsp)
   mov    %r15,0x18(%rsp)
   mov    %rdi,%r15
   cmp    $0x2,%edi
   ja     A
   mov    $0x1,%eax
   jmpq   C
A: mov    $0x1,%r14d
   mov    $0x1,%r13d
B: mov    %r14,%r12
   add    %r13,%r12
   mov    %r14,%r13
   mov    %r12,%r14
   dec    %r15d
   cmp    $0x2,%r15d
   ja     B
   mov    %r12,%rax
C: mov    (%rsp),%r12
   mov    0x8(%rsp),%r13
   mov    0x10(%rsp),%r14
   mov    0x18(%rsp),%r15
   mov    %rbp,%rsp
   pop    %rbp

How does it perform compared to the C version? I ran each in a loop of 1,000,000 iterations computing fib(75). The function was run once outside the timing loop, to avoid cache miss effects. The C code was compiled with -O0 and -O2, which made a huge difference in the results. Invoking gcc -O3 resulted in slower code due to overly aggressive loop unrolling. Similarly, libjit can set an optimization level via jit_function_set_optimization_level(), though it made no measurable difference in the results.

C -O0: 367.2 nsecs/iteration
libjit: 344.8 nsecs/iteration
C -O2: 110.1 nsecs/iteration

In this example libjit achieved the speed of a naive C compilation. This makes sense, it is the result of a naive programmer translating C code.

The point of this series of articles is the promise, not necessarily the current reality. The great promise of JIT compilation as compared to static is the ability to make optimizations based on profiling, even specializing routines for specific input values. The routine can sanity check its inputs, and escape back into the interpreter if the input does not match expectations. This would be of great benefit in large code bases, where we frequently have variables which are essentially constant. For example in a web application the language setting is a variable, though any individual user essentially never changes their language setting in the middle of a session. We'll explore this idea more in the future.