Tuesday, August 18, 2009

Virtual Machines And Manual Transmissions

Stick Shift Knob

I've chosen a manual transmission for every vehicle I've purchased. It is a personal preference, I like the feeling of control over the engine and the ability to trade power for torque. Driving a stick shift was also an advantage in school: practically nobody knew how to drive it, so nobody could borrow my car.

For software development I code mostly in C, which is a rather thin layer on top of the machine. Even C++, while still considered a low-level language, nonetheless implements significantly more abstraction. Consider the following simple example of incrementing a variable in C and in C++:

C:
int val = 0;

int incr() {
    val++;
}
C++:
class exi {
    int val;

    public:
        exi() { val = 0; };

        int incr() {
            val++;
        }
};
Note that in neither case is incr() declared to take an argument. We'll use one of my favorite techniques, disassembling the binary to see how it works. This time we're looking at PowerPC opcodes.
_main:
 bl _incr  branch to incr()
 
_incr:
 mfspr r9,lr address of val
 lwz r2,0xbc(r9) fetch val
 addi r2,r2,0x1 increment
 stw r2,0xbc(r9) store new val
 blr return
_main:
 ... much C++ object init removed...
 addi r3,r1,0x38 object addr in arg0
 bl __ZN6exi4incrEv  branch to exi::incr()
 
__ZN6exi4incrEv:
 lwz r2,0x0(r3) fetch val from *arg0
 addi r2,r2,0x1 increment
 stw r2,0x0(r3) store new val
 blr return

Though the C++ source code does not show an argument to exi::incr(), at the machine level there nonetheless is one. The object address is passed as the first argument. Passing the object is necessary for C++ to handle "this" object - it has to have a pointer to operate on.

In low level languages you can generally see the relationship from source to the resulting machine code, even when significant compiler optimization is done. As we move to higher level languages, the abstractions between the source and machine code grow ever larger. C++ is somewhat higher level than C, and at the machine level the mapping from instructions back to source code is less clear. More abstract languages like Java, Python, and C# compile to a virtual machine running on top of the real system. If one gathered an instruction trace of CPU execution, one would be hard-pressed to correlate these instructions back to the source code they implement.


 
Foreshadowing

One can see the day coming when manual transmissions will be unavailable in most car models. Continuously variable transmissions were an early indication of this trend, with an essentially infinite number of gearing ratios that could only be effectively controlled by an engine computer. Hybrid vehicles have a complex transmission which meshes the output of two motors, and again can only be controlled by computer. Future vehicles will likely have an entirely electric drivetrain, with no need for a conventional transmission at all. The simple fact is that the engine computer can do a far better job of optimizing the behavior of the drivetrain than I can.

I'm currently digging in to the low level aspects of virtual machines. Running compilation just-in-time as part of a virtual machine has several notable advantages over static compilation with gcc:

  • gcc's optimization improves if you compile with profiling, run the program, and then compile again. This is so annoying that it is hardly ever done. A virtual machine always has profiling data available, as it interprets the bytecodes for a while before running the JIT.
  • gcc's profile-guided optimization is done in advance, on a representative corpus of input data which the programmer supplies. If the program operates on inputs which differ substantially from this, its performance will not be optimal. The JIT optimization is always done with the real data as profiling input.
  • gcc can optimize for a specific CPU pipeline, such as Core2 vs NetBurst vs i486. One is trading off performance improvement on the favored CPU versus degradation on other CPUs. The JIT can know the specific type of CPU being used and can optimize accordingly.
  • gcc can do static constant propagation across subroutines. That is, if a constant NULL is passed to a function gcc can create a version of that function which will eliminate any unreachable code. The JIT can create optimal versions of functions tuned for specific arguments dynamically, whether they are constant or variable. It just has to validate that the arguments still match the expected, and it is free to jump back to the interpreted bytecode on a mismatch.

This should be fun. Some initial thoughts:

  • modern CPUs have extensive branch prediction and speculative execution features, to keep it from spending all its time stalled for the outcome of a branch decision. What happens when we have a lot more big loops with straight line code, where the JIT has optimized all the conditionals up to sanity checks at the entry to the function?
  • Does widespread use of JIT mean that VLIW architectures become more viable? VLIW is particularly dependent on the compiler to match the code to available hardware resources, which a JIT is better positioned to tackle.