Wednesday, January 21, 2009

Blogroll, January 2009

Programming languages are a voyeuristic pursuit for me, as my daily work consists entirely of C and assembly. Nonetheless it is quite clear that the future belongs to high level languages. Garbage collection avoids a huge class of bugs which have vexed developers for decades. Exceptions likewise make the handling of errors simpler and more robust, and having associative arrays as a built-in capability makes them the obvious choice for many needs.

Criticism is leveled at high level languages about their performance, but languages based on a virtual machine will eventually achieve higher performance than C++. Profile-driven optimization is a well known technique to improve software performance, by optimizing for code paths which are actually used. Achieving this with C/C++ compilers is possible by running a first pass binary and feeding back profile data to a second pass. This is such a giant pain in the ass that it is practically never done. Virtual machines, by their very nature, constantly collect profile data while running the interpreter. The Just-In-Time compilation in a VM can take advantage of this information, resulting in straight line, tightly scheduled code with no branches. It is a beautiful thing... or rather, it will be once it all works.

So without further ado, lets talk about blogs that talk about programming languages.


The Blogroll: January 2009
 
1) armstrong on software, by Joe Armstrong

Joe writes mostly about Erlang, a language which has been around since the mid 1980s, though part of that time was hidden within the bowels of the Ericsson corporation. Erlang is very different from other programming languages. For example, there really are not variables in the normal sense: you can give a name to a value, like 'x', but you cannot change the value of x later. The language is single assignment only. Similarly there are no loops (you can't have a loop if you cannot have a loop variable), there are only list comprehensions. There is really no mutable state in an Erlang program, once created a data element is guaranteed not to change until it is garbage collected.

Why does Erlang make these choices? Erlang's primary design features are all around making message passing very cheap and efficient. Complete lack of mutable state is one example: there is no need to copy the current values of variables into a message, as there is no chance the values will change. Data can be freely passed by reference. The messaging infrastructure drives Erlang's reliability features, where Erlang nodes monitor other Erlang nodes and take action if they stop responding. The messaging capabilities also allow excellent scalability across multiple CPUs, which has driven Erlang's recent resurgence (with high-visibility deployments in Facebook chat and Amazon SimpleDB.


 
2) Headius, by Charles Nutter

Ruby is an interesting language, driven to popularity by the excellent Ruby on Rails web application framework. Ruby's primary implementation is mri, which only recently implemented a bytecoded virtual machine. Earlier mri versions were entirely interpreted, and this led to the development a number of alternate implementations on various VMs. JRuby is the most interesting to me personally: it implements the Ruby language atop the Java Virtual Machine, a very mature and well-performing technology. Charles Nutter is driving the JRuby development effort, and joined the staff at Sun Microsystems to work on it full time.


 
3) Code Commit, by Daniel Spiewak

Erlang and Ruby are both dynamically typed languages: functions accept objects as arguments, and dynamically determine their type at runtime. This is certainly powerful, in that a single routine can handle a wide variety of inputs, but it also means that type errors will only be detected at runtime. Unit testing becomes essential in this environment... and I have trouble accepting the premise that I'm supposed to love doing manually that which the toolchain used to be able to do.

Daniel Spiewak writes mainly about Scala. C is a weakly typed language, where it is trivial to cast one type to another. Scala is an example of a more strongly typed language, making the toolchain able to make more guarantees about what the valid inputs are. Scala is also interesting in that it is another language implemented atop the Java VM.

Wednesday, January 7, 2009

Variable Scoping with gcc

Has this ever happened to you? You allocate some sort of resource which needs to be released before returning from the function. You can put the release at the end of the routine for the normal case, but there are a number of error checks which can cause it to return early. You consider calling the reclaim routine inside each error handler, but you're concerned that someone maintaining the code in the future will forget to do so. Instead you put all of the cleanup code at the end of the function, and have each error check use a goto.

Then of course someone argues very strenuously that goto is evil incarnate and must never be used, one thing leads to another, and then you have to find somewhere to hide the body ... wait, nevermind that last bit.

Consider this instead:

int foo()
{
    int fd LOCAL_SCOPE_FD = open("/path/to/file");

    if (error1()) {
        return -1;
    }

    return 0;
}

Though it might appear that I'm suggesting the file descriptor be leaked in order to simplify the code, that is not the case. The magic happens in LOCAL_SCOPE_FD:

#define LOCAL_SCOPE_FD __attribute__((cleanup(local_fd_close)))

void local_fd_close(int *fd)
{
    if (*fd >= 0) close(*fd);
}

__attribute__(cleanup) is a gcc extension. When an automatic variable goes out of scope, the function indicated by the cleanup attribute will be called. If the scope is exited unusually, such as via longjmp() or by calling exit(), the cleanup function does not get called, but normal return statements or falling off the end of the block do work.

It also works within inner blocks. For example:

int foo()
{
    if (do_something) {
        int fd LOCAL_SCOPE_FD = open("/path/to/file");
    }
    /* local_fd_close will be called here. */

    ... more code ...
}

Cleanup can only be applied to automatic variables, i.e. variables on the stack declared within a function. It cannot be used with global or static variables. The cleanup attribute can be used with any variable type, not just integers. The cleanup function receives a pointer to the automatic variable being cleaned up.

I assume that if the Gentle Reader is reading this article, there are good reasons to use C in your problem space. Needless to say, if you want automatic resource reclamation a language other than C would provide more capabilities. Garbage collection and object finalizers are powerful constructs, given a problem space where they are appropriate.


 
Acknowledgements

Many thanks to Matt Peters for pointing out the __attribute__(cleanup) capability to me.

Ian Lance Taylor recently wrote on a similar topic, about support for destructors and exceptions in C.


 
Updates

This article was picked up on reddit, with a few comments. One pertinent comment from erikd:

His local-fd-close() function has a bug, it needs to check the return value of close(), because close can return an error of EINTR which means the prcoess received an interrupt and should retry the close operation.