Planet D

August 25, 2008

Leandro Lucarella: Avoiding counter updates

The main drawback of reference counting (leaving cycle aside) probably is high overhead it imposes into the client program. Every pointer update has to manipulate the reference counters, for both the old and the new objects.

Function calls

This includes every object passed as argument to a function, which one can guess it would be a lot (every method call for example). However, this kind of rc updates can be easily optimized away. Let's see an example:

class SomeClass
{
      void some_method() {}
}

void some_function(SomeClass o)
{
      o.some_method();
}

void main()
{
      auto o = new SomeClass;
      some_function(o);
}

It's clear that o should live until the end of main(), and that there is no chance o could be garbage collected until main() finishes. To express this, is enough to have o's rc = 1. There is no need to increment it when some_function() is called, nor when some_method() is called.

So, theoretically (I really didn't prove it =) is not necessary to update object's rc when used as arguments.

Local pointers update

What about pointers in the stack? Most of the time, pointers updates are done in local variables (pointers in the stack, not in the heap). The GC book talks about 99% of pointers update done in local variables for Lisp and ML. I don't think D could have that much but I guess it could be pretty high too.

Mental note

Gather some statistics about the number of local pointers update vs. heap pointers update in D

Fortunately Deutsch and Bobrow created an algorithm to completely ignore local pointers update, at the cost of relaying on some kind of tracing collector, but that only have to trace the stack (which should be pretty small compared to the heap).

What the algorithm proposes is to use simple assignment when updating local pointers. Pointers living in the heap manipulates rc as usual, but when the count drops to 0, the object is added to a zero count table (ZCT) (and removed if some pointer update increments the counter again).

Finally, at some point (usually when you run out of memory), the ZCT has to be reconciled, doing some simple steps: trace the stack looking for pointers and incrementing their counters and remove any object with rc = 0. Finally, decrement all the counters of the objects pointer to by the stack pointers.

This technique seems to be a good mix of both reference counting and tracing collectors: small pauses (the stack is usually small), low overhead for counter manipulation. The only missing point is cycles. At first sight, if we need a tracing collector for cycles, this algorithm seems pretty useless because you have to trace all the heap and stack to free cycles, so the optimization is lost. Big pauses are here again.

I have the feeling I'm missing something and the ZCT could be useful when comes to reclaim cycles, but I have to think a little more about that.

Bartosz Milewski: Bartosz Milewski


With the multicore explosion in the making, are we going to be running hundreds of thousands of threads in a single program? Erlang programmers would emphatically say, yes! C++ and Java programmers would probably say no.

Why this discrepancy?

Thread model based on heavy-duty OS threads and mutexes has its limitations. You can ask server writers, or Google for “thread per connection” to convince yourself. Servers use thread pools exactly because of that.

Thread pools are an admission of defeat for the thread model. Having to pool threads tells you that:

  • Thread creation is not fast enough
  • Threads’ consumption of resources is substantial, so it makes sense to keep their numbers down

Granted, these are technical problems that might be overcome in future by improvements in operating systems.

The more fundamental problem with threads has its root in memory sharing. It seems like sharing offers great advantage in terms of performance, but sharing requires locking. It’s a well known fact that locking doesn’t scale (or compose). Between races and deadlocks, it’s also extremely hard to get right.

Here’s what Erlang does

Erlang gives up on sharing!

Threads that don’t share memory are called processes. We tend to think of processes as heavyweight beasts implemented by operating systems. That’s because one needs the operating system to strictly enforce the no-sharing policy (the isolation of address spaces). Only the OS can manage separate address spaces and the passing of data between them.

But that’s not the only way. The isolation might instead be enforced by the language. Erlang is a functional language with strict copy semantics and with no pointers or references. Erlang processes communicate by message passing. Of course, behind the scenes, messages are passed through shared memory, thus avoiding a large performance hit of inter-process communication. But that’s invisible to the client.

Erlang rolls out its own threads!

Erlang interpreter provides lightweight processes (so lightweight that there’s a benchmark running 20 million Erlang processes).

And there is a bonus: Erlang code that uses lightweight processes will also work with heavyweight processes and in a distributed environment.

Why don’t we all switch to Erlang?

As far as I know there are two main reasons:

  • It’s a weird language. Don’t get me wrong, I love functional programming for its mathematical beauty, terseness, and elegance. But I had to rewire my brain to be able to write pure functional programs. Functional paradigm is as alien to our everyday experience as quantum mechanics. (CS grad students: you’re not typical programmers.)
  • Messages have to be copied. You can’t deep-copy a large data structure without some performance degradation, and not all copying can be optimized away (it requires behind-the-scenes alias analysis). This is why mainstream languages (and I will even include Scala in this category) don’t abandon sharing. Instead they rely on programmer’s discipline or try to control aliasing.

Conclusions

Native threads are expensive. Interpreted languages, like Erlang, can afford to implement their own lightweight threads and schedulers. I don’t see that happening in compiled languages. The hope is that  operating systems will improve their implementations of threads–I hear Linux’s NPTL already is a big improvement in this area. In the meanwhile, we have to rely on thread pools.

Shared memory concurrency model is the reason why multithreaded programming is so difficult and error-prone. Separating address spaces simplifies programming, but at a performance cost. I believe that some kind of compromise is possible. A message-passing or an actor model can work with sharing, as long as aliasing is under control.

Inspiration for this post

After my last post on thin lock implementation, I was asked the question, why I used such a small number, 1024, for the maximum number of threads. It was actually a number I’ve found in the D compiler runtime. A thin lock could easily work with a much larger number of threads. In fact I’m going to substantially increase the number of bits for thread ID at the expense of recursion count. But this question got me thinking about scalability of threading in general.

Fibers

Fibers (as well as Java’s green threads) are not an alternative to heavyweight threads. Fibers can’t run in parallel, so they have no way to use multiple processors. They are more of a flow-of-control construct, often used interchangeably with coroutines.

Interesting reading

August 24, 2008

leonardo maffi: Updates and links

Several updates to my D libs, among them there's a transpose function, and the Range!() templates, their syntax is similar to the range([start,] stop[, step]), but it generates a tuple of the numbers at compile time. It's useful for "static foreach" loops, where range extrema are compile time constants:
foreach (i; Range!(3)) // static
    a[i] = b[i];
becomes unrolled and compiled as:
    a[0] = b[0];
    a[1] = b[1];
    a[2] = b[2];
Using the transpose function you can solve the following codegolf problem:
http://codegolf.com/grid-computing

Given a 10x10 grid of 0-padded numbers ranging from 00 to 99, as below, it asks to find the max among the sum of the columns and rows:
01 34 46 31 55 21 16 88 87 87
32 40 82 40 43 96 08 82 41 86
30 16 24 18 04 54 65 96 38 48
32 00 99 90 24 75 89 41 04 01
11 80 31 83 08 93 37 96 27 64
09 81 28 41 48 23 68 55 86 72
64 61 14 55 33 39 40 18 57 59
49 34 50 81 85 12 22 54 80 76
18 45 50 26 81 95 25 14 46 75
22 52 37 50 37 40 16 71 52 17
For that problem the solution is 615, obtained by the sum of the 8th column.
D solution using my libs:
import d.all, std.string, std.conv;

void main() {
    auto m = map((string l){ return map(&toInt, split(l)); }, xstdin);
    putr(max(map(&sum!(int[]), m ~ inTranspose(m))));
}
You can remove most of the whitespace to produce a shorter code. It's no match still to the Perl version, but it's enough.

------------------------

Third part of an exploration of the Scala language:
http://blog.objectmentor.com/articles/2008/08/14/the-seductions-of-scala-part-iii-concurrent-programming


The functional language ATS, the compiler looks able to create programs about as fast as C ones, among other qualities:
http://www.ats-lang.org/
Few examples from the Computer Shootout, timings compared to the C versions are inside each source:
http://www.ats-lang.org/EXAMPLE/SHOOTOUT/


Finally released the free NumPy manual:
http://www.tramy.us/numpybook.pdf


A nice gallery of visualization strategies of text documents:
http://www.timshowers.com/2008/08/visualization-strategies-text-documents/

Alan Knowles: In-line comments on RooJS1 docs - More nails in ExtJs coffin

Just a small update on roojs1 - I've added in-line comments to the RooJs1 documentation - so if you think a method or class is not well explained, add your comment, example code, or bug note right on the documentation. - Something I think makes a huge difference to the usability of code...

Other small changes are also beginning to be added - including the fields property of a Roo.toolbar, and the new Roo.form.Hidden element, to make creating forms with hidden variables simpler.

Anyway, rumors are swirling around that there is a recommendation to avoid ExtJS among certain linux distributors due to uncertainties and general lack of confidence in the license, it's continual changing and general poor behavior by Jack in not understanding the implications of the bait and switch changes.

It also appears that Jack's Legal team are working harder than his coding team again, sending out threatening letters to people involved in various ExtJs2 derivative works. (which is the key reason I'm focusing on v1 - clear, definitive separation and licenses that do not allow for such shenanigans)

While ExtJS2 offered some interesting new features. The significant change to the object rendering model, from my experience has been broken on every release prior to the GPL one, (and since I never tested it after that release, I suspect it's still broken).

Other changes relating to a clearer heirachy of object model look like they should be a simple addition to ExtJS1, along with better support for non DOM element dependent constructors. Hence I think building on ExtJS1 appears to be a better long term move, and sounds like a few more are beginning to see that wisdom.

Michael Parker: News Bits

A few news items that have popped up recently: Mathhias Thurau has GPLed his game Automorun2008. You can find the D source code from the project’s web site. Vincent Richomme has announced GYNOID, a D compiler for Windows CE platforms (Pocket PC/Windows Mobile). It’s based on CeGCC 4.1.0. Ary Borenszweig has uploaded a nightly build of Descent, the [...]

August 21, 2008

leonardo maffi: Array Interleaving Kata

Another Kata, an exercise in programming style. This time the problem is just to interleave the items of two given arrays. Here I have tried to make the code simple/readable (and this often means short too), yet efficient enough. Such purposes are sometimes opposed, so the problem isn't trivial. A good programmer is supposed to be able to write elegant code that has such qualities.

This is the original C version, I have mostly copied it from the original source, it's brittle and not general, but it's short (maybe even too much short) and for people that know C, it's readable enough:
#include "stdio.h"
#include "string.h"

void interleave(const int *a1, size_t len1,
                const int *a2, size_t len2, int *result) {
    for ( ; len1 && len2; len1--, len2--) {
        *result++ = *a1++;
        *result++ = *a2++;
    }
    if (len1)
        memmove(result, a1, len1 * sizeof(int));
    if (len2)
        memmove(result, a2, len2 * sizeof(int));
}

int main() {
    int a1[] = {1, 2, 3, 4, 5};
    int a2[] = {6, 7, 8};
    int result[8];
    size_t i;

    interleave(a1, 5, a2, 3, result);
    for (i = 0; i < 8; i++)
        printf("%d ", result[i]);
    printf("\n");
    return 0;
}


A D version similar to the C one can be written (but without the 'const' in the definition and other minor things), but it's not in the style of D (D-thonic?) and not safe enough. To avoid writing a D version that shares too many of the problems of the C version I have first tried to write a very short yet efficient Python version. The following may be among the most efficient Python versions (most computations are performed by built-ins) and the code is very short (but probably this code isn't the most efficient if you use the current veresion of Psyco, because it doesn't digest itertools. A new version of Psyco is in the works that will probably manage itertools efficienty enough). I have have had to write several Python versions, with subtly or not so subtly different code, trying to find the best balance. In the end I think this Python code is short enough, but it's not elegant enough for my tastes:
from itertools import islice

def interleaveLists(L1, L2):
    """
    >>> print interleaveLists([1, 2, 3], [4, 5, 6, 7])
    [1, 4, 2, 5, 3, 6, 7]
    >>> print interleaveLists([1, 2, 3], [8, 9, 10])
    [1, 8, 2, 9, 3, 10]
    >>> print interleaveLists([4, 5, 6, 7], [1, 2, 3])
    [4, 1, 5, 2, 6, 3, 7]
    >>> print interleaveLists([1, 2, 3], [])
    [1, 2, 3]
    >>> print interleaveLists([], [1, 2, 3])
    [1, 2, 3]
    >>> print interleaveLists([], [])
    []
    """
    result = [None] * (len(L1) + len(L2))
    len_min = min(len(L1), len(L2))
    result[: len_min * 2 : 2] = islice(L1, 0, len_min)
    result[1: len_min * 2 : 2] = islice(L2, 0, len_min)
    longer = L2 if len(L1) < len(L2) else L1
    result[len_min * 2 :] = islice(longer, len_min, None)
    return result

if __name__ == "__main__":
    import doctest
    doctest.testmod()

    a1 = [1, 2, 3, 4];
    a2 = [5, 6, 7, 8];
    print interleaveLists(a1, a2)


Now it's easy to translate that Python code to D. This time too I have written several versions, some of them are shorter than the following one. I have also tried to use my D libs, but in the end I think this is the best version, and the most readable one, yet efficient/short enough. The code can be improved, making it more general, able to take in input associative arrays and generic lazy iterables too:
T[] interleaveArrays(T)(T[] a1, T[] a2) {
    auto result = new T[a1.length + a2.length];

    int len_min = a1.length < a2.length ? a1.length : a2.length;
    int pos;
    for (int i; i < len_min; i++) {
        result[pos++] = a1[i];
        result[pos++] = a2[i];
    }

    auto longer = a1.length < a2.length ? a2 : a1;
    result[len_min * 2 .. $] = longer[len_min .. $];
    return result;
}

void main() {
    assert(interleaveArrays([1, 2, 3], [4, 5, 6, 7]) == [1, 4, 2, 5, 3, 6, 7]);
    assert(interleaveArrays([1, 2, 3], [8, 9, 10]) == [1, 8, 2, 9, 3, 10]);
    assert(interleaveArrays([4, 5, 6, 7], [1, 2, 3]) == [4, 1, 5, 2, 6, 3, 7]);
    assert(interleaveArrays([1, 2, 3], new int[0]) == [1, 2, 3]);
    assert(interleaveArrays(new int[0], [1, 2, 3]) == [1, 2, 3]);
    assert(interleaveArrays(new int[0], new int[0]) == new int[0]);
}
Unfortunately D slicing syntax doesn't support the handy stride/step as Python, so I've had to use an explicit loop.
D 2.x allows to improve a little that code, replacing this line:
for (int i; i < len_min; i++) {
With this tidier one:
foreach (i; 0 .. len_min) {

leonardo maffi: Word halves - Almost Kata Eight

All the code relative to this this problem:
http://www.fantascienza.net/leonardo/js/word_halves.zip

This time I solve a problem that's a variant of the programming Kata Eight: Conflicting Objectives, by Dave Thomas:
http://codekata.pragprog.com/2007/01/kata_eight_conf.html

The purpose of this variant is to process a dictionary of words, looking for all words which are composed of two concatenated smaller words in the same dictionary. For example:
abandon + ed => abandoned
abash + ed => abashed
abduct + ed => abducted
abduct + ion => abduction
abduct + ions => abductions
abduct + or => abductor
abel + ian => abelian
The original problem asks to write one solution as readable as possible, another solution as fast as possible, etc.

Generally I start writing the most basic algorithm first, because this helps me find a correct solution, this is very important because it gives me a starting point, and it may even be the end point because if the problem isn't too much big often a simple solution may be enough. If I want to improve the solution, I can compare to the solution given by the naive algorithm with the solutions given by more refined algorithms, to see if they are correct. So the naive version may help me create an unit test. If the problem isn't trivial I have seen that very often for me Python is the best language to create a prototype that actually works, so I generally use it for the first try (when I don't use Python for such first version I often later repent).

Despite this problem looks simple, I have create several successive solutions.

To solve this problem there's no need to pre-process the keys because the (word1 + word2) is a very fast operation. But I can use a set to speed up a lot the test if the joined word is present. So this is a first raw (Python) solution, 5 lines long:
words = set(w.rstrip().lower() for w in file("dict.txt"))
for w1 in words:
    for w2 in words:
        if (w1 + w2) in words:
            print w1, "+", w2, "=>", w1 + w2
It works but it's too much slow (about 18 minutes and 30 seconds on my PC), the dict has more than 45000 words, so being that algorithm O(n^2) it requires more than 2 billions (~2e9) string pair concatenations.

So I have implemented a minimal Trie data structure in the D language, where each node contains a boolean (that is true for words that end there) and a fixed array of pointers to node, as long as the lowercase ASCII letters. The trie allows me to navigate to a word and then quickly find all words that has that word as prefix.

But later I have understood that if the sequence of words is sorted, the words that start with another word are adjacent, so a binary search on the sorted sequence may suffice. Then I have understood that the binary search is not necessary, because in a sorted sequence the words that follow a given word are the ones that start with the first word (they may be absent). So you can stop looking when you see a word that doesn't start with the first word. The following Python program is much faster, it runs in about 20 seconds on my PC.
def main(words_set):
    sorted_words = sorted(words_set)

    for pos, word1 in enumerate(sorted_words):
        for word2 in sorted_words[pos:]:
            if not word2.startswith(word1):
                break
            second_part = word2[len(word1):]
            if second_part in words_set:
                print "%s + %s => %s" % (word1, second_part, word2)

words_set = set(word.rstrip().lower() for word in file("dict.txt"))
import psyco; psyco.full()
main(words_set)
The computation of words_set is outside the main because Psyco isn't able to optimize it, so if you put it into the main Psyco will reduce the overall performance of the program.

This is the translation to D using my libs:
import std.stdio, std.file, std.string, d.func, d.string;

void main() {
    auto words = fromKeys((cast(string)read("dict.txt")).inToLower().xsplit(), 0);
    auto sorted_words = words.keys.sort;

    foreach (pos, word1; sorted_words)
        foreach (word2; sorted_words[pos .. $]) {
            if (!word2.startsWith(word1))
                break;
            string second_part = word2[word1.length .. $];
            if (second_part in words)
                writefln("%s + %s => %s", word1, second_part, word2);
        }
}
My D libs can be found here:
http://www.fantascienza.net/leonardo/so/libs_d.zip

And this is a different D translation that uses just the standard library Phobos (here split() leads to worse performance than xsplit() but other parts of the code are faster, so the two D versions have similar performance):
import std.stdio, std.file, std.string;

void main() {
    uint[string] words;
    foreach (word; (cast(string)read("dict.txt")).tolower().split())
        words[word] = 0;
    auto sorted_words = words.keys.sort;

    foreach (pos, word1; sorted_words)
        foreach (word2; sorted_words[pos .. $]) {
            if (word2.length < word1.length || word1 != word2[0 .. word1.length])
                break;
            string second_part = word2[word1.length .. $];
            if (second_part in words)
                writefln(word1, " + ", second_part, " => ", word2);                
        }
}
The D programs run in about 0.11/0.12 seconds (this 180X running speed sounds strange because in programs that process strings D is often about as fast as Python/Psyco or slower. It's more than 10000 times faster than the naive Python program).

Note that std.string of Phobos has no .startsWith() function, so you have to use array slices, but you have to add a control on the lengths because D slices aren't as forgiven as Python ones.

For many purposes such D programs are fast enough (but I'd like a Python program that's faster), so you may want to stop looking for better algorithms. But experience shows that often there's no such thing as the "fastest program" to solve a problem: most times there are ways to write a faster program.

All the code relative to this this problem:
http://www.fantascienza.net/leonardo/js/word_halves.zip

Peter Modzelewski: Tango conference 2008 speakers

Not so long ago Conference Speakers List was published on the NG. Lets take a closer look at the announcment:
A preliminary list of most of the entries in the program at the Tango
Conference 2008 in Torun, Poland, is now ready. Titles for most of the
talks are not ready yet though, and so the below should not be considered a
definite guide. More posts on the program are also due to be confirmed
later, along with the final time table.

- Tomasz Stachowiak - DDL
DDL is one of those projects in D that proved me how D is amazing. This utility for loading "D Dynamic Libraires" makes DLL or SO to hide in the shadows in shame. It was started by Pragma(Eric Anderton), but now also h3r3tic (Tomasz Stachowiak) added some patches to the project. He will tell all about this project on the conference. Personally I can add, as a colegue of Tom who witnessed many of his speeches, that his lecture will surely be interesting and will explain the topic greatly.

- Frank Benoit - DWT Features
Everyone who at least touched nasty java world knows SWT. It is very mature and quite relable gui library which is also multiplatform. Frank Benoit (aka keinfarbton) once worked on project called tioport which aim was to create tool for converting java code to D. Altought the project is stalled for now, it managed to convert SWT to D. That's how DWT began to exist. Support for it was also added to Entice GUI Designer making it as easy to use as in java with use of eclipse or netbeans. You will find out all about DWT from it's author himself - surelly worth hearing.

- Mikola Lysenko - Get some Fibers in your diet
Mikola Lysenko is surely man of many talents. We could see his amazing work on gamedev. He also has added to Tango extremely cool feature - Fibers. What they are and, what's most important, why/when/how to use them will be subject of this exciting talk.
- Rick Richardson
Top Secret for now.
- Tomas Lindquist / Christian Kamm - LLVMDC
Compilers are one of the most important issues of D nowdays. Althought we have two of them, each has it's problems: DMD is well supported by D author himself, its code optimisation (done by dmc) is poor in compare to, for example, gcc. GDC (which is using gcc as a backend) is badly supported (at least in my opinion). Many (including) me are looking forward to LLVMDC which could solve those problems. What is the progress of this project and why are so many waiting for it - you can find answers to this quesitons and many more on this lecture
- Jarrett Billingsley - MiniD
There are many scripting languages: lua, squirrel, io, python ... The one you will hear about during this talk is a golden middle between D and a scripting language. Looks really promissing as much for the game development issues as for other aplicaitons.
- Dr Rafał Bocian - On teaching D
As you surely remember I wrote about the fact, that D is being taught on my university. Over 150 students, about 500 pass projects. How this all happened? How students are responding to D? What should one know before starting to teach D? Who to ask if not the first man in the world who was officially holding course of D on a university.
- Kris Bell - Lightweight Coding
Last year on D conference Seattle 2007 Kris Bell was talking how to use slicing to improve application performance. This year he will tell what lightweight coding means and how to use it in your applications.
- Team0xf - Game development panel
- Compiler/runtime workshop
- Keynote
I believe I don't need to explain those.

As I tried to show all of those talks are seriously interesting but also covers wide set of topics, so everyone should be able to satisfy his main interests and expand horizons . More info about the Conference coming soon.

August 20, 2008

Peter Modzelewski: On Travelling to Tango Conference 2008

OK so many of you are probably looking for different ways of getting to Torun for the Tango Conference 2008. If you will travel by bus or train then you will probably plan your trip easily. If you want to travel by plain first look at airports list on conference pages. When you have chosen the air connection and airport that suits you, you should look at buses and trains connections to find best option for you.
This and many more you can find on the conference main page.

Michael Parker: Zeus For Windows IDE v3.96q

A new version of the Zeus IDE has been released. It includes support for code folding and syntax highlighting for D (among other languages). Here are a few links demonstrating how to further enhance the D experience in Zeus:Configuring Phobos-Tango Autocompletion Integrating D Help Files (here and here)Writing Zeus Macros with DMDScriptIntegrating DDBG Technorati Tags: D Programming [...]

August 19, 2008

Christian Kamm: Exception handling in LLVMDC using LLVM

Exception handling is an integral part of the D programming language. Naturally LLVMDC, aiming to be a complying compiler, needs to provide it. Here I describe how exactly user code, generated LLVM IR, the unwinding library and the LLVMDC runtime interact to make it all work - at least on x86 Linux.

There is some documentation on exception handling with LLVM and the pages linked from there contain further information, in particular the details on the unwinding runtime. Unfortunately, examples of actual use are hard to find, so trial and error has played a major role in learning the workings of LLVM EH. I’ll try to present a complete example here, but will assume you’ve at least skimmed through both documents.

First, the throw statement. Its basic job is simple: invoke the exception handling runtime by calling _Unwind_RaiseException with the address of an _Unwind_Exception struct. This struct contains, among some private data, an eight-byte exception class to identify the language and vendor it originates from (for LLVMDC we set it to “D1\0\0″ and “LLDC”) and a cleanup callback. Since it is necessary to communicate the exception object that is being thrown to the handler code, this struct is embedded in a larger one. Later, the address of this surrounding struct can be computed from the address of the unwind_info member.

Consequently, the outer struct looks like this

struct _d_exception {
  Object exception_object;
  _Unwind_Exception unwind_info;
}

and the code to invoke the unwinding runtime is straightforward:

void _d_throw_exception(Object e) {
    if (e !is null) {
        _d_exception* exc_struct = new _d_exception;
        exc_struct.unwind_info.exception_class[0..4] = "LLDC";
        exc_struct.unwind_info.exception_class[4..8] = "D1\0\0";
        exc_struct.exception_object = e;
        _Unwind_RaiseException(&exc_struct.unwind_info);
    }
    abort();
}

What happens on a throw is essentially the following:

Luckily, exception handling in D can be implemented using only a single personality function for all landing pads. This personality function decides what to do for each individual landing pad by parsing the language specific area of the unwind data. This area contains three tables: the callsite table, the action table and the classinfo table.

When the personality function is called and given the context of a certain landing pad, it looks up the instruction pointer, finds the right entry in the callsite table and then walks the corresponding action chain. For each possible action, it checks whether the thrown exception object is derived from the class specified by the respective classinfo. Once a match is found, it knows that this landing pad is responsible for the exception. When it is called again with instructions to transfer control to the handler, the personality function passes the exception object and the index from the action table to the hander code.

If you’re interested in the code that accomplishes this, take a look here.

The last step required to make EH work is to provide the handler code and to write out the correct unwind tables. Let’s look at some user code and what it is essentially turned into by LLVMDC (of course the actual output is LLVM IR). The situation grows considerably more complex when there are nested try-catch-finallys in the same stack frame, but I hope this snippet illustrates the basic ideas.


try
{
  code_try();
}
catch(ExceptionClass ec)
{
  code_catch(ec);
}
finally
{
  code_finally();
}

// this is an invoke with
// 'handler' as exception target
code_try();
goto end;
 
handler:
 
ehptr = llvm.eh.exception();
ehsel = llvm.eh.selector(
    ehptr,
    &_d_eh_personality,
    ExceptionClass.classinfo,
    0);
 
switch(ehsel)
{
  case 1:
    code_catch(ehptr);
    goto end;
  default: // ehsel == 0
    code_finally();
    _Unwind_Resume(&ehptr.unwind_info);
}
// unreachable
 
end:
code_finally();

The llvm.eh.* intrinsics get the exception object and the action table index that are passed in by the personality function as mentioned above. But there’s more going on here: the selector intrinsic also tells LLVM what the data in the unwind tables should be. In particular, the personality function and the exception classinfos are set here. The zero indicates the finally block. The call to code_try() has been turned into an invoke, which makes LLVM emit an entry in the callsite table for it.

As you can see, the unwinding runtime and LLVM code generator are tied closely via the two intrinsics and thus supporting other runtimes such as Windows structured exception handling will be nigh-impossible without changes to LLVM. Hopefully, getting llvm-gcc to support exception handling on Windows will be enough of an incentive for the LLVM team to provide that feature eventually.

Another thing to bear in mind is that LLVM’s exception support is, at the moment, very C++ specific. The code generator can fill the language specific data area only with the three C++ style tables mentioned above. Fortunately, D’s exceptions are similar enough that we can get the right behavior by inserting suitable values into these tables.

For now, the implementation in LLVMDC has only been tested on x86 Linux, though the PowerPC target should work as well. EH on x86-64 Linux will supposedly be enabled in the next LLVM release. The remaining issues should be solved as LLVM matures, enabling LLVMDC to provide correct exception handling support on more platforms.

August 18, 2008

Leandro Lucarella: Lazy freeing RC

The first optimization to analyze is a very simple one. What's the idea behind it lazy freeing? Just transfer some of the work of freeing unused cells to the allocation, making the collection even more interleaved with the mutator.

When you delete a cell, if it's counter drops to 0, instead of recursively free it, just add it to a free-list. Then, when a new cell has to be allocated, take it from the free-list, delete all its children (using the lazy delete, of course), and return that cell.

First drawback of this method: you loose finalization support, but as I said, most people don't care about that. So that's a non-problem. Second, allocation is not that fast anymore. But it's almost bounded. Why almost? Because it's O(N), being N the number of pointers to be deleted in that cell. This doesn't seems like a huge cost anyways (just decrement a counter and, maybe, add it to a free-list). Allocation is (usually) not bounded anyways (except for compacting collectors).

The big win? Bounded freeing. Really small pauses, with no extra costs.

Note

If you have a (simple) program that suffers from GC pauses that you think it could be easily converted to be reference counted (i.e. few pointer updates), please let me know if you want me to try to make it use lazy freeing RC to analyze the real impact on a real-life program.

Leandro Lucarella: Reference counting worth a try

Even when I said that reference counting (RC) will be hard in D, I think it worth a try because it's a really easy way to get incremental garbage collection; the collector activity is interleaved with the mutator. And besides it could be hard to add support to the compiler, it's doable by manually incrementing and decrementing the reference counters to evaluate it.

One of the biggest features of RC is its capability to identify garbage cells as soon as they become garbage (let cycles outside that statement =). The killer use for this is finalization support. Unfortunately this feature kills a lot of possible optimizations. On the other hand, D doesn't need finalization support very hard (with the scope statement and other possible RAII D techniques, I think nobody is missing it), so, lucky us, we can drop that feature and think about some optimizations.

RC can help too to all the fuzz about concurrency and sharing in D2 (it's trivial to know when an object is unshared), but that's a different story.

Note

By the way, I don't think RC can make it on his own (yes, because of cycles), but I think it can help a lot to make collection incremental, leaving just a very small ammount of work to a tracing collector.

planet D logo

Style switcher

Feed

Links

Subscriptions

Planetarium:

Planet D is maintained by Anders Bergh and is powered by Planet Planet.
Last updated: August 29, 2008 04:00 AM (UTC)