C99 doesn't need function bodies, or 'VLAs are Turing complete'

19 Feb 2022

#programming #c


Preface

The 1999 revision to the C programming language brought several interesting changes and additions to the standard. Among those are variable length arrays, or VLAs for short, which allow array types to have lengths that are determined at runtime. These can show up in different contexts, but for the purposes of this post we will be looking at VLAs as parameters to functions. For example:

void f(int n, float v[n]);

Since array parameters decay to their corresponding pointer type, that declaration is functionally equivalent, and compatible (6.7.5.3p7) to the following:

void f(int n, float *v);

The length of v in the first declaration is given by the value of a previous parameter of the function, which can only be known at runtime. However, in a function prototype (i.e. not a function definition) like the above, the length of a VLA parameter is never computed and essentially ignored (6.7.5.2p5). But in a function definition, the length is evaluated and must be greater than zero. In practice this is rather useless because the sizeof operator doesn't work as one might expect it to with array parameters (6.5.3.4-note88):

void f(int n, float param[n]) {
    float local[n];
    int a = sizeof local,
        b = sizeof param;

    printf("n = %d\n", n);
    printf("sizeof local = %d\n", a); // this will be n * sizeof(float)
    printf("sizeof param = %d\n", b); // but this will be sizeof(float *)
}

In other words, using sizeof with a VLA will evaluate the runtime-known length and calculate the array size based on that, except when that VLA is a function parameter, then, for whatever reason, it is the size of the corresponding pointer type (in this example, sizeof local === n * sizeof(float) but sizeof param == sizeof(float *)). The length of a VLA parameter may be used when e.g. computing indices when accessing multi-dimensional variable length arrays.

Alas, the standard mandates that the variable array length be computed when the function is called. Of course, the expression in between the square brackets is not limited to simple expressions like n, so one can write something like:

void f(int a, int b, int m[(a + b) / 2]) {}

or

void f(int x, int b[abs(x)]) {}

or even

void f(int v[getchar()]) {}

Disembodiment

The following program should give you an idea of the kind of constructs that these rules allow for (try it on compiler explorer):

int main(int argc, char *argv[printf("Hello")])
{
    printf(" world!\n");
}

The length expression of the VLA is evaluated before any of the statements in main. I couldn't find anywhere in the standard saying whether this evaluation order is well-defined but it is what clang and gcc do, and luckily, it does not matter for the sake of this article, as we will see shortly.

Let us refer to the subset of C99 where function bodies must be empty as disembodied C. You might naturally ask yourself what things can be accomplished in this subset (though you can probably guess the answer from the title of this page). In other words, what can you do in C if you are limited to just evaluating expressions and no statements?

Limitations

After all of that, it seems that basically anything can be done in disembodied C, so, is there something that can't be expressed in this C99 subset?

With the rules that we've laid down so far, yes. In particular:

Final Words

It seems that with enough effort anything can be done with these peculiar restrictions. As an example, I wrote implementations of the MD5 and SHA256 hash functions as well as Conway's Game of Life using SDL for graphics. I was also working on a more interesting and "useful" program but I kind of lost motivation halfway through. I'll update this article if I ever come back to finish that.

In case it wasn't clear enough, there is zero practical reason to ever do any of this. I hope you'll agree it's a fun party trick though :).