Duff’s device is a really neat construct for unrolling loops of indeterminate size. It was invented by Tom Duff during the early eighties (Duff's account) as a highly efficient way of copying memory, but can be used for other things too.
Basically, it repeats the processing code in batches—of 8, usually—and iterates over that total actions / batch size times. It consumes that quotient’s remainder with a switch statement’s fall through. Here’s that in C:
/*
* Divide by 8, rounding up. The switch will take care of
* the addition iteration. Count is the total number of
* iterations. It’s assumed to be greater than 0.
*/
int n = (count + 7) / 8;
while (--n > 0) {
Action
Action
Action
Action
Action
Action
Action
Action
}
/*
* Many compilers will use a jump table instead of a series of if
* statements if the case values are contiguous--the cases’ code
* will be contiguous and in order, to make fall through
* automatic.
*/
switch (count % 8) {
case 0:
Action
case 7:
Action
case 6:
Action
case 5:
Action
case 4:
Action
case 3:
Action
case 2:
Action
case 1:
Action
}
What makes the Duff’s device interesting is that it combines the loop and switch by exploiting C’s allowing do/while‘s to span cases:
/*
* The following is based on FOLDOC’s Duff’s device
* implementation: http://foldoc.org/?Duff%27s+device
* As before, count is assumed to be greater than 0.
*/
int n = (count + 7) / 8;
/* The mod is only performed during the first iteration. */
switch (count % 8) {
case 0:
do {
Action
case 7:
Action
case 6:
Action
case 5:
Action
case 4:
Action
case 3:
Action
case 2:
Action
case 1:
Action
} while (--n > 0);
}
While C# doesn’t allow do/while loops to span cases, it lets you use goto to navigate from one case to another—it forces you to do this if you want to fall through from a case that has at least one statement—so Duff’s device can also be implemented in managed code too:
// Count must be greater than 0.
int n = (count + 7) / 8;
switch (count % 8)
{
case 0:
Action
goto case 7;
case 7:
Action
goto case 6;
case 6:
Action
goto case 5;
case 5:
Action
goto case 4;
case 4:
Action
goto case 3;
case 3:
Action
goto case 2;
case 2:
Action
goto case 1;
case 1:
Action
// This differs from the other implementations’ conditional
// because in C#, == 0 and != 0 are faster than > 0, < 0, >=
// 0, and <= 0 (they're handled like Booleans).
if (--n != 0)
{
goto case 0;
}
break;
}
Is Duff’s device still relevant in an object-oriented world? If you have a task, comprised of quick, repeated actions—that aren’t already bottlenecked by database, disk, or network access—unrolling might help. And given that the C# compiler doesn’t seem to unroll loops automatically, you’ll need to implement it yourself. At least Duff’s device can provide a standard pattern for this.
For the heck of it, here’s a comparison between a loop and Duff’s device with the construct’s original purpose, copying memory. Note that this is just for illustrative purposes as Array.Copy generally performs better. Here’s the loop:
// destinationStartIndex - the first index in the destination
// array that should receive elements from the source array.
// sourceStartIndex – the first index in the source array that
// should be copied to the destination array.
while (count-- != 0)
{
destination[destinationStartIndex++] =
source[sourceStartIndex++];
}
Here’s how a Duff’s device performs.
Time [in ms.] to copy all indexes of an int32 array, 1000 iterations:
|
Array size |
|
|
4k |
8k |
16k |
32k |
64k |
128k |
256k |
|
Loop |
43 |
86 |
172 |
345 |
698 |
2684 |
6076 |
|
Duff's device |
28 |
55 |
109 |
217 |
465 |
2304 |
6066 |
As one might expect, Duff’s device copy performs better until memory access becomes a bottleneck—which is below 256k on my laptop, and your results will certainly vary.
Source, binaries, and annotated CIL asm are available at:
http://blogs.vertigosoftware.com/files/DuffDeviceTest.zip
Next up (hopefully), using a slightly modified Duff’s device as a [fake] threading mechanism.
For more information on Duff’s device, please check: