Why Swift is 100 times slower than C in this image processing test?
- by xiaobai
Like many other developers I have been very excited at the new Swift language from Apple. Apple has boasted its speed is faster than Objective C and can be used to write operating system. And from what I learned so far, it's a very type-safe language and able to have precisely control over the exact data type (like integer length). So it does look like having good potential handling performance critical tasks, like image processing, right?
That's what I thought before I carried out a quick test. The result really surprised me. 
Here is a much simplified image alpha blending code snippet in C:
test.c:
#include <stdio.h>
#include <stdint.h>
#include <string.h>
uint8_t pixels[640*480];
uint8_t alpha[640*480];
uint8_t blended[640*480];
void blend(uint8_t* px, uint8_t* al, uint8_t* result, int size)
{
    for(int i=0; i<size; i++) {
        result[i] = (uint8_t)(((uint16_t)px[i]) *al[i] /255);
    }
}
int main(void)
{
    memset(pixels, 128, 640*480);
    memset(alpha, 128, 640*480);
    memset(blended, 255, 640*480);
    // Test 10 frames
    for(int i=0; i<10; i++) {
        blend(pixels, alpha, blended, 640*480);
    }
    return 0;
}
I compiled it on my Macbook Air 2011 with the following command:
gcc -O3 test.c -o test
The 10 frame processing time is about 0.01s. In other words, it takes the C code 1ms to process one frame:
$ time ./test
real    0m0.010s
user    0m0.006s
sys     0m0.003s
Then I have a Swift version of the same code:
test.swift:
let pixels = UInt8[](count: 640*480, repeatedValue: 128)
let alpha = UInt8[](count: 640*480, repeatedValue: 128)
let blended = UInt8[](count: 640*480, repeatedValue: 255)
func blend(px: UInt8[], al: UInt8[], result: UInt8[], size: Int)
{
    for(var i=0; i<size; i++) {
        var b = (UInt16)(px[i]) * (UInt16)(al[i])
        result[i] = (UInt8)(b/255)
    }
}
for i in 0..10 {
    blend(pixels, alpha, blended, 640*480)
}
The build command line is:
xcrun swift -O3 test.swift -o test
Here I use the same O3 level optimization flag to make the comparison hopefully fair. However, the resulting speed is 100 time slower:
$ time ./test
real    0m1.172s
user    0m1.146s
sys     0m0.006s
In other words, it takes Swift ~120ms to processing one frame which takes C just 1 ms. I also verified the memory initialization time in both test code are very small compared to the blend processing function time. 
What happened?