how to implement a really efficient bitvector sorting in python
- by xiao
Hello guys!
Actually this is an interesting topic from programming pearls, sorting 10 digits telephone numbers in a limited memory with an efficient algorithm. You can find the whole story here
What I am interested in is just how fast the implementation could be in python. I have done a naive implementation with the module bitvector. The code is as following:
from BitVector import BitVector
import timeit
import random
import time
import sys
def sort(input_li):
        return sorted(input_li)
def vec_sort(input_li):
        bv = BitVector( size = len(input_li) )
        for i in input_li:
                bv[i] = 1
        res_li = []
        for i in range(len(bv)):
                if bv[i]:
                        res_li.append(i)
        return res_li
if __name__ == "__main__":
        test_data = range(int(sys.argv[1]))
        print 'test_data size is:', sys.argv[1]
        random.shuffle(test_data)
        start = time.time()
        sort(test_data)
        elapsed = (time.time() - start)
        print "sort function takes " + str(elapsed)
        start = time.time()
        vec_sort(test_data)
        elapsed = (time.time() - start)
        print "sort function takes " + str(elapsed)
        start = time.time()
        vec_sort(test_data)
        elapsed = (time.time() - start)
        print "vec_sort function takes " + str(elapsed)
I have tested from array size 100 to 10,000,000 in my macbook(2GHz Intel Core 2 Duo 2GB SDRAM), the result is as following:
test_data size is: 1000 
sort function
takes 0.000274896621704
vec_sort function takes 0.00383687019348
test_data size is: 10000
sort function takes 0.00380706787109
vec_sort function takes 0.0371489524841
test_data size is: 100000
sort function takes 0.0520560741425
vec_sort function takes 0.374383926392
test_data size is: 1000000
sort function takes 0.867373943329
vec_sort function takes 3.80475401878
test_data size is: 10000000
sort function takes 12.9204008579
vec_sort function takes 38.8053860664
What disappoints me is that even when the test_data size is 100,000,000, the sort function is still faster than vec_sort. Is there any way to accelerate the vec_sort function?