maintaining a large list in python

Posted by Oren on Stack Overflow See other posts from Stack Overflow or by Oren
Published on 2010-03-24T17:59:34Z Indexed on 2010/03/24 18:03 UTC
Read the original article Hit count: 622

Filed under:

database

I need to maintain a large list of python pickleable objects that will quickly execute the following algorithm:

The list will have 4 pointers and can:

Read\Write each of the 4 pointed nodes
Insert new nodes before the pointed nodes
Increase a pointer by 1
Pop nodes from the beginning of the list (without overlapping the pointers)
Add nodes at the end of the list

The list can be very large (300-400 MB), so it shouldnt be contained in the RAM. The nodes are small (100-200 bytes) but can contain various types of data.

The most efficient way it can be done, in my opinion, is to implement some paging-mechanism. On the harddrive there will be some database of a linked-list of pages where in every moment up to 5 of them are loaded in the memory. On insertion, if some max-page-size was reached, the page will be splitted to 2 small pages, and on pointer increment, if a pointer is increased beyond its loaded page, the following page will be loaded instead.

This is a good solution, but I don't have the time to write it, especially if I want to make it generic and implement all the python-list features.

Using a SQL tables is not good either, since I'll need to implement a complex index-key mechanism.

I tried ZODB and zc.blist, which implement a BTree based list that can be stored in a ZODB database file, but I don't know how to configure it so the above features will run in reasonable time. I don't need all the multi-threading\transactioning features. No one else will touch the database-file except for my single-thread program.

Can anyone explain me how to configure the ZODB\zc.blist so the above features will run fast, or show me a different large-list implementation?

Developer IT