Database Compression in Python

Posted by user551832 on Stack Overflow See other posts from Stack Overflow or by user551832
Published on 2010-12-24T20:23:46Z Indexed on 2010/12/24 21:54 UTC
Read the original article Hit count: 148

Filed under:

I have hourly logs like

user1:joined
user2:log out
user1:added pic
user1:added comment
user3:joined

I want to compress all the flat files down to one file. There are around 30 million users in the logs and I just want the latest user log for all the logs.

My end result is I want to have a log look like

user1:added comment
user2:log out
user3:joined

Now my first attempt on a small scale was to just do a dict like

log['user1'] = "added comment"

Will doing a dict of 30 million key/val pairs have a giant memory footprint.. Or should I use something like sqllite to store them.. then just put the contents of the sqllite table back into a file?

© Stack Overflow or respective owner

Related posts about python