unpickling - Developer IT

[Python] How can I speed up unpickling large objects if I have plenty of RAM?

- by conradlee

It's taking me up to an hour to read a 1-gigabyte NetworkX graph data structure using cPickle (its 1-GB when stored on disk as a binary pickle file). Note that the file quickly loads into memory. In other words, if I run: import cPickle as pickle f = open("bigNetworkXGraph.pickle","rb") binary_data = f.read() # This part doesn't take long graph = pickle.loads(binary_data) # This takes ages How can I speed this last operation up? Note that I have tried pickling the data both in using both binary protocols (1 and 2), and it doesn't seem to make much difference which protocol I use. Also note that although I am using the "loads" (meaning "load string") function above, it is loading binary data, not ascii-data. I have 128gb of RAM on the system I'm using, so I'm hoping that somebody will tell me how to increase some read buffer buried in the pickle implementation.

Read the article

problems with unpickling a 80 megabyte file in python

- by tipu

I am using the pickle module to read and write large amounts of data to a file. After writing to the file a 80 megabyte pickled file, I load it in a SocketServer using class MyTCPHandler(SocketServer.BaseRequestHandler): def handle(self): print("in handle") words_file_handler = open('/home/tipu/Dropbox/dev/workspace/search/words.db', 'rb') words = pickle.load(words_file_handler) tweets = shelve.open('/home/tipu/Dropbox/dev/workspace/search/tweets.db', 'r'); results_per_page = 25 query_details = self.request.recv(1024).strip() query_details = eval(query_details) query = query_details["query"] page = int(query_details["page"]) - 1 return_ = [] booleanquery = BooleanQuery(MyTCPHandler.words) if query.find("(") > -1: result = booleanquery.processAdvancedQuery(query) else: result = booleanquery.processQuery(query) result = list(result) i = 0 for tweet_id in result and i < 25: #return_.append(MyTCPHandler.tweets[str(tweet_id)]) return_.append(tweet_id) i += 1 self.request.send(str(return_)) However the file never seems to load after the pickle.load line and it eventually halts the connection attempt. Is there anything I can do to speed this up?

Read the article

[Python] How do I read binary pickle data first, then unpickle it?

- by conradlee

I'm unpickling a NetworkX object that's about 1GB in size on disk. Although I saved it in the binary format (using protocol 2), it is taking a very long time to unpickle this file---at least half an hour. The system I'm running on has plenty of system memory (128 GB), so that's not the bottleneck. I've read here that pickling can be sped up by first reading the entire file into memory, and then unpickling it (that particular thread refers to python 3.0, which I'm not using, but the point should still be true in python 2.6). How do I first read the binary file, and then unpickle it? I have tried: import cPickle as pickle f = open("big_networkx_graph.pickle","rb") bin_data = f.read() graph_data = pickle.load(bin_data) But this returns: TypeError: argument must have 'read' and 'readline' attributes Any ideas?

Read the article

How to customize pickle for django model objects

- by muudscope

I need to pickle a complex object that refers to django model objects. The standard pickling process stores a denormalized object in the pickle. So if the object changes on the database between pickling and unpickling, the model is now out of date. (I know this is true with in-memory objects too, but the pickling is a convenient time to address it.) So what I'd like is a way to not pickle the full django model object. Instead just store its class and id, and re-fetch the contents from the database on load. Can I specify a custom pickle method for this class? I'm happy to write a wrapper class around the django model to handle the lazy fetching from db, if there's a way to do the pickling.

Developer IT