Search Results

Search found 22 results on 1 pages for 'pymongo'.

Page 1/1 | 1

Getting exception when trying to monkey patch pymongo.connection._Pool

- by Creotiv

I use pymongo 1.9 on Ubuntu 10.10 with python 2.6.6 When i trying to monkey patch pymongo.connection._Pool i'm getting error on connection: AutoReconnect: could not find master/primary But when i change _Pool class in pymongo.connection module, it work pretty fine. Even if i copy _Pool implementation from pymongo.connection module and will try to monkey patch by the same code, it still giving same exception. I need to remove threading.local from _Pool class, because i use gevent and i need to implement Pool for all mongo connections(for all threads). I use this code: import pymongo class GPool: """A simple connection pool. Uses thread-local socket per thread. By calling return_socket() a thread can return a socket to the pool. Right now the pool size is capped at 10 sockets - we can expose this as a parameter later, if needed. """ # Non thread-locals __slots__ = ["sockets", "socket_factory", "pool_size","sock"] #sock = None def __init__(self, socket_factory): self.pool_size = 10 if not hasattr(self,"sock"): self.sock = None self.socket_factory = socket_factory if not hasattr(self, "sockets"): self.sockets = [] def socket(self): # we store the pid here to avoid issues with fork / # multiprocessing - see # test.test_connection:TestConnection.test_fork for an example # of what could go wrong otherwise pid = os.getpid() if self.sock is not None and self.sock[0] == pid: return self.sock[1] try: self.sock = (pid, self.sockets.pop()) except IndexError: self.sock = (pid, self.socket_factory()) return self.sock[1] def return_socket(self): if self.sock is not None and self.sock[0] == os.getpid(): # There's a race condition here, but we deliberately # ignore it. It means that if the pool_size is 10 we # might actually keep slightly more than that. if len(self.sockets) < self.pool_size: self.sockets.append(self.sock[1]) else: self.sock[1].close() self.sock = None pymongo.connection._Pool = GPool

Read the article
How to manually create a DBRef using pymongo?

- by Soviut

I want to create a DBRef manually so that I can add an additional field to it. However, when I try to pass the following: {'$ref': 'projects', '$id': '1029412409721', 'project_name': 'My Project'} Pymongo raises an error: pymongo.errors.InvalidName: key '$id' must not start with '$' It would seem that pymongo reserve the $ for the special key, leading me to wonder if it is even possible to do what I'm trying to do?

Read the article
python mongokit Connection() AssertionError

- by zalew

just installed mongokit and can't figure out why I get AssertionError python console: >>> from mongokit import Connection >>> c = Connection() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python2.6/dist-packages/mongokit-0.5.3-py2.6.egg/mongokit/connection.py", line 35, in __init__ super(Connection, self).__init__(*args, **kwargs) File "build/bdist.linux-i686/egg/pymongo/connection.py", line 169, in __init__ File "build/bdist.linux-i686/egg/pymongo/connection.py", line 338, in __find_master File "build/bdist.linux-i686/egg/pymongo/connection.py", line 226, in __master File "build/bdist.linux-i686/egg/pymongo/database.py", line 220, in command File "build/bdist.linux-i686/egg/pymongo/collection.py", line 356, in find_one File "build/bdist.linux-i686/egg/pymongo/cursor.py", line 485, in next File "build/bdist.linux-i686/egg/pymongo/cursor.py", line 461, in _refresh File "build/bdist.linux-i686/egg/pymongo/cursor.py", line 429, in __send_message File "build/bdist.linux-i686/egg/pymongo/helpers.py", line 98, in _unpack_response AssertionError >>> mongodb console: Wed Mar 31 10:27:34 connection accepted from 127.0.0.1:60480 #30 Wed Mar 31 10:27:34 end connection 127.0.0.1:60480 db 1.5 pymongo 1.5 (tested also on 1.4.) mongokit 0.5.3 (also 0.5.2)

Read the article
Using a '.' in your key name in MongoDB (PyMongo)

- by user319723

When I try saving a dict with a '.' in the key PyMongo throws an error (InvaildName) however I do see (on the Mongodb website) that keys can have '.''s in them. Why is it that pymongo won't let me save these docs? Is there an issue with them and Mongo? James

Read the article
How to query through a DBRef in MongoDB/pymongo?

- by Soviut

Is it possible to query through a DBRef using a single find spec? user collection { 'age': 30 } post collection { 'user': DBRef('user', ...) } Is it possible to query for all post who's users are 30 in a single find step? If not, would it be wise to create a javascript function to handle the multi-stage operation or will that cause blocking problems?

Read the article
MongoDB query to return only embedded document

- by Matt

assume that i have a BlogPost model with zero-to-many embedded Comment documents. can i query for and have MongoDB return only Comment objects matching my query spec? eg, db.blog_posts.find({"comment.submitter": "some_name"}) returns only a list of comments. edit: an example: import pymongo connection = pymongo.Connection() db = connection['dvds'] db['dvds'].insert({'title': "The Hitchhikers Guide to the Galaxy", 'episodes': [{'title': "Episode 1", 'desc': "..."}, {'title': "Episode 2", 'desc': "..."}, {'title': "Episode 3", 'desc': "..."}, {'title': "Episode 4", 'desc': "..."}, {'title': "Episode 5", 'desc': "..."}, {'title': "Episode 6", 'desc': "..."}]}) episode = db['dvds'].find_one({'episodes.title': "Episode 1"}, fields=['episodes']) in this example, episode is: {u'_id': ObjectId('...'), u'episodes': [{u'desc': u'...', u'title': u'Episode 1'}, {u'desc': u'...', u'title': u'Episode 2'}, {u'desc': u'...', u'title': u'Episode 3'}, {u'desc': u'...', u'title': u'Episode 4'}, {u'desc': u'...', u'title': u'Episode 5'}, {u'desc': u'...', u'title': u'Episode 6'}]} but i just want: {u'desc': u'...', u'title': u'Episode 1'}

Read the article
Error data: line 2 column 1 when using pycurl with gzip stream

- by Sagar Hatekar

Thanks for reading. Background: I am trying to read a streaming API feed that returns data in JSON format, and then storing this data to a pymongo collection. The streaming API requires a "Accept-Encoding" : "Gzip" header. What's happening: Code fails on json.loads and outputs - Extra data: line 2 column 1 - line 4 column 1 (char 1891 - 5597) (Refer Error Log below) This does NOT happen while parsing every JSON object - it happens at random. My guess is I am encountering some weird JSON object after every "x" proper JSON objects. I did reference how to use pycurl if requested data is sometimes gzipped, sometimes not? and Encoding error while deserializing a json object from Google but so far have been unsuccessful at resolving this error. Could someone please help me out here? Error Log: Note: The raw dump of the JSON object below is basically using the repr() method. '{"id":"tag:search.twitter.com,2005:207958320747782146","objectType":"activity","actor":{"objectType":"person","id":"id:twitter.com:493653150","link":"http://www.twitter.com/Deathnews_7_24","displayName":"Death News 7/24","postedTime":"2012-02-16T01:30:12.000Z","image":"http://a0.twimg.com/profile_images/1834408513/deathnewstwittersquare_normal.jpg","summary":"Crashes, Murders, Suicides, Accidents, Crime and Naturals Death News From All Around World","links":[{"href":"http://www.facebook.com/DeathNews724","rel":"me"}],"friendsCount":56,"followersCount":14,"listedCount":1,"statusesCount":1029,"twitterTimeZone":null,"utcOffset":null,"preferredUsername":"Deathnews_7_24","languages":["tr"]},"verb":"post","postedTime":"2012-05-30T22:15:02.000Z","generator":{"displayName":"web","link":"http://twitter.com"},"provider":{"objectType":"service","displayName":"Twitter","link":"http://www.twitter.com"},"link":"http://twitter.com/Deathnews_7_24/statuses/207958320747782146","body":"Kathi Kamen Goldmark, Writers\xe2\x80\x99 Catalyst, Dies at 63 http://t.co/WBsNlNtA","object":{"objectType":"note","id":"object:search.twitter.com,2005:207958320747782146","summary":"Kathi Kamen Goldmark, Writers\xe2\x80\x99 Catalyst, Dies at 63 http://t.co/WBsNlNtA","link":"http://twitter.com/Deathnews_7_24/statuses/207958320747782146","postedTime":"2012-05-30T22:15:02.000Z"},"twitter_entities":{"urls":[{"display_url":"nytimes.com/2012/05/30/boo\xe2\x80\xa6","indices":[52,72],"expanded_url":"http://www.nytimes.com/2012/05/30/books/kathi-kamen-goldmark-writers-catalyst-dies-at-63.html","url":"http://t.co/WBsNlNtA"}],"hashtags":[],"user_mentions":[]},"gnip":{"language":{"value":"en"},"matching_rules":[{"value":"url_contains: nytimes.com","tag":null}],"klout_score":11,"urls":[{"url":"http://t.co/WBsNlNtA","expanded_url":"http://www.nytimes.com/2012/05/30/books/kathi-kamen-goldmark-writers-catalyst-dies-at-63.html?_r=1"}]}}\r\n{"id":"tag:search.twitter.com,2005:207958321003638785","objectType":"activity","actor":{"objectType":"person","id":"id:twitter.com:178760897","link":"http://www.twitter.com/Mobanu","displayName":"Donald Ochs","postedTime":"2010-08-15T16:33:56.000Z","image":"http://a0.twimg.com/profile_images/1493224811/small_mobany_Logo_normal.jpg","summary":"","links":[{"href":"http://www.mobanuweightloss.com","rel":"me"}],"friendsCount":10272,"followersCount":9698,"listedCount":30,"statusesCount":725,"twitterTimeZone":"Mountain Time (US & Canada)","utcOffset":"-25200","preferredUsername":"Mobanu","languages":["en"],"location":{"objectType":"place","displayName":"Crested Butte, Colorado"}},"verb":"post","postedTime":"2012-05-30T22:15:02.000Z","generator":{"displayName":"twitterfeed","link":"http://twitterfeed.com"},"provider":{"objectType":"service","displayName":"Twitter","link":"http://www.twitter.com"},"link":"http://twitter.com/Mobanu/statuses/207958321003638785","body":"Mobanu: Can Exercise Be Bad for You?: Researchers have found evidence that some people who exercise do worse on ... http://t.co/mTsQlNQO","object":{"objectType":"note","id":"object:search.twitter.com,2005:207958321003638785","summary":"Mobanu: Can Exercise Be Bad for You?: Researchers have found evidence that some people who exercise do worse on ... http://t.co/mTsQlNQO","link":"http://twitter.com/Mobanu/statuses/207958321003638785","postedTime":"2012-05-30T22:15:02.000Z"},"twitter_entities":{"urls":[{"display_url":"nyti.ms/KUmmMa","indices":[116,136],"expanded_url":"http://nyti.ms/KUmmMa","url":"http://t.co/mTsQlNQO"}],"hashtags":[],"user_mentions":[]},"gnip":{"language":{"value":"en"},"matching_rules":[{"value":"url_contains: nytimes.com","tag":null}],"klout_score":12,"urls":[{"url":"http://t.co/mTsQlNQO","expanded_url":"http://well.blogs.nytimes.com/2012/05/30/can-exercise-be-bad-for-you/?utm_medium=twitter&utm_source=twitterfeed"}]}}\r\n' json exception: Extra data: line 2 column 1 - line 4 column 1 (char 1891 - 5597) Header Output: HTTP/1.1 200 OK Content-Type: application/json; charset=UTF-8 Vary: Accept-Encoding Date: Wed, 30 May 2012 22:14:48 UTC Connection: close Transfer-Encoding: chunked Content-Encoding: gzip get_stream.py: #!/usr/bin/env python import sys import pycurl import json import pymongo STREAM_URL = "https://stream.test.com:443/accounts/publishers/twitter/streams/track/Dev.json" AUTH = "userid:passwd" DB_HOST = "127.0.0.1" DB_NAME = "stream_test" class StreamReader: def __init__(self): try: self.count = 0 self.buff = "" self.mongo = pymongo.Connection(DB_HOST) self.db = self.mongo[DB_NAME] self.raw_tweets = self.db["raw_tweets_gnip"] self.conn = pycurl.Curl() self.conn.setopt(pycurl.ENCODING, 'gzip') self.conn.setopt(pycurl.URL, STREAM_URL) self.conn.setopt(pycurl.USERPWD, AUTH) self.conn.setopt(pycurl.WRITEFUNCTION, self.on_receive) self.conn.setopt(pycurl.HEADERFUNCTION, self.header_rcvd) while True: self.conn.perform() except Exception as ex: print "error ocurred : %s" % str(ex) def header_rcvd(self, header_data): print header_data def on_receive(self, data): temp_data = data self.buff += data if data.endswith("\r\n") and self.buff.strip(): try: tweet = json.loads(self.buff, encoding = 'UTF-8') self.buff = "" if tweet: try: self.raw_tweets.insert(tweet) except Exception as insert_ex: print "Error inserting tweet: %s" % str(insert_ex) self.count += 1 if self.count % 10 == 0: print "inserted "+str(self.count)+" tweets" except Exception as json_ex: print "json exception: %s" % str(json_ex) print repr(temp_data) stream = StreamReader()

Read the article
TG2.1: Proper location to store a database session instance?

- by Lee Olayvar

I am using a custom database (MongoDB) with TG 2.1 and i am wondering where the proper place to store the PyMongo connection/database instances would be? Eg, at the moment they are getting created inside of my inherited instance of AppConfig. Is there a standard location to store this? Would shoving the variables into the project.model.__init__ be the best location, given that under SQLAlchemy, the database seems to commonly be retrieved via: from project.model import DBSession, metadata Anyway, just curious what the best practice is.

Read the article
Can DBRefs contain additional fields?

- by Soviut

I've encountered several situations when using MongoDB that require the use of DBRefs. However, I'd also like to cache some fields from the referenced document in the DBRef itself. {$ref:'user', $id:'10285102912A', username:'Soviut'} For example, I may want to have the username available even though the user document is referenced. This would provide me all the benefits of a single document approach; Faster querying and eliminating the need to do manual dereferencing in my code. While at the same time allowing me to use references where they make sense. The idea being that when the referenced document is updated (a user changes their name, for example) my business layer can automatically update all the documents that reference it. Ultimately, I'm wondering if it's considered good form to store additional fields on my DBRefs? Will it break anything? Will I lose my data each time a reference is rewritten? Will drivers like pymongo support it?

Read the article
Problem using easy_install on Windows 7, 64 bit. (cannot find python.exe)

- by Rune

Hi, I have just now installed Python 2.6 on my Windows 7 (64 bit) Lenovo t61p laptop. I have downloaded Sphinx and nose and apparently installed them correctly using python setup.py install (at least no errors were reported during the installation). Now I am trying to install pymongo using easy_install but I am not having much success. It seems that easy_install isn't working at all. I execute easy_install as administrator: C:\>easy_install Cannot find Python executable C:\Program Files\Python26\python.exe The path C:\Program Files\Python26\python.exe is correct. I have found this bug report on bugs.python.org which seems to be related, although its status is 'Resolved'. Do you have any ideas as to what may be wrong? Any pointers, hints or tips for diagnosing the problem further would be greatly appreciated. EDIT: This is the stacktrace I receive when trying to install pymongo: C:\Users\Rune Ibsen\Documents\Downloads\pymongo-1.4>python setup.py install running install running bdist_egg running egg_info writing pymongo.egg-info\PKG-INFO writing top-level names to pymongo.egg-info\top_level.txt writing dependency_links to pymongo.egg-info\dependency_links.txt reading manifest file 'pymongo.egg-info\SOURCES.txt' reading manifest template 'MANIFEST.in' writing manifest file 'pymongo.egg-info\SOURCES.txt' installing library code to build\bdist.win-amd64\egg running install_lib running build_py running build_ext building 'pymongo._cbson' extension Traceback (most recent call last): File "setup.py", line 166, in <module> "doc": doc}) File "C:\Program Files\Python26\lib\distutils\core.py", line 152, in setup dist.run_commands() File "C:\Program Files\Python26\lib\distutils\dist.py", line 975, in run_commands self.run_command(cmd) File "C:\Program Files\Python26\lib\distutils\dist.py", line 995, in run_command cmd_obj.run() File "C:\Program Files\Python26\lib\site-packages\setuptools-0.6c9-py2.6.egg\setuptools\command\install.py", line 76, in run File "C:\Program Files\Python26\lib\site-packages\setuptools-0.6c9-py2.6.egg\setuptools\command\install.py", line 96, in do_egg_install File "C:\Program Files\Python26\lib\distutils\cmd.py", line 333, in run_command self.distribution.run_command(command) File "C:\Program Files\Python26\lib\distutils\dist.py", line 995, in run_command cmd_obj.run() File "C:\Program Files\Python26\lib\site-packages\setuptools-0.6c9-py2.6.egg\setuptools\command\bdist_egg.py", line 174, in run File "C:\Program Files\Python26\lib\site-packages\setuptools-0.6c9-py2.6.egg\setuptools\command\bdist_egg.py", line 161, in call_command File "C:\Program Files\Python26\lib\distutils\cmd.py", line 333, in run_command self.distribution.run_command(command) File "C:\Program Files\Python26\lib\distutils\dist.py", line 995, in run_command cmd_obj.run() File "C:\Program Files\Python26\lib\site-packages\setuptools-0.6c9-py2.6.egg\setuptools\command\install_lib.py", line 20, in run File "C:\Program Files\Python26\lib\distutils\command\install_lib.py", line 113, in build self.run_command('build_ext') File "C:\Program Files\Python26\lib\distutils\cmd.py", line 333, in run_command self.distribution.run_command(command) File "C:\Program Files\Python26\lib\distutils\dist.py", line 995, in run_command cmd_obj.run() File "setup.py", line 107, in run build_ext.run(self) File "C:\Program Files\Python26\lib\distutils\command\build_ext.py", line 340, in run self.build_extensions() File "C:\Program Files\Python26\lib\distutils\command\build_ext.py", line 449, in build_extensions self.build_extension(ext) File "setup.py", line 117, in build_extension build_ext.build_extension(self, ext) File "C:\Program Files\Python26\lib\distutils\command\build_ext.py", line 499, in build_extension depends=ext.depends) File "C:\Program Files\Python26\lib\distutils\msvc9compiler.py", line 448, in compile self.initialize() File "C:\Program Files\Python26\lib\distutils\msvc9compiler.py", line 358, in initialize vc_env = query_vcvarsall(VERSION, plat_spec) File "C:\Program Files\Python26\lib\distutils\msvc9compiler.py", line 274, in query_vcvarsall raise ValueError(str(list(result.keys()))) ValueError: [u'path'] C:\Users\Rune Ibsen\Documents\Downloads\pymongo-1.4> PS.: I previously installed Python 3.1 but later installed 2.6 because I am not sure whether pymongo supports 3.1. PPS.: I have tried installing pymongo using the python setup.py install approach, but this resulted in a nasty-looking stack trace, so I thought I would try to let easy_install take care of it for me. PPPS.: I am completely new to Python, easy_install, eggs etc.

Read the article
Filtering documents against a dictionary key in MongoDB

- by Thomas

I have a collection of articles in MongoDB that has the following structure: { 'category': 'Legislature', 'updated': datetime.datetime(2010, 3, 19, 15, 32, 22, 107000), 'byline': None, 'tags': { 'party': ['Peter Hoekstra', 'Virg Bernero', 'Alma Smith', 'Mike Bouchard', 'Tom George', 'Rick Snyder'], 'geography': ['Michigan', 'United States', 'North America'] }, 'headline': '2 Mich. gubernatorial candidates speak to students', 'text': [ 'BEVERLY HILLS, Mich. (AP) \u2014 Two Democratic and Republican gubernatorial candidates found common ground while speaking to private school students in suburban Detroit', "Democratic House Speaker state Rep. Andy Dillon and Republican U.S. Rep. Pete Hoekstra said Friday a more business-friendly government can help reduce Michigan's nation-leading unemployment rate.", "The candidates were invited to Detroit Country Day Upper School in Beverly Hills to offer ideas for Michigan's future.", 'Besides Dillon, the Democratic field includes Lansing Mayor Virg Bernero and state Rep. Alma Wheeler Smith. Other Republicans running are Oakland County Sheriff Mike Bouchard, Attorney General Mike Cox, state Sen. Tom George and Ann Arbor business leader Rick Snyder.', 'Former Republican U.S. Rep. Joe Schwarz is considering running as an independent.' ], 'dateline': 'BEVERLY HILLS, Mich.', 'published': datetime.datetime(2010, 3, 19, 8, 0, 31), 'keywords': "Governor's Race", '_id': ObjectId('4ba39721e0e16cb25fadbb40'), 'article_id': 'urn:publicid:ap.org:0611e36fb084458aa620c0187999db7e', 'slug': "BC-MI--Governor's Race,2nd Ld-Writethr" } If I wanted to write a query that looked for all articles that had at least 1 geography tag, how would I do that? I have tried writing db.articles.find( {'tags': 'geography'} ), but that doesn't appear to work. I've also thought about changing the search parameter to 'tags.geography', but am having a devil of a time figuring out what the search predicate would be.

Read the article
get distinct items in a collection if other item is not null

- by Anurag Sharma

I have a collection like this { Country : 'XYZ' Books : [ {"name" : "book1", "url" : "book1url", "auth_email" : "emailid1"}, {"name" : "book2", "url" : "book2url", "auth_email" : "emailid2"}, {"name" : "book3", "url" : "book3url", "auth_email" : "emailid3"}, {"name" : "book4", "url" : "book4url", "auth_email" : "emailid4"} .......................................... ] } I want to extract distinct 'Books.name' and corresponding 'Books.email' only if 'Books.email' is not = ''

Read the article
MapReduce results seem limited to 100?

- by user1813867

I'm playing around with Map Reduce in MongoDB and python and I've run into a strange limitation. I'm just trying to count the number of "book" records. It works when there are less than 100 records but when it goes over 100 records the count resets for some reason. Here is my MR code and some sample outputs: var M = function () { book = this.book; emit(book, {count : 1}); } var R = function (key, values) { var sum = 0; values.forEach(function(x) { sum += 1; }); var result = { count : sum }; return result; } MR output when record count is 99: {u'_id': u'superiors', u'value': {u'count': 99}} MR output when record count is 101: {u'_id': u'superiors', u'value': {u'count': 2.0}} Any ideas?

Read the article
Connect to Mongodb in python

- by SpawnCxy

I'm a little confused by the document when I tried to connect to the Mongodb.And I find it's different from mysql.I want to create a new database named "mydb" and insert some posts into it.The follows is what I'm trying. from pymongo.connection import Connection import datetime host = 'localhost' port = 27017 user = 'ucenter' passwd = '123' connection = Connection(host,port) db = connection['mydb'] post = {'author':'mike', 'text':'my first blog post!', 'tags':['mongodb','python','pymongo'], 'date':datetime.datetime.utcnow()} posts = db.posts posts.insert(post) #print str(db.collection_names()) And I got an error as pymongo.errors.OperationFailure: database error: unauthorized.How can I do the authorizing part?Thanks.

Read the article
Which Python API should be used with Mongo DB and Django

- by Thomas

I have been going back and forth over which Python API to use when interacting with Mongo. I did a quick survey of the landscape and identified three leading candidates. PyMongo MongoEngine Ming If you were designing a new content-heavy website using the django framework, what API would you choose and why? MongoEngine looks like it was built specifically with Django in mind. PyMongo appears to be a thin wrapper around Mongo. It has a lot of power, though loses a lot of the abstractions gained through using django as a framework. Ming represents an interesting middle ground between PyMongo and MongoEngine, though I haven't had the opportunity to take it for a test drive.

Read the article
Optimizing python code performance when importing zipped csv to a mongo collection

- by mark

I need to import a zipped csv into a mongo collection, but there is a catch - every record contains a timestamp in Pacific Time, which must be converted to the local time corresponding to the (longitude,latitude) pair found in the same record. The code looks like so: def read_csv_zip(path, timezones): with ZipFile(path) as z, z.open(z.namelist()[0]) as input: csv_rows = csv.reader(input) header = csv_rows.next() check,converters = get_aux_stuff(header) for csv_row in csv_rows: if check(csv_row): row = { converter[0]:converter[1](value) for converter, value in zip(converters, csv_row) if allow_field(converter) } ts = row['ts'] lng, lat = row['loc'] found_tz_entry = timezones.find_one(SON({'loc': {'$within': {'$box': [[lng-tz_lookup_radius, lat-tz_lookup_radius],[lng+tz_lookup_radius, lat+tz_lookup_radius]]}}})) if found_tz_entry: tz_name = found_tz_entry['tz'] local_ts = ts.astimezone(timezone(tz_name)).replace(tzinfo=None) row['tz'] = tz_name else: local_ts = (ts.astimezone(utc) + timedelta(hours = int(lng/15))).replace(tzinfo = None) row['local_ts'] = local_ts yield row def insert_documents(collection, source, batch_size): while True: items = list(itertools.islice(source, batch_size)) if len(items) == 0: break; try: collection.insert(items) except: for item in items: try: collection.insert(item) except Exception as exc: print("Failed to insert record {0} - {1}".format(item['_id'], exc)) def main(zip_path): with Connection() as connection: data = connection.mydb.data timezones = connection.timezones.data insert_documents(data, read_csv_zip(zip_path, timezones), 1000) The code proceeds as follows: Every record read from the csv is checked and converted to a dictionary, where some fields may be skipped, some titles be renamed (from those appearing in the csv header), some values may be converted (to datetime, to integers, to floats. etc ...) For each record read from the csv, a lookup is made into the timezones collection to map the record location to the respective time zone. If the mapping is successful - that timezone is used to convert the record timestamp (pacific time) to the respective local timestamp. If no mapping is found - a rough approximation is calculated. The timezones collection is appropriately indexed, of course - calling explain() confirms it. The process is slow. Naturally, having to query the timezones collection for every record kills the performance. I am looking for advises on how to improve it. Thanks. EDIT The timezones collection contains 8176040 records, each containing four values: > db.data.findOne() { "_id" : 3038814, "loc" : [ 1.48333, 42.5 ], "tz" : "Europe/Andorra" } EDIT2 OK, I have compiled a release build of http://toblerity.github.com/rtree/ and configured the rtree package. Then I have created an rtree dat/idx pair of files corresponding to my timezones collection. So, instead of calling collection.find_one I call index.intersection. Surprisingly, not only there is no improvement, but it works even more slowly now! May be rtree could be fine tuned to load the entire dat/idx pair into RAM (704M), but I do not know how to do it. Until then, it is not an alternative. In general, I think the solution should involve parallelization of the task. EDIT3 Profile output when using collection.find_one: >>> p.sort_stats('cumulative').print_stats(10) Tue Apr 10 14:28:39 2012 ImportDataIntoMongo.profile 64549590 function calls (64549180 primitive calls) in 1231.257 seconds Ordered by: cumulative time List reduced from 730 to 10 due to restriction <10> ncalls tottime percall cumtime percall filename:lineno(function) 1 0.012 0.012 1231.257 1231.257 ImportDataIntoMongo.py:1(<module>) 1 0.001 0.001 1230.959 1230.959 ImportDataIntoMongo.py:187(main) 1 853.558 853.558 853.558 853.558 {raw_input} 1 0.598 0.598 370.510 370.510 ImportDataIntoMongo.py:165(insert_documents) 343407 9.965 0.000 359.034 0.001 ImportDataIntoMongo.py:137(read_csv_zip) 343408 2.927 0.000 287.035 0.001 c:\python27\lib\site-packages\pymongo\collection.py:489(find_one) 343408 1.842 0.000 274.803 0.001 c:\python27\lib\site-packages\pymongo\cursor.py:699(next) 343408 2.542 0.000 271.212 0.001 c:\python27\lib\site-packages\pymongo\cursor.py:644(_refresh) 343408 4.512 0.000 253.673 0.001 c:\python27\lib\site-packages\pymongo\cursor.py:605(__send_message) 343408 0.971 0.000 242.078 0.001 c:\python27\lib\site-packages\pymongo\connection.py:871(_send_message_with_response) Profile output when using index.intersection: >>> p.sort_stats('cumulative').print_stats(10) Wed Apr 11 16:21:31 2012 ImportDataIntoMongo.profile 41542960 function calls (41542536 primitive calls) in 2889.164 seconds Ordered by: cumulative time List reduced from 778 to 10 due to restriction <10> ncalls tottime percall cumtime percall filename:lineno(function) 1 0.028 0.028 2889.164 2889.164 ImportDataIntoMongo.py:1(<module>) 1 0.017 0.017 2888.679 2888.679 ImportDataIntoMongo.py:202(main) 1 2365.526 2365.526 2365.526 2365.526 {raw_input} 1 0.766 0.766 502.817 502.817 ImportDataIntoMongo.py:180(insert_documents) 343407 9.147 0.000 491.433 0.001 ImportDataIntoMongo.py:152(read_csv_zip) 343406 0.571 0.000 391.394 0.001 c:\python27\lib\site-packages\rtree-0.7.0-py2.7.egg\rtree\index.py:384(intersection) 343406 379.957 0.001 390.824 0.001 c:\python27\lib\site-packages\rtree-0.7.0-py2.7.egg\rtree\index.py:435(_intersection_obj) 686513 22.616 0.000 38.705 0.000 c:\python27\lib\site-packages\rtree-0.7.0-py2.7.egg\rtree\index.py:451(_get_objects) 343406 6.134 0.000 33.326 0.000 ImportDataIntoMongo.py:162(<dictcomp>) 346 0.396 0.001 30.665 0.089 c:\python27\lib\site-packages\pymongo\collection.py:240(insert) EDIT4 I have parallelized the code, but the results are still not very encouraging. I am convinced it could be done better. See my own answer to this question for details.

Read the article
'module' object has no attribute 'PY2'

- by ManikandanV

I am using ubuntu 14.04, was trying to install python-memcache. I have got an error like Downloading/unpacking python-memcached Downloading python-memcached-1.53.tar.gz Cleaning up... Exception: Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/pip/basecommand.py", line 122, in main status = self.run(options, args) File "/usr/lib/python2.7/dist-packages/pip/commands/install.py", line 278, in run requirement_set.prepare_files(finder, force_root_egg_info=self.bundle, bundle=self.bundle) File "/usr/lib/python2.7/dist-packages/pip/req.py", line 1229, in prepare_files req_to_install.run_egg_info() File "/usr/lib/python2.7/dist-packages/pip/req.py", line 292, in run_egg_info logger.notify('Running setup.py (path:%s) egg_info for package %s' % (self.setup_py, self.name)) File "/usr/lib/python2.7/dist-packages/pip/req.py", line 284, in setup_py if six.PY2 and isinstance(setup_py, six.text_type): AttributeError: 'module' object has no attribute 'PY2' Storing debug log for failure in /home/mani/.pip/pip.log I am getting the same error when installing Django-celery, pymongo etc

Read the article
Graphing a line and scatter points using Matplotlib?

- by Patrick O'Doherty

Hi guys I'm using matplotlib at the moment to try and visualise some data I am working on. I'm trying to plot around 6500 points and the line y = x on the same graph but am having some trouble in doing so. I can only seem to get the points to render and not the line itself. I know matplotlib doesn't plot equations as such rather just a set of points so I'm trying to use and identical set of points for x and y co-ordinates to produce the line. The following is my code from matplotlib import pyplot import numpy from pymongo import * class Store(object): """docstring for Store""" def __init__(self): super(Store, self).__init__() c = Connection() ucd = c.ucd self.tweets = ucd.tweets def fetch(self): x = [] y = [] for t in self.tweets.find(): x.append(t['positive']) y.append(t['negative']) return [x,y] if __name__ == '__main__': c = Store() array = c.fetch() t = numpy.arange(0., 0.03, 1) pyplot.plot(array[0], array[1], 'ro', t, t, 'b--') pyplot.show() Any suggestions would be appreciated, Patrick

Read the article
Is it bad practice to extend the MongoEngine User document?

- by Soviut

I'm integrating MongoDB using MongoEngine. It provides auth and session support that a standard pymongo setup would lack. In regular django auth, it's considered bad practice to extend the User model since there's no guarantee it will be used correctly everywhere. Is this the case with mongoengine.django.auth? If it is considered bad practice, what is the best way to attach a separate user profile? Django has mechanisms for specifying an AUTH_PROFILE_MODULE. Is this supported in MongoEngine as well, or should I be manually doing the lookup?

Read the article
How can I optimize this code?

- by loop0

Hi, I'm developing a logger daemon to squid to grab the logs on a mongodb database. But I'm experiencing too much cpu utilization. How can I optimize this code? from sys import stdin from pymongo import Connection connection = Connection() db = connection.squid logs = db.logs buffer = [] a = 'timestamp' b = 'resp_time' c = 'src_ip' d = 'cache_status' e = 'reply_size' f = 'req_method' g = 'req_url' h = 'username' i = 'dst_ip' j = 'mime_type' L = 'L' while True: l = stdin.readline() if l[0] == L: l = l[1:].split() buffer.append({ a: float(l[0]), b: int(l[1]), c: l[2], d: l[3], e: int(l[4]), f: l[5], g: l[6], h: l[7], i: l[8], j: l[9] } ) if len(buffer) == 1000: logs.insert(buffer) buffer = [] if not l: break connection.disconnect()

Read the article
mongodb: insert if not exists

- by LeMiz

Hello, Every day, I receive a stock of documents (an update). What I want to do is inserting each of them if it does not exists. I also want to keep track of the first time I inserted them, and the last time I saw them in an update. I don't want to have duplicate documents. I don't want to remove a document which has previously been saved, but is not in my update. 95% (estimated) of the records are unmodified from day to day. I am using the python driver (pymongo), for that matter. What I currently do is (pseudo-code): for each document in update: existing_document = collection.find_one(document) if not existing_document: document['insertion_date'] = now else: document = existing_document document['last_update_date'] = now my_collection.save(document) My problem is that it is very slow (40 mins for less than 100 000 records, and I have millions of them in the update). I am pretty sure there is something builtin for doing this, but the document for update() is mmmhhh.... a bit terse.... ( http://www.mongodb.org/display/DOCS/Updating ) Can someone give an advice on doing it faster ?

Read the article
How to efficiently store and update binary data in Mongodb?

- by Rocketman

I am storing a large binary array within a document. I wish to continually add bytes to this array and sometimes change the value of existing bytes. I was looking for some $append_bytes and $replace_bytes type of modifiers but it appears that the best I can do is $push for arrays. It seems like this would be doable by performing seek-write type operations if I had access somehow to the underlying bson on disk, but it does not appear to me that there is anyway to do this in mongodb (and probably for good reason). If I were instead to just query this binary array, edit or add to it, and then update the document by rewriting the entire field, how costly will this be? Each binary array will be on the order of 1-2MB, and updates occur once every 5 minutes and across 1000s of documents. Worse, yet there is no easy way to spread these out (in time) and they will usually be happening close to one another on the 5 minute intervals. Does anyone have a good feel for how disastrous this will be? Seems like it would be problematic. An alternative would be to store this binary data as separate files on disk, implement a thread pool to efficiently manipulate the files on disk, and reference the filename from my mongodb document. (I'm using python and pymongo so I was looking at pytables). I'd prefer to avoid this though if possible. Is there any other alternative that I am overlooking here? Thanks in advnace.

Read the article

1