Is it conceivable to have millions of lists of data in memory in Python?

Posted by Codemonkey on Programmers See other posts from Programmers or by Codemonkey
Published on 2012-12-03T09:48:42Z Indexed on 2012/12/03 11:54 UTC
Read the original article Hit count: 292

Filed under:

python

|

data-structures

|

optimization

I have over the last 30 days been developing a Python application that utilizes a MySQL database of information (specifically about Norwegian addresses) to perform address validation and correction. The database contains approximately 2.1 million rows (43 columns) of data and occupies 640MB of disk space.

I'm thinking about speed optimizations, and I've got to assume that when validating 10,000+ addresses, each validation running up to 20 queries to the database, networking is a speed bottleneck.

I haven't done any measuring or timing yet, and I'm sure there are simpler ways of speed optimizing the application at the moment, but I just want to get the experts' opinions on how realistic it is to load this amount of data into a row-of-rows structure in Python. Also, would it even be any faster? Surely MySQL is optimized for looking up records among vast amounts of data, so how much help would it even be to remove the networking step? Can you imagine any other viable methods of removing the networking step?

The location of the MySQL server will vary, as the application might well be run from a laptop at home or at the office, where the server would be local.

© Programmers or respective owner

Related posts about python

unmet dependencies in Ubuntu 12.04

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I tried today to install a dvb-card on my Ubuntu 12.04 (Linux blauhai-linux 3.2.0-25-generic #40-Ubuntu SMP Wed May 23 20:30:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux ). The installation failed with an error. After that, i tried to install python (it was already installed but i got this error): linux:~$… >>> More
How can I get sikuli-ide to work?

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I installed sikuli-ide with sudo apt-get install sikuli-ide Everything was fine until I tried to start it from the terminal. I typed sikuli-ide But the only response I got was [info] locale: en_US The application was not started, furthermore there is no desktop file and sikuli-ide does not… >>> More
Getting PATH right for python after MacPorts install

as seen on Super User - Search for 'Super User'
I can't import some python libraries (PIL, psycopg2) that I just installed with MacPorts. I looked through these forums, and tried to adjust my PATH variable in $HOME/.bash_profile in order to fix this but it did not work. I added the location of PIL and psycopg2 to PATH. I know that Terminal is… >>> More
call python with system() in R to run a python script emulating the python console

as seen on Stack Overflow - Search for 'Stack Overflow'
I want to pass a chunk of Python code to Python in R with something like system('python ...'), and I'm wondering if there is an easy way to emulate the python console in this case. For example, suppose the code is "print 'hello world'", how can I get the output like this in R? >>> print… >>> More
Python - Calling a non python program from python?

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, I am currently struggling to call a non python program from a python script. I have a ~1000 files that when passed through this C++ program will generate ~1000 outputs. Each output file must have a distinct name. The command I wish to run is of the form: program_name -input -output -o1 -o2… >>> More

Related posts about data-structures

Clever ways of implementing different data structures in C & data structures that should be used mor

as seen on Stack Overflow - Search for 'Stack Overflow'
What are some clever (not ordinary) ways of implementing data structures in C, and what are some data structures that should be used more often? For example, what is the most effective way (generating minimal overhead) to implement a directed and cyclic graph with weighted edges in C? I know that… >>> More
Is there a way to track data structure dependencies from the database, through the tiers, all the way out to a web page?

as seen on Programmers - Search for 'Programmers'
When we design applications, we generally end up with the same tiered sets of data structures: A persistent data structure that is described using DDL and implemented as RDBMS tables and columns. A set of domain objects that consist primarily of data structures, usually combined with business-rule… >>> More
Why are data structures so important in interviews?

as seen on Programmers - Search for 'Programmers'
I am a newbie into the corporate world recently graduated in computers. I am a java/groovy developer. I am a quick learner and I can learn new frameworks, APIs or even programming languages within considerably short amount of time. Albeit that, I must confess that I was not so strong in data structures… >>> More
Thread-safe data structures

as seen on Stack Overflow - Search for 'Stack Overflow'
Hello, I have to design a data structure that is to be used in a multi-threaded environment. The basic API is simple: insert element, remove element, retrieve element, check that element exists. The structure's implementation uses implicit locking to guarantee the atomicity of a single API call.… >>> More
Data Structures

as seen on Stack Overflow - Search for 'Stack Overflow'
There is a large stream of numbers coming in such as 5 6 7 2 3 1 2 3 .. What kind of data structure is suitable for this problem given the constraints that elements must be inserted in descending order and duplicates should be eliminated. I am not looking for any code just ideas? I was thinking… >>> More