Python2.7: How can I speed up this bit of code (loop/lists/tuple optimization)?
        Posted  
        
            by 
                user89
            
        on Stack Overflow
        
        See other posts from Stack Overflow
        
            or by user89
        
        
        
        Published on 2014-05-31T02:59:09Z
        Indexed on 
            2014/05/31
            3:26 UTC
        
        
        Read the original article
        Hit count: 194
        
I repeat the following idiom again and again. I read from a large file (sometimes, up to 1.2 million records!) and store the output into an SQLite databse. Putting stuff into the SQLite DB seems to be fairly fast.
def readerFunction(recordSize, recordFormat, connection, outputDirectory, outputFile, numObjects):
insertString = "insert into NODE_DISP_INFO(node, analysis, timeStep, H1_translation, H2_translation, V_translation, H1_rotation, H2_rotation, V_rotation) values (?, ?, ?, ?, ?, ?, ?, ?, ?)" 
analysisNumber = int(outputPath[-3:])
outputFileObject = open(os.path.join(outputDirectory, outputFile), "rb")
outputFileObject, numberOfRecordsInFileObject = determineNumberOfRecordsInFileObjectGivenRecordSize(recordSize, outputFileObject)
numberOfRecordsPerObject = (numberOfRecordsInFileObject//numberOfObjects)
loop1StartTime = time.time()
for i in range(numberOfRecordsPerObject ):  
    processedRecords = []
    loop2StartTime = time.time()
    for j in range(numberOfObjects):
        fout = outputFileObject .read(recordSize)
        processedRecords.append(tuple([j+1, analysisNumber, i] + [x for x in list(struct.unpack(recordFormat, fout))]))
    loop2EndTime = time.time()
    print "Time taken to finish loop2: {}".format(loop2EndTime-loop2StartTime)  
    dbInsertStartTime = time.time()
    connection.executemany(insertString, processedRecords)
    dbInsertEndTime = time.time()
loop1EndTime = time.time()
print "Time taken to finish loop1: {}".format(loop1EndTime-loop1StartTime)
outputFileObject.close()
print "Finished reading output file for analysis {}...".format(analysisNumber)
When I run the code, it seems that "loop 2" and "inserting into the database" is where most execution time is spent. Average "loop 2" time is 0.003s, but it is run up to 50,000 times, in some analyses. The time spent putting stuff into the database is about the same: 0.004s. Currently, I am inserting into the database every time after loop2 finishes so that I don't have to deal with running out RAM.
What could I do to speed up "loop 2"?
© Stack Overflow or respective owner