Parse large XML file w/ script or use BioPython API ?

Posted by jeremy04 on Stack Overflow See other posts from Stack Overflow or by jeremy04
Published on 2010-05-26T17:51:03Z Indexed on 2010/05/27 18:11 UTC
Read the original article Hit count: 210

Filed under:
|
|

Hey guys this is my first question on here. I'm trying to make a local copy of the UniprotKB in SQL.

The UniprotKB is 2.1GB, and it comes in XML and a special text format used by SwissProt

Here are my options:

1) Use a SAX parser (XML) - I chose Ruby, and Nokogiri. I started writing the parser, but my initial reaction: how would I map the XML schema to the SAX parser?

2) BioPython - I already have BioSQL/Biopython installed, which literally created my SQL schema for me, and I was able to successfully insert one SwissProt/Uniprot txt file into the database.

I'm running it right now (crosses fingers) on the entire 2.1gb. Here is the code I'm running:


from Bio import SeqIO
from BioSQL import BioSeqDatabase
from Bio import SwissProt

server = BioSeqDatabase.open_database(driver = "MySQLdb", user = "root", passwd = "", host="localhost", db = "bioseqdb")
db = server["uniprot"]
iterator = SeqIO.parse(open("/path/to/uniprot_sprot.dat", "r"), "swiss")
db.load(iterator)
server.commit()

Edit: it's now crashing because the transactions are getting locked (since the tables are Innodb) Error Number: 1205 Lock wait timeout exceeded; try restarting transaction. I'm using MySQL version: 5.1.43

Should I switch my database to Postgrelsql ?

© Stack Overflow or respective owner

Related posts about python

Related posts about Xml