Fastest way to convert file from latin1 to utf-8 in python.

Posted by xsaero00 on Stack Overflow See other posts from Stack Overflow or by xsaero00
Published on 2010-03-08T21:22:24Z Indexed on 2010/03/08 22:06 UTC
Read the original article Hit count: 199

Filed under:

I need fastest way to convert files from latin1 to utf-8 in python. The files are large ~ 2G. ( I am moving DB data ). So far I have

import codecs
infile = codecs.open(tmpfile, 'r', encoding='latin1')
outfile = codecs.open(tmpfile1, 'w', encoding='utf-8')
for line in infile:
     outfile.write(line)
infile.close()
outfile.close()

but it is still slow. The conversion takes one fourth of the whole migration time.

I could also use a linux command line utility if it is faster than native python code.

© Stack Overflow or respective owner

Related posts about python