Python line file iteration and strange characters

Posted by muckabout on Stack Overflow See other posts from Stack Overflow or by muckabout
Published on 2010-04-29T13:57:43Z Indexed on 2010/04/29 14:17 UTC
Read the original article Hit count: 348

Filed under:

python

|

codec

|

linebreaks

|

gzip

I have a huge gzipped text file which I need to read, line by line. I go with the following:

for i, line in enumerate(codecs.getreader('utf-8')(gzip.open('file.gz'))):
  print i, line

At some point late in the file, the python output diverges from the file. This is because lines are getting broken due to weird special characters that python thinks are newlines. When I open the file in 'vim', they are correct, but the suspect characters are formatted weirdly. Is there something I can do to fix this?

I've tried other codecs including utf-16, latin-1. I've also tried with no codec.

I looked at the file using 'od'. Sure enough, there are \n characters where they shouldn't be. But, the "wrong" ones are prepended by a weird character. I think there's some encoding here with some characters being 2-bytes, but the trailing byte being a \n if not viewed properly.

If I replace:

gzip.open('file.gz')

With:

os.popen('zcat file.gz')

It works fine (and actually, quite faster). But, I'd like to know where I'm going wrong.

© Stack Overflow or respective owner

Related posts about python

unmet dependencies in Ubuntu 12.04

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I tried today to install a dvb-card on my Ubuntu 12.04 (Linux blauhai-linux 3.2.0-25-generic #40-Ubuntu SMP Wed May 23 20:30:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux ). The installation failed with an error. After that, i tried to install python (it was already installed but i got this error): linux:~$… >>> More
How can I get sikuli-ide to work?

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I installed sikuli-ide with sudo apt-get install sikuli-ide Everything was fine until I tried to start it from the terminal. I typed sikuli-ide But the only response I got was [info] locale: en_US The application was not started, furthermore there is no desktop file and sikuli-ide does not… >>> More
Getting PATH right for python after MacPorts install

as seen on Super User - Search for 'Super User'
I can't import some python libraries (PIL, psycopg2) that I just installed with MacPorts. I looked through these forums, and tried to adjust my PATH variable in $HOME/.bash_profile in order to fix this but it did not work. I added the location of PIL and psycopg2 to PATH. I know that Terminal is… >>> More
call python with system() in R to run a python script emulating the python console

as seen on Stack Overflow - Search for 'Stack Overflow'
I want to pass a chunk of Python code to Python in R with something like system('python ...'), and I'm wondering if there is an easy way to emulate the python console in this case. For example, suppose the code is "print 'hello world'", how can I get the output like this in R? >>> print… >>> More
Python - Calling a non python program from python?

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, I am currently struggling to call a non python program from a python script. I have a ~1000 files that when passed through this C++ program will generate ~1000 outputs. Each output file must have a distinct name. The command I wish to run is of the form: program_name -input -output -o1 -o2… >>> More

Related posts about codec

Trouble installing Lagarith Lossless Codec (Codec Removal?)

as seen on Super User - Search for 'Super User'
I had been using the Lagarith Lossless Codec with camstudio on Windows XP Pro SP3 for a little while before I switched to Ubuntu. Now I'm back trying to do something on windows and the codec is now missing. I tried to install it via the .exe and even manually with the .inf but it's not being listed… >>> More
HDMI Sound Stops Working

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
When not used for a few hours the sound stops working on my HTPC. To get sound to work again I have to unplug the HDMI cable and plug it back in again When this sound "outage" occurs, the HDMI device dissapears from the sound config output devices I see the following in dmesg [78534.010328] HDMI… >>> More
Codec problem: Video noise / squares blinking as it play

as seen on Super User - Search for 'Super User'
Its a weird and intermittent problem, it comes and goes with time. I thought I could fix it rebooting, but it seems that is not always. Take a look: This noise keeps blinking at the screen the entire movie. If I give up and try again an hour from now, it will be fine... weird! I have K-Lite Codec… >>> More
What is some good lossless video codec for recording gameplay?

as seen on Super User - Search for 'Super User'
I'm an avid gamer and I like to record my gameplay. Usually I've been using Fraps to do it, however I'm thinking of switching to Dxtory as it allows to write on multiple HDDs at once. Say I have 3 HDDs with the following write speeds: HDD1 with 50 mb/s, HDD2 with 22 mb/s and HDD3 with 45 mb/s. Combined… >>> More
How to stream H264 Video from camera over FTP?

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I bought a h264 security camera system last year and set it up to ftp video to my computer. I was able to get the video to play (even though it played a little fast) on Ubuntu 11.04 using mplayer. A few months ago, I did a fresh install of 12.04 and I cannot seem to get the video to play with… >>> More