Using awk to split text file every 10,000 lines

Posted by Sneaky Wombat on Super User See other posts from Super User or by Sneaky Wombat
Published on 2012-10-08T20:59:11Z Indexed on 2012/10/08 21:40 UTC
Read the original article Hit count: 286

Filed under:

awk

I have a large gzip'd text file. I'd like to something like:

zcat BIGFILE.GZ | awk (snag 10,000 lines and redirect to...)|gzip -9 smallerPartFile.gz

the awk part up there, I basically want it to take 10,000 lines and send it to gzip and then repeat until all lines in the original input file are consumed. I found a script that claims to do this, but when I run it on my files and then diff the original to the ones that were split and then merged, lines are missing. So, something is wrong with the awk part and I'm not sure what part is broken.

Here's the code. Can someone tell me why this doesn't yield a file that can be split and merged and then diff'd to the original successfully?

# Generate files part0.dat.gz, part1.dat.gz, etc.
# restore with: zcat foo* | gzip -9 > restoredFoo.sql.gz (or something like that)
prefix="foo"
count=0
suffix=".sql"

lines=10000 # Split every 10000 line.

zcat /home/foo/foo.sql.gz |
while true; do
  partname=${prefix}${count}${suffix}

  # Use awk to read the required number of lines from the input stream.
  awk -v lines=${lines} 'NR <= lines {print} NR == lines {exit}' >${partname}

  if [[ -s ${partname} ]]; then
    # Compress this part file.
    gzip -9 ${partname}
    (( ++count ))
  else
    # Last file generated is empty, delete it.
    rm -f ${partname}
    break
  fi
done

Related posts about awk

problem with awk script

as seen on Stack Overflow - Search for 'Stack Overflow'
Hello, when I call my awk script, I keep getting an error : sam@sam-laptop:~/shell/td4$ awk -f agenda.awk -- -n Robert agenda.txt awk: agenda.awk:6: printf "Hello" awk: agenda.awk:6: ^ syntax error the script contains this : #!/usr/bin/awk BEGIN { } printf "Hello" END { } Thank you >>> More
Awk to grab colo(u)r codes from CSS files aka School me in Awk

as seen on Stack Overflow - Search for 'Stack Overflow'
Nice and (hopefully) easy. I am trying to work out how to grab the variable #XXX from a text file (css file) containing strings like hr { margin: 18px 0 17px; border-color: #ccc; } h1 a:hover, h2 a:hover, h3 a:hover { color: #001100; } Which I would like to return as ccc 777 The… >>> More
Parsing the output of "uptime" with bash

as seen on Super User - Search for 'Super User'
I would like to save the output of the uptime command into a csv file in a Bash script. Since the uptime command has different output formats based on the time since the last reboot I came up with a pretty heavy solution based on case, but there is surely a more elegant way of doing this. uptime… >>> More
AWK scripting :How to remove Field separator using awk

as seen on Stack Overflow - Search for 'Stack Overflow'
Need the following output ONGC044 ONGC043 ONGC042 ONGC041 ONGC046 ONGC047 from this input Medium Label Medium ID Free Blocks =============================================================================== [ONGC044] ECCPRDDB_FS_43 ac100076:4aed9b39:44f0:0001… >>> More
sort associative array in awk - help?

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi all, I have an associative array in awk that gets populated like this... chr_count[$3]++ When I try to print my chr_counts I use this: for (i in chr_count) { print i,":",chr_count[i]; } But not surprisingly, the order of i is not sorted in any way. Is there an easy way to iterate over… >>> More

Developer IT

Using awk to split text file every 10,000 lines - Developer IT

Using awk to split text file every 10,000 lines

awk

Related posts about awk

problem with awk script

Awk to grab colo(u)r codes from CSS files aka School me in Awk

Parsing the output of "uptime" with bash

AWK scripting :How to remove Field separator using awk

sort associative array in awk - help?

Categories cloud