Efficient splitting of elements in a field

Posted by Gary on Stack Overflow See other posts from Stack Overflow or by Gary
Published on 2010-04-12T15:42:03Z Indexed on 2010/04/12 15:42 UTC
Read the original article Hit count: 319

Filed under:
|

I have a field in a text file exported from a database. The field contains addresses but sometimes they are quite long and the database allows them to contain multiple lines. When exported, the newline character gets replaced with a dollar sign like this:

first part of very long address$second part of very long address$third part of very long address

Not every address has multiple lines and no address contains more than three lines. The length of each line is variable.

I'm massaging the data for import into MS Access which is used for a mailmerge. I want to split the field on the $ sign if it's there but if the field only contains 1 line, I want to set my two extra output fields to a zero length string so that I don't wind up with blank lines in the address when it gets printed.

I have an awk file that's working correctly on all the other data in the textfile but I need to get this last bit working. I tried the below code. Aside from the fact that I get a syntax error at the else, I'm not sure this is a good way to do what I want. This is being done with gawk on Windows.

BEGIN { FS = "|" }

$1 != "HEADER" {

if ($6 ~ /\$/)
    split($6, arr, "$")
    address = arr[1]
    addresstwo = arr[2]
    addressthree = arr[3]
    addressLength = length(address)
    addressTwoLength = length(addresstwo)
    addressThreeLength = length(addressthree)

else {
    address = $6
    addressLength = length($6)
    addresstwo = ""
    addressTwoLength = length(addresstwo)
addressthree = ""
    addressThreeLength = length(addressthree)
    }

printf("%*s\t%*s\t\%*s\n",
      addressLength, address, addressTwoLength, addresstwo, addressThreeLength, addressthree)

}

© Stack Overflow or respective owner

Related posts about gawk

Related posts about newbie