use awk to identify multi-line record and filtering

Posted by nanshi on Stack Overflow See other posts from Stack Overflow or by nanshi
Published on 2012-05-30T22:38:12Z Indexed on 2012/05/30 22:40 UTC
Read the original article Hit count: 257

Filed under:
|
|

I need to process a big data file that contains multi-line records, example input:

1  Name      Dan
1  Title     Professor
1  Address   aaa street
1  City      xxx city
1  State     yyy
1  Phone     123-456-7890
2  Name      Luke
2  Title     Professor
2  Address   bbb street
2  City      xxx city
3  Name      Tom
3  Title     Associate Professor
3  Like      Golf
4  Name
4  Title     Trainer
4  Likes     Running

Note that the first integer field is unique and really identifies a whole record. So in the above input I really have 4 records although I dont know how many lines of attributes each records may have. I need to: - identify valid record (must have "Name" and "Title" field) - output the available attributes for each valid record, say "Name", "Title", "Address" are needed fields.

Example output:

1  Name      Dan
1  Title     Professor
1  Address   aaa street
2  Name      Luke
2  Title     Professor
2  Address   bbb street
3  Name      Tom
3  Title     Associate Professor

So in the output file, record 4 is removed since it doen't have the "Name" field. Record 3 doesn't have Address field but still being print to the output since it is a valid record that has "Name" and "Title".

Can I do this with awk? But how do i identify a whole record using the first "id" field on each line?

Thanks a lot to the unix shell script expert for helping me out! :)

© Stack Overflow or respective owner

Related posts about bash

Related posts about shell