Using PIG with Hadoop, how do I regex match parts of text with an unknown number of groups?

Posted by lmonson on Stack Overflow See other posts from Stack Overflow or by lmonson
Published on 2010-12-30T04:50:35Z Indexed on 2010/12/30 4:53 UTC
Read the original article Hit count: 309

I'm using Amazon's elastic map reduce.

I have log files that look something like this

   random text foo="1" more random text foo="2"
   more text noise foo="1"
   blah blah blah foo="1" blah blah foo="3" blah blah foo="4" ...

How can I write a pig expression to pick out all the numbers in the 'foo' expressions?

I prefer tuples that look something like this:

(1,2)
(1)
(1,3,4)

I've tried the following:

TUPLES = foreach LINES generate FLATTEN(EXTRACT(line,'foo="([0-9]+)"'));

But this yields only the first match in each line:

(1)
(1)
(1)

© Stack Overflow or respective owner

Related posts about amazon-web-services

Related posts about hadoop