Using PIG with Hadoop, how do I regex match parts of text with an unknown number of groups?

Posted by lmonson on Stack Overflow See other posts from Stack Overflow or by lmonson
Published on 2010-12-30T04:50:35Z Indexed on 2010/12/30 4:53 UTC
Read the original article Hit count: 360

Filed under:

amazon-web-services

|

hadoop

|

mapreduce

|

pig

I'm using Amazon's elastic map reduce.

I have log files that look something like this

   random text foo="1" more random text foo="2"
   more text noise foo="1"
   blah blah blah foo="1" blah blah foo="3" blah blah foo="4" ...

How can I write a pig expression to pick out all the numbers in the 'foo' expressions?

I prefer tuples that look something like this:

(1,2)
(1)
(1,3,4)

I've tried the following:

TUPLES = foreach LINES generate FLATTEN(EXTRACT(line,'foo="([0-9]+)"'));

But this yields only the first match in each line:

(1)
(1)
(1)

© Stack Overflow or respective owner

Related posts about amazon-web-services

amazon web services and sql server support

as seen on Server Fault - Search for 'Server Fault'
Hi All, I have built my application using sql server 2008 and .net framework 3.5 I am looking for a sclable hosting service and have come to think of amazon web services. Does amazon also support hosting of sql server 2008 databases? What hosting services do you advise Thank you. >>> More
Unable to list owned images and running instances from Amazon Web Services using Zend Framework

as seen on Stack Overflow - Search for 'Stack Overflow'
I am using Zend Framework's library to manage EC2 instances and AMI. However I can't list the AMI's I own and can't list existing EC2 instances. $ec2Instance = new Zend_Service_Amazon_Ec2_Instance($awsAccessKey, $awsSecretKey); $instances = $ec2Instance ->describe(); $ec2Instance -describe()… >>> More
Amazon Web services - retrieving a wishlist

as seen on Stack Overflow - Search for 'Stack Overflow'
I've been tinkering with Yahoo Pipes and the Amazon E-Commerce Service (ECS) SDK to retrieve my wishlist. The problem is that although I can get all the items on my wishlist just fine, it seems to include items that I've deleted too. Has anyone else used this API and noticed this? Is there a way… >>> More
Amazon Web Services: A Developer Primer

as seen on Internet.com - Search for 'Internet.com'
With Amazon Web Services (AWS), developers get both scalable services that they can use to architect their applications and the flexibility to run any software on Amazon's compute cloud. >>> More
amazon web services

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, Has anyone of you worked on amazon web services? I just wanted to retrieve the citations of a given book. Is this possible using the aws? Is aws free? Thanks. >>> More

Related posts about hadoop

prerequisites of learnig hadoop, can php developer learn hadoop without java experience [closed]

as seen on Programmers - Search for 'Programmers'
i am willing to learn hadoop as a Developer , but i am confused over the prerequisite of learning it.? is having a good experience in java programming very essential to learn hadoop? I have 4 years of experience in application development in LAMP. But i am not in touch with java programming as a part… >>> More
Hadoop hdfs namenode is throwing an error

as seen on Server Fault - Search for 'Server Fault'
Full list of error: hb@localhost:/etc/hadoop/conf$ sudo service hadoop-hdfs-namenode start * Starting Hadoop namenode: starting namenode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-namenode-localhost.out 12/09/10 14:41:09 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG:… >>> More
Combining HBase and HDFS results in Exception in makeDirOnFileSystem

as seen on Server Fault - Search for 'Server Fault'
Introduction An attempt to combine HBase and HDFS results in the following: 2014-06-09 00:15:14,777 WARN org.apache.hadoop.hbase.HBaseFileSystem: Create Dir ectory, retries exhausted 2014-06-09 00:15:14,780 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java… >>> More
Problem compiling hive with ant

as seen on Stack Overflow - Search for 'Stack Overflow'
I compiling with Solaris 10 SPARC, jdk 1.6 from Sun, Ant 1.7.1 from OpenCSW. I have no problem running hadoop 0.17.2.1 However, I have problem compiling/integrating hive with the error 'cannot find symbol', although I followed the tutorial. I have the hive source code from SVN exactly from tutorial… >>> More
no namenode error in pseudo-mode

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm new to hadoop and is in learning phase. As per Hadoop Definitve guide, i have set up my hadoop in pseudo distributed mode and everything was working fine. I was even able to execute all the examples from chapter 3 yesterday. Today, when i rebooted my unix and tried to run start-dfs.sh and then… >>> More