Pig: Count number of keys in a map

Posted by Donald Miner on Stack Overflow See other posts from Stack Overflow or by Donald Miner
Published on 2012-12-05T22:12:58Z Indexed on 2012/12/05 23:03 UTC
Read the original article Hit count: 237

Filed under:
|
|

I'd like to count the number of keys in a map in Pig. I could write a UDF to do this, but I was hoping there would be an easier way.

data = LOAD 'hbase://MARS1'
       USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
         'A:*', '-loadKey true -caching=100000')
       AS (id:bytearray, A_map:map[]);

In the code above, I want to basically build a histogram of id and how many items in column family A that key has.

In hoping, I tried c = FOREACH data GENERATE id, COUNT(A_map); but that unsurprisingly didn't work.

Or, perhaps someone can suggest a better way to do this entirely. If I can't figure this out soon I'll just write a Java MapReduce job or a Pig UDF.

© Stack Overflow or respective owner

Related posts about hadoop

Related posts about hbase