Hadoop: Processing large serialized objects

Posted by restrictedinfinity on Stack Overflow See other posts from Stack Overflow or by restrictedinfinity
Published on 2010-06-10T06:28:22Z Indexed on 2010/06/10 6:32 UTC
Read the original article Hit count: 229

Filed under:
|
|
|

I am working on development of an application to process (and merge) several large java serialized objects (size of order GBs) using Hadoop framework. Hadoop stores distributes blocks of a file on different hosts. But as deserialization will require the all the blocks to be present on single host, its gonna hit the performance drastically. How can I deal this situation where different blocks have to cant be individually processed, unlike text files ?

© Stack Overflow or respective owner

Related posts about java

Related posts about Performance