Free Large datasets to experiment with Hadoop

Posted by Sundar on Stack Overflow See other posts from Stack Overflow or by Sundar
Published on 2010-04-20T10:54:11Z Indexed on 2010/04/22 22:23 UTC
Read the original article Hit count: 479

Filed under:
|
|

Do you know any large datasets to experiment with Hadoop which is free/low cost? Any pointers/links related is appreciated.

Prefernce:

  • Atleast one GB of data.

  • Production log data of webserver.

Few of them which I found so far:

  1. http://dumps.wikimedia.org/enwiki/20100130/

  2. http://wiki.freebase.com/wiki/Data_dumps

  3. http://aws.amazon.com/publicdatasets/

Also can we run our own crawler to gather data from sites e.g. Wikipedia? Any pointers on how to do this is appreciated as well.

© Stack Overflow or respective owner

Related posts about opendata

Related posts about resources