identify documents from results of mahout clustering

Posted by Tejas on Stack Overflow See other posts from Stack Overflow or by Tejas
Published on 2010-10-15T20:18:29Z Indexed on 2011/01/15 19:53 UTC
Read the original article Hit count: 185

Filed under:

I am using mahout to cluster text documents indexed using solr.

I have used the "text" field in the document to form vectors. Then I used the k-means driver in mahout for clustering and then the clusterdumper utility to dump the results.

I am having difficulty in understanding the output results from the dumper. I could see the clusters formed with term vectors in those clusters. But how do I extract the documents from these clusters. I want the result to be the input documents appearing in different clusters.

© Stack Overflow or respective owner

Related posts about mahout