vectorization of a text file

Posted by Fox on Stack Overflow See other posts from Stack Overflow or by Fox
Published on 2012-03-21T17:25:28Z Indexed on 2012/03/21 17:29 UTC
Read the original article Hit count: 318

Filed under:
|

I am trying to implement vectorization of a text file...I have created a dictionary (Unique words in all the documents) ... Which is the best way to implement this in java?

For example - My dictionary has the following words - {w1, w2, w3, w4} And I have 2 documents each having subset of the words in the vocabulary. I need to write to a text file the matrix in the form --

1,3,4,0
0,0,2,1

Here each row represents a document and the values represent the occurrence of each word in the document.

Can you suggest me the most efficient way to implement this in Java?

© Stack Overflow or respective owner

Related posts about java

Related posts about vectorization