Create a dataset: extract features from text documents (TF-IDF)

Posted by BigG on Stack Overflow See other posts from Stack Overflow or by BigG
Published on 2010-05-27T13:27:49Z Indexed on 2010/05/27 13:31 UTC
Read the original article Hit count: 214

Filed under:

java

|

tools

|

data

|

information-retrieval

|

scores

I've to create a dataset from some text files, writing them as vectors of features.

Something like this:

doc1: 1,0.45 6,0.001 94,0.1 ...

doc2: 3,0.5 98,0.2 ...

...

each position of the vector represent a word, and the score is given by something like TF-IDF.

Do you know some library/tool/whatever for this? (java is better)

© Stack Overflow or respective owner

Related posts about java

Tomcat 6: Access Control Exception?

as seen on Server Fault - Search for 'Server Fault'
I'm trying to setup a tomcat6 server, and I'm trying to match another setup someone else established. However, my deployment (default Ubuntu install) uses a policy.d/ directory structure, and the established server just uses a catalina.policy file. I've tried setting every entry in policy.d to match… >>> More
Problem in creation MDB Queue connection at Jboss StartUp

as seen on Stack Overflow - Search for 'Stack Overflow'
I am not able to create a Queue connection in JBOSS4.2.3GA Version & Java1.5, as I am using MDB as per the below details. I am putting this MDB in a jar file(named utsJar.jar) and copied it in deploy folder of JBOSS, In the test env. this MDB works well but in another env. [ env settings and… >>> More
failing to establish connection between Postgres db and gwt

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, I am using Postgres and gwt 2.0 for one of my applications. I am facing problem connecting to the database. When I try to connect it gives "ClassNotFoundException". Here is what I get when I try to connect to database: java.lang.ClassNotFoundException: org.postgresql.Driver at java.net… >>> More
failing to establish connection between postgre db and gwt

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, For i am using postgre and gwt 2.0 for one of my applications. I am facing problem connecting to the database. When i try to connect it gives "ClassNotFoundException". Here is what i get when i try to connect to database: java.lang.ClassNotFoundException: org.postgresql.Driver at java.net… >>> More
Migration and deployement problems JBoss 4.2.2.GA to JBoss 6.0.0.M2

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, I'm trying to migrate an application running on JBoss 4.2.2.GA to JBoss 6.0.0.M2 I give you some log to explain my problem : boot.log : 2010-03-16 09:59:29,406 ERROR [org.jboss.system.server.profileservice.ProfileServiceBootstrap] (Thread-2) Failed to load profile: Summary of incomplete deployments… >>> More

Related posts about tools

java.lang.IllegalAccessException during Ant jwsc webservice build

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi. I have a large application, part of which relies on a set of 3 webservices. I'm currently in the process of writing an Ant build script to build and package the application into an EAR file. When building the web sub-project for this application I use the <jwsc> task in Ant to compile… >>> More
juju bootstrap fails with a local environment, why?

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
Each time I try to bootstrap juju using a local enviroment it fails starting the juju-db-braiam-local script as follows: $ sudo juju --debug --verbose bootstrap 2013-10-20 02:28:53 INFO juju.provider.local environprovider.go:32 opening environment "local" 2013-10-20 02:28:53 DEBUG juju.provider.local… >>> More
How to stop an IOException error using whilst using a combination of jython, pyro and ant?

as seen on Stack Overflow - Search for 'Stack Overflow'
So the wonderful low down on this doozie of a problem: short version: We are building a distribution system for this item of software we're using. Basically we take out build artifact, store it on an ftp server which passes it to multiple clients which execute scripts to patch their servers. Long… >>> More
DNS Tools - DNS and Few of its Concerning Terms and Tools

as seen on Article City - Search for 'Article City'
A DNS or Domain Name System lets you locate computers on a network or the Internet TCP/IP network by domain name. The DNS server sustains a database of domain names or host names along with their cor... [Author: Daisy Osbaldo - Computers and Internet - April 02, 2010] >>> More
Search Engines Online Business Tools For Website Marketing - 3 Free Tools to Optimise Your Website

as seen on Ezine Articles - Search for 'Ezine Articles'
Search engines online business tools for website marketing are available by the thousands, if not millions. Lots of software companies have designed a whole range of different applications to help you optimise your website and marketing campaigns. When I first started with online marketing, I looked… >>> More