Performing a SVD on tweets. Memory problem

Posted by plotti on Stack Overflow See other posts from Stack Overflow or by plotti
Published on 2010-05-12T12:23:12Z Indexed on 2010/05/12 12:24 UTC
Read the original article Hit count: 270

Filed under:

svd

|

matrix

|

datamining

I have generated a huge csv file as an output from my pos tagging and stemming. It looks like this:

        word1, word2, word3, ..., word14400
person1   1      2      0            1
person2   0      0      1            0
...
person650

It contains the word counts for each person. Like this I am getting characteristic vectors for each person.

I want to run a SVD on this beast, but it seems the matrix is too big to be held in memory to perform the operation. My quesion is:

should i reduce the column size by removing words which have a column sum of for example 1, which means that they have been used only once. Do I bias the data too much with this attempt?
I tried the rapidminer attempt, by loading the csv into the db. and then sequentially reading it in with batches for processing, like rapidminer proposes. But Mysql can't store that many columns in a table. If i transpose the data, and then retranspose it on import it also takes ages....

--> So in general I am asking for advice how to perform a svd on such a corpus.

© Stack Overflow or respective owner

Related posts about svd

OpenCV SVD Matrix format.

as seen on Stack Overflow - Search for 'Stack Overflow'
I currently have a set of 2D Cartesian coordinates e.g. {(1,3), (2,2), (3,4)} Which will be put into a 2D array, to perform SVD properly would the matrix be put together such that the coordinates form the columns or the rows e.g. 1 3 2 2 3 4 or 1 2 3 3 2 4 I have been doing a little trial… >>> More
SVD factorization

as seen on Stack Overflow - Search for 'Stack Overflow'
How can i use the bidiagonal and diagonal form to compute the SVD factorization. I think my real problem is that i don't understand this concepts. >>> More
What is SVD(singular value decomposition)

as seen on Stack Overflow - Search for 'Stack Overflow'
How does it actually reduce noise..can you suggest some nice tutorials? >>> More
Performing a SVD on tweets. Memory problem

as seen on Stack Overflow - Search for 'Stack Overflow'
I have generated a huge csv file as an output from my pos tagging and stemming. It looks like this: word1, word2, word3, ..., word14400 person1 1 2 0 1 person2 0 0 1 0 ... person650 It contains the word counts for each person. Like this I am… >>> More
Matlab Image watermarking question , using both SVD and DWT

as seen on Stack Overflow - Search for 'Stack Overflow'
Hello all . here is a code that i got over the net ,and it is supposed to embed a watermark of size(50*20) called _copyright.bmp in the Code below . the size of the cover object is (512*512), it is called _lena_std_bw.bmp.What we did here is we did DWT2 2 times for the image , when we reached our… >>> More

Related posts about matrix

Find the "largest" dense sub matrix in a large sparse matrix

as seen on Stack Overflow - Search for 'Stack Overflow'
Given a large sparse matrix (say 10k+ by 1M+) I need to find a subset, not necessarily continuous, of the rows and columns that form a dense matrix (all non-zero elements). I want this sub matrix to be as large as possible (not the largest sum, but the largest number of elements) within some aspect… >>> More
How does a template class inherit another template class?

as seen on Stack Overflow - Search for 'Stack Overflow'
I have a "SquareMatrix" template class which inherits "Matrix" template class, like below: SquareMatrix.h: #ifndef SQUAREMATRIX_H #define SQUAREMATRIX_H #include "Matrix.h" template <class T> class SquareMatrix : public Matrix<T> { public: T GetDeterminant(); }; template… >>> More
iPhone OpenGL ES: How do I use gravity vector to correctly transform scene for augmented reality

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm trying figure out how to get an OpenGL specified object to be displayed correctly according to the device orientation (ie. according to the gravity vector from the accelerometer, and heading from compass). The GLGravity sample project has an example which is almost like this (despite ignoring… >>> More
How do I use the gravity vector to correctly transform scene for augmented reality?

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm trying figure out how to get an OpenGL specified object to be displayed correctly according to the device orientation (ie. according to the gravity vector from the accelerometer, and heading from compass). The GLGravity sample project has an example which is almost like this (despite ignoring… >>> More
C++ templates - matrix class

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi I'm learning templates in C++ so I decied to write matrix class which would be a template class. In Matrix.h file I wrote #pragma once #include "stdafx.h" #include <vector> using namespace std; template<class T> class Matrix { public: Matrix(); ~Matrix(); GetDataVector(); … >>> More