Search Results

Search found 2 results on 1 pages for 'plotti'.

Page 1/1 | 1 

  • Performing a SVD on tweets. Memory problem

    - by plotti
    I have generated a huge csv file as an output from my pos tagging and stemming. It looks like this: word1, word2, word3, ..., word14400 person1 1 2 0 1 person2 0 0 1 0 ... person650 It contains the word counts for each person. Like this I am getting characteristic vectors for each person. I want to run a SVD on this beast, but it seems the matrix is too big to be held in memory to perform the operation. My quesion is: should i reduce the column size by removing words which have a column sum of for example 1, which means that they have been used only once. Do I bias the data too much with this attempt? I tried the rapidminer attempt, by loading the csv into the db. and then sequentially reading it in with batches for processing, like rapidminer proposes. But Mysql can't store that many columns in a table. If i transpose the data, and then retranspose it on import it also takes ages.... -- So in general I am asking for advice how to perform a svd on such a corpus.

    Read the article

  • Plotting Tweets from DB in Ruby, grouping by hour.

    - by plotti
    Hey guys I've got a couple of issues with my code. I was wondering that I am plotting the results very ineffectively, since the grouping by hour takes ages the DB is very simple it contains the tweets, created date and username. It is fed by the twitter gardenhose. Thanks for your help ! require 'rubygems' require 'sequel' require 'gnuplot' DB = Sequel.sqlite("volcano.sqlite") tweets = DB[:tweets] def get_values(keyword,tweets) my_tweets = tweets.filter(:text.like("%#{keyword}%")) r = Hash.new start = my_tweets.first[:created_at] my_tweets.each do |t| hour = ((t[:created_at]-start)/3600).round r[hour] == nil ? r[hour] = 1 : r[hour] += 1 end x = [] y = [] r.sort.each do |e| x << e[0] y << e[1] end [x,y] end keywords = ["iceland", "island", "vulkan", "volcano"] values = {} keywords.each do |k| values[k] = get_values(k,tweets) end Gnuplot.open do |gp| Gnuplot::Plot.new(gp) do |plot| plot.terminal "png" plot.output "volcano.png" plot.data = [] values.each do |k,v| plot.data << Gnuplot::DataSet.new([v[0],v[1]]){ |ds| ds.with = "linespoints" ds.title = k } end end end

    Read the article

1