Performing calculations by subsets of data in R

Posted by Vivi on Stack Overflow See other posts from Stack Overflow or by Vivi
Published on 2012-06-26T15:11:47Z Indexed on 2012/06/26 21:16 UTC
Read the original article Hit count: 252

Filed under:
|

I want to perform calculations for each company number in the column PERMNO of my data frame, the summary of which can be seen here:

> summary(companydataRETS)
     PERMNO           RET           
 Min.   :10000   Min.   :-0.971698  
 1st Qu.:32716   1st Qu.:-0.011905  
 Median :61735   Median : 0.000000  
 Mean   :56788   Mean   : 0.000799  
 3rd Qu.:80280   3rd Qu.: 0.010989  
 Max.   :93436   Max.   :19.000000  

My solution so far was to create a variable with all possible company numbers

compns <- companydataRETS[!duplicated(companydataRETS[,"PERMNO"]),"PERMNO"]

And then use a foreach loop using parallel computing which calls my function get.rho() which in turn perform the desired calculations

rhos <- foreach (i=1:length(compns), .combine=rbind) %dopar% 
      get.rho(subset(companydataRETS[,"RET"],companydataRETS$PERMNO == compns[i]))

I tested it for a subset of my data and it all works. The problem is that I have 72 million observations, and even after leaving the computer working overnight, it still didn't finish.

I am new in R, so I imagine my code structure can be improved upon and there is a better (quicker, less computationally intensive) way to perform this same task (perhaps using apply or with, both of which I don't understand). Any suggestions?

© Stack Overflow or respective owner

Related posts about r

    Related posts about data.table