Foreign-key-like merge in R
        Posted  
        
            by 
                skyl
            
        on Stack Overflow
        
        See other posts from Stack Overflow
        
            or by skyl
        
        
        
        Published on 2012-07-07T02:41:36Z
        Indexed on 
            2012/07/07
            3:15 UTC
        
        
        Read the original article
        Hit count: 189
        
r
I'm merging a bunch of csv with 1 row per id/pk/seqn.
> full = merge(demo, lab13am, by="seqn", all=TRUE)
> full = merge(full, cdq, by="seqn", all=TRUE)
> full = merge(full, mcq, by="seqn", all=TRUE)
> full = merge(full, cfq, by="seqn", all=TRUE)
> full = merge(full, diq, by="seqn", all=TRUE)
> print(length(full$ridageyr))
[1] 9965
> print(summary(full$ridageyr))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00   11.00   19.00   29.73   48.00   85.00 
Everything is great. But, I have another file which has multiple rows per id like:
"seqn","rxd030","rxd240b","nhcode","rxq250"
56,2,"","",NA,NA,""
57,1,"ACETAMINOPHEN","01200",2
57,1,"BUDESONIDE","08800",1
58,1,"99999","",NA
57 has two rows. So, if I naively try to merge this file, I have a ton more rows and my data gets all skewed up.
> full = merge(full, rxq, by="seqn", all=TRUE)
> print(length(full$ridageyr))
[1] 15643
> print(summary(full$ridageyr))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00   14.00   41.00   40.28   66.00   85.00 
Is there a normal idiomatic way to deal with data like this? Suppose I want a way to make a simple model like
MYSPECIAL_FACTOR <- somehow()
glm(MYSPECIAL_FACTOR ~ full$ridageyr, family=binomial)
where MYSPECIAL_FACTOR is, say, whether or not rxd240b == "ACETAMINOPHEN" for the observations which are unique by seqn. You can reproduce by running the first bit of this.
© Stack Overflow or respective owner