R optimization: How can I avoid a for loop in this situation?

Posted by chrisamiller on Stack Overflow See other posts from Stack Overflow or by chrisamiller
Published on 2010-03-25T17:23:18Z Indexed on 2010/03/25 21:13 UTC
Read the original article Hit count: 214

Filed under:

intersection

I'm trying to do a simple genomic track intersection in R, and running into major performance problems, probably related to my use of for loops.

In this situation, I have pre-defined windows at intervals of 100bp and I'm trying to calculate how much of each window is covered by the annotations in mylist. Graphically, it looks something like this:

          0    100   200    300    400   500   600  
windows: |-----|-----|-----|-----|-----|-----|

mylist:    |-|   |-----------|

So I wrote some code to do just that, but it's fairly slow and has become a bottleneck in my code:

##window for each 100-bp segment    
windows <- numeric(6)

##second track
mylist = vector("list")
mylist[[1]] = c(1,20)
mylist[[2]] = c(120,320)


##do the intersection
for(i in 1:length(mylist)){
  st <- floor(mylist[[i]][1]/100)+1
  sp <- floor(mylist[[i]][2]/100)+1
  for(j in st:sp){       
    b <- max((j-1)*100, mylist[[i]][1])
    e <- min(j*100, mylist[[i]][2])
    windows[j] <- windows[j] + e - b + 1
  }
}

print(windows)
[1]  20  81 101  21   0   0

Naturally, this is being used on data sets that are much larger than the example I provide here. Through some profiling, I can see that the bottleneck is in the for loops, but my clumsy attempt to vectorize it using *apply functions resulted in code that runs an order of magnitude more slowly.

I suppose I could write something in C, but I'd like to avoid that if possible. Can anyone suggest another approach that will speed this calculation up?

Related posts about bioinformatics

How do programmer-seeking employers see Bioinformatics degree?

as seen on Programmers - Search for 'Programmers'
I love programming, but I also love biology. Basically Bioinformatics sounds fun to me. However, there is a fat chance that I won't get a Bioinformatics job and will be forced to build my career around regular programming. Therefore a question: does it matter (much) for an employer if he is looking… >>> More
Recommended reading for bioinformatics

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm keen on learning about bioinformatics. I am ideally looking for a short course introduction, with some practical tasks I can get my teeth into immediately to see if there is any interest in it for me. I already have a good understanding of molecular biology, so I should be able to skip most… >>> More
What do you think is the best language for Bioinformatics?

as seen on Stack Overflow - Search for 'Stack Overflow'
I have done a couple research jobs in Bio-informatics and I have used Matlab for them. Matlab had a lot of powerful tools and was easy to use. I did thinks with genome sequencing and predicting metabolic pathways. I am wondering what other people think is best? or there might not be one specific language… >>> More
Why is Perl used so extensively in Biology research?

as seen on Stack Overflow - Search for 'Stack Overflow'
I work as support staff in a Biology research institute as a student, and Perl seems to be used everywhere. Not for every single project, but it seems that more than half the people here have a few Perl books in/on their office/desk. Why is Perl used so much in Biology? >>> More
Bored with CS. What can I study that will make an impact?

as seen on Stack Overflow - Search for 'Stack Overflow'
So I'm going to an average university, majoring in CS. I haven't learned a damn thing and am in my third year. I've come to be really bored with studying CS. Initially, I was kind of misinformed and thought majoring in CS would make me a good "product creator". I make my money combining programming… >>> More

Developer IT