Search Results

Search found 1 results on 1 pages for 'ptocquin'.

Page 1/1 | 1 

  • lapply slower than for-loop when used for a BiomaRt query. Is that expected?

    - by ptocquin
    I would like to query a database using BiomaRt package. I have loci and want to retrieve some related information, let say description. I first try to use lapply but was surprise by the time needed for the task to be performed. I thus tried a more basic for-loop and get a faster result. Is that expected or is something wrong with my code or with my understanding of apply ? I read other posts dealing with *apply vs for-loop performance (Here, for example) and I was aware that improved performance should not be expected but I don't understand why performance here is actually lower. Here is a reproducible example. 1) Loading the library and selecting the database : library("biomaRt") athaliana <- useMart("plants_mart_14") athaliana <- useDataset("athaliana_eg_gene",mart=athaliana) 2) Querying the database : loci <- c("at1g01300", "at1g01800", "at1g01900", "at1g02335", "at1g02790", "at1g03220", "at1g03230", "at1g04040", "at1g04110", "at1g05240" ) I create a function for the use in lapply : foo <- function(loci) { getBM("description","tair_locus",loci,athaliana) } When I use this function on the first element : > system.time(foo(cwp_loci[1])) utilisateur système écoulé 0.020 0.004 1.599 When I use lapply to retrieve the data for all values : > system.time(lapply(loci, foo)) utilisateur système écoulé 0.220 0.000 16.376 I then created a new function, adding a for-loop : foo2 <- function(loci) { for (i in loci) { getBM("description","tair_locus",loci[i],athaliana) } } Here is the result : > system.time(foo2(loci)) utilisateur système écoulé 0.204 0.004 10.919 Of course, this will be applied to a big list of loci, so the best performing option is needed. I thank you for assistance. EDIT Following recommendation of @MartinMorgan Simply passing the vector loci to getBM greatly improves the query efficiency. Simpler is better. > system.time(lapply(loci, foo)) utilisateur système écoulé 0.236 0.024 110.512 > system.time(foo2(loci)) utilisateur système écoulé 0.208 0.040 116.099 > system.time(foo(loci)) utilisateur système écoulé 0.028 0.000 6.193

    Read the article

1