Search Results

Search found 27 results on 2 pages for 'lapply'.

Page 1/2 | 1 2  | Next Page >

  • getSymbols and using lapply, Cl, and merge to extract close prices

    - by algotr8der
    I've been messing around with this for some time. I recently started using the quantmod package to perform analytics on stock prices. I have a ticker vector that looks like the following: > tickers [1] "SPY" "DIA" "IWM" "SMH" "OIH" "XLY" "XLP" "XLE" "XLI" "XLB" "XLK" "XLU" "XLV" [14] "QQQ" > str(tickers) chr [1:14] "SPY" "DIA" "IWM" "SMH" "OIH" "XLY" "XLP" "XLE" ... I wrote a function called myX to use in a lapply call to save prices for every stock in the vector tickers. It has the following code: myX <- function(tickers, start, end) { require(quantmod) getSymbols(tickers, from=start, to=end) } I call lapply by itself library(quantmod) lapply(tickers,myX,start="2001-03-01", end="2011-03-11") > lapply(tickers,myX,start="2001-03-01", end="2011-03-11") [[1]] [1] "SPY" [[2]] [1] "DIA" [[3]] [1] "IWM" [[4]] [1] "SMH" [[5]] [1] "OIH" [[6]] [1] "XLY" [[7]] [1] "XLP" [[8]] [1] "XLE" [[9]] [1] "XLI" [[10]] [1] "XLB" [[11]] [1] "XLK" [[12]] [1] "XLU" [[13]] [1] "XLV" [[14]] [1] "QQQ" That works fine. Now I want to merge the Close prices for every stock into an object that looks like # BCSI.Close WBSN.Close NTAP.Close FFIV.Close SU.Close # 2011-01-03 30.50 20.36 57.41 134.33 38.82 # 2011-01-04 30.24 19.82 57.38 132.07 38.03 # 2011-01-05 31.36 19.90 57.87 137.29 38.40 # 2011-01-06 32.04 19.79 57.49 138.07 37.23 # 2011-01-07 31.95 19.77 57.20 138.35 37.30 # 2011-01-10 31.55 19.76 58.22 142.69 37.04 Someone recommended I try something like the following: ClosePrices <- do.call(merge, lapply(tickers, function(x) Cl(get(x)))) However I tried various combinations of this without any success. First I tried just calling lapply with Cl(x) >lapply(tickers,myX,start="2001-03-01", end="2011-03-11") Cl(myX))) > lapply(tickers,myX,start="2001-03-01", end="2011-03-11") Cl(x))) Error: unexpected symbol in "lapply(tickers,myX,start="2001-03-01", end="2011-03-11") Cl" > > lapply(tickers,myX(x),start="2001-03-01", end="2011-03-11") Cl(x))) Error: unexpected symbol in "lapply(tickers,myX(x),start="2001-03-01", end="2011-03-11") Cl" > > lapply(tickers,myX(start="2001-03-01", end="2011-03-11") Cl(x) Error: unexpected symbol in "lapply(tickers,myX(start="2001-03-01", end="2011-03-11") Cl" > lapply(tickers,myX(start="2001-03-01", end="2011-03-11") Cl(x)) Error: unexpected symbol in "lapply(tickers,myX(start="2001-03-01", end="2011-03-11") Cl" > Any guidance would be kindly appreciated.

    Read the article

  • R: How to tell lapply to ignore an error and process the next thing in the list?

    - by John
    I have an example function below that reads in a date as a string and returns it as a date object. If it reads a string that it cannot convert to a date, it returns an error. testFunction <- function (date_in) { return(as.Date(date_in)) } testFunction("2010-04-06") # this works fine testFunction("foo") # this returns an error Now, I want to use lapply and apply this function over a list of dates: dates1 = c("2010-04-06", "2010-04-07", "2010-04-08") lapply(dates1, testFunction) # this works fine But if I want to apply the function over a list when one string in the middle of two good dates returns an error, what is the best way to deal with this? dates2 = c("2010-04-06", "foo", "2010-04-08") lapply(dates2, testFunction) I presume that I want a try catch in there, but is there a way to catch the error for the "foo" string whilst asking lapply to continue and read the third date?

    Read the article

  • lapply slower than for-loop when used for a BiomaRt query. Is that expected?

    - by ptocquin
    I would like to query a database using BiomaRt package. I have loci and want to retrieve some related information, let say description. I first try to use lapply but was surprise by the time needed for the task to be performed. I thus tried a more basic for-loop and get a faster result. Is that expected or is something wrong with my code or with my understanding of apply ? I read other posts dealing with *apply vs for-loop performance (Here, for example) and I was aware that improved performance should not be expected but I don't understand why performance here is actually lower. Here is a reproducible example. 1) Loading the library and selecting the database : library("biomaRt") athaliana <- useMart("plants_mart_14") athaliana <- useDataset("athaliana_eg_gene",mart=athaliana) 2) Querying the database : loci <- c("at1g01300", "at1g01800", "at1g01900", "at1g02335", "at1g02790", "at1g03220", "at1g03230", "at1g04040", "at1g04110", "at1g05240" ) I create a function for the use in lapply : foo <- function(loci) { getBM("description","tair_locus",loci,athaliana) } When I use this function on the first element : > system.time(foo(cwp_loci[1])) utilisateur système écoulé 0.020 0.004 1.599 When I use lapply to retrieve the data for all values : > system.time(lapply(loci, foo)) utilisateur système écoulé 0.220 0.000 16.376 I then created a new function, adding a for-loop : foo2 <- function(loci) { for (i in loci) { getBM("description","tair_locus",loci[i],athaliana) } } Here is the result : > system.time(foo2(loci)) utilisateur système écoulé 0.204 0.004 10.919 Of course, this will be applied to a big list of loci, so the best performing option is needed. I thank you for assistance. EDIT Following recommendation of @MartinMorgan Simply passing the vector loci to getBM greatly improves the query efficiency. Simpler is better. > system.time(lapply(loci, foo)) utilisateur système écoulé 0.236 0.024 110.512 > system.time(foo2(loci)) utilisateur système écoulé 0.208 0.040 116.099 > system.time(foo(loci)) utilisateur système écoulé 0.028 0.000 6.193

    Read the article

  • can lapply not modify variables in a higher scope

    - by stevejb
    I often want to do essentially the following: mat <- matrix(0,nrow=10,ncol=1) lapply(1:10, function(i) { mat[i,] <- rnorm(1,mean=i)}) But, I would expect that mat would have 10 random numbers in it, but rather it has 0. (I am not worried about the rnorm part. Clearly there is a right way to do that. I am worry about affecting mat from within an anonymous function of lapply) Can I not affect matrix mat from inside lapply? Why not? Is there a scoping rule of R that is blocking this?

    Read the article

  • how to determine if a character vector is a valid numeric or integer vector

    - by Andrew Barr
    I am trying to turn a nested list structure into a dataframe. The list looks similar to the following (it is serialized data from parsed JSON read in using the httr package). myList <- list(object1 = list(w=1, x=list(y=0.1, z="cat")), object2 = list(w=2, x=list(y=0.2, z="dog"))) unlist(myList) does a great job of recursively flattening the list, and I can then use lapply to flatten all the objects nicely. flatList <- lapply(myList, FUN= function(object) {return(as.data.frame(rbind(unlist(object))))}) And finally, I can button it up using plyr::rbind.fill myDF <- do.call(plyr::rbind.fill, flatList) str(myDF) #'data.frame': 2 obs. of 3 variables: #$ w : Factor w/ 2 levels "1","2": 1 2 #$ x.y: Factor w/ 2 levels "0.1","0.2": 1 2 #$ x.z: Factor w/ 2 levels "cat","dog": 1 2 The problem is that w and x.y are now being interpreted as character vectors, which by default get parsed as factors in the dataframe. I believe that unlist() is the culprit, but I can't figure out another way to recursively flatten the list structure. A workaround would be to post-process the dataframe, and assign data types then. What is the best way to determine if a vector is a valid numeric or integer vector?

    Read the article

  • How to create a column containing a string of stars to inidcate levels of a factor in a data frame i

    - by PaulHurleyuk
    (second question today - must be a bad day) I have a dataframe with various columns, inculding a concentration column (numeric), a flag highlighting invalid results (boolean) and a description of the problem (character) dput(df) structure(list(x = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), rawconc = c(77.4, 52.6, 86.5, 44.5, 167, 16.2, 59.3, 123, 1.95, 181), reason = structure(c(NA, NA, 2L, NA, NA, NA, 2L, 1L, NA, NA), .Label = c("Fails Acceptance Criteria", "Poor Injection"), class = "factor"), flag = c("False", "False", "True", "False", "False", "False", "True", "True", "False", "False" )), .Names = c("x", "rawconc", "reason", "flag"), row.names = c(NA, -10L), class = "data.frame") I can create a column with the numeric level of the reason column df$level<-as.numeric(df$reason) df x rawconc reason flag level 1 1 77.40 <NA> False NA 2 2 52.60 <NA> False NA 3 3 86.50 Poor Injection True 2 4 4 44.50 <NA> False NA 5 5 167.00 <NA> False NA 6 6 16.20 <NA> False NA 7 7 59.30 Poor Injection True 2 8 8 123.00 Fails Acceptance Criteria True 1 9 9 1.95 <NA> False NA 10 10 181.00 <NA> False NA and here's what I want to do to create a column with 'level' many stars, but it fails df$stars<-paste(rep("*",df$level)sep="",collapse="") Error: unexpected symbol in "df$stars<-paste(rep("*",df$level)sep" df$stars<-paste(rep("*",df$level),sep="",collapse="") Error in rep("*", df$level) : invalid 'times' argument rep("*",df$level) Error in rep("*", df$level) : invalid 'times' argument df$stars<-paste(rep("*",pmax(df$level,0,na.rm=TRUE)),sep="",collapse="") Error in rep("*", pmax(df$level, 0, na.rm = TRUE)) : invalid 'times' argument It seems that rep needs to be fed one value at a time. I feel that this should be possible (and my gut says 'use lapply' but my apply fu is v. poor) ANy one want to try ?

    Read the article

  • Subset a data.frame by list and apply function on each part, by rows

    - by aL3xa
    This may seem as a typical plyr problem, but I have something different in mind. Here's the function that I want to optimize (skip the for loop). # dummy data set.seed(1985) lst <- list(a=1:10, b=11:15, c=16:20) m <- matrix(round(runif(200, 1, 7)), 10) m <- as.data.frame(m) dfsub <- function(dt, lst, fun) { # check whether dt is `data.frame` stopifnot (is.data.frame(dt)) # check if vectors in lst are "whole" / integer # vector elements should be column indexes is.wholenumber <- function(x, tol = .Machine$double.eps^0.5) abs(x - round(x)) < tol # fall if any non-integers in list idx <- rapply(lst, is.wholenumber) stopifnot(idx) # check for list length stopifnot(ncol(dt) == length(idx)) # subset the data subs <- list() for (i in 1:length(lst)) { # apply function on each part, by row subs[[i]] <- apply(dt[ , lst[[i]]], 1, fun) } # preserve names names(subs) <- names(lst) # convert to data.frame subs <- as.data.frame(subs) # guess what =) return(subs) } And now a short demonstration... actually, I'm about to explain what I primarily intended to do. I wanted to subset a data.frame by vectors gathered in list object. Since this is a part of code from a function that accompanies data manipulation in psychological research, you can consider m as a results from personality questionnaire (10 subjects, 20 vars). Vectors in list hold column indexes that define questionnaire subscales (e.g. personality traits). Each subscale is defined by several items (columns in data.frame). If we presuppose that the score on each subscale is nothing more than sum (or some other function) of row values (results on that part of questionnaire for each subject), you could run: > dfsub(m, lst, sum) a b c 1 46 20 24 2 41 24 21 3 41 13 12 4 37 14 18 5 57 18 25 6 27 18 18 7 28 17 20 8 31 18 23 9 38 14 15 10 41 14 22 I took a glance at this function and I must admit that this little loop isn't spoiling the code at all... BUT, if there's an easier/efficient way of doing this, please, let me know!

    Read the article

  • Coding the R-ight way - avoiding the for loop

    - by mropa
    I am going through one of my .R files and by cleaning it up a little bit I am trying to get more familiar with writing the code the r-ight way. As a beginner, one of my favorite starting points is to get rid of the for() loops and try to transform the expression into a functional programming form. So here is the scenario: I am assembling a bunch of data.frames into a list for later usage. dataList <- list (dataA, dataB, dataC, dataD, dataE ) Now I like to take a look at each data.frame's column names and substitute certain character strings. Eg I like to substitute each "foo" and "bar" with "baz". At the moment I am getting the job done with a for() loop which looks a bit awkward. colnames(dataList[[1]]) [1] "foo" "code" "lp15" "bar" "lh15" colnames(dataList[[2]]) [1] "a" "code" "lp50" "ls50" "foo" matchVec <- c("foo", "bar") for (i in seq(dataList)) { for (j in seq(matchVec)) { colnames (dataList[[i]])[grep(pattern=matchVec[j], x=colnames (dataList[[i]]))] <- c("baz") } } Since I am working here with a list I thought about the lapply function. My attempts handling the job with the lapply function all seem to look alright but only at first sight. If I write f <- function(i, xList) { gsub(pattern=c("foo"), replacement=c("baz"), x=colnames(xList[[i]])) } lapply(seq(dataList), f, xList=dataList) the last line prints out almost what I am looking for. However, if i take another look at the actual names of the data.frames in dataList: lapply (dataList, colnames) I see that no changes have been made to the initial character strings. So how can I rewrite the for() loop and transform it into a functional programming form? And how do I substitute both strings, "foo" and "bar", in an efficient way? Since the gsub() function takes as its pattern argument only a character vector of length one.

    Read the article

  • What is the optimal way to run a set of regressions in R.

    - by stevejb
    Assume that I have sources of data X and Y that are indexable, say matrices. And I want to run a set of independent regressions and store the result. My initial approach would be results = matrix(nrow=nrow(X), ncol=(2)) for(i in 1:ncol(X)) { matrix[i,] = coefficients(lm(Y[i,] ~ X[i,]) } But, loops are bad, so I could do it with lapply as out <- lapply(1:nrow(X), function(i) { coefficients(lm(Y[i,] ~ X[i,])) } ) Is there a better way to do this?

    Read the article

  • What is the best way to run a loop of regressions in R?

    - by stevejb
    Assume that I have sources of data X and Y that are indexable, say matrices. And I want to run a set of independent regressions and store the result. My initial approach would be results = matrix(nrow=nrow(X), ncol=(2)) for(i in 1:ncol(X)) { matrix[i,] = coefficients(lm(Y[i,] ~ X[i,]) } But, loops are bad, so I could do it with lapply as out <- lapply(1:nrow(X), function(i) { coefficients(lm(Y[i,] ~ X[i,])) } ) Is there a better way to do this?

    Read the article

  • R problem with apply + rbind

    - by Carl
    I cannot seem to get the following to work directory <- "./" files.15x16 <- c("15x16-70d.out", "15x16-71d.out") data.15x16<-rbind( lapply( as.array(paste(directory, files.15x16, sep="")), FUN=read.csv, sep=" ", header=F) ) What it should be doing is pretty straightforward - I have a directory name, some file names, and actual files of data. I paste the directory and file names together, read the data from the files in, and then rbind them all together into a single chunk of data. Except the result of the lapply has the data in [[]] - i.e., accessing it occurs via a[[1]], a[[2]], etc which rbind doesn't seem to accept. Suggestions?

    Read the article

  • function not working R

    - by user3722828
    I've never programmed before and am trying to learn. I'm following that "coursera" course that I've seen other people post about — a course offered by Johns Hopkins on R programming. Anyway, this was supposed to be my first function. Yet, it doesn't work! But when I type out all the steps individually, it runs just fine... Can anyone tell me why? > pollutantmean <- function(directory, pollutant, id = 1:332){ + x<- list.files("/Users/mike******/Desktop/directory", full.names=TRUE) + y<- lapply(x, read.csv) + z<- do.call(rbind.data.frame, y[id]) + + mean(z$pollutant, na.rm=TRUE) + } > pollutantmean(specdata,nitrate,1:10) [1] NA Warning message: In mean.default(z$pollutant, na.rm = TRUE) : argument is not numeric or logical: returning NA #### > x<- list.files("/Users/mike******/Desktop/specdata",full.names=TRUE) > y<- lapply(x,read.csv) > z<- do.call(rbind.data.frame,y[1:10]) > mean(z$nitrate,na.rm=TRUE) [1] 0.7976266

    Read the article

  • Calculating a consecutive streak in data

    - by Jura25
    I’m trying to calculate the maximum winning and losing streak in a dataset (i.e. the highest number of consecutive positive or negative values). I’ve found a somewhat related question here on StackOverflow and even though that gave me some good suggestions, the angle of that question is different, and I’m not (yet) experienced enough to translate and apply that information to this problem. So I was hoping you could help me out, even an suggestion would be great. My data set look like this: > subRes Instrument TradeResult.Currency. 1 JPM -3 2 JPM 264 3 JPM 284 4 JPM 69 5 JPM 283 6 JPM -219 7 JPM -91 8 JPM 165 9 JPM -35 10 JPM -294 11 KFT -8 12 KFT -48 13 KFT 125 14 KFT -150 15 KFT -206 16 KFT 107 17 KFT 107 18 KFT 56 19 KFT -26 20 KFT 189 > split(subRes[,2],subRes[,1]) $JPM [1] -3 264 284 69 283 -219 -91 165 -35 -294 $KFT [1] -8 -48 125 -150 -206 107 107 56 -26 189 In this case, the maximum (winning) streak for JPM is four (namely the 264, 284, 69 and 283 consecutive positive results) and for KFT this value is 3 (107, 107, 56). My goal is to create a function which gives the maximum winning streaks per instrument (i.e. JPM: 4, KFT: 3). To achieve that: R needs to compare the current result with the previous result, and if it is higher then there is a streak of at least 2 consecutive positive results. Then R needs to look at the next value, and if this is also higher: add 1 to the already found value of 2. If this value isn’t higher, R needs to move on to the next value, while remembering 2 as the intermediate maximum. I’ve tried cumsum and cummax in accordance with conditional summing (like cumsum(c(TRUE, diff(subRes[,2]) > 0))), which didn’t work out. Also rle in accordance with lapply (like lapply(rle(subRes$TradeResult.Currency.), function(x) diff(x) > 0)) didn’t work. How can I make this work?

    Read the article

  • How to keep columns labels when numeric convert to character

    - by stata
    a<- data.frame(sex=c(1,1,2,2,1,1),bq=factor(c(1,2,1,2,2,2))) library(Hmisc) label(a$sex)<-"gender" label(a$bq)<-"xxx" str(a) b<-data.frame(lapply(a, as.character), stringsAsFactors=FALSE) str(b) When I covert dataframe a columns to character,the columns labels disappeared.My dataframe have many columns.Here as an example only two columns. How to keep columns labels when numeric convert to character? Thank you!

    Read the article

  • Why can't I rename a data frame column inside a list?

    - by Moreno Garcia
    I would like to rename some columns from CPU_Usage to the process name before I merge the dataframes in order to make it more legible. names(byProcess[[1]]) # [1] "Time" "CPU_Usage" names(byProcess[1]) # [1] "CcmExec_3344" names(byProcess[[1]][2]) <- names(byProcess[1]) names(byProcess[[1]][2]) # [1] "CPU_Usage" names(byProcess[[1]][2]) <- 'test' names(byProcess[[1]][2]) # [1] "CPU_Usage" lapply(byProcess, names) # $CcmExec_3344 # [1] "Time" "CPU_Usage" # # ... (removed several entries to make it more readable) # # $wrapper_1604 # [1] "Time" "CPU_Usage"

    Read the article

  • Join and sum not compatible matrices through data.table

    - by leodido
    My goal is to "sum" two not compatible matrices (matrices with different dimensions) using (and preserving) row and column names. I've figured this approach: convert the matrices to data.table objects, join them and then sum columns vectors. An example: > M1 1 3 4 5 7 8 1 0 0 1 0 0 0 3 0 0 0 0 0 0 4 1 0 0 0 0 0 5 0 0 0 0 0 0 7 0 0 0 0 1 0 8 0 0 0 0 0 0 > M2 1 3 4 5 8 1 0 0 1 0 0 3 0 0 0 0 0 4 1 0 0 0 0 5 0 0 0 0 0 8 0 0 0 0 0 > M1 %ms% M2 1 3 4 5 7 8 1 0 0 2 0 0 0 3 0 0 0 0 0 0 4 2 0 0 0 0 0 5 0 0 0 0 0 0 7 0 0 0 0 1 0 8 0 0 0 0 0 0 This is my code: M1 <- matrix(c(0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0), byrow = TRUE, ncol = 6) colnames(M1) <- c(1,3,4,5,7,8) M2 <- matrix(c(0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0), byrow = TRUE, ncol = 5) colnames(M2) <- c(1,3,4,5,8) # to data.table objects DT1 <- data.table(M1, keep.rownames = TRUE, key = "rn") DT2 <- data.table(M2, keep.rownames = TRUE, key = "rn") # join and sum of common columns if (nrow(DT1) > nrow(DT2)) { A <- DT2[DT1, roll = TRUE] A[, list(X1 = X1 + X1.1, X3 = X3 + X3.1, X4 = X4 + X4.1, X5 = X5 + X5.1, X7, X8 = X8 + X8.1), by = rn] } That outputs: rn X1 X3 X4 X5 X7 X8 1: 1 0 0 2 0 0 0 2: 3 0 0 0 0 0 0 3: 4 2 0 0 0 0 0 4: 5 0 0 0 0 0 0 5: 7 0 0 0 0 1 0 6: 8 0 0 0 0 0 0 Then I can convert back this data.table to a matrix and fix row and column names. The questions are: how to generalize this procedure? I need a way to automatically create list(X1 = X1 + X1.1, X3 = X3 + X3.1, X4 = X4 + X4.1, X5 = X5 + X5.1, X7, X8 = X8 + X8.1) because i want to apply this function to matrices which dimensions (and row/columns names) are not known in advance. In summary I need a merge procedure that behaves as described. there are other strategies/implementations that achieve the same goal that are, at the same time, faster and generalized? (hoping that some data.table monster help me) to what kind of join (inner, outer, etc. etc.) is assimilable this procedure? Thanks in advance. p.s.: I'm using data.table version 1.8.2 EDIT - SOLUTIONS @Aaron solution. No external libraries, only base R. It works also on list of matrices. add_matrices_1 <- function(...) { a <- list(...) cols <- sort(unique(unlist(lapply(a, colnames)))) rows <- sort(unique(unlist(lapply(a, rownames)))) out <- array(0, dim = c(length(rows), length(cols)), dimnames = list(rows,cols)) for (m in a) out[rownames(m), colnames(m)] <- out[rownames(m), colnames(m)] + m out } @MadScone solution. Used reshape2 package. It works only on two matrices per call. add_matrices_2 <- function(m1, m2) { m <- acast(rbind(melt(M1), melt(M2)), Var1~Var2, fun.aggregate = sum) mn <- unique(colnames(m1), colnames(m2)) rownames(m) <- mn colnames(m) <- mn m } BENCHMARK (100 runs with microbenchmark package) Unit: microseconds expr min lq median uq max 1 add_matrices_1 196.009 257.5865 282.027 291.2735 549.397 2 add_matrices_2 13737.851 14697.9790 14864.778 16285.7650 25567.448 No need to comment the benchmark: @Aaron solution wins. I'll continue to investigate a similar solution for data.table objects. I'll add other solutions eventually reported or discovered.

    Read the article

  • converting a matrix to a list

    - by andrewj
    Suppose I have a matrix foo as follows: foo <- cbind(c(1,2,3), c(15,16,17)) > foo [,1] [,2] [1,] 1 15 [2,] 2 16 [3,] 3 17 I'd like to turn it into a list that looks like [[1]] [1] 1 15 [[2]] [1] 2 16 [[3]] [1] 3 17 You can do it as follows: lapply(apply(foo, 1, function(x) list(c(x[1], x[2]))), function(y) unlist(y)) I'm interested in an alternative method that isn't as complicated. Note, if you just do apply(foo, 1, function(x) list(c(x[1], x[2]))), it returns a list within a list, which I'm hoping to avoid.

    Read the article

  • Error in R for missing object

    - by griffin
    I have several nested functions, some of which as called in lapply clauses. In the process, sometimes I don't set a default for a parameter and instead check if it was supplied using the missing function. I'm getting a strange error right now when I'm not passing a value for a parameter, it has no default, yet missing resolves to FALSE. And when I try to use or check the parameter in any other way (using length, exists, etc.), I get an error: Error in try(length(x)) : argument "data" is missing, with no default Has anyone experienced this failure of the missing function before?

    Read the article

  • Avoid the use of loops (for) with R

    - by albergali
    Hi, I'm working with R and I have a code like this: i<-1 j<-1 for (i in 1:10) for (j in 1:100) if (data[i] == paths[j,1]) cluster[i,4] <- paths[j,2] where : data is a vector with 100 rows and 1 column paths is a matrix with 100 rows and 5 columns cluster is a matrix with 100 rows and 5 columns My question is: how could I avoid the use of "for" loops to iterate through the matrix? I don't know whether apply functions (lapply, tapply...) are useful in this case. This is a problem when j=10000 for example, because execution time is very long. Thank you

    Read the article

  • Grab triangles within a lower triangle

    - by Tyler Rinker
    I have the need to grab all the thee element triangles that make up the lower triangle of a symmetric matrix. I can not think of how to grab all these pieces in the order of far left column working down and then next column to the right and so on. I know that the numbe rof mini triangles inside of the lower triangle is: n = x(x - 1)/2 where: x = nrow(mats[[i]]) Here I've created three matrices with letters (it's easier for me to conceptualize this way) and the elements in the order I'm looking for: FUN <- function(n) { matrix(LETTERS[1:(n*n)], n) } mats <- lapply(3:5, FUN) So this is the output I'd like to get (I put it in code rather than output format) for each of the matrices created above: list(c("B", "C", "F")) list(c("B", "C", "G"), c("C", "D", "H"), c("G", "H", "L")) list(c("B", "C", "H"), c("C", "D", "I"), c("D", "E", "J"), c("H", "I", "N"), c("I", "J", "O"), c("N", "O", "T")) How can I do this task in the fastest manner possible while staying in base R? Not sure if this visual of what I'm after is helpful but it may be:

    Read the article

  • R webscraping: interrogating for date and importance

    - by adam.888
    I am able to webscrape a table from a webpage containing news library(XML) webpage <- "http://www.tradingeconomics.com/calendar" tables <- readHTMLTable(webpage ) n.rows <- unlist(lapply(tables, function(t) dim(t)[1])) dfcal <- as.data.frame(tables$calendar) However I do not know how to interrogate for date or for importance. For example how could I webscrape news from Jan 2014? I am able to do this on the webpage by altering button settings, but how can I do it from within R? I was also not able to collect the importance column data. Also are there better ways for collecting economic news from within R? I have looked on http://www.rseek.org/ but could not find anything. Thank you for your help.

    Read the article

  • Best way to reduce consecutive NAs to single NA

    - by digEmAll
    I need to reduce the consecutive NA's in a vector to a single NA, without touching the other values. So, for example, given a vector like this: NA NA 8 7 NA NA NA NA NA 3 3 NA -1 4 what I need to get, is the following result: NA 8 7 NA 3 3 NA -1 4 Currently, I'm using the following function: reduceConsecutiveNA2One <- function(vect){ enc <- rle(is.na(vect)) # helper func tmpFun <- function(i){ if(enc$values[i]){ data.frame(L=c(enc$lengths[i]-1, 1), V=c(TRUE,FALSE)) }else{ data.frame(L=enc$lengths[i], V=enc$values[i]) } } Df <- do.call(rbind.data.frame,lapply(1:length(enc$lengths),FUN=tmpFun)) return(vect[rep.int(!Df$V,Df$L)]) } and it seems to work fine, but probably there's a simpler/faster way to accomplish this task. Any suggestions ? Thanks in advance.

    Read the article

  • Storing an arbitrary R object onto HDD?

    - by Harokitty
    I understand that we can export data matrices to csv or xlsx files. What about complex objects like lm? For example, in my work I might have a list of length 1000, each with a single lm() object. Each time I load R I have to wait a long time to populate the 1000 length list with these lm objects with a for loop or a lapply. I would rather just save the list somewhere on my HDD at the end of a session and open it at the start of the next session.

    Read the article

  • Substitute values (for specific dates) from a second data frame to the first data frame

    - by user1665355
    I have two time series data frames: The first one: head(df1) : GMT MSCI ACWI DJGlbl Russell 1000 Russell Dev S&P GSCI Industrial S&P GSCI Precious 1999-03-01 -0.7000000 0.2000000 -0.1000000 -1.5000000 -1.0000000 -0.4000000 1999-03-02 -0.5035247 0.0998004 -0.7007007 -0.2030457 0.4040404 -0.3012048 1999-03-03 -0.2024291 0.2991027 0.0000000 -0.6103764 0.1006036 -0.1007049 1999-03-04 0.7099391 0.2982107 1.5120968 -0.1023541 0.5025126 0.4032258 1999-03-05 2.4169184 0.8919722 2.1847071 2.7663934 -1.2000000 0.0000000 1999-03-08 0.3933137 0.3929273 0.5830904 -0.0997009 -0.2024291 1.1044177 tail(df1) : GMT MSCI ACWI DJGlbl Russell 1000 Russell Dev S&P GSCI Industrial S&P GSCI Precious 2011-12-23 0.68241470 0.84790673 0.9441385 0.6116208 0.5822862 -0.2345300 2011-12-26 -0.05213764 0.00000000 0.0000000 0.0000000 0.0000000 0.0000000 2011-12-27 0.20865936 0.05254861 0.3117693 0.2431611 0.0000000 -0.7233273 2011-12-28 -0.62467465 -1.20798319 -1.1655012 -0.9702850 -2.0414381 -2.4043716 2011-12-29 0.52383447 0.47846890 0.8647799 0.5511329 -0.0933126 -1.2504666 2011-12-30 0.26055237 1.03174603 -0.4676539 1.2180268 1.9613948 1.7388017 The second one: head(df2) : GMT MSCI.ACWI DJGlbl Russell.1000 Russell.Dev S.P.GSCI.Industrial S.P.GSCI.Precious 1999-06-01 0.00000000 0.24438520 0.0000000 0 -0.88465521 0.008522842 1999-07-01 0.12630441 0.06755621 0.0000000 0 0.29394697 0.000000000 1999-08-02 0.07441812 0.18922829 0.0000000 0 0.02697299 -0.107155063 1999-09-01 -0.36952701 0.08684107 0.1117509 0 0.24520976 0.000000000 1999-10-01 0.00000000 0.00000000 0.0000000 0 0.00000000 1.941266205 1999-11-01 0.41879925 0.00000000 0.0000000 0 0.00000000 -0.197897901 tail(df2) : GMT MSCI.ACWI DJGlbl Russell.1000 Russell.Dev S.P.GSCI.Industrial S.P.GSCI.Precious 2011-07-01 0.00000000 0.0000000 0.0000000 0.0000000 0.00000000 -0.1141162 2011-08-01 0.00000000 0.0000000 0.0000000 0.0000000 0.02627347 0.0000000 2011-09-01 -0.02470873 0.2977585 -0.0911891 0.6367605 0.00000000 0.2830977 2011-10-03 0.42495188 0.0000000 0.4200743 -0.4420027 -0.41012646 0.0000000 2011-11-01 0.00000000 0.0000000 0.0000000 -0.6597739 0.00000000 0.0000000 2011-12-01 0.50273034 0.0000000 0.0000000 0.6476393 0.00000000 0.0000000 The first df cointains daily observations. The second df contains only the "first day of each month" forecasted values. I would like to substitute the values from the second df into the first one. In other words, the "first day of each month" values in the first df will be substituted for the "first day of each month" values from the second df. I tried to write an lapply loop that substitutes the values and was only trying to use match function. But I failed. I could not find the similar question at StackOverflow either... Greatful for any suggestions!

    Read the article

1 2  | Next Page >