sapply - Developer IT

R: Using sapply on vector of POSIXct

- by Chris

I have what may be a very simple question. I want to process a column of POSIXct objects from a dataframe and generate a vector of datetime strings. I tried to use the following sapply call dt <- sapply(df$datetime, function(x) format(x,"%Y-%m-%dT%H:%M:%S")) but to no avail. I keep getting the following error Error in prettyNum(.Internal(format(x, trim, digits, nsmall, width, 3L, : invalid 'trim' argument When I apply this function to a single POSIXct object from the column, I have no problem. So I'm stumped at the moment about what the problem is. Do I need to do something special with POSIXct objects?

Read the article

sapply and concurrency in R

- by JSmaga

Good afternoon, Somebody asked me a question today and neither did I know the answer nor could I find it in the documentation. This person simply asked me if the sapply function in R was making concurrent calls to the function you want to apply to the list, or if the computation is done sequantially. Does anybody know the answer? What about rapply (the recursive version of this function)? Thanks, Jeremie

Read the article

why is `vapply` safer than `sapply`?

- by flodel

The documentation says vapply is similar to sapply, but has a pre-specified type of return value, so it can be safer [...] to use. Could you please elaborate as to why it is generally safer, maybe provide examples? P.S.: I know the answer and I already tend to avoid sapply. I just wish there was a nice answer here on SO so I can point my coworkers to it. Please, no "read the manual" answer.

Read the article

R strsplit and vectorization

- by James

When creating functions that use strsplit, vector inputs do not behave as desired, and sapply needs to be used. This is due to the list output that strsplit produces. Is there a way to vectorize the process - that is, the function produces the correct element in the list for each of the elements of the input? For example, to count the lengths of words in a character vector: words <- c("a","quick","brown","fox") > length(strsplit(words,"")) [1] 4 # The number of words (length of the list) > length(strsplit(words,"")[[1]]) [1] 1 # The length of the first word only > sapply(words,function (x) length(strsplit(x,"")[[1]])) a quick brown fox 1 5 5 3 # Success, but potentially very slow Ideally, something like length(strsplit(words,"")[[.]]) where . is interpreted as the being the relevant part of the input vector.

Read the article

select rows with largest value of variable within a group in r

- by Misha

a.2<-sample(1:10,100,replace=T) b.2<-sample(1:100,100,replace=T) a.3<-data.frame(a.2,b.2) r<-sapply(split(a.3,a.2),function(x) which.max(x$b.2)) a.3[r,] returns the list index, not the index for the entire data.frame Im trying to return the largest value of b.2 for each subgroup of a.2. How can I do this efficiently?

Read the article

Repeat elements of vector in R

- by bshor

Hi, I'm trying to repeat the elements of vector a, b number of times. That is, a="abc" should be "aabbcc" if y = 2. Why doesn't either of the following code examples work? sapply(a, function (x) rep(x,b)) and from the plyr package, aaply(a, function (x) rep(x,b)) I know I'm missing something very obvious ...

Read the article

for (i in xxx) ggplot problem

- by Andreas

This is strange - I think? library(ggplot2) tf <- which(sapply(diamonds, is.factor)) diamonds.tf <- diamonds[,tf] So far so good. But next comes the trouble: pl.f <- ggplot(diamonds.tf, aes(x=diamonds.tf[,i]))+ geom_bar()+ xlab(names(diamonds.tf[i])) for (i in 1:ncol(diamonds.tf)) { ggsave(paste("plot.f",i,".png",sep=""), plot=pl.f, height=3.5, width=5.5) } This saves the plots in my working directory - but with the wrong x-label. I think this is strange since calling ggplot directly produces the right plot: i <- 2 ggplot(diamonds, aes(x=diamonds[,i]))+geom_bar()+xlab(names(diamonds)[i]) I don't really know how to describe this as a fitting title - suggestions as to a more descriptive question-title is most welcome. Thanks in advance

Read the article

R: How to get a stack trace from the snow package

- by Keith

How can I get a stack trace back from a snow node after an error occurs? I am using the snow package (version 0.3-3) on R 2.10.1 and I'm getting errors when I use parSapply that do not occur when I use sapply. Snow is nice enough to give me the error message but it would be much more useful for me to have the kind of stack trace you can get from traceback(). So far I have tried: options(showWarnCalls = T, showErrorCalls = T) setDefaultClusterOptions(outfile = "/dev/tty") and options(error=traceback) setDefaultClusterOptions(outfile = "/dev/tty") without luck. I'm currently just testing with a local cluster ie: makeSOCKcluster(c("localhost","localhost")) but I will eventually be using an MPI cluster. Thanks.

Read the article

R software : How to extract values from rasterstack with xy coordinates?

- by Eddie

I have a rasterstack(5 raster layers) that actually is a time series raster. r <- raster(nrow=20, ncol=200) s <- stack( sapply(1:5, function(i) setValues(r, rnorm(ncell(r), i, 3) )) ) > s class : RasterStack dimensions : 20, 200, 4000, 5 (nrow, ncol, ncell, nlayers) resolution : 1.8, 9 (x, y) extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax) coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 names : layer.1, layer.2, layer.3, layer.4, layer.5 min values : -9.012146, -9.165947, -9.707269, -7.829763, -5.332007 max values : 11.32811, 11.97328, 15.99459, 15.66769, 16.72236 My objective is to plot each pixel and explore their behavior over time. How could I extract each pixels together with their x,y coordinates and plot a time series curve?

Read the article

Trouble with applying a nested loop on a list

- by user1665355

I have a list consisting of 3 elements: datalist=list(a=datanew1,b=datanew2,c=datanew3) datalist$a : Inv_ret Firm size leverage Risk Liquidity Equity 17 0.04555968 17.34834 0.1323199 0.011292273 0.02471489 0 48 0.01405835 15.86315 0.6931730 0.002491093 0.12054914 0 109 0.04556252 16.91602 0.1714068 0.006235836 0.01194579 0 159 0.04753472 14.77039 0.3885720 0.007126830 0.06373028 0 301 0.03941040 16.94377 0.1805346 0.005450653 0.01723319 0 datalist$b : Inv_ret Firm size leverage Risk Liquidity Equity 31 0.04020832 18.13300 0.09326265 0.015235240 0.01579559 0.005025379 62 0.04439078 17.84086 0.11016402 0.005486982 0.01266566 0.006559096 123 0.04543250 18.00517 0.12215307 0.011154742 0.01531451 0.002282790 173 0.03960613 16.45457 0.10828643 0.011506857 0.02385191 0.009003780 180 0.03139643 17.57671 0.40063094 0.003447233 0.04530395 0.000000000 datalist$c : Inv_ret Firm size leverage Risk Liquidity Equity 92 0.03081029 19.25359 0.10513159 0.01635201 0.025760806 0.000119744 153 0.03280746 19.90229 0.11731517 0.01443786 0.006769735 0.011999005 210 0.04655847 20.12543 0.11622403 0.01418010 0.003125632 0.003802365 250 0.03301018 20.67197 0.13208234 0.01262499 0.009418828 0.021400052 282 0.04355975 20.03012 0.08588316 0.01918129 0.004213846 0.023657440 I am trying to create a cor.test on the datalist above : Cor.tests=sapply(datalist,function(x){ for(h in 1:length(names(x))){ for(i in 1:length(names(x$h[i]))){ for(j in 1:length(names(x$h[j]))){ cor.test(x$h[,i],x$h[,j])$p.value }}}}) But I get an error : Error in cor.test.default(x$h[, i], x$h[, j]) : 'x' must be a numeric vector Any suggestions about what I am doing wrong? P.S. If I simply have one dataframe, datanew1 : Inv_ret Firm size leverage Risk Liquidity Equity 17 0.04555968 17.34834 0.1323199 0.011292273 0.02471489 0 48 0.01405835 15.86315 0.6931730 0.002491093 0.12054914 0 109 0.04556252 16.91602 0.1714068 0.006235836 0.01194579 0 159 0.04753472 14.77039 0.3885720 0.007126830 0.06373028 0 301 0.03941040 16.94377 0.1805346 0.005450653 0.01723319 0 I use this loop : results=matrix(NA,nrow=6,ncol=6) for(i in 1:length(names(datanew1))){ for(j in 1:length(names(datanew1))){ results[i,j]<-cor.test(datanew1[,i],datanew1[,j])$p.value }} And the output is: results : [,1] [,2] [,3] [,4] [,5] [,6] [1,] 0.000000e+00 7.085663e-09 3.128975e-10 3.018239e-02 4.806400e-10 0.475139526 [2,] 7.085663e-09 0.000000e+00 2.141581e-21 0.000000e+00 2.247825e-20 0.454032499 [3,] 3.128975e-10 2.141581e-21 0.000000e+00 2.485924e-25 2.220446e-16 0.108643838 [4,] 3.018239e-02 0.000000e+00 2.485924e-25 0.000000e+00 5.870007e-15 0.006783324 [5,] 4.806400e-10 2.247825e-20 2.220446e-16 5.870007e-15 0.000000e+00 0.558827862 [6,] 4.751395e-01 4.540325e-01 1.086438e-01 6.783324e-03 5.588279e-01 0.000000000 Which is exactly what I want. But I want to get 3 matrices, one for each element of the datalist above.

Read the article

How to patch an S4 method in an R package?

- by Richie Cotton

If you find a bug in a package, it's usually possible to patch the problem with fixInNamespace, e.g. fixInNamespace("mean.default", "base"). For S4 methods, I'm not sure how to do it though. The method I'm looking at is in the gWidgetstcltk package. You can see the source code with getMethod(".svalue", c("gTabletcltk", "guiWidgetsToolkittcltk")) I can't find the methods with fixInNamespace. fixInNamespace(".svalue", "gWidgetstcltk") Error in get(subx, envir = ns, inherits = FALSE) : object '.svalue' not found I thought setMethod might do the trick, but setMethod(".svalue", c("gTabletcltk", "guiWidgetsToolkittcltk"), definition = function (obj, toolkit, index = NULL, drop = NULL, ...) { widget = getWidget(obj) sel <- unlist(strsplit(tclvalue(tcl(widget, "selection")), " ")) if (length(sel) == 0) { return(NA) } theChildren <- .allChildren(widget) indices <- sapply(sel, function(i) match(i, theChildren)) inds <- which(visible(obj))[indices] if (!is.null(index) && index == TRUE) { return(inds) } if (missing(drop) || is.null(drop)) drop = TRUE chosencol <- tag(obj, "chosencol") if (drop) return(obj[inds, chosencol, drop = drop]) else return(obj[inds, ]) }, where = "package:gWidgetstcltk" ) Error in setMethod(".svalue", c("gTabletcltk", "guiWidgetsToolkittcltk"), : the environment "gWidgetstcltk" is locked; cannot assign methods for function ".svalue" Any ideas?

Read the article

Optimizing a "set in a string list" to a "set as a matrix" operation

- by Eric Fournier

I have a set of strings which contain space-separated elements. I want to build a matrix which will tell me which elements were part of which strings. For example: "" "A B C" "D" "B D" Should give something like: A B C D 1 2 1 1 1 3 1 4 1 1 Now I've got a solution, but it runs slow as molasse, and I've run out of ideas on how to make it faster: reverseIn <- function(vector, value) { return(value %in% vector) } buildCategoryMatrix <- function(valueVector) { allClasses <- c() for(classVec in unique(valueVector)) { allClasses <- unique(c(allClasses, strsplit(classVec, " ", fixed=TRUE)[[1]])) } resMatrix <- matrix(ncol=0, nrow=length(valueVector)) splitValues <- strsplit(valueVector, " ", fixed=TRUE) for(cat in allClasses) { if(cat=="") { catIsPart <- (valueVector == "") } else { catIsPart <- sapply(splitValues, reverseIn, cat) } resMatrix <- cbind(resMatrix, catIsPart) } colnames(resMatrix) <- allClasses return(resMatrix) } Profiling the function gives me this: $by.self self.time self.pct total.time total.pct "match" 31.20 34.74 31.24 34.79 "FUN" 30.26 33.70 74.30 82.74 "lapply" 13.56 15.10 87.86 97.84 "%in%" 12.92 14.39 44.10 49.11 So my actual questions would be: - Where are the 33% spent in "FUN" coming from? - Would there be any way to speed up the %in% call? I tried turning the strings into factors prior to going into the loop so that I'd be matching numbers instead of strings, but that actually makes R crash. I've also tried going for partial matrix assignment (IE, resMatrix[i,x] <- 1) where i is the number of the string and x is the vector of factors. No dice there either, as it seems to keep on running infinitely.

Read the article

Using ddply() to Get Frequency of Certain IDs, by Appearance in Multiple Rows (in R)

- by EconomiCurtis

Goal If the following description is hard follow, please see the example "before" and "after" to see a straightforward example. I have bartering data, with unique trade ids, and two sides of the trade. Side1 and Side2 are baskets, lists of item ids that represent both sides of the barter transaction. I'd like to count the frequency each ITEM appears in TRADES. E.g, if item "001" appeared in 3 trades, I'd have a count of 3 (ignoring how many times the item appeared in each trade). Further, I'd like to do this with the plyr ddply function. (If you're interested as to my motivation, I working over many hundreds of thousands of transactions and am already using a ddply to calculate several other summary statistics. I'd like to add this to the ddply I'm already using, rather than calculate it after, and merge it into the ddply output.... sorry if that was difficult to follow.) In terms of pseudo code I'm working off of: merge each row of Side1 and Side2 by row, get unique() appearances of each item id apply table() function transpose and relabel output from table Example of the structure of my data, and the output I desire. Data Example (before): df <- data.frame(TradeID = c("01","02","03","04")) df$Side1 = list(c("001","001","002"), c("002","002","003"), c("001","004"), c("001","002","003","004")) df$Side2 = list(c("001"),c("007"),c("009"),c()) Desired Output (after): df.ItemRelFreq_byTradeID <- data.frame(ItemID = c("001","002","003","004","007","009"), RelFreq_byTrade = c(3,3,2,2,1,1)) One method to do this without ddply I've worked out one way to do this below. My problem is that I can't quite seem to get ddply to do this for me. temp <- table(unlist(sapply(mapply(c,df$Side1,df$Side2), unique))) df.ItemRelFreq_byTradeID <- data.frame(ItemID = names(temp), RelFreq_byTrade = temp[]) Thanks for any help you can offer! Curtis

Read the article

Calculating Growth-Rates by applying log-differences

- by mropa

I am trying to transform my data.frame by calculating the log-differences of each column and controlling for the rows id. So basically I like to calculate the growth rates for each id's variable. So here is a random df with an id column, a time period colum p and three variable columns: df <- data.frame (id = c("a","a","a","c","c","d","d","d","d","d"), p = c(1,2,3,1,2,1,2,3,4,5), var1 = rnorm(10, 5), var2 = rnorm(10, 5), var3 = rnorm(10, 5) ) df id p var1 var2 var3 1 a 1 5.375797 4.110324 5.773473 2 a 2 4.574700 6.541862 6.116153 3 a 3 3.029428 4.931924 5.631847 4 c 1 5.375855 4.181034 5.756510 5 c 2 5.067131 6.053009 6.746442 6 d 1 3.846438 4.515268 6.920389 7 d 2 4.910792 5.525340 4.625942 8 d 3 6.410238 5.138040 7.404533 9 d 4 4.637469 3.522542 3.661668 10 d 5 5.519138 4.599829 5.566892 Now I have written a function which does exactly what I want BUT I had to take a detour which is possibly unnecessary and can be removed. However, somehow I am not able to locate the shortcut. Here is the function and the output for the posted data frame: fct.logDiff <- function (df) { df.log <- dlply (df, "code", function(x) data.frame (p = x$p, log(x[, -c(1,2)]))) list.nalog <- llply (df.log, function(x) data.frame (p = x$p, rbind(NA, sapply(x[,-1], diff)))) ldply (list.nalog, data.frame) } fct.logDiff(df) id p var1 var2 var3 1 a 1 NA NA NA 2 a 2 -0.16136569 0.46472004 0.05765945 3 a 3 -0.41216720 -0.28249264 -0.08249587 4 c 1 NA NA NA 5 c 2 -0.05914281 0.36999681 0.15868378 6 d 1 NA NA NA 7 d 2 0.24428771 0.20188025 -0.40279188 8 d 3 0.26646102 -0.07267311 0.47041227 9 d 4 -0.32372771 -0.37748866 -0.70417351 10 d 5 0.17405309 0.26683625 0.41891802 The trouble is due to the added NA-rows. I don't want to collapse the frame and reduce it, which would be automatically done by the diff() function. So I had 10 rows in my original frame and am keeping the same amount of rows after the transformation. In order to keep the same length I had to add some NAs. I have taken a detour by transforming the data.frame into a list, add the NAs, and afterwards transform the list back into a data.frame. That looks tedious. Any ideas to avoid the data.frame-list-data.frame class transformation and optimize the function?

Search Results

Search found 14 results on 1 pages for 'sapply'.

Page 1/1 | 1

- by Chris

- by JSmaga

- by flodel

- by James

- by Misha

- by bshor

- by Andreas

- by Keith

- by Eddie

- by user1665355

- by Richie Cotton

- by Eric Fournier

- by EconomiCurtis

- by mropa