Search Results

Search found 100 results on 4 pages for 'ggplot2'.

Page 4/4 | < Previous Page | 1 2 3 4 

  • Trying to keep filled bars in a faceted plot

    - by John Horton
    Not sure what I'm doing wrong here. I have this plot: ggplot(data.PE5, aes(ybands,fill=factor(decide))) + geom_bar(position="dodge") which produces: Then I want to facet by a factor, creating two stacked plots w/ dodged, colored bars ggplot(data.PE5, aes(ybands,fill=factor(decide))) + geom_bar(position="dodge") + facet_grid(~group_label) However, I lose the factor-based coloring, which I want to keep:

    Read the article

  • ggplot geom_bar - to many bars

    - by Andreas
    I am sorry for the non-informative title. exstatus <- structure(list(org = structure(c(2L, 1L, 7L, 3L, 6L, 2L, 2L, 7L, 2L, 1L, 2L, 2L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 8L, 4L, 2L, 2L, 5L, 7L, 8L, 6L, 2L, 7L, 2L, 2L, 7L, 2L, 2L, 2L, 2L, 4L, 2L, 2L, 1L, 2L, 2L, 7L, 2L, 7L, 2L, 4L, 7L, 2L, 4L, 2L, 2L, 2L, 7L, 7L, 2L, 7L, 2L, 7L, 1L, 7L, 5L, 2L, 2L, 1L, 7L, 3L, 5L, 3L, 2L, 2L, 2L, 7L, 4L, 2L, 7L, 2L, 4L, 2L, 2L, 2L, 4L, 6L, 2L, 4L, 4L, 7L, 2L, 2L, 2L, 7L, 6L, 2L, 2L, 1L, 2L, 2L, 4L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 6L, 2L, 7L, 7L, 2L, 7L, 8L, 7L, 8L, 6L, 7L, 2L, 2L, 2L, 7L, 2L, 7L, 2L, 7L, 2L, 7L, 2L, 4L, 2L, 7L, 2L, 4L, 8L, 2L, 3L, 4L, 2L, 7L, 3L, 8L, 8L, 6L, 2L, 2L, 2L, 7L, 7L, 7L, 7L, 2L, 2L, 2L, 2L, 7L, 2L, 4L, 7L, 7L, 8L, 2L, 7L, 2L, 2L, 2L, 2L, 1L, 7L, 7L, 2L, 1L, 7L, 2L, 7L, 7L, 2L, 2L, 7L, 2L, 2L, 7L, 7L, 2L, 7L, 2L, 7L, 5L, 2L), .Label = c("gl", "il", "gm", "im", "gk", "ik", "tv", "tu"), class = "factor"), art = structure(c(2L, 1L, 3L, 1L, 2L, 2L, 2L, 3L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 3L, 2L, 2L, 2L, 1L, 3L, 3L, 2L, 2L, 3L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 3L, 2L, 3L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 2L, 3L, 2L, 3L, 1L, 3L, 1L, 2L, 2L, 1L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 2L, 2L, 3L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 2L, 3L, 2L, 2L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 2L, 2L, 3L, 2L, 2L, 3L, 2L, 1L, 2L, 2L, 3L, 1L, 3L, 3L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 3L, 2L, 2L, 3L, 3L, 3L, 2L, 3L, 2L, 2L, 2L, 2L, 1L, 3L, 3L, 2L, 1L, 3L, 2L, 3L, 3L, 2L, 2L, 3L, 2L, 2L, 3L, 3L, 2L, 3L, 2L, 3L, 1L, 2L), .Label = c("Finish", "Attending", "Something"), class = "factor"), type = structure(c(2L, 2L, 5L, 3L, 1L, 2L, 2L, 5L, 2L, 2L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 3L, 2L, 2L, 1L, 5L, 4L, 1L, 2L, 5L, 2L, 2L, 5L, 2L, 2L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 5L, 2L, 5L, 2L, 3L, 5L, 2L, 3L, 2L, 2L, 2L, 5L, 5L, 2L, 5L, 2L, 5L, 2L, 5L, 1L, 2L, 2L, 2L, 5L, 3L, 1L, 3L, 2L, 2L, 2L, 5L, 3L, 2L, 5L, 2L, 3L, 2L, 2L, 2L, 3L, 1L, 2L, 3L, 3L, 5L, 2L, 2L, 2L, 5L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 5L, 5L, 2L, 5L, 4L, 5L, 4L, 1L, 5L, 2L, 2L, 2L, 5L, 2L, 5L, 2L, 5L, 2L, 5L, 2L, 3L, 2L, 5L, 2L, 3L, 4L, 2L, 3L, 3L, 2L, 5L, 3L, 4L, 4L, 1L, 2L, 2L, 2L, 5L, 5L, 5L, 5L, 2L, 2L, 2L, 2L, 5L, 2L, 3L, 5L, 5L, 4L, 2L, 5L, 2L, 2L, 2L, 2L, 2L, 5L, 5L, 2L, 2L, 5L, 2L, 5L, 5L, 2L, 2L, 5L, 2L, 2L, 5L, 5L, 2L, 5L, 2L, 5L, 1L, 2L), .Label = c("short", "long", "between", "young", "old"), class = "factor")), .Names = c("org", "art", "type"), row.names = c(NA, -192L), class = "data.frame") and then the plot ggplot(exstatus, aes(x=type, fill=art))+ geom_bar(aes(y=..count../sum(..count..)),position="dodge") The problem is that the two rightmost bars ("young", "old") are too thick - "something" takes up the whole width - whcih is not what I intended. I am sorry that I can not explain it better.

    Read the article

  • Changing ylim (axis limits) drops data falling outside range. How can this be prevented?

    - by Alex Holcombe
    df <- data.frame(age=c(10,10,20,20,25,25,25),veg=c(0,1,0,1,1,0,1)) g=ggplot(data=df,aes(x=age,y=veg)) g=g+stat_summary(fun.y=mean,geom="point") Points reflect mean of veg at each age, which is what I expected and want to preserve after changing axis limits with the command below. g=g+ylim(0.2,1) Changing axis limits with the above command unfortunately causes veg==0 subset to be dropped from the data, yielding "Warning message: Removed 4 rows containing missing values (stat_summary)" This is bad because now the data plot (stat_summary mean) omits the veg==0 points. How can this be prevented? I simply want to avoid showing the empty part of the plot- the ordinate from 0 to .2, but not drop the associated data from the stat_summary calculation.

    Read the article

  • Is it possible to plot a single density over a discrete variable?

    - by mattrepl
    The x-axis is time broken up into time intervals. There is an interval column in the data frame that specifies the time for each row. Plotting a histogram or line using geom_histogram and geom_freqpoly works great, but I'd like to use geom_density to get a filled area. Perhaps there is a better way to achieve this. Right now, if I use geom_density, curves are created for each discrete factor level instead of smoothing over all of them.

    Read the article

  • How can I use splne() with ggplot?

    - by David
    I would like to fit my data using spline(y~x) but all of the examples that I can find use a spline with smoothing, e.g. lm(y~ns(x), df=_). I want to use spline() specifically because I am using this to do the analysis represented by the plot that I am making. Is there a simple way to use spline() in ggplot? I have considered the hackish approach of fitting a line using geom_smooth(aes(x=(spline(y~x)$x, y=spline(y~x)$y)) but I would prefer not to have to resort to this. Thanks!

    Read the article

  • Issue with plotting daily data using ggplot

    - by user1723765
    I tried to plot daily data from 9 variables in ggplot, but the graph I get cannot handle the date variable properly. The x axis is unreadable and its impossible to read the plot. I'm guessing there's an issue with the handling of dates. Here's the data: https://dl.dropbox.com/u/22681355/su.csv Here's the code I've been using: su=read.csv(file="su.csv", head=TRUE) meltdf=melt(su) ggplot(meltdf, aes(x=Date, y=value, colour=variable, group=variable))+geom_line() and here's the output: https://dl.dropbox.com/u/22681355/output.jpg here's the same plot done in excel, why does it look completely different?

    Read the article

  • program R- in ggplot restrict y to be >0 in LOESS plot

    - by Nate
    Here's my code: qplot(data=sites, x, y, main="Site 349") (p <- qplot(data = sites, x, y, xlab = "", ylab = "")) (p1 <- p + geom_smooth(method = "loess",span=0.5, size = 1.5)) p1 + theme_bw() + opts(title = "Site 349") Some of the LOESS lines and confidence intervals go below zero, but I would like to restrict the graphics to 0 and positive numbers (because negative do not make sense). How can I do this?

    Read the article

  • R: Plotting a graph with different colors of points based on advanced criteria

    - by balconydoor
    What I would like to do is a plot (using ggplot), where the x axis represent years which have a different colour for the last three years in the plot than the rest. The last three years should also meet a certain criteria and based on this the last three years can either be red or green. The criteria is that the mean of the last three years should be less (making it green) or more (making it red) than the 66%-percentile of the remaining years. So far I have made two different functions calculating the last three year mean: LYM3 <- function (x) { LYM3 <- tail(x,3) mean(LYM3$Data,na.rm=T) } And the 66%-percentile for the remaining: perc66 <- function(x) { percentile <- head(x,-3) quantile(percentile$Data, .66, names=F,na.rm=T) } Here are two sets of data that can be used in the calculations (plots), the first which is an example from my real data where LYM3(df1) < perc66(df1) and the second is just made up data where LYM3 perc66. df1<- data.frame(Year=c(1979:2010), Data=c(347261.87, 145071.29, 110181.93, 183016.71, 210995.67, 205207.33, 103291.78, 247182.10, 152894.45, 170771.50, 206534.55, 287770.86, 223832.43, 297542.86, 267343.54, 475485.47, 224575.08, 147607.81, 171732.38, 126818.10, 165801.08, 136921.58, 136947.63, 83428.05, 144295.87, 68566.23, 59943.05, 49909.08, 52149.11, 117627.75, 132127.79, 130463.80)) df2 <- data.frame(Year=c(1979:2010), Data=c(sample(50,29,replace=T),75,75,75)) Here’s my code for my plot so far: plot <- ggplot(df1, aes(x=Year, y=Data)) + theme_bw() + geom_point(size=3, aes(colour=ifelse(df1$Year<2008, "black",ifelse(LYM3(df1) < perc66(df1),"green","red")))) + geom_line() + scale_x_continuous(breaks=c(1980,1985,1990,1995,2000,2005,2010), limits=c(1978,2011)) plot As you notice it doesn’t really do what I want it to do. The only thing it does seem to do is that it turns the years before 2008 into one level and those after into another one and base the point colour off these two levels. Since I don’t want this year to be stationary either, I made another tiny function: fun3 <- function(x) { df <- subset(x, Year==(max(Year)-2)) df$Year } So the previous code would have the same effect as: geom_point(size=3, aes(colour=ifelse(df1$Year<fun3(df1), "black","red"))) But it still does not care about my colours. Why does it make the years into levels? And how come an ifelse function doesn’t work within another one in this case? How would it be possible to the arguments to do what I like? I realise this might be a bit messy, asking for a lot at the same time, but I hope my description is pretty clear. It would be helpful if someone could at least point me in the right direction. I tried to put the code for the plot into a function as well so I wouldn’t have to change the data frame at all functions within the plot, but I can’t get it to work. Thank you!

    Read the article

  • What does size really mean in geom_point?

    - by Jonas Stein
    In both plots the points look different, but why? mya <- data.frame(a=1:100) ggplot() + geom_path(data=mya, aes(x=a, y=a, colour=2, size=seq(0.1,10,0.1))) + geom_point(data=mya, aes(x=a, y=a, colour=1, size=1)) + theme_bw() + theme(text=element_text(size=11)) ggplot() + geom_path(data=mya, aes(x=a, y=a, colour=2, size=1)) + geom_point(data=mya, aes(x=a, y=a, colour=1, size=1)) + theme_bw() + theme(text=element_text(size=11))

    Read the article

  • Reading a user input (character or string of letters) into ggplot command inside a switch statement or a nested ifelse (with functions in it)

    - by statisticalbeginner
    I have code like AA <- as.integer(readline("Select any number")) switch(AA, 1={ num <-as.integer(readline("Select any one of the options \n")) print('You have selected option 1') #reading user data var <- readline("enter the variable name \n") #aggregating the data based on required condition gg1 <- aggregate(cbind(get(var))~Mi+hours,a, FUN=mean) #Ploting ggplot(gg1, aes(x = hours, y = get(var), group = Mi, fill = Mi, color = Mi)) + geom_point() + geom_smooth(stat="smooth", alpha = I(0.01)) }, 2={ print('bar') }, { print('default') } ) The dataset is [dataset][1] I have loaded the dataset into object list a <- read.table(file.choose(), header=FALSE,col.names= c("Ei","Mi","hours","Nphy","Cphy","CHLphy","Nhet","Chet","Ndet","Cdet","DON","DOC","DIN","DIC","AT","dCCHO","TEPC","Ncocco","Ccocco","CHLcocco","PICcocco","par","Temp","Sal","co2atm","u10","dicfl","co2ppm","co2mol","pH")) I am getting error like source ("switch_statement_check.R") Select any one of the options 1 [1] "You have selected option 1" enter the variable name Nphy Error in eval(expr, envir, enclos) : (list) object cannot be coerced to type 'double' > gg1 is getting data that is fine. I dont know what to do to make the variable entered by user to work in that ggplot command. Please suggest any solution for this. The dput output structure(list(Ei = c(1L, 1L, 1L, 1L, 1L, 1L), Mi = c(1L, 1L, 1L, 1L, 1L, 1L), hours = 1:6, Nphy = c(0.1023488, 0.104524, 0.1064772, 0.1081702, 0.1095905, 0.110759), Cphy = c(0.6534707, 0.6448216, 0.6369597, 0.6299084, 0.6239005, 0.6191941), CHLphy = c(0.1053458, 0.110325, 0.1148174, 0.1187672, 0.122146, 0.1249877), Nhet = c(0.04994161, 0.04988347, 0.04982555, 0.04976784, 0.04971029, 0.04965285), Chet = c(0.3308593, 0.3304699, 0.3300819, 0.3296952, 0.3293089, 0.3289243), Ndet = c(0.04991916, 0.04984045, 0.04976363, 0.0496884, 0.04961446, 0.04954156), Cdet = c(0.3307085, 0.3301691, 0.3296314, 0.3290949, 0.3285598, 0.3280252), DON = c(0.05042275, 0.05085697, 0.05130091, 0.05175249, 0.05220978, 0.05267118 ), DOC = c(49.76304, 49.52745, 49.29323, 49.06034, 48.82878, 48.59851), DIN = c(14.9933, 14.98729, 14.98221, 14.9781, 14.97485, 14.97225), DIC = c(2050.132, 2050.264, 2050.396, 2050.524, 2050.641, 2050.758), AT = c(2150.007, 2150.007, 2150.007, 2150.007, 2150.007, 2150.007), dCCHO = c(0.964222, 0.930869, 0.8997098, 0.870544, 0.843196, 0.8175117), TEPC = c(0.1339044, 0.1652179, 0.1941872, 0.2210289, 0.2459341, 0.2690721), Ncocco = c(0.1040715, 0.1076058, 0.1104229, 0.1125141, 0.1140222, 0.1151228), Ccocco = c(0.6500288, 0.6386706, 0.6291149, 0.6213265, 0.6152447, 0.6108502), CHLcocco = c(0.1087667, 0.1164099, 0.1225822, 0.1273103, 0.1308843, 0.1336465), PICcocco = c(0.1000664, 0.1001396, 0.1007908, 0.101836, 0.1034179, 0.1055634), par = c(0, 0, 0.8695131, 1.551317, 2.777707, 4.814341), Temp = c(9.9, 9.9, 9.9, 9.9, 9.9, 9.9), Sal = c(31.31, 31.31, 31.31, 31.31, 31.31, 31.31), co2atm = c(370, 370, 370, 370, 370, 370), u10 = c(0.01, 0.01, 0.01, 0.01, 0.01, 0.01), dicfl = c(-2.963256, -2.971632, -2.980446, -2.989259, -2.997877, -3.005702), co2ppm = c(565.1855, 565.7373, 566.3179, 566.8983, 567.466, 567.9814), co2mol = c(0.02562326, 0.02564828, 0.0256746, 0.02570091, 0.02572665, 0.02575002 ), pH = c(7.879427, 7.879042, 7.878636, 7.878231, 7.877835, 7.877475)), .Names = c("Ei", "Mi", "hours", "Nphy", "Cphy", "CHLphy", "Nhet", "Chet", "Ndet", "Cdet", "DON", "DOC", "DIN", "DIC", "AT", "dCCHO", "TEPC", "Ncocco", "Ccocco", "CHLcocco", "PICcocco", "par", "Temp", "Sal", "co2atm", "u10", "dicfl", "co2ppm", "co2mol", "pH"), row.names = c(NA, 6L), class = "data.frame") As per the below suggestions I have tried a lot but it is not working. Summarizing I will say: var <- readline("enter a variable name") I cant use get(var) inside any command but not inside ggplot, it wont work. gg1$var it also doesnt work, even after changing the column names. Does it have a solution or should I just choose to import from an excel sheet, thats better? Tried with if else and functions fun1 <- function() { print('You have selected option 1') my <- as.character((readline("enter the variable name \n"))) gg1 <- aggregate(cbind(get(my))~Mi+hours,a, FUN=mean) names(gg1)[3] <- my #print(names(gg1)) ggplot (gg1,aes_string(x="hours",y=(my),group="Mi",color="Mi")) + geom_point() } my <- as.integer(readline("enter a number")) ifelse(my == 1,fun1(),"") ifelse(my == 2,print ("its 2"),"") ifelse(my == 3,print ("its 3"),"") ifelse(my != (1 || 2|| 3) ,print("wrong number"),"") Not working either...:(

    Read the article

  • ggplot add percentage labels based on x-axis variables

    - by eugeneyan
    I've a ggplot that shows the counts of tweets for some brands as well as a label for the overall percentage. This was done with much help from this link: ggplot: showing % instead of counts in charts of categorical variables # plot ggplot of brands ggplot(data = test, aes(x = brand, fill = brand)) + geom_bar() + stat_bin(aes(label = sprintf("%.02f %%", ..count../sum(..count..)*100)), geom = 'text', vjust = -0.3) Next, I would like to plot it based on brand and sentiment, with the labels for the bars of each brand totalling up to 100%. However, I have difficulty amending my code to do this. Would you be able to help please? Also, would it be possible to change the colours for neu to blue and pos to green? # plot ggplot of brands and sentiment ggplot(data = test, aes(x = brand, fill = factor(sentiment))) + geom_bar(position = 'dodge') + stat_bin(aes(label = sprintf("%.02f %%", ..count../sum(..count..)*100)), geom = 'text', position = position_dodge(width = 0.9), vjust=-0.3)

    Read the article

  • Storing parameters from a graph and applying to other graphs

    - by Braden
    I would like to store the xmin and xmax parameters from one geom_histogram and apply them to a second geom_histogram. I am putting both graphs on the same page using grid.arrange and would like them to have the same x range, while allowing the first graph to establish the range based on its data. The second graph is produced from a subset of the first graphs data, so it will not have data that falls outside of the x-range established by the first. But I don't want the range to shrink to fit the second graph.

    Read the article

  • Can one extract model fit parameters after a ggplot stat_smooth call?

    - by Alex Holcombe
    Using stat_smooth, I can fit models to data. E.g. g=ggplot(tips,aes(x=tip,y=as.numeric(unclass(factor(tips$sex))-1))) +facet_grid(time~.) g=g+ stat_summary(fun.y=mean,geom="point") g=g+ stat_smooth(method="glm", family="binomial") I would like to know the coefficients of the glm binomial fits. I could re-do the fit with dlply and get the coefficients with ldply, but I'd like to avoid such duplication. Calling str(g) reveals the hierarchy of objects that ggplot creates, perhaps there's some way to get to the coefficients through that?

    Read the article

  • How to enable `geom_text` to recognize `aes` in QPLOT (R programming)

    - by neversaint
    I have a data that looks like this ensg mirna_hgc time value perc id ENSG00000211521 MIR665 x 89 2.07612456747405 1 ENSG00000207787 MIR98 x 73 1.73010380622837 2 ... ENSG00000207827 MIR30A y 99 21.4532871972318 288 ENSG00000207757 MIR93 y 94 1.73010380622837 289 What I'm trying to do is to create a facet plot with label on top of it. The label can be easily called from the perc column. Using this code: dat.m <- read.delim("http://dpaste.com/1271039/plain/",header=TRUE,sep=" ") qplot(value, data=dat.m,facets=time~.,binwidth=1,main="")+ xlab("Value")+ ylab("Count")+ theme(legend.position="none")+ stat_bin(aes(value,label=sprintf("%.01f",perc)),geom="text") But it gave me this error: Error: geom_text requires the following missing aesthetics: label What I'm trying to do is to generate this plot:

    Read the article

  • How can I superimpose modified loess lines on a ggplot2 qplot?

    - by briandk
    Background Right now, I'm creating a multiple-predictor linear model and generating diagnostic plots to assess regression assumptions. (It's for a multiple regression analysis stats class that I'm loving at the moment :-) My textbook (Cohen, Cohen, West, and Aiken 2003) recommends plotting each predictor against the residuals to make sure that: The residuals don't systematically covary with the predictor The residuals are homoscedastic with respect to each predictor in the model On point (2), my textbook has this to say: Some statistical packages allow the analyst to plot lowess fit lines at the mean of the residuals (0-line), 1 standard deviation above the mean, and 1 standard deviation below the mean of the residuals....In the present case {their example}, the two lines {mean + 1sd and mean - 1sd} remain roughly parallel to the lowess {0} line, consistent with the interpretation that the variance of the residuals does not change as a function of X. (p. 131) How can I modify loess lines? I know how to generate a scatterplot with a "0-line,": # First, I'll make a simple linear model and get its diagnostic stats library(ggplot2) data(cars) mod <- fortify(lm(speed ~ dist, data = cars)) attach(mod) str(mod) # Now I want to make sure the residuals are homoscedastic qplot (x = dist, y = .resid, data = mod) + geom_smooth(se = FALSE) # "se = FALSE" Removes the standard error bands But does anyone know how I can use ggplot2 and qplot to generate plots where the 0-line, "mean + 1sd" AND "mean - 1sd" lines would be superimposed? Is that a weird/complex question to be asking?

    Read the article

  • Convert ddply {plyr} to Oracle R Enterprise, or use with Embedded R Execution

    - by Mark Hornick
    The plyr package contains a set of tools for partitioning a problem into smaller sub-problems that can be more easily processed. One function within {plyr} is ddply, which allows you to specify subsets of a data.frame and then apply a function to each subset. The result is gathered into a single data.frame. Such a capability is very convenient. The function ddply also has a parallel option that if TRUE, will apply the function in parallel, using the backend provided by foreach. This type of functionality is available through Oracle R Enterprise using the ore.groupApply function. In this blog post, we show a few examples from Sean Anderson's "A quick introduction to plyr" to illustrate the correpsonding functionality using ore.groupApply. To get started, we'll create a demo data set and load the plyr package. set.seed(1) d <- data.frame(year = rep(2000:2014, each = 3),         count = round(runif(45, 0, 20))) dim(d) library(plyr) This first example takes the data frame, partitions it by year, and calculates the coefficient of variation of the count, returning a data frame. # Example 1 res <- ddply(d, "year", function(x) {   mean.count <- mean(x$count)   sd.count <- sd(x$count)   cv <- sd.count/mean.count   data.frame(cv.count = cv)   }) To illustrate the equivalent functionality in Oracle R Enterprise, using embedded R execution, we use the ore.groupApply function on the same data, but pushed to the database, creating an ore.frame. The function ore.push creates a temporary table in the database, returning a proxy object, the ore.frame. D <- ore.push(d) res <- ore.groupApply (D, D$year, function(x) {   mean.count <- mean(x$count)   sd.count <- sd(x$count)   cv <- sd.count/mean.count   data.frame(year=x$year[1], cv.count = cv)   }, FUN.VALUE=data.frame(year=1, cv.count=1)) You'll notice the similarities in the first three arguments. With ore.groupApply, we augment the function to return the specific data.frame we want. We also specify the argument FUN.VALUE, which describes the resulting data.frame. From our previous blog posts, you may recall that by default, ore.groupApply returns an ore.list containing the results of each function invocation. To get a data.frame, we specify the structure of the result. The results in both cases are the same, however the ore.groupApply result is an ore.frame. In this case the data stays in the database until it's actually required. This can result in significant memory and time savings whe data is large. R> class(res) [1] "ore.frame" attr(,"package") [1] "OREbase" R> head(res)    year cv.count 1 2000 0.3984848 2 2001 0.6062178 3 2002 0.2309401 4 2003 0.5773503 5 2004 0.3069680 6 2005 0.3431743 To make the ore.groupApply execute in parallel, you can specify the argument parallel with either TRUE, to use default database parallelism, or to a specific number, which serves as a hint to the database as to how many parallel R engines should be used. The next ddply example uses the summarise function, which creates a new data.frame. In ore.groupApply, the year column is passed in with the data. Since no automatic creation of columns takes place, we explicitly set the year column in the data.frame result to the value of the first row, since all rows received by the function have the same year. # Example 2 ddply(d, "year", summarise, mean.count = mean(count)) res <- ore.groupApply (D, D$year, function(x) {   mean.count <- mean(x$count)   data.frame(year=x$year[1], mean.count = mean.count)   }, FUN.VALUE=data.frame(year=1, mean.count=1)) R> head(res)    year mean.count 1 2000 7.666667 2 2001 13.333333 3 2002 15.000000 4 2003 3.000000 5 2004 12.333333 6 2005 14.666667 Example 3 uses the transform function with ddply, which modifies the existing data.frame. With ore.groupApply, we again construct the data.frame explicilty, which is returned as an ore.frame. # Example 3 ddply(d, "year", transform, total.count = sum(count)) res <- ore.groupApply (D, D$year, function(x) {   total.count <- sum(x$count)   data.frame(year=x$year[1], count=x$count, total.count = total.count)   }, FUN.VALUE=data.frame(year=1, count=1, total.count=1)) > head(res)    year count total.count 1 2000 5 23 2 2000 7 23 3 2000 11 23 4 2001 18 40 5 2001 4 40 6 2001 18 40 In Example 4, the mutate function with ddply enables you to define new columns that build on columns just defined. Since the construction of the data.frame using ore.groupApply is explicit, you always have complete control over when and how to use columns. # Example 4 ddply(d, "year", mutate, mu = mean(count), sigma = sd(count),       cv = sigma/mu) res <- ore.groupApply (D, D$year, function(x) {   mu <- mean(x$count)   sigma <- sd(x$count)   cv <- sigma/mu   data.frame(year=x$year[1], count=x$count, mu=mu, sigma=sigma, cv=cv)   }, FUN.VALUE=data.frame(year=1, count=1, mu=1,sigma=1,cv=1)) R> head(res)    year count mu sigma cv 1 2000 5 7.666667 3.055050 0.3984848 2 2000 7 7.666667 3.055050 0.3984848 3 2000 11 7.666667 3.055050 0.3984848 4 2001 18 13.333333 8.082904 0.6062178 5 2001 4 13.333333 8.082904 0.6062178 6 2001 18 13.333333 8.082904 0.6062178 In Example 5, ddply is used to partition data on multiple columns before constructing the result. Realizing this with ore.groupApply involves creating an index column out of the concatenation of the columns used for partitioning. This example also allows us to illustrate using the ORE transparency layer to subset the data. # Example 5 baseball.dat <- subset(baseball, year > 2000) # data from the plyr package x <- ddply(baseball.dat, c("year", "team"), summarize,            homeruns = sum(hr)) We first push the data set to the database to get an ore.frame. We then add the composite column and perform the subset, using the transparency layer. Since the results from database execution are unordered, we will explicitly sort these results and view the first 6 rows. BB.DAT <- ore.push(baseball) BB.DAT$index <- with(BB.DAT, paste(year, team, sep="+")) BB.DAT2 <- subset(BB.DAT, year > 2000) X <- ore.groupApply (BB.DAT2, BB.DAT2$index, function(x) {   data.frame(year=x$year[1], team=x$team[1], homeruns=sum(x$hr))   }, FUN.VALUE=data.frame(year=1, team="A", homeruns=1), parallel=FALSE) res <- ore.sort(X, by=c("year","team")) R> head(res)    year team homeruns 1 2001 ANA 4 2 2001 ARI 155 3 2001 ATL 63 4 2001 BAL 58 5 2001 BOS 77 6 2001 CHA 63 Our next example is derived from the ggplot function documentation. This illustrates the use of ddply within using the ggplot2 package. We first create a data.frame with demo data and use ddply to create some statistics for each group (gp). We then use ggplot to produce the graph. We can take this same code, push the data.frame df to the database and invoke this on the database server. The graph will be returned to the client window, as depicted below. # Example 6 with ggplot2 library(ggplot2) df <- data.frame(gp = factor(rep(letters[1:3], each = 10)),                  y = rnorm(30)) # Compute sample mean and standard deviation in each group library(plyr) ds <- ddply(df, .(gp), summarise, mean = mean(y), sd = sd(y)) # Set up a skeleton ggplot object and add layers: ggplot() +   geom_point(data = df, aes(x = gp, y = y)) +   geom_point(data = ds, aes(x = gp, y = mean),              colour = 'red', size = 3) +   geom_errorbar(data = ds, aes(x = gp, y = mean,                                ymin = mean - sd, ymax = mean + sd),              colour = 'red', width = 0.4) DF <- ore.push(df) ore.tableApply(DF, function(df) {   library(ggplot2)   library(plyr)   ds <- ddply(df, .(gp), summarise, mean = mean(y), sd = sd(y))   ggplot() +     geom_point(data = df, aes(x = gp, y = y)) +     geom_point(data = ds, aes(x = gp, y = mean),                colour = 'red', size = 3) +     geom_errorbar(data = ds, aes(x = gp, y = mean,                                  ymin = mean - sd, ymax = mean + sd),                   colour = 'red', width = 0.4) }) But let's take this one step further. Suppose we wanted to produce multiple graphs, partitioned on some index column. We replicate the data three times and add some noise to the y values, just to make the graphs a little different. We also create an index column to form our three partitions. Note that we've also specified that this should be executed in parallel, allowing Oracle Database to control and manage the server-side R engines. The result of ore.groupApply is an ore.list that contains the three graphs. Each graph can be viewed by printing the list element. df2 <- rbind(df,df,df) df2$y <- df2$y + rnorm(nrow(df2)) df2$index <- c(rep(1,300), rep(2,300), rep(3,300)) DF2 <- ore.push(df2) res <- ore.groupApply(DF2, DF2$index, function(df) {   df <- df[,1:2]   library(ggplot2)   library(plyr)   ds <- ddply(df, .(gp), summarise, mean = mean(y), sd = sd(y))   ggplot() +     geom_point(data = df, aes(x = gp, y = y)) +     geom_point(data = ds, aes(x = gp, y = mean),                colour = 'red', size = 3) +     geom_errorbar(data = ds, aes(x = gp, y = mean,                                  ymin = mean - sd, ymax = mean + sd),                   colour = 'red', width = 0.4)   }, parallel=TRUE) res[[1]] res[[2]] res[[3]] To recap, we've illustrated how various uses of ddply from the plyr package can be realized in ore.groupApply, which affords the user explicit control over the contents of the data.frame result in a straightforward manner. We've also highlighted how ddply can be used within an ore.groupApply call.

    Read the article

  • Interactive Charts for web application

    - by user227290
    We are working on a web based application (implemented in JAVA) on commodity prices and one part of it is interactive charting. I provide a simplified example here. We have a table in Mysql database where we have information on commodity prices in US states and counties. One aspect of the application is to create interactive plots based on user choice. For example, if the user needs to see the price density in Oregon and Linn county then she chooses it from the menu in a webpage and it is rendered on fly with accompanying quantile information in a table. As the user changes state and county these plots and table change on fly.For our computational need we are using R (and use rjava to integrate it to our web application) and I know that if interactivity is not an issue this is a piece of cake in ggplot2, but I am not aware of any interactive version of R graphics framework (like lattice, ggplot2). We are exploring google visualization API but I am not sure we can have the statistical power we need in some of the plots.Please help.

    Read the article

  • error when trying to import ps file by grImport in R

    - by lokheart
    I need to create a pdf file with several chart created by ggplot2 arranged in a A4 paper, and repeat it 20-30 times. I export the ggplot2 chart into ps file, and try to PostScriptTrace it as instructed in grImport, but it just keep giving me error of "Unrecoverable error, exit code 1". I ignore the error and try to import and xml file generated into R object, give me another error: attributes construct error Couldn't find end of Start Tag text line 21 Premature end of data in tag picture line 3 Error: 1: attributes construct error 2: Couldn't find end of Start Tag text line 21 3: Premature end of data in tag picture line 3 What's wrong here? Thanks!

    Read the article

  • How to market R at your institute?

    - by ran2
    Okay, I admit there are lots of threads R vs. something. The strengths of R are obvious to most people here. Still though advertising R in an environment that has been preferring various kinds of other software for quite some time is not easy. Moreover, even in the limited time I´ve been dealing with R, it improved so dramatically that I would mention things among its strengths that I would not have listed when I started my personal R-evolution. So, what I am trying to do here is to collect the most recent and striking arguments that can be put in nutshell and be presented easily. What I got on my list so far is: the Springer useR series ggplot2 and its documentation open source CRAN Rapache rcpp rsocket What can you add to this list? SO threads are also very welcome as answers. EDIT: so far, though indeed very helpful, most answers are arguments (pros) why one would want to use R. Do you have some specific hints that I could include in some kind of overview presentation? EDIT2: I wanted to add this link about R's future to the list...

    Read the article

  • Change color on single-series line chart using rCharts and Highcharts?

    - by Sharon
    Is it possible to change the color of a line using rCharts and Highcharts so that the line color changes depending on a factor? I've done this with ggplot2 but would like to make an interactive version if possible. I've tried h1 <- Highcharts$new() h1$chart(type="line") h1$series(data=mydf$myvalue, name="", groups = c("myfactor")) h1$xAxis(tickInterval = 4, categories = mydf$myXaxis) but that's not working, line stays the same color. Sample data myvalue <- c(16, 18, 5, 14, 10) myXaxis <- c(1,2,3,4,5) myfactor <- c("old", "old", "old", "new", "new") mydf <- data.frame(myvalue, myXaxis, myfactor) Thanks for any suggestions.

    Read the article

  • How to create plots in multiple windows and keep them seperate in R

    - by PaulHurleyuk
    Hello SOers, I'm sure this is an easy problem, but my google / help foo has failed me, so it's up to you. I have an R script that generates several plots, and I want to view all the plots on screen at once (in seperate windows), but I can't work out how to open multiple graphics windows. I'm using ggplot2, but I feel this is a more basic problem, so I'm just using base grapics for this simple example x<-c(1:10) y<-sin(x) z<-cos(x) dev.new() plot(y=y,x=x) dev.off() dev.new() plot(x=x,y=z) But this doesn't work. I'm on Windows if this matters (Windows + Eclipse + StatEt)

    Read the article

  • How to include multiple tables programmaticaly into a Sweave document using R

    - by PaulHurleyuk
    Hello, I want to have a sweave document that will include a variable number of tables in. I thought the example below would work, but it doesn't. I want to loop over the list foo and print each element as it's own table. % \documentclass[a4paper]{article} \usepackage[OT1]{fontenc} \usepackage{longtable} \usepackage{geometry} \usepackage{Sweave} \geometry{left=1.25in, right=1.25in, top=1in, bottom=1in} \listfiles \begin{document} <<label=start, echo=FALSE, include=FALSE>>= startt<-proc.time()[3] library(RODBC) library(psych) library(xtable) library(plyr) library(ggplot2) options(width=80) #Produce some example data, here I'm creating some dummy dataframes and putting them in a list foo<-list() foo[[1]]<-data.frame(GRP=c(rep("AA",10), rep("Aa",10), rep("aa",10)), X1=rnorm(30), X2=rnorm(30,5,2)) foo[[2]]<-data.frame(GRP=c(rep("BB",10), rep("bB",10), rep("BB",10)), X1=rnorm(30), X2=rnorm(30,5,2)) foo[[3]]<-data.frame(GRP=c(rep("CC",12), rep("cc",18)), X1=rnorm(30), X2=rnorm(30,5,2)) foo[[4]]<-data.frame(GRP=c(rep("DD",10), rep("Dd",10), rep("dd",10)), X1=rnorm(30), X2=rnorm(30,5,2)) @ \title{Docuemnt to test putting a variable number of tables into a sweave Document} \author{"Paul Hurley"} \maketitle \section{Text} This document was created on \today, with \Sexpr{print(version$version.string)} running on a \Sexpr{print(version$platform)} platform. It took approx \input{time} sec to process. <<label=test, echo=FALSE, results=tex>>= cat("Foo") @ that was a test, so is this <<label=table1test, echo=FALSE, results=tex>>= print(xtable(foo[[1]])) @ \newpage \subsection{Tables} <<label=Tables, echo=FALSE, results=tex>>= for(i in seq(foo)){ cat("\n") cat(paste("Table_",i,sep="")) cat("\n") print(xtable(foo[[i]])) cat("\n") } #cat("<<label=endofTables>>= ") @ <<label=bye, include=FALSE, echo=FALSE>>= endt<-proc.time()[3] elapsedtime<-as.numeric(endt-startt) @ <<label=elapsed, include=FALSE, echo=FALSE>>= fileConn<-file("time.tex", "wt") writeLines(as.character(elapsedtime), fileConn) close(fileConn) @ \end{document} Here, the table1test chunk works as expected, and produced a table based on the dataframe in foo[[1]], however the loop only produces Table(underscore)1.... Any ideas what I'm doing wrong ?

    Read the article

  • R plot- SGAM plot counts vs. time - how do I get dates on the x-axis?

    - by Nate
    I'd like to plot this vs. time, with the actual dates (years actually, 1997,1998...2010). The dates are in a raw format, ala SAS, days since 1960 (hence as.date conversion). If I convert the dates using as.date to variable x, and do the GAM plot, I get an error. It works fine with the raw day numbers. But I want the plot to display the years (data are not equally spaced). structure(list(site = c(928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L), date = c(13493L, 13534L, 13566L, 13611L, 13723L, 13752L, 13804L, 13837L, 13927L, 14028L, 14082L, 14122L, 14150L, 14182L, 14199L, 16198L, 16279L, 16607L, 16945L, 17545L, 17650L, 17743L, 17868L, 17941L, 18017L, 18092L), y = c(7L, 7L, 17L, 18L, 17L, 17L, 10L, 3L, 17L, 24L, 11L, 5L, 5L, 3L, 5L, 14L, 2L, 9L, 9L, 4L, 7L, 6L, 1L, 0L, 5L, 0L)), .Names = c("site", "date", "y"), class = "data.frame", row.names = c(NA, -26L)) sgam1 <- gam(sites$y ~ s(sites$date)) sgam <- predict(sgam1, se=TRUE) plot(sites$date,sites$y,xaxt="n", xlab='Time', ylab='Counts') x<-as.Date(sites$date, origin="1960-01-01") axis(1, at=1:26,labels=x) lines(sites$date,sgam$fit, lty = 1) lines(sites$date,sgam$fit + 1.96* sgam$se, lty = 2) lines(sites$date,sgam$fit - 1.96* sgam$se, lty = 2) ggplot2 has a solution (it doesn't mind the as.date thing) but it gives me other problems...

    Read the article

  • 2D Histogram in R: Converting from Count to Frequency within a Column

    - by Jac
    Would appreciate help with generating a 2D histogram of frequencies, where frequencies are calculated within a column. My main issue: converting from counts to column based frequency. Here's my starting code: # expected packages library(ggplot2) library(plyr) # generate example data corresponding to expected data input x_data = sample(101:200,10000, replace = TRUE) y_data = sample(1:100,10000, replace = TRUE) my_set = data.frame(x_data,y_data) # define x and y interval cut points x_seq = seq(100,200,10) y_seq = seq(0,100,10) # label samples as belonging within x and y intervals my_set$x_interval = cut(my_set$x_data,x_seq) my_set$y_interval = cut(my_set$y_data,y_seq) # determine count for each x,y block xy_df = ddply(my_set, c("x_interval","y_interval"),"nrow") # still need to convert for use with dplyr # convert from count to frequency based on formula: freq = count/sum(count in given x interval) ################ TRYING TO FIGURE OUT ################# # plot results fig_count <- ggplot(xy_df, aes(x = x_interval, y = y_interval)) + geom_tile(aes(fill = nrow)) # count fig_freq <- ggplot(xy_df, aes(x = x_interval, y = y_interval)) + geom_tile(aes(fill = freq)) # frequency I would appreciate any help in how to calculate the frequency within a column. Thanks! jac EDIT: I think the solution will require the following steps 1) Calculate and store overall counts for each x-interval factor 2) Divide the individual bin count by its corresponding x-interval factor count to obtain frequency. Not sure how to carry this out though. .

    Read the article

  • plot an item map (based on difficulties)

    - by Tyler Rinker
    I have a data set of item difficulties that correspond to items on a questionnaire that looks like this: item difficulty 1 ITEM_6: I DESTROY THINGS BELONGING TO OTHERS 2.31179818 2 ITEM_11: I PHYSICALLY ATTACK PEOPLE 1.95215238 3 ITEM_5: I DESTROY MY OWN THINGS 1.93479536 4 ITEM_10: I GET IN MANY FIGHTS 1.62610855 5 ITEM_19: I THREATEN TO HURT PEOPLE 1.62188759 6 ITEM_12: I SCREAM A LOT 1.45137544 7 ITEM_8: I DISOBEY AT SCHOOL 0.94255210 8 ITEM_3: I AM MEAN TO OTHERS 0.89941812 9 ITEM_20: I AM LOUDER THAN OTHER KIDS 0.72752197 10 ITEM_17: I TEASE OTHERS A LOT 0.61792597 11 ITEM_9: I AM JEALOUS OF OTHERS 0.61288399 12 ITEM_4: I TRY TO GET A LOT OF ATTENTION 0.39947791 13 ITEM_18: I HAVE A HOT TEMPER 0.32209970 14 ITEM_13: I SHOW OFF OR CLOWN 0.31707701 15 ITEM_7: I DISOBEY MY PARENTS 0.20902108 16 ITEM_2: I BRAG 0.19923607 17 ITEM_15: MY MOODS OR FEELINGS CHANGE SUDDENLY 0.06023317 18 ITEM_14: I AM STUBBORN -0.31155481 19 ITEM_16: I TALK TOO MUCH -0.67777282 20 ITEM_1: I ARGUE A LOT -1.15013758 I want to make an item map of these items that looks similar (not exactly) to this (I created this in word but it lacks true scaling as I just eyeballed the scale). It's not really a traditional statistical graphic and so I don't really know how to approach this. I don't care what graphics system this is done in but I am more familiar with ggplot2 and base. I would greatly appreciate a method of plotting this sort of unusual plot. Here's the data set (I'm including it as I was having difficulty using read.table on the dataframe above): DF <- structure(list(item = structure(c(17L, 3L, 16L, 2L, 11L, 4L, 19L, 14L, 13L, 9L, 20L, 15L, 10L, 5L, 18L, 12L, 7L, 6L, 8L, 1L ), .Label = c("ITEM_1: I ARGUE A LOT", "ITEM_10: I GET IN MANY FIGHTS", "ITEM_11: I PHYSICALLY ATTACK PEOPLE", "ITEM_12: I SCREAM A LOT", "ITEM_13: I SHOW OFF OR CLOWN", "ITEM_14: I AM STUBBORN", "ITEM_15: MY MOODS OR FEELINGS CHANGE SUDDENLY", "ITEM_16: I TALK TOO MUCH", "ITEM_17: I TEASE OTHERS A LOT", "ITEM_18: I HAVE A HOT TEMPER", "ITEM_19: I THREATEN TO HURT PEOPLE", "ITEM_2: I BRAG", "ITEM_20: I AM LOUDER THAN OTHER KIDS", "ITEM_3: I AM MEAN TO OTHERS", "ITEM_4: I TRY TO GET A LOT OF ATTENTION", "ITEM_5: I DESTROY MY OWN THINGS", "ITEM_6: I DESTROY THINGS BELONGING TO OTHERS", "ITEM_7: I DISOBEY MY PARENTS", "ITEM_8: I DISOBEY AT SCHOOL", "ITEM_9: I AM JEALOUS OF OTHERS" ), class = "factor"), difficulty = c(2.31179818110545, 1.95215237740899, 1.93479536058926, 1.62610855327073, 1.62188759115818, 1.45137543733965, 0.942552101641177, 0.899418119889782, 0.7275219669431, 0.617925967008653, 0.612883990709181, 0.399477905189577, 0.322099696946661, 0.31707700560997, 0.209021078266059, 0.199236065264793, 0.0602331732900628, -0.311554806052955, -0.677772822413495, -1.15013757942119)), .Names = c("item", "difficulty" ), row.names = c(NA, -20L), class = "data.frame") Thank you in advance.

    Read the article

< Previous Page | 1 2 3 4