Search Results

Search found 65 results on 3 pages for 'ggplot'.

Page 2/3 | < Previous Page | 1 2 3  | Next Page >

  • Convert ddply {plyr} to Oracle R Enterprise, or use with Embedded R Execution

    - by Mark Hornick
    The plyr package contains a set of tools for partitioning a problem into smaller sub-problems that can be more easily processed. One function within {plyr} is ddply, which allows you to specify subsets of a data.frame and then apply a function to each subset. The result is gathered into a single data.frame. Such a capability is very convenient. The function ddply also has a parallel option that if TRUE, will apply the function in parallel, using the backend provided by foreach. This type of functionality is available through Oracle R Enterprise using the ore.groupApply function. In this blog post, we show a few examples from Sean Anderson's "A quick introduction to plyr" to illustrate the correpsonding functionality using ore.groupApply. To get started, we'll create a demo data set and load the plyr package. set.seed(1) d <- data.frame(year = rep(2000:2014, each = 3),         count = round(runif(45, 0, 20))) dim(d) library(plyr) This first example takes the data frame, partitions it by year, and calculates the coefficient of variation of the count, returning a data frame. # Example 1 res <- ddply(d, "year", function(x) {   mean.count <- mean(x$count)   sd.count <- sd(x$count)   cv <- sd.count/mean.count   data.frame(cv.count = cv)   }) To illustrate the equivalent functionality in Oracle R Enterprise, using embedded R execution, we use the ore.groupApply function on the same data, but pushed to the database, creating an ore.frame. The function ore.push creates a temporary table in the database, returning a proxy object, the ore.frame. D <- ore.push(d) res <- ore.groupApply (D, D$year, function(x) {   mean.count <- mean(x$count)   sd.count <- sd(x$count)   cv <- sd.count/mean.count   data.frame(year=x$year[1], cv.count = cv)   }, FUN.VALUE=data.frame(year=1, cv.count=1)) You'll notice the similarities in the first three arguments. With ore.groupApply, we augment the function to return the specific data.frame we want. We also specify the argument FUN.VALUE, which describes the resulting data.frame. From our previous blog posts, you may recall that by default, ore.groupApply returns an ore.list containing the results of each function invocation. To get a data.frame, we specify the structure of the result. The results in both cases are the same, however the ore.groupApply result is an ore.frame. In this case the data stays in the database until it's actually required. This can result in significant memory and time savings whe data is large. R> class(res) [1] "ore.frame" attr(,"package") [1] "OREbase" R> head(res)    year cv.count 1 2000 0.3984848 2 2001 0.6062178 3 2002 0.2309401 4 2003 0.5773503 5 2004 0.3069680 6 2005 0.3431743 To make the ore.groupApply execute in parallel, you can specify the argument parallel with either TRUE, to use default database parallelism, or to a specific number, which serves as a hint to the database as to how many parallel R engines should be used. The next ddply example uses the summarise function, which creates a new data.frame. In ore.groupApply, the year column is passed in with the data. Since no automatic creation of columns takes place, we explicitly set the year column in the data.frame result to the value of the first row, since all rows received by the function have the same year. # Example 2 ddply(d, "year", summarise, mean.count = mean(count)) res <- ore.groupApply (D, D$year, function(x) {   mean.count <- mean(x$count)   data.frame(year=x$year[1], mean.count = mean.count)   }, FUN.VALUE=data.frame(year=1, mean.count=1)) R> head(res)    year mean.count 1 2000 7.666667 2 2001 13.333333 3 2002 15.000000 4 2003 3.000000 5 2004 12.333333 6 2005 14.666667 Example 3 uses the transform function with ddply, which modifies the existing data.frame. With ore.groupApply, we again construct the data.frame explicilty, which is returned as an ore.frame. # Example 3 ddply(d, "year", transform, total.count = sum(count)) res <- ore.groupApply (D, D$year, function(x) {   total.count <- sum(x$count)   data.frame(year=x$year[1], count=x$count, total.count = total.count)   }, FUN.VALUE=data.frame(year=1, count=1, total.count=1)) > head(res)    year count total.count 1 2000 5 23 2 2000 7 23 3 2000 11 23 4 2001 18 40 5 2001 4 40 6 2001 18 40 In Example 4, the mutate function with ddply enables you to define new columns that build on columns just defined. Since the construction of the data.frame using ore.groupApply is explicit, you always have complete control over when and how to use columns. # Example 4 ddply(d, "year", mutate, mu = mean(count), sigma = sd(count),       cv = sigma/mu) res <- ore.groupApply (D, D$year, function(x) {   mu <- mean(x$count)   sigma <- sd(x$count)   cv <- sigma/mu   data.frame(year=x$year[1], count=x$count, mu=mu, sigma=sigma, cv=cv)   }, FUN.VALUE=data.frame(year=1, count=1, mu=1,sigma=1,cv=1)) R> head(res)    year count mu sigma cv 1 2000 5 7.666667 3.055050 0.3984848 2 2000 7 7.666667 3.055050 0.3984848 3 2000 11 7.666667 3.055050 0.3984848 4 2001 18 13.333333 8.082904 0.6062178 5 2001 4 13.333333 8.082904 0.6062178 6 2001 18 13.333333 8.082904 0.6062178 In Example 5, ddply is used to partition data on multiple columns before constructing the result. Realizing this with ore.groupApply involves creating an index column out of the concatenation of the columns used for partitioning. This example also allows us to illustrate using the ORE transparency layer to subset the data. # Example 5 baseball.dat <- subset(baseball, year > 2000) # data from the plyr package x <- ddply(baseball.dat, c("year", "team"), summarize,            homeruns = sum(hr)) We first push the data set to the database to get an ore.frame. We then add the composite column and perform the subset, using the transparency layer. Since the results from database execution are unordered, we will explicitly sort these results and view the first 6 rows. BB.DAT <- ore.push(baseball) BB.DAT$index <- with(BB.DAT, paste(year, team, sep="+")) BB.DAT2 <- subset(BB.DAT, year > 2000) X <- ore.groupApply (BB.DAT2, BB.DAT2$index, function(x) {   data.frame(year=x$year[1], team=x$team[1], homeruns=sum(x$hr))   }, FUN.VALUE=data.frame(year=1, team="A", homeruns=1), parallel=FALSE) res <- ore.sort(X, by=c("year","team")) R> head(res)    year team homeruns 1 2001 ANA 4 2 2001 ARI 155 3 2001 ATL 63 4 2001 BAL 58 5 2001 BOS 77 6 2001 CHA 63 Our next example is derived from the ggplot function documentation. This illustrates the use of ddply within using the ggplot2 package. We first create a data.frame with demo data and use ddply to create some statistics for each group (gp). We then use ggplot to produce the graph. We can take this same code, push the data.frame df to the database and invoke this on the database server. The graph will be returned to the client window, as depicted below. # Example 6 with ggplot2 library(ggplot2) df <- data.frame(gp = factor(rep(letters[1:3], each = 10)),                  y = rnorm(30)) # Compute sample mean and standard deviation in each group library(plyr) ds <- ddply(df, .(gp), summarise, mean = mean(y), sd = sd(y)) # Set up a skeleton ggplot object and add layers: ggplot() +   geom_point(data = df, aes(x = gp, y = y)) +   geom_point(data = ds, aes(x = gp, y = mean),              colour = 'red', size = 3) +   geom_errorbar(data = ds, aes(x = gp, y = mean,                                ymin = mean - sd, ymax = mean + sd),              colour = 'red', width = 0.4) DF <- ore.push(df) ore.tableApply(DF, function(df) {   library(ggplot2)   library(plyr)   ds <- ddply(df, .(gp), summarise, mean = mean(y), sd = sd(y))   ggplot() +     geom_point(data = df, aes(x = gp, y = y)) +     geom_point(data = ds, aes(x = gp, y = mean),                colour = 'red', size = 3) +     geom_errorbar(data = ds, aes(x = gp, y = mean,                                  ymin = mean - sd, ymax = mean + sd),                   colour = 'red', width = 0.4) }) But let's take this one step further. Suppose we wanted to produce multiple graphs, partitioned on some index column. We replicate the data three times and add some noise to the y values, just to make the graphs a little different. We also create an index column to form our three partitions. Note that we've also specified that this should be executed in parallel, allowing Oracle Database to control and manage the server-side R engines. The result of ore.groupApply is an ore.list that contains the three graphs. Each graph can be viewed by printing the list element. df2 <- rbind(df,df,df) df2$y <- df2$y + rnorm(nrow(df2)) df2$index <- c(rep(1,300), rep(2,300), rep(3,300)) DF2 <- ore.push(df2) res <- ore.groupApply(DF2, DF2$index, function(df) {   df <- df[,1:2]   library(ggplot2)   library(plyr)   ds <- ddply(df, .(gp), summarise, mean = mean(y), sd = sd(y))   ggplot() +     geom_point(data = df, aes(x = gp, y = y)) +     geom_point(data = ds, aes(x = gp, y = mean),                colour = 'red', size = 3) +     geom_errorbar(data = ds, aes(x = gp, y = mean,                                  ymin = mean - sd, ymax = mean + sd),                   colour = 'red', width = 0.4)   }, parallel=TRUE) res[[1]] res[[2]] res[[3]] To recap, we've illustrated how various uses of ddply from the plyr package can be realized in ore.groupApply, which affords the user explicit control over the contents of the data.frame result in a straightforward manner. We've also highlighted how ddply can be used within an ore.groupApply call.

    Read the article

  • Trying to keep filled bars in a faceted plot

    - by John Horton
    Not sure what I'm doing wrong here. I have this plot: ggplot(data.PE5, aes(ybands,fill=factor(decide))) + geom_bar(position="dodge") which produces: Then I want to facet by a factor, creating two stacked plots w/ dodged, colored bars ggplot(data.PE5, aes(ybands,fill=factor(decide))) + geom_bar(position="dodge") + facet_grid(~group_label) However, I lose the factor-based coloring, which I want to keep:

    Read the article

  • What does size really mean in geom_point?

    - by Jonas Stein
    In both plots the points look different, but why? mya <- data.frame(a=1:100) ggplot() + geom_path(data=mya, aes(x=a, y=a, colour=2, size=seq(0.1,10,0.1))) + geom_point(data=mya, aes(x=a, y=a, colour=1, size=1)) + theme_bw() + theme(text=element_text(size=11)) ggplot() + geom_path(data=mya, aes(x=a, y=a, colour=2, size=1)) + geom_point(data=mya, aes(x=a, y=a, colour=1, size=1)) + theme_bw() + theme(text=element_text(size=11))

    Read the article

  • Annotating axis in ggplot2

    - by mpiktas
    I am looking for the way to annotate axis in ggplot2. The example of the problem can be found here: http://learnr.wordpress.com/2009/09/24/ggplot2-back-to-back-bar-charts. The y axis of the chart (example graph in the link) has an annotation: (million euro). Is there a way to create such types of annotations in ggplot2? Looking at the documentation there is no obvious way, since the ggplot does not explicitly let you put objects outside plotting area. But maybe there is some workaround? One of the possible workarounds I thought about is using scales: data=data.frame(x=1:10,y=1:10) qplot(x=x,y=y,data=data)+scale_y_continuous(breaks=10.1,label="Millions") But then how do I remove the tick? And it seems that since ggplot does not support multiple scales, I will need to grab the output of the scale_y_continuous, when it calculates the scales automaticaly and then add my custom break and label by hand. Maybe there is a better way?

    Read the article

  • Irrelevant legend information in ggplot2

    - by Dan Goldstein
    When running this code (go ahead, try it): library(ggplot2) (myDat <- data.frame(cbind(VarX=10:1, VarY=runif(10)), Descrip=sample(LETTERS[1:3], 10, replace=TRUE))) ggplot(myDat,aes(VarX,VarY,shape=Descrip,size=3)) + geom_point() ... the "size=3" statement does correctly set the point size. However it causes the legend to give birth to a little legend beneath it, entitled "3" and containing nothing but a big dot and the number 3. This does the same ggplot(myDat,aes(VarX,VarY,shape=Descrip)) + geom_point(aes(size=3)) Yes, it is funny. It would have driven me insane a couple hours ago if it weren't so funny. But now let's make it stop.

    Read the article

  • geom_tile heatmap with different high fill colours based on factor

    - by Michael
    I'm interested in building a heatmap with geom_tile in ggplot2 that uses a different gradient high color based on a factor. The plot below creates the plot where the individual tiles are colored blue or red based on the xy_type, but there is no gradient. ggplot() + geom_tile(data=mydata, aes(x=factor(myx), y=myy, fill=factor(xy_type))) + scale_fill_manual(values=c("blue", "red")) The plot below does not use the xy_type factor to choose the color, but I get a single group gradient based on the xy_avg_value. ggplot() + geom_tile(data=mydata, aes(x=factor(myx), y=myy, fill=xy_avg_value)) Is there a technique to blend these two plots? I can use a facet_grid(xy_type ~ .) to create separate plots of this data, with the gradient. As this is ultimately going to be a map (x~y coordinates), I'd like to find a way to display the different gradient together in a single geom_tile map.

    Read the article

  • How can I make a group bar plot in ggplot2?

    - by maximusyoda
    I have four of these kind of dataframes each with a different name (Apple,Ball,Cat) with different values of frequency but same 4 season names Seasons Frequency DJF 9886 JJA 5408 MAM 12876 SON 6932 And I am trying to make a group bar plot. The graph I'm looking for is like this, where c,d,e,f will be the names - Apple, Ball, Cat. Y-axis will be Frequency Each group will have 4 bars: DJF,JJA,MAM,SON Filled by seasons The number of Frequency written above the bar plot. How can I format the data to make it suitable for ggplot (cbind, melt etc) and use it in ggplot?

    Read the article

  • R: How to remove outliers from a smoother in ggplot2?

    - by John
    I have the following data set that I am trying to plot with ggplot2, it is a time series of three experiments A1, B1 and C1 and each experiment had three replicates. I am trying to add a stat which detects and removes outliers before returning a smoother (mean and variance?). I have written my own outlier function (not shown) but I expect there is already a function to do this, I just have not found it. I've looked at stat_sum_df("median_hilow", geom = "smooth") from some examples in the ggplot2 book, but I didn't understand the help doc from Hmisc to see if it removes outliers or not. Is there a function to remove outliers like this in ggplot, or where would I amend my code below to add my own function? library (ggplot2) data = data.frame (day = c(1,3,5,7,1,3,5,7,1,3,5,7,1,3,5,7,1,3,5,7,1,3,5,7,1,3,5,7,1,3,5,7,1,3,5,7), od = c( 0.1,1.0,0.5,0.7 ,0.13,0.33,0.54,0.76 ,0.1,0.35,0.54,0.73 ,1.3,1.5,1.75,1.7 ,1.3,1.3,1.0,1.6 ,1.7,1.6,1.75,1.7 ,2.1,2.3,2.5,2.7 ,2.5,2.6,2.6,2.8 ,2.3,2.5,2.8,3.8), series_id = c( "A1", "A1", "A1","A1", "A1", "A1", "A1","A1", "A1", "A1", "A1","A1", "B1", "B1","B1", "B1", "B1", "B1","B1", "B1", "B1", "B1","B1", "B1", "C1","C1", "C1", "C1", "C1","C1", "C1", "C1", "C1","C1", "C1", "C1"), replicate = c( "A1.1","A1.1","A1.1","A1.1", "A1.2","A1.2","A1.2","A1.2", "A1.3","A1.3","A1.3","A1.3", "B1.1","B1.1","B1.1","B1.1", "B1.2","B1.2","B1.2","B1.2", "B1.3","B1.3","B1.3","B1.3", "C1.1","C1.1","C1.1","C1.1", "C1.2","C1.2","C1.2","C1.2", "C1.3","C1.3","C1.3","C1.3")) > data day od series_id replicate 1 1 0.10 A1 A1.1 2 3 1.00 A1 A1.1 3 5 0.50 A1 A1.1 4 7 0.70 A1 A1.1 5 1 0.13 A1 A1.2 6 3 0.33 A1 A1.2 7 5 0.54 A1 A1.2 8 7 0.76 A1 A1.2 9 1 0.10 A1 A1.3 10 3 0.35 A1 A1.3 11 5 0.54 A1 A1.3 12 7 0.73 A1 A1.3 13 1 1.30 B1 B1.1 This is what I have so far and is working nicely, but outliers are not removed: r <- ggplot(data = data, aes(x = day, y = od)) r + geom_point(aes(group = replicate, color = series_id)) + # add points geom_line(aes(group = replicate, color = series_id)) + # add lines geom_smooth(aes(group = series_id)) # add smoother, average of each replicate

    Read the article

  • How can I suppress the vertical gridlines in a ggplot2 plot while retaining the x-axis labels?

    - by Tarek
    This is a follow-on from this question, in which I was trying to suppress the vertical gridlines. The solution, as provided by learnr, was to add scale_x_continuous(breaks = NA), but this had the side effect of also suppressing the x-axis labels, as well. I am totally happy to write the labels back in by hand, but it's not clear to me how to figure out where the labels should go. The other option is to suppress all gridlines (using opts( panel.grid.major = theme_blank()) or some such) and then drawing back in just the major horizontal gridlines. Again, the problem here is how to figure out what the breaks are in the plot to supply to geom_hline(). So, essentially, my options are: Suppress vertical gridlines and x-axis labels (using scale_x_continuous(breaks = NA) ) and add the x-axis labels back in. Suppress all gridlines (using opts( panel.grid.major = theme_blank()) ) and add the major horizontal gridlines back in using geom_hline(). Here are the two options: library(ggplot2) data <- data.frame(x = 1:10, y = c(3,5,2,5,6,2,7,6,5,4)) # suppressing vertical gridlines and x-axis labels # need to re-draw x-axis labels ggplot(data, aes(x, y)) + geom_bar(stat = 'identity') + scale_x_continuous(breaks = NA) + opts( panel.grid.major = theme_line(size = 0.5, colour = '#1391FF'), panel.grid.minor = theme_blank(), panel.background = theme_blank(), axis.ticks = theme_blank() ) # suppressing all gridlines # need to re-draw horizontal gridlines, probably with geom_hbar() ggplot(data, aes(x, y)) + geom_bar(stat = 'identity') + scale_x_continuous(breaks = NA) + opts( panel.grid.major = theme_blank(), panel.grid.minor = theme_blank(), panel.background = theme_blank(), axis.ticks = theme_blank() )

    Read the article

  • Group variables in a boxplot in R

    - by tao.hong
    I am trying to generate a boxplot whose data come from two scenarios. In the plot, I would like to group boxes by their names (So there will be two boxes per variable). I know ggplot would be a good choice. But I got errors which I could not figure out. Can anyone give me some suggestions? sensitivity_out1 structure(c(0.0522902104339716, 0.0521369824334004, 0.0520240345973737, 0.0519818337359876, 0.051935071418996, 0.0519089404325544, 0.000392698277338341, 0.000326135474295325, 0.000280863338343747, 0.000259631566041935, 0.000246594043996332, 0.000237923540393391, 0.00046732650331544, 0.000474448907808135, 0.000478287273678457, 0.000480194683464109, 0.000480631753078668, 0.000481760272726273, 0.000947965771207979, 0.000944821699830455, 0.000939631071343889, 0.000937186900570605, 0.000936007346568281, 0.000934756220144141, 0.00132442589501872, 0.00132658367774979, 0.00133334696220742, 0.00133622384928092, 0.0013381577476241, 0.00134005741746304, 0.0991622968751298, 0.100791399440082, 0.101946808417405, 0.102524244727408, 0.102920085260477, 0.103232984259916, 0.0305219507186844, 0.0304635269233494, 0.0304161055015213, 0.0303742106794513, 0.0303381888169022, 0.0302996157711171, 1.94268588634518e-05, 2.23991225564447e-05, 2.5756135487907e-05, 2.79997917298194e-05, 3.00753967077715e-05, 3.16270817369878e-05, 0.544701146678523, 0.542887331601984, 0.541632986366816, 0.541005610554556, 0.540617004208336, 0.540315690692195, 0.000453386694666078, 0.000448473414508756, 0.00044692043197248, 0.000444826296854332, 0.000445747996014684, 0.000444764303682453, 0.000127569551159321, 0.000128422491392669, 0.00012933662856487, 0.000129941842982939, 0.000129578971489026, 0.000131113075233758, 0.00684610571790029, 0.00686349387897349, 0.00687468164010565, 0.00687880720347743, 0.00688275579317197, 0.00687822247621936), .Dim = c(6L, 12L)) out2 structure(c(0.0189965816735366, 0.0189995096225103, 0.0190099362589894, 0.0190033523148514, 0.01900896721937, 0.0190099427513381, 0.00192043989797585, 0.00207303208721059, 0.00225931163225165, 0.0024049969048389, 0.00252310364086785, 0.00262940166568126, 0.00195164921633517, 0.00190079923515755, 0.00186139563778548, 0.00184188171395076, 0.00183248544676564, 0.00182492970673969, 1.83038731485927e-05, 1.98252671720347e-05, 2.14794764479231e-05, 2.30713122969332e-05, 2.4484220713564e-05, 2.55958833705284e-05, 0.0428066864455102, 0.0431686808647809, 0.0434411033615353, 0.0435883377765726, 0.0436690169266633, 0.0437340464360965, 0.145288252474567, 0.141488776430307, 0.138204532539654, 0.136281799717717, 0.134864952272761, 0.133738386148036, 0.0711728636959696, 0.072031388688795, 0.0727536853228245, 0.0731581966147734, 0.0734424337399303, 0.0736637270702609, 0.000605277151497094, 0.000617268349064968, 0.000632975679951382, 0.000643904422677427, 0.000653775268094148, 0.000662225067910141, 0.26735354610469, 0.267515415990146, 0.26753155165617, 0.267553498616325, 0.267532284594615, 0.267510330320289, 0.000334158771646756, 0.000319032383145857, 0.000306074699839994, 0.000299153278494114, 0.000293956197852583, 0.000290171804454218, 0.000645975219899115, 0.000637548672578787, 0.000632375486965757, 0.000629579821884212, 0.000624956458229123, 0.000622456283217054, 0.0645188290106884, 0.0651539609630352, 0.0656417364889907, 0.0658996698322889, 0.0660715073023965, 0.0662034341510152), .Dim = c(6L, 12L)) Melt data: group variable value 1 1 PLDKRT 0 2 1 PLDKRT 0 3 1 PLDKRT 0 4 1 PLDKRT 0 5 1 PLDKRT 0 6 1 PLDKRT 0 Code: #Data_source 1 sensitivity_1=rbind(sensitivity_out1,sensitivity_out2) sensitivity_1=data.frame(sensitivity_1) colnames(sensitivity_1)=main_l #variable names sensitivity_1$group=1 #Data_source 2 sensitivity_2=rbind(sensitivity_out1[3:4,],sensitivity_out2[3:4,]) sensitivity_2=data.frame(sensitivity_2) colnames(sensitivity_2)=main_l sensitivity_2$group=2 sensitivity_pool=rbind(sensitivity_1,sensitivity_2) sensitivity_pool_m=melt(sensitivity_pool,id.vars="group") ggplot(data = sensitivity_pool_m, aes(x = variable, y = value)) + geom_boxplot(aes( fill= group), width = 0.8) Error: "Error in unit(tic_pos.c, "mm") : 'x' and 'units' must have length > 0" Update Figure out the error. I should use geom_boxplot(aes( fill= factor(group)), width = 0.8) rather than fill= group

    Read the article

  • 2D Histogram in R: Converting from Count to Frequency within a Column

    - by Jac
    Would appreciate help with generating a 2D histogram of frequencies, where frequencies are calculated within a column. My main issue: converting from counts to column based frequency. Here's my starting code: # expected packages library(ggplot2) library(plyr) # generate example data corresponding to expected data input x_data = sample(101:200,10000, replace = TRUE) y_data = sample(1:100,10000, replace = TRUE) my_set = data.frame(x_data,y_data) # define x and y interval cut points x_seq = seq(100,200,10) y_seq = seq(0,100,10) # label samples as belonging within x and y intervals my_set$x_interval = cut(my_set$x_data,x_seq) my_set$y_interval = cut(my_set$y_data,y_seq) # determine count for each x,y block xy_df = ddply(my_set, c("x_interval","y_interval"),"nrow") # still need to convert for use with dplyr # convert from count to frequency based on formula: freq = count/sum(count in given x interval) ################ TRYING TO FIGURE OUT ################# # plot results fig_count <- ggplot(xy_df, aes(x = x_interval, y = y_interval)) + geom_tile(aes(fill = nrow)) # count fig_freq <- ggplot(xy_df, aes(x = x_interval, y = y_interval)) + geom_tile(aes(fill = freq)) # frequency I would appreciate any help in how to calculate the frequency within a column. Thanks! jac EDIT: I think the solution will require the following steps 1) Calculate and store overall counts for each x-interval factor 2) Divide the individual bin count by its corresponding x-interval factor count to obtain frequency. Not sure how to carry this out though. .

    Read the article

  • plot 4 different plots inside 1 pdf

    - by ifreak
    I have 2 data frames which i want to generate 3 plots from them and place them inside 1 pdf file as a single column. i want all the plots to have the same x-axis limits(basically the same x-axis) even thought they differ in the name and how they were obtained. the dataframes looks something like that: d1 X Y Z 0.04939317 -0.4622222 13651 0.03202451 -0.4261000 13401 0.09950793 -0.3233025 13151 0.11548556 -0.4637981 12486 0.09817597 -0.4751886 12236 0.15770701 -0.5819355 11986 and d2 V0 V1 V2 V3 sign 1 1 0.379 0.612 pos 2 1 0.378 0.620 pos 3 1 0.578 0.571 neg 4 1 0.978 0.561 pos 5 1 0.758 0.261 neg 6 1 0.378 0.126 neg P.S : both data frames are bigger than this, this is only a part of them V0, V1 and Z range from 1 to 20000 the plots that i created are : From d2 d2plot=ggplot(d1, aes(V0,V1, fill=sign)) + geom_tile()+ scale_fill_manual(values = c("neg" = "yellow", "pos"="red")) +geom_vline(xintercept =10000 ) +geom_text(mapping=aes(x=10000,y=0, label="Stop"), size=4, angle=90, vjust=-0.4, hjust=0) From d1 d1plot = ggplot(d2) + geom_errorbarh(aes(x=z,xmin=z-50,xmax=z+50, y=Y, height = 0.02),color="red")+ opts(legend.position = "none") +geom_vline(xintercept = 10000) +geom_text(mapping=aes(x=10000,y=-0.3, label="Stop"), size=4, angle=90, vjust=-0.4, hjust=0) i've tried grid.arrange(d1plot,d2plot,ncol=1) but the x-axis is different for each plot, i tried changing the aspect ratio, but this will change the y-axis ..i've also tried to use facet_wrap but the problem that my x-axis values have different values, i just want the limits and breaks to be the same and the plots all be aligned in 1 column based on 1 x-axis to comapre the value of the statistical methods in an easy way.

    Read the article

  • Show frequencies along with barplot in ggplot2

    - by aL3xa
    I'm trying to display frequencies within barplot ... well, I want them somewhere in the graph: under the bars, within bars, above bars or in the legend area. And I recall (I may be wrong) that it can be done in ggplot2. This is probably an easy one... at least it seems easy. Here's the code: p <- ggplot(mtcars) p + aes(factor(cyl)) + geom_bar() Is there any chance that I can get frequencies embedded in the graph?

    Read the article

  • R: ggplot2, why does my legend show faded colors?

    - by John
    Why is my legend faded in these examples below? Notice how the colours in the legend are not as vivid as the colours in the plot: library(ggplot2) r <- ggplot(data = diamonds, aes(x = carat, y = price, color = cut, group = cut)) r + geom_smooth() #(left) r + geom_smooth(size = 2) #(right)

    Read the article

  • ggplot2 pdf import in Adobe Illustrator missing font AdobePiStd

    - by Sander
    I created several simple ggplot2 plots and saved them to PDF files using the following commands: p <- ggplot(plotobject, aes(x=Pos, y=Pval),res=300) ggsave(plot=p,height=6,width=6,dpi=200, filename="~/example.pdf") If I now open this example.pdf in Adobe Illustrator I get the following error: The font AdobePiStd is missing. Affected text will be displayed using a substitute font. Is there a way in ggplot2 to specify a font (I presume this is for the dots/points) that Adobe will understand or otherwise is there a way to get this font working in Adobe?

    Read the article

  • Remove a layer from a ggplot2 chart

    - by Erik Shilts
    I'd like to remove a layer (in this case the results of geom_ribbon) from a ggplot2 created grid object. Is there a way I can remove it once it's already part of the object? library(ggplot2) dat <- data.frame(x=1:3, y=1:3, ymin=0:2, ymax=2:4) p <- ggplot(dat, aes(x=x, y=y)) + geom_ribbon(aes(ymin=ymin, ymax=ymax), alpha=0.3) + geom_line() # This has the geom_ribbon p # This overlays another ribbon on top p + geom_ribbon(aes(ymin=ymin, ymax=ymax, fill=NA))

    Read the article

  • R: Plotting a graph with different colors of points based on advanced criteria

    - by balconydoor
    What I would like to do is a plot (using ggplot), where the x axis represent years which have a different colour for the last three years in the plot than the rest. The last three years should also meet a certain criteria and based on this the last three years can either be red or green. The criteria is that the mean of the last three years should be less (making it green) or more (making it red) than the 66%-percentile of the remaining years. So far I have made two different functions calculating the last three year mean: LYM3 <- function (x) { LYM3 <- tail(x,3) mean(LYM3$Data,na.rm=T) } And the 66%-percentile for the remaining: perc66 <- function(x) { percentile <- head(x,-3) quantile(percentile$Data, .66, names=F,na.rm=T) } Here are two sets of data that can be used in the calculations (plots), the first which is an example from my real data where LYM3(df1) < perc66(df1) and the second is just made up data where LYM3 perc66. df1<- data.frame(Year=c(1979:2010), Data=c(347261.87, 145071.29, 110181.93, 183016.71, 210995.67, 205207.33, 103291.78, 247182.10, 152894.45, 170771.50, 206534.55, 287770.86, 223832.43, 297542.86, 267343.54, 475485.47, 224575.08, 147607.81, 171732.38, 126818.10, 165801.08, 136921.58, 136947.63, 83428.05, 144295.87, 68566.23, 59943.05, 49909.08, 52149.11, 117627.75, 132127.79, 130463.80)) df2 <- data.frame(Year=c(1979:2010), Data=c(sample(50,29,replace=T),75,75,75)) Here’s my code for my plot so far: plot <- ggplot(df1, aes(x=Year, y=Data)) + theme_bw() + geom_point(size=3, aes(colour=ifelse(df1$Year<2008, "black",ifelse(LYM3(df1) < perc66(df1),"green","red")))) + geom_line() + scale_x_continuous(breaks=c(1980,1985,1990,1995,2000,2005,2010), limits=c(1978,2011)) plot As you notice it doesn’t really do what I want it to do. The only thing it does seem to do is that it turns the years before 2008 into one level and those after into another one and base the point colour off these two levels. Since I don’t want this year to be stationary either, I made another tiny function: fun3 <- function(x) { df <- subset(x, Year==(max(Year)-2)) df$Year } So the previous code would have the same effect as: geom_point(size=3, aes(colour=ifelse(df1$Year<fun3(df1), "black","red"))) But it still does not care about my colours. Why does it make the years into levels? And how come an ifelse function doesn’t work within another one in this case? How would it be possible to the arguments to do what I like? I realise this might be a bit messy, asking for a lot at the same time, but I hope my description is pretty clear. It would be helpful if someone could at least point me in the right direction. I tried to put the code for the plot into a function as well so I wouldn’t have to change the data frame at all functions within the plot, but I can’t get it to work. Thank you!

    Read the article

  • R: ggplot2, how to get the parameters from a plotted linear model smoother?

    - by John
    I have a data.frame with 3 time series in it, shown below. When I plot them with a smoother time series, I want to be able to get the parameters of the linear model that I plot, but I can't see how to do that? > data day od series_id 1 1 0.10 A1 2 3 1.00 A1 3 5 0.50 A1 4 7 0.70 A1 5 1 1.70 B1 6 3 1.60 B1 7 5 1.75 B1 8 7 1.70 B1 9 1 2.10 C1 10 3 2.30 C1 11 5 2.50 C1 12 7 2.70 C1 data = data.frame (day = c(1,3,5,7,1,3,5,7,1,3,5,7), od = c(0.1,1.0,0.5,0.7 ,1.7,1.6,1.75,1.7 ,2.1,2.3,2.5,2.7), series_id = c("A1", "A1", "A1","A1", "B1", "B1","B1", "B1", "C1","C1", "C1", "C1")) r <- ggplot(data = data, aes(x = day, y = od)) r + stat_smooth(aes(group = series_id, color = series_id),method="lm")

    Read the article

  • Generate multiple graphics from within an R function

    - by William Doane
    I'd like to spawn several graphics windows from within a function in R using ggplot graphics... testf <- function(a, b) { devAskNewPage(TRUE) qplot(a, b); # grid.newpage(recording = TRUE) dev.new() qplot(a, a+a); # grid.newpage(recording = TRUE) dev.new() qplot(b, b+b); } library(ggplot2) x <- rnorm(50) y <- rnorm(50) testf(x, y) However, neither dev.new() nor grid.newpage() seems to flush the preceding plot. I know that, in R, functions normally only produce the last thing they evaluate, but I'd like to understand the process better and to learn of any possible workarounds. Thoughts?

    Read the article

  • possible bug in geom_ribbon

    - by tomw
    i was hoping to plot two time series and shade the space between the series according to which series is larger at that time. here are the two series-- first in a data frame with an indicator for whichever series is larger at that time d1 <- read.csv("https://dl.dropbox.com/s/0txm3f70msd3nm6/ribbon%20data.csv?dl=1") And this is the melted series. d2 <- read.csv("https://dl.dropbox.com/s/6ohwmtkhpsutpig/melted%20ribbon%20data.csv?dl=1") which I plot... ggplot() + geom_line(data = d2, aes(x = time, y = value, group = variable, color = variable)) + geom_hline(yintercept = 0, linetype = 2) + geom_ribbon(data = d1[d1$big == "B",], aes(x = time, ymin = csa, ymax = csb), alpha = .25, fill = "#9999CC") + geom_ribbon(data = d1[d1$big == "A",], aes(x = time, ymin = csb, ymax = csa), alpha = .25, fill = "#CC6666") + scale_color_manual(values = c("#CC6666" , "#9999CC")) which results in... why is there a superfluous blue band in the middle of the plot?

    Read the article

  • How to override ggplot2's axis formatting?

    - by Richie Cotton
    When you choose a log scale, ggplot2 formats the breaks like 10^x. I'd like it to not do that. For example, the code below should display a graph with ticks at 1, 2, 5 etc, not 10^0, 10^0.3, 10^0.69 etc. library(ggplot2) dfr <- data.frame(x = 1:100, y = rlnorm(100)) breaks <- as.vector(c(1, 2, 5) %o% 10^(-1:1)) p1 <- ggplot(dfr, aes(x, y)) + geom_point() + scale_y_log10(breaks = breaks) print(p1) I guess that adding a formatter argument to scale_y_log10 would do the trick, but I'm not sure what to put in the argument, or where the options might be documented.

    Read the article

  • from ggplot2 to OOo workflow?

    - by Andreas
    This is not really a programming question, but I try here none the less. I once used latex for my reports. But the people I work with needs to make small edits and do not have latex skillz. Openoffice is then the way to go. But saving ggplot images with dpi 100 makes for really ugly graphs. dpi = 600 is a no go (e.g. huge legend). So what to do? I currently save (still via ggsave) to eps - which openoffice can import. But performance is not good at all. Googling I found a bug for the poor eps performance in OOo, and also talk about a non-implemented svg feature. But none helps me right now. If you work with ggplot2 and OOo - What do you do? I have been unsuccesfull with pdf conversion for some reason.

    Read the article

  • What is the simplest method to fill the area under a geom_freqpoly line?

    - by mattrepl
    The x-axis is time broken up into time intervals. There is an interval column in the data frame that specifies the time for each row. The column is a factor, where each interval is a different factor level. Plotting a histogram or line using geom_histogram and geom_freqpoly works great, but I'd like to have a line, like that provided by geom_freqpoly, with the area filled. Currently I'm using geom_freqpoly like this: ggplot(quake.data, aes(interval, fill=tweet.type)) + geom_freqpoly(aes(group = tweet.type, colour = tweet.type)) + opts(axis.text.x=theme_text(angle=-60, hjust=0, size = 6)) I would prefer to have a filled area, such as provided by geom_density, but without smoothing the line: UPDATE: The geom_area has been suggested, is there any way to use a ggplot2-generated statistic, such as ..count.., for the geom_area's y-values? Or, does the count aggregation need to occur prior to using ggplot2?

    Read the article

  • Changing ylim (axis limits) drops data falling outside range. How can this be prevented?

    - by Alex Holcombe
    df <- data.frame(age=c(10,10,20,20,25,25,25),veg=c(0,1,0,1,1,0,1)) g=ggplot(data=df,aes(x=age,y=veg)) g=g+stat_summary(fun.y=mean,geom="point") Points reflect mean of veg at each age, which is what I expected and want to preserve after changing axis limits with the command below. g=g+ylim(0.2,1) Changing axis limits with the above command unfortunately causes veg==0 subset to be dropped from the data, yielding "Warning message: Removed 4 rows containing missing values (stat_summary)" This is bad because now the data plot (stat_summary mean) omits the veg==0 points. How can this be prevented? I simply want to avoid showing the empty part of the plot- the ordinate from 0 to .2, but not drop the associated data from the stat_summary calculation.

    Read the article

< Previous Page | 1 2 3  | Next Page >