Search Results

Search found 14 results on 1 pages for 'quantile'.

Page 1/1 | 1 

  • how I can convert the string output into data.frame

    - by user2968058
    i have a data set called SIZEDIST$AVG.µm., and i have fitted this data with a weibull curve. now i have generated the quantiles by using the quantile function and now i want to access the output of this function generated at the interval of p=0.01. fwbl<-fitdist(SIZEDIST$AVG.µm., "weibull",start=list(shape=0.8,scale=1)) fwbl quantwbl<-quantile(fwbl,probs=seq(.1,.99,.01)) quantwbl str(quantwbl) using str(quantwbl) i can visualize the output but i cant convert them into data.frame

    Read the article

  • Calculating Percentiles (Ruby).

    - by zxcvbnm
    My code is based on the methods described here and here. def fraction?(number) number - number.truncate end def percentile(param_array, percentage) another_array = param_array.to_a.sort r = percentage.to_f * (param_array.size.to_f - 1) + 1 if r <= 1 then return another_array[0] elsif r >= another_array.size then return another_array[another_array.size - 1] end ir = r.truncate another_array[ir] + fraction?((another_array[ir].to_f - another_array[ir - 1].to_f).abs) end Example usage: test_array = [95.1772, 95.1567, 95.1937, 95.1959, 95.1442, 95.061, 95.1591, 95.1195, 95.1065, 95.0925, 95.199, 95.1682] test_values = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0] test_values.each do |value| puts value.to_s + ": " + percentile(test_array, value).to_s end Output: 0.0: 95.061 0.1: 95.1205 0.2: 95.1325 0.3: 95.1689 0.4: 95.1692 0.5: 95.1615 0.6: 95.1773 0.7: 95.1862 0.8: 95.2102 0.9: 95.1981 1.0: 95.199 The problem here is that the 80th percentile is higher than the 90th and the 100th. However, as far as I can tell my implementation is as described, and it returns the right answer for the example given (0.9). Is there an error in my code I'm not seeing? Or is there a better way to do this?

    Read the article

  • R: outlier cleaning for each column in a dataframe by using quantiles 0.05 and 0.95

    - by Rainer
    hi, I am a R-novice. I want to do some outlier cleaning and over-all-scaling from 0 to 1 before putting the sample into a random forest. g<-c(1000,60,50,60,50,40,50,60,70,60,40,70,50,60,50,70,10) If i do a simple scaling from 0 - 1 the result would be: > round((g - min(g))/abs(max(g) - min(g)),1) [1] 1.0 0.1 0.0 0.1 0.0 0.0 0.0 0.1 0.1 0.1 0.0 0.1 0.0 0.1 0.0 0.1 0.0 So my idea is to replace the values of each column that are greater than the 0.95-quantile with the next value smaller than the 0.95-quantile - and the same for the 0.05-quantile. So the pre-scaled result would be: g<-c(**70**,60,50,60,50,40,50,60,70,60,40,70,50,60,50,70,**40**) and scaled: > round((g - min(g))/abs(max(g) - min(g)),1) [1] 1.0 0.7 0.3 0.7 0.3 0.0 0.3 0.7 1.0 0.7 0.0 1.0 0.3 0.7 0.3 1.0 0.0 I need this formula for a whole dataframe, so the functional implementation within R should be something like: > apply(c, 2, function(x) x[x`<quantile(x, 0.95)]`<-max(x[x, ... max without the quantile(x, 0.95)) Can anyone help? Spoken beside: if there exists a function that does this job directly, please let me know. I already checked out cut and cut2. cut fails because of not-unique breaks; cut2 would work, but only gives back string values or the mean value, and I need a numeric vector from 0 - 1. for trial: a<-c(100,6,5,6,5,4,5,6,7,6,4,7,5,6,5,7,1) b<-c(1000,60,50,60,50,40,50,60,70,60,40,70,50,60,50,70,10) c<-cbind(a,b) c<-as.data.frame(c) Regards and thanks for help, Rainer

    Read the article

  • Russian Hydrodynamic Modeling, Prediction, and Visualization in Java

    - by Geertjan
    JSC "SamaraNIPIoil", located in Samara, Russia, provides the following applications for internal use. SimTools. Used to create & manage reservoir history schedule files for hydrodynamic models. The main features are that it lets you create/manage schedule files for models and create/manage well trajectory files to use with schedule files. DpSolver. Used to estimate permeability cubes using pore cube and results of well testing. Additionally, the user visualizes maps of vapor deposition polymerization or permeability, which can be visualized layer by layer. The base opportunities of the application are that it enables calculation of reservoir vertical heterogeneity and vertical sweep efficiency; automatic history matching of sweep efficiency; and calculations using Quantile-Quantile transformation and vizualization of permeability cube and other reservoir data. Clearly, the two applications above are NetBeans Platform applications.

    Read the article

  • in R, question about generate table

    - by alex
    a = matrix(1:25,5,5) B = capture.output(for (X in 1:5){ A = c(min(a[,X]),quantile(a[,X],0.25),median(a[,X]),quantile(a[,X],0.75),max(a[,X]),mean(a[,X]),sd(a[,X])/m^(1/2),var(a[,X])) cat(A,"\n") }) matrix(B,8,5) what i was trying to do is to generate a table which each column has those element in A and in that order. i try to use the matrix, but seems like it dont reli work here...can anyone help 1 2 3 4 5 min 1st quartile median SEM VAR THIS IS WHAT I WANT THE TABLE LOOKS LIKE ..

    Read the article

  • Create a table of Quantiles in R for multiple Subsets of Data

    - by user1489719
    I'm trying to create append a table of quantiles in R for multiple subsets of data. Right now, I have a vector of ids (p_ids) in table DATA, which are not consecutive. For each value in p_ids, I am looking to list the quantile. So far, I've tried variations of: i <- 1 n <- 1 for (i in p_ids) { while(n <= nrow(data)) { quantiles[n] <- quantile(subset(alldata$variableA, alldata$variableB == i),probs = c(0,1,2,3)/3) n <- n + 1 } } I know my issue lies somewhere in the index, but I can't seem to get where the index should go. Suggestions?

    Read the article

  • incremental way of counting quantiles for large set of data

    - by Gacek
    I need to count the quantiles for a large set of data. Let's assume we can get the data only through some portions (i.e. one row of a large matrix). To count the Q3 quantile one need to get all the portions of the data and store it somewhere, then sort it and count the quantile: List<double> allData = new List<double>(); foreach(var row in matrix) // this is only example. In fact the portions of data are not rows of some matrix { allData.AddRange(row); } allData.Sort(); double p = 0.75*allData.Count; int idQ3 = (int)Math.Ceiling(p) - 1; double Q3 = allData[idQ3]; Now, I would like to find a way of counting this without storing the data in some separate variable. The best solution would be to count some parameters od mid-results for first row and then adjust it step by step for next rows. Note: These datasets are really big (ca 5000 elements in each row) The Q3 can be estimated, it doesn't have to be an exact value. I call the portions of data "rows", but they can have different leghts! Usually it varies not so much (+/- few hundred samples) but it varies! This question is similar to this one: http://stackoverflow.com/questions/1058813/on-line-iterator-algorithms-for-estimating-statistical-median-mode-skewness But I need to count quantiles. ALso there are few articles in this topic, i.e.: http://web.cs.wpi.edu/~hofri/medsel.pdf http://portal.acm.org/citation.cfm?id=347195&dl But before I would try to implement these, I wanted to ask you if there are maybe any other, qucker ways of counting the 0.25/0.75 quantiles?

    Read the article

  • Interactive Charts for web application

    - by user227290
    We are working on a web based application (implemented in JAVA) on commodity prices and one part of it is interactive charting. I provide a simplified example here. We have a table in Mysql database where we have information on commodity prices in US states and counties. One aspect of the application is to create interactive plots based on user choice. For example, if the user needs to see the price density in Oregon and Linn county then she chooses it from the menu in a webpage and it is rendered on fly with accompanying quantile information in a table. As the user changes state and county these plots and table change on fly.For our computational need we are using R (and use rjava to integrate it to our web application) and I know that if interactivity is not an issue this is a piece of cake in ggplot2, but I am not aware of any interactive version of R graphics framework (like lattice, ggplot2). We are exploring google visualization API but I am not sure we can have the statistical power we need in some of the plots.Please help.

    Read the article

  • Calculating confidence intervals for a non-normal distribution

    - by Josiah
    Hi all, First, I should specify that my knowledge of statistics is fairly limited, so please forgive me if my question seems trivial or perhaps doesn't even make sense. I have data that doesn't appear to be normally distributed. Typically, when I plot confidence intervals, I would use the mean +- 2 standard deviations, but I don't think that is acceptible for a non-uniform distribution. My sample size is currently set to 1000 samples, which would seem like enough to determine if it was a normal distribution or not. I use Matlab for all my processing, so are there any functions in Matlab that would make it easy to calculate the confidence intervals (say 95%)? I know there are the 'quantile' and 'prctile' functions, but I'm not sure if that's what I need to use. The function 'mle' also returns confidence intervals for normally distributed data, although you can also supply your own pdf. Could I use ksdensity to create a pdf for my data, then feed that pdf into the mle function to give me confidence intervals? Also, how would I go about determining if my data is normally distributed. I mean I can currently tell just by looking at the histogram or pdf from ksdensity, but is there a way to quantitatively measure it? Thanks!

    Read the article

  • R: Plotting a graph with different colors of points based on advanced criteria

    - by balconydoor
    What I would like to do is a plot (using ggplot), where the x axis represent years which have a different colour for the last three years in the plot than the rest. The last three years should also meet a certain criteria and based on this the last three years can either be red or green. The criteria is that the mean of the last three years should be less (making it green) or more (making it red) than the 66%-percentile of the remaining years. So far I have made two different functions calculating the last three year mean: LYM3 <- function (x) { LYM3 <- tail(x,3) mean(LYM3$Data,na.rm=T) } And the 66%-percentile for the remaining: perc66 <- function(x) { percentile <- head(x,-3) quantile(percentile$Data, .66, names=F,na.rm=T) } Here are two sets of data that can be used in the calculations (plots), the first which is an example from my real data where LYM3(df1) < perc66(df1) and the second is just made up data where LYM3 perc66. df1<- data.frame(Year=c(1979:2010), Data=c(347261.87, 145071.29, 110181.93, 183016.71, 210995.67, 205207.33, 103291.78, 247182.10, 152894.45, 170771.50, 206534.55, 287770.86, 223832.43, 297542.86, 267343.54, 475485.47, 224575.08, 147607.81, 171732.38, 126818.10, 165801.08, 136921.58, 136947.63, 83428.05, 144295.87, 68566.23, 59943.05, 49909.08, 52149.11, 117627.75, 132127.79, 130463.80)) df2 <- data.frame(Year=c(1979:2010), Data=c(sample(50,29,replace=T),75,75,75)) Here’s my code for my plot so far: plot <- ggplot(df1, aes(x=Year, y=Data)) + theme_bw() + geom_point(size=3, aes(colour=ifelse(df1$Year<2008, "black",ifelse(LYM3(df1) < perc66(df1),"green","red")))) + geom_line() + scale_x_continuous(breaks=c(1980,1985,1990,1995,2000,2005,2010), limits=c(1978,2011)) plot As you notice it doesn’t really do what I want it to do. The only thing it does seem to do is that it turns the years before 2008 into one level and those after into another one and base the point colour off these two levels. Since I don’t want this year to be stationary either, I made another tiny function: fun3 <- function(x) { df <- subset(x, Year==(max(Year)-2)) df$Year } So the previous code would have the same effect as: geom_point(size=3, aes(colour=ifelse(df1$Year<fun3(df1), "black","red"))) But it still does not care about my colours. Why does it make the years into levels? And how come an ifelse function doesn’t work within another one in this case? How would it be possible to the arguments to do what I like? I realise this might be a bit messy, asking for a lot at the same time, but I hope my description is pretty clear. It would be helpful if someone could at least point me in the right direction. I tried to put the code for the plot into a function as well so I wouldn’t have to change the data frame at all functions within the plot, but I can’t get it to work. Thank you!

    Read the article

  • subset in geom_point SOMETIMES returns full dataset, instead of none.

    - by Andreas
    I ask the following in the hope that someone might come up with a generic description about the problem.Basically I have no idea whats wrong with my code. When I run the code below, plot nr. 8 turns out wrong. Specifically the subset in geom_point does not work the way it should. (update: With plot nr. 8 the whole dataset is plottet, instead of only the subset). If somebody can tell me what the problem is, I'll update this post. SOdata <- structure(list(id = 10:55, one = c(7L, 8L, 7L, NA, 7L, 8L, 5L, 7L, 7L, 8L, NA, 10L, 8L, NA, NA, NA, NA, 6L, 5L, 6L, 8L, 4L, 7L, 6L, 9L, 7L, 5L, 6L, 7L, 6L, 5L, 8L, 8L, 7L, 7L, 6L, 6L, 8L, 6L, 8L, 8L, 7L, 7L, 5L, 5L, 8L), two = c(7L, NA, 8L, NA, 10L, 10L, 8L, 9L, 4L, 10L, NA, 10L, 9L, NA, NA, NA, NA, 7L, 8L, 9L, 10L, 9L, 8L, 8L, 8L, 8L, 8L, 9L, 10L, 8L, 8L, 8L, 10L, 9L, 10L, 8L, 9L, 10L, 8L, 8L, 7L, 10L, 8L, 9L, 7L, 9L), three = c(7L, 10L, 7L, NA, 10L, 10L, NA, 10L, NA, NA, NA, NA, 10L, NA, NA, 4L, NA, 7L, 7L, 4L, 10L, 10L, 7L, 4L, 7L, NA, 10L, 4L, 7L, 7L, 7L, 10L, 10L, 7L, 10L, 4L, 10L, 10L, 10L, 4L, 10L, 10L, 10L, 10L, 7L, 10L), four = c(7L, 10L, 4L, NA, 10L, 7L, NA, 7L, NA, NA, NA, NA, 10L, NA, NA, 4L, NA, 10L, 10L, 7L, 10L, 10L, 7L, 7L, 7L, NA, 10L, 7L, 4L, 10L, 4L, 7L, 10L, 2L, 10L, 4L, 12L, 4L, 7L, 10L, 10L, 12L, 12L, 4L, 7L, 10L), five = c(7L, NA, 6L, NA, 8L, 8L, 7L, NA, 9L, NA, NA, NA, 9L, NA, NA, NA, NA, 7L, 8L, NA, NA, 7L, 7L, 4L, NA, NA, NA, NA, 5L, 6L, 5L, 7L, 7L, 6L, 9L, NA, 10L, 7L, 8L, 5L, 7L, 10L, 7L, 4L, 5L, 10L), six = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("2010-05-25", "2010-05-27", "2010-06-07"), class = "factor"), seven = c(0.777777777777778, 0.833333333333333, 0.333333333333333, 0.888888888888889, 0.5, 0.888888888888889, 0.777777777777778, 0.722222222222222, 0.277777777777778, 0.611111111111111, 0.722222222222222, 1, 0.888888888888889, 0.722222222222222, 0.555555555555556, NA, 0, 0.666666666666667, 0.666666666666667, 0.833333333333333, 0.833333333333333, 0.833333333333333, 0.833333333333333, 0.722222222222222, 0.833333333333333, 0.888888888888889, 0.666666666666667, 1, 0.777777777777778, 0.722222222222222, 0.5, 0.833333333333333, 0.722222222222222, 0.388888888888889, 0.722222222222222, 1, 0.611111111111111, 0.777777777777778, 0.722222222222222, 0.944444444444444, 0.555555555555556, 0.666666666666667, 0.722222222222222, 0.444444444444444, 0.333333333333333, 0.777777777777778), eight = c(0.666666666666667, 0.333333333333333, 0.833333333333333, 0.666666666666667, 1, 1, 0.833333333333333, 0.166666666666667, 0.833333333333333, 0.833333333333333, 1, 1, 0.666666666666667, 0.666666666666667, 0.333333333333333, 0.5, 0, 0.666666666666667, 0.5, 1, 0.666666666666667, 0.5, 0.666666666666667, 0.666666666666667, 0.666666666666667, 0.333333333333333, 0.333333333333333, 1, 0.666666666666667, 0.833333333333333, 0.666666666666667, 0.666666666666667, 0.5, 0, 0.833333333333333, 1, 0.666666666666667, 0.5, 0.666666666666667, 0.666666666666667, 0.5, 1, 0.833333333333333, 0.666666666666667, 0.833333333333333, 0.666666666666667), nine = c(0.307692307692308, NA, 0.461538461538462, 0.538461538461538, 1, 0.769230769230769, 0.538461538461538, 0.692307692307692, 0, 0.153846153846154, 0.769230769230769, NA, 0.461538461538462, NA, NA, NA, NA, 0, 0.615384615384615, 0.615384615384615, 0.769230769230769, 0.384615384615385, 0.846153846153846, 0.923076923076923, 0.615384615384615, 0.692307692307692, 0.0769230769230769, 0.846153846153846, 0.384615384615385, 0.384615384615385, 0.461538461538462, 0.384615384615385, 0.461538461538462, NA, 0.923076923076923, 0.692307692307692, 0.615384615384615, 0.615384615384615, 0.769230769230769, 0.0769230769230769, 0.230769230769231, 0.692307692307692, 0.769230769230769, 0.230769230769231, 0.769230769230769, 0.615384615384615), ten = c(0.875, 0.625, 0.375, 0.75, 0.75, 0.75, 0.625, 0.875, 1, 0.125, 1, NA, 0.625, 0.75, 0.75, 0.375, NA, 0.625, 0.5, 0.75, 0.875, 0.625, 0.875, 0.75, 0.625, 0.875, 0.5, 0.75, 0, 0.5, 0.875, 1, 0.75, 0.125, 0.5, 0.5, 0.5, 0.625, 0.375, 0.625, 0.625, 0.75, 0.875, 0.375, 0, 0.875), elleven = c(1, 0.8, 0.7, 0.9, 0, 1, 0.9, 0.5, 0, 0.8, 0.8, NA, 0.8, NA, NA, 0.8, NA, 0.4, 0.8, 0.5, 1, 0.4, 0.5, 0.9, 0.8, 1, 0.8, 0.5, 0.3, 0.9, 0.2, 1, 0.8, 0.1, 1, 0.8, 0.5, 0.2, 0.7, 0.8, 1, 0.9, 0.6, 0.8, 0.2, 1), twelve = c(0.666666666666667, NA, 0.133333333333333, 1, 1, 0.8, 0.4, 0.733333333333333, NA, 0.933333333333333, NA, NA, 0.6, 0.533333333333333, NA, 0.533333333333333, NA, 0, 0.6, 0.533333333333333, 0.733333333333333, 0.6, 0.733333333333333, 0.666666666666667, 0.533333333333333, 0.733333333333333, 0.466666666666667, 0.733333333333333, 1, 0.733333333333333, 0.666666666666667, 0.533333333333333, NA, 0.533333333333333, 0.6, 0.866666666666667, 0.466666666666667, 0.533333333333333, 0.333333333333333, 0.6, 0.6, 0.866666666666667, 0.666666666666667, 0.6, 0.6, 0.533333333333333)), .Names = c("id", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten", "elleven", "twelve"), class = "data.frame", row.names = c(NA, -46L)) iqr <- function(x, ...) { qs <- quantile(as.numeric(x), c(0.25, 0.5, 0.75), na.rm = T) names(qs) <- c("ymin", "y", "ymax") qs } magic <- function(y, ...) { high <- median(SOdata[[y]], na.rm=T)+1.5*sd(SOdata[[y]],na.rm=T) low <- median(SOdata[[y]], na.rm=T)-1.5*sd(SOdata[[y]],na.rm=T) ggplot(SOdata, aes_string(x="six", y=y))+ stat_summary(fun.data="iqr", geom="crossbar", fill="grey", alpha=0.3)+ geom_point(data = SOdata[SOdata[[y]] > high,], position=position_jitter(w=0.1, h=0),col="green", alpha=0.5)+ geom_point(data = SOdata[SOdata[[y]] < low,], position=position_jitter(w=0.1, h=0),col="red", alpha=0.5)+ stat_summary(fun.y=median, geom="point",shape=18 ,size=4, col="orange") } for (i in names(SOdata)[-c(1,7)]) { p<- magic(i) ggsave(paste("magig_plot_",i,".png",sep=""), plot=p, height=3.5, width=5.5) }

    Read the article

  • how to use ggplot conditional on data

    - by Andreas
    I asked this question and it seams ggplot2 currently has a bug with empty data.frames. Therefore I am trying to check if the dataframe is empty, before I make the plot. But what ever I come up with, it gets really ugly, and doesn't work. So I am asking for your help. example data: SOdata <- structure(list(id = 10:55, one = c(7L, 8L, 7L, NA, 7L, 8L, 5L, 7L, 7L, 8L, NA, 10L, 8L, NA, NA, NA, NA, 6L, 5L, 6L, 8L, 4L, 7L, 6L, 9L, 7L, 5L, 6L, 7L, 6L, 5L, 8L, 8L, 7L, 7L, 6L, 6L, 8L, 6L, 8L, 8L, 7L, 7L, 5L, 5L, 8L), two = c(7L, NA, 8L, NA, 10L, 10L, 8L, 9L, 4L, 10L, NA, 10L, 9L, NA, NA, NA, NA, 7L, 8L, 9L, 10L, 9L, 8L, 8L, 8L, 8L, 8L, 9L, 10L, 8L, 8L, 8L, 10L, 9L, 10L, 8L, 9L, 10L, 8L, 8L, 7L, 10L, 8L, 9L, 7L, 9L), three = c(7L, 10L, 7L, NA, 10L, 10L, NA, 10L, NA, NA, NA, NA, 10L, NA, NA, 4L, NA, 7L, 7L, 4L, 10L, 10L, 7L, 4L, 7L, NA, 10L, 4L, 7L, 7L, 7L, 10L, 10L, 7L, 10L, 4L, 10L, 10L, 10L, 4L, 10L, 10L, 10L, 10L, 7L, 10L), four = c(7L, 10L, 4L, NA, 10L, 7L, NA, 7L, NA, NA, NA, NA, 10L, NA, NA, 4L, NA, 10L, 10L, 7L, 10L, 10L, 7L, 7L, 7L, NA, 10L, 7L, 4L, 10L, 4L, 7L, 10L, 2L, 10L, 4L, 12L, 4L, 7L, 10L, 10L, 12L, 12L, 4L, 7L, 10L), five = c(7L, NA, 6L, NA, 8L, 8L, 7L, NA, 9L, NA, NA, NA, 9L, NA, NA, NA, NA, 7L, 8L, NA, NA, 7L, 7L, 4L, NA, NA, NA, NA, 5L, 6L, 5L, 7L, 7L, 6L, 9L, NA, 10L, 7L, 8L, 5L, 7L, 10L, 7L, 4L, 5L, 10L), six = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("2010-05-25", "2010-05-27", "2010-06-07"), class = "factor"), seven = c(0.777777777777778, 0.833333333333333, 0.333333333333333, 0.888888888888889, 0.5, 0.888888888888889, 0.777777777777778, 0.722222222222222, 0.277777777777778, 0.611111111111111, 0.722222222222222, 1, 0.888888888888889, 0.722222222222222, 0.555555555555556, NA, 0, 0.666666666666667, 0.666666666666667, 0.833333333333333, 0.833333333333333, 0.833333333333333, 0.833333333333333, 0.722222222222222, 0.833333333333333, 0.888888888888889, 0.666666666666667, 1, 0.777777777777778, 0.722222222222222, 0.5, 0.833333333333333, 0.722222222222222, 0.388888888888889, 0.722222222222222, 1, 0.611111111111111, 0.777777777777778, 0.722222222222222, 0.944444444444444, 0.555555555555556, 0.666666666666667, 0.722222222222222, 0.444444444444444, 0.333333333333333, 0.777777777777778), eight = c(0.666666666666667, 0.333333333333333, 0.833333333333333, 0.666666666666667, 1, 1, 0.833333333333333, 0.166666666666667, 0.833333333333333, 0.833333333333333, 1, 1, 0.666666666666667, 0.666666666666667, 0.333333333333333, 0.5, 0, 0.666666666666667, 0.5, 1, 0.666666666666667, 0.5, 0.666666666666667, 0.666666666666667, 0.666666666666667, 0.333333333333333, 0.333333333333333, 1, 0.666666666666667, 0.833333333333333, 0.666666666666667, 0.666666666666667, 0.5, 0, 0.833333333333333, 1, 0.666666666666667, 0.5, 0.666666666666667, 0.666666666666667, 0.5, 1, 0.833333333333333, 0.666666666666667, 0.833333333333333, 0.666666666666667), nine = c(0.307692307692308, NA, 0.461538461538462, 0.538461538461538, 1, 0.769230769230769, 0.538461538461538, 0.692307692307692, 0, 0.153846153846154, 0.769230769230769, NA, 0.461538461538462, NA, NA, NA, NA, 0, 0.615384615384615, 0.615384615384615, 0.769230769230769, 0.384615384615385, 0.846153846153846, 0.923076923076923, 0.615384615384615, 0.692307692307692, 0.0769230769230769, 0.846153846153846, 0.384615384615385, 0.384615384615385, 0.461538461538462, 0.384615384615385, 0.461538461538462, NA, 0.923076923076923, 0.692307692307692, 0.615384615384615, 0.615384615384615, 0.769230769230769, 0.0769230769230769, 0.230769230769231, 0.692307692307692, 0.769230769230769, 0.230769230769231, 0.769230769230769, 0.615384615384615), ten = c(0.875, 0.625, 0.375, 0.75, 0.75, 0.75, 0.625, 0.875, 1, 0.125, 1, NA, 0.625, 0.75, 0.75, 0.375, NA, 0.625, 0.5, 0.75, 0.875, 0.625, 0.875, 0.75, 0.625, 0.875, 0.5, 0.75, 0, 0.5, 0.875, 1, 0.75, 0.125, 0.5, 0.5, 0.5, 0.625, 0.375, 0.625, 0.625, 0.75, 0.875, 0.375, 0, 0.875), elleven = c(1, 0.8, 0.7, 0.9, 0, 1, 0.9, 0.5, 0, 0.8, 0.8, NA, 0.8, NA, NA, 0.8, NA, 0.4, 0.8, 0.5, 1, 0.4, 0.5, 0.9, 0.8, 1, 0.8, 0.5, 0.3, 0.9, 0.2, 1, 0.8, 0.1, 1, 0.8, 0.5, 0.2, 0.7, 0.8, 1, 0.9, 0.6, 0.8, 0.2, 1), twelve = c(0.666666666666667, NA, 0.133333333333333, 1, 1, 0.8, 0.4, 0.733333333333333, NA, 0.933333333333333, NA, NA, 0.6, 0.533333333333333, NA, 0.533333333333333, NA, 0, 0.6, 0.533333333333333, 0.733333333333333, 0.6, 0.733333333333333, 0.666666666666667, 0.533333333333333, 0.733333333333333, 0.466666666666667, 0.733333333333333, 1, 0.733333333333333, 0.666666666666667, 0.533333333333333, NA, 0.533333333333333, 0.6, 0.866666666666667, 0.466666666666667, 0.533333333333333, 0.333333333333333, 0.6, 0.6, 0.866666666666667, 0.666666666666667, 0.6, 0.6, 0.533333333333333)), .Names = c("id", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten", "elleven", "twelve"), class = "data.frame", row.names = c(NA, -46L)) And the plot iqr <- function(x, ...) { qs <- quantile(as.numeric(x), c(0.25, 0.5, 0.75), na.rm = T) names(qs) <- c("ymin", "y", "ymax") qs } magic <- function(y, ...) { high <- median(SOdata[[y]], na.rm=T)+1.5*sd(SOdata[[y]],na.rm=T) low <- median(SOdata[[y]], na.rm=T)-1.5*sd(SOdata[[y]],na.rm=T) ggplot(SOdata, aes_string(x="six", y=y))+ stat_summary(fun.data="iqr", geom="crossbar", fill="grey", alpha=0.3)+ geom_point(data = SOdata[SOdata[[y]] > high,], position=position_jitter(w=0.1, h=0),col="green", alpha=0.5)+ geom_point(data = SOdata[SOdata[[y]] < low,], position=position_jitter(w=0.1, h=0),col="red", alpha=0.5)+ stat_summary(fun.y=median, geom="point",shape=18 ,size=4, col="orange") } for (i in names(SOdata)[-c(1,7)]) { p<- magic(i) ggsave(paste("magig_plot_",i,".png",sep=""), plot=p, height=3.5, width=5.5) } The problem is that sometimes in the call to geom_point the subset returns an empty dataframe, which sometimes (!) causes ggplot2 to plot all the data instead of none of the data. geom_point(data = SOdata[SOdata[[y]] > high,], position=position_jitter(w=0.1, h=0),col="green", alpha=0.5)+ This is kindda of important to me, and I am really stuck trying to find a solution. Any help that will get me started is much appreciated. Thanks in advance.

    Read the article

  • What is wrong here (will update): subset in geom_point does not work as expected

    - by Andreas
    I ask the following in the hope that someone might come up with a generic description about the problem.Basically I have no idea whats wrong with my code. When I run the code below, plot nr. 8 turns out wrong. Specifically the subset in geom_point does not work the way it should. If somebody can tell me what the problem is, I'll update this post. SOdata <- structure(list(id = 10:55, one = c(7L, 8L, 7L, NA, 7L, 8L, 5L, 7L, 7L, 8L, NA, 10L, 8L, NA, NA, NA, NA, 6L, 5L, 6L, 8L, 4L, 7L, 6L, 9L, 7L, 5L, 6L, 7L, 6L, 5L, 8L, 8L, 7L, 7L, 6L, 6L, 8L, 6L, 8L, 8L, 7L, 7L, 5L, 5L, 8L), two = c(7L, NA, 8L, NA, 10L, 10L, 8L, 9L, 4L, 10L, NA, 10L, 9L, NA, NA, NA, NA, 7L, 8L, 9L, 10L, 9L, 8L, 8L, 8L, 8L, 8L, 9L, 10L, 8L, 8L, 8L, 10L, 9L, 10L, 8L, 9L, 10L, 8L, 8L, 7L, 10L, 8L, 9L, 7L, 9L), three = c(7L, 10L, 7L, NA, 10L, 10L, NA, 10L, NA, NA, NA, NA, 10L, NA, NA, 4L, NA, 7L, 7L, 4L, 10L, 10L, 7L, 4L, 7L, NA, 10L, 4L, 7L, 7L, 7L, 10L, 10L, 7L, 10L, 4L, 10L, 10L, 10L, 4L, 10L, 10L, 10L, 10L, 7L, 10L), four = c(7L, 10L, 4L, NA, 10L, 7L, NA, 7L, NA, NA, NA, NA, 10L, NA, NA, 4L, NA, 10L, 10L, 7L, 10L, 10L, 7L, 7L, 7L, NA, 10L, 7L, 4L, 10L, 4L, 7L, 10L, 2L, 10L, 4L, 12L, 4L, 7L, 10L, 10L, 12L, 12L, 4L, 7L, 10L), five = c(7L, NA, 6L, NA, 8L, 8L, 7L, NA, 9L, NA, NA, NA, 9L, NA, NA, NA, NA, 7L, 8L, NA, NA, 7L, 7L, 4L, NA, NA, NA, NA, 5L, 6L, 5L, 7L, 7L, 6L, 9L, NA, 10L, 7L, 8L, 5L, 7L, 10L, 7L, 4L, 5L, 10L), six = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("2010-05-25", "2010-05-27", "2010-06-07"), class = "factor"), seven = c(0.777777777777778, 0.833333333333333, 0.333333333333333, 0.888888888888889, 0.5, 0.888888888888889, 0.777777777777778, 0.722222222222222, 0.277777777777778, 0.611111111111111, 0.722222222222222, 1, 0.888888888888889, 0.722222222222222, 0.555555555555556, NA, 0, 0.666666666666667, 0.666666666666667, 0.833333333333333, 0.833333333333333, 0.833333333333333, 0.833333333333333, 0.722222222222222, 0.833333333333333, 0.888888888888889, 0.666666666666667, 1, 0.777777777777778, 0.722222222222222, 0.5, 0.833333333333333, 0.722222222222222, 0.388888888888889, 0.722222222222222, 1, 0.611111111111111, 0.777777777777778, 0.722222222222222, 0.944444444444444, 0.555555555555556, 0.666666666666667, 0.722222222222222, 0.444444444444444, 0.333333333333333, 0.777777777777778), eight = c(0.666666666666667, 0.333333333333333, 0.833333333333333, 0.666666666666667, 1, 1, 0.833333333333333, 0.166666666666667, 0.833333333333333, 0.833333333333333, 1, 1, 0.666666666666667, 0.666666666666667, 0.333333333333333, 0.5, 0, 0.666666666666667, 0.5, 1, 0.666666666666667, 0.5, 0.666666666666667, 0.666666666666667, 0.666666666666667, 0.333333333333333, 0.333333333333333, 1, 0.666666666666667, 0.833333333333333, 0.666666666666667, 0.666666666666667, 0.5, 0, 0.833333333333333, 1, 0.666666666666667, 0.5, 0.666666666666667, 0.666666666666667, 0.5, 1, 0.833333333333333, 0.666666666666667, 0.833333333333333, 0.666666666666667), nine = c(0.307692307692308, NA, 0.461538461538462, 0.538461538461538, 1, 0.769230769230769, 0.538461538461538, 0.692307692307692, 0, 0.153846153846154, 0.769230769230769, NA, 0.461538461538462, NA, NA, NA, NA, 0, 0.615384615384615, 0.615384615384615, 0.769230769230769, 0.384615384615385, 0.846153846153846, 0.923076923076923, 0.615384615384615, 0.692307692307692, 0.0769230769230769, 0.846153846153846, 0.384615384615385, 0.384615384615385, 0.461538461538462, 0.384615384615385, 0.461538461538462, NA, 0.923076923076923, 0.692307692307692, 0.615384615384615, 0.615384615384615, 0.769230769230769, 0.0769230769230769, 0.230769230769231, 0.692307692307692, 0.769230769230769, 0.230769230769231, 0.769230769230769, 0.615384615384615), ten = c(0.875, 0.625, 0.375, 0.75, 0.75, 0.75, 0.625, 0.875, 1, 0.125, 1, NA, 0.625, 0.75, 0.75, 0.375, NA, 0.625, 0.5, 0.75, 0.875, 0.625, 0.875, 0.75, 0.625, 0.875, 0.5, 0.75, 0, 0.5, 0.875, 1, 0.75, 0.125, 0.5, 0.5, 0.5, 0.625, 0.375, 0.625, 0.625, 0.75, 0.875, 0.375, 0, 0.875), elleven = c(1, 0.8, 0.7, 0.9, 0, 1, 0.9, 0.5, 0, 0.8, 0.8, NA, 0.8, NA, NA, 0.8, NA, 0.4, 0.8, 0.5, 1, 0.4, 0.5, 0.9, 0.8, 1, 0.8, 0.5, 0.3, 0.9, 0.2, 1, 0.8, 0.1, 1, 0.8, 0.5, 0.2, 0.7, 0.8, 1, 0.9, 0.6, 0.8, 0.2, 1), twelve = c(0.666666666666667, NA, 0.133333333333333, 1, 1, 0.8, 0.4, 0.733333333333333, NA, 0.933333333333333, NA, NA, 0.6, 0.533333333333333, NA, 0.533333333333333, NA, 0, 0.6, 0.533333333333333, 0.733333333333333, 0.6, 0.733333333333333, 0.666666666666667, 0.533333333333333, 0.733333333333333, 0.466666666666667, 0.733333333333333, 1, 0.733333333333333, 0.666666666666667, 0.533333333333333, NA, 0.533333333333333, 0.6, 0.866666666666667, 0.466666666666667, 0.533333333333333, 0.333333333333333, 0.6, 0.6, 0.866666666666667, 0.666666666666667, 0.6, 0.6, 0.533333333333333)), .Names = c("id", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten", "elleven", "twelve"), class = "data.frame", row.names = c(NA, -46L)) iqr <- function(x, ...) { qs <- quantile(as.numeric(x), c(0.25, 0.5, 0.75), na.rm = T) names(qs) <- c("ymin", "y", "ymax") qs } magic <- function(y, ...) { high <- median(SOdata[[y]], na.rm=T)+1.5*sd(SOdata[[y]],na.rm=T) low <- median(SOdata[[y]], na.rm=T)-1.5*sd(SOdata[[y]],na.rm=T) ggplot(SOdata, aes_string(x="six", y=y))+ stat_summary(fun.data="iqr", geom="crossbar", fill="grey", alpha=0.3)+ geom_point(data = SOdata[SOdata[[y]] > high,], position=position_jitter(w=0.1, h=0),col="green", alpha=0.5)+ geom_point(data = SOdata[SOdata[[y]] < low,], position=position_jitter(w=0.1, h=0),col="red", alpha=0.5)+ stat_summary(fun.y=median, geom="point",shape=18 ,size=4, col="orange") } for (i in names(SOdata)[-c(1,7)]) { p<- magic(i) ggsave(paste("magig_plot_",i,".png",sep=""), plot=p, height=3.5, width=5.5) }

    Read the article

  • Thematic map contd.

    - by jsharma
    The previous post (creating a thematic map) described the use of an advanced style (color ranged-bucket style). The bucket style definition object has an attribute ('classification') which specifies the data classification scheme to use. It's values can be one of {'equal', 'quantile', 'logarithmic', 'custom'}. We use logarithmic in the previous example. Here we'll describe how to use a custom algorithm for classification. Specifically the Jenks Natural Breaks algorithm. We'll use the Javascript implementation in geostats.js The sample code above needs a few changes which are listed below. Include the geostats.js file after or before including oraclemapsv2.js <script src="geostats.js"></script> Modify the bucket style definition to use custom classification Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;}    bucketStyleDef = {       numClasses : colorSeries[colorName].classes,       classification: 'custom', //'logarithmic',  // use a logarithmic scale       algorithm: jenksFromGeostats,       styles: theStyles,       gradient:  useGradient? 'linear' : 'off'     }; The function, which implements the custom classification scheme, is specified as the algorithm attribute value. It must accept two input parameters, an array of OM.feature and the name of the feature attribute (e.g. TOTPOP) to use in the classification, and must return an array of buckets (i.e. an array of or OM.style.Bucket  or OM.style.RangedBucket in this case). However the algorithm also needs to know the number of classes (i.e. the number of buckets to create). So we use a global to pass that info in. (Note: This bug/oversight will be fixed and the custom algorithm will be passed 3 parameters: the features array, attribute name, and number of classes). So createBucketColorStyle() has the following changes Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} var numClasses ; function createBucketColorStyle( colorName, colorSeries, rangeName, useGradient) {    var theBucketStyle;    var bucketStyleDef;    var theStyles = [];    //var numClasses ; numClasses = colorSeries[colorName].classes; ... and the function jenksFromGeostats is defined as Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} function jenksFromGeostats(featureArray, columnName) {    var items = [] ; // array of attribute values to be classified    $.each(featureArray, function(i, feature) {         items.push(parseFloat(feature.getAttributeValue(columnName)));    });    // create the geostats object    var theSeries = new geostats(items);    // call getJenks which returns an array of bounds    var theClasses = theSeries.getJenks(numClasses);    if(theClasses)    {     theClasses[theClasses.length-1]=parseFloat(theClasses[theClasses.length-1])+1;    }    else    {     alert(' empty result from getJenks');    }    var theBuckets = [], aBucket=null ;    for(var k=0; k<numClasses; k++)    {             aBucket = new OM.style.RangedBucket(             {low:parseFloat(theClasses[k]),               high:parseFloat(theClasses[k+1])             });             theBuckets.push(aBucket);     }     return theBuckets; } A screenshot of the resulting map with 5 classes is shown below. It is also possible to simply create the buckets and supply them when defining the Bucket style instead of specifying the function (algorithm). In that case the bucket style definition object would be    bucketStyleDef = {      numClasses : colorSeries[colorName].classes,      classification: 'custom',        buckets: theBuckets, //since we are supplying all the buckets      styles: theStyles,      gradient:  useGradient? 'linear' : 'off'    };

    Read the article

1