Search Results

Search found 1631 results on 66 pages for 'statistics'.

Page 15/66 | < Previous Page | 11 12 13 14 15 16 17 18 19 20 21 22  | Next Page >

  • Summarising grouped records in a dataframe in R

    - by monch1962
    Hello all, I have a data frame in R that looks like this: > TimeOffset, Source, Length > 0 1 1500 > 0.1 1 1000 > 0.2 1 50 > 0.4 2 25 > 0.6 2 3 > 1.1 1 1500 > 1.4 1 18 > 1.6 2 2500 > 1.9 2 18 > 2.1 1 37 > ... and I want to convert it to > TimeOffset, Source, Length > 0.2 1 2550 > 0.6 2 28 > 1.4 1 1518 > 1.9 2 2518 > ... Trying to put this into English, I want to group consecutive records with the same 'Source' together, then printing out a single record per group showing the highest time offset in that group, the source, and the sum of the lengths in that group. The TimeOffset values will always increase. I suspect this is possible in R, but I really don't know where to start. In a pinch I could export the data frame out and do it in e.g. Python, but I'd prefer to stay within R if possible. Thanks in advance for any assistance you can provide

    Read the article

  • How to change the icon in the title bar in R?

    - by Jared
    I just installed R 2.11.0-x64 onto my Windows 7 Professional machine. With my previous installations of R (2.10.1 32 bit was the most recent) the little icon that appeared in the title bar and in the taskbar at the bottom of windows was the R "R." Now however, the icon almost looks like a small windows Task Manager. I know this isn't a code issue, but it affects me as I flip between windows. Is there a way to put the "R" icon back in there? Would it be an R setting or a Windows setting?

    Read the article

  • How to generate correlated binary variables

    - by jonalm
    Dear All I need to generate a series of N random binary variables with a given correlation function. Let x = {x_i} be a series of binary variables (taking the value 0 or 1, i running form 1 to N). The marginal probability is given Pr(x_i = 1) = p, and the values should be correlated in the following way E[ x_i x_j ] = const * |i-j|^-alfa where alfa is a positive number. Is it possible to generate a series like this? preferably in python.

    Read the article

  • Is NoSQL ideal to store stats?

    - by Ivan
    I'm not terribly familiar with NoSQL systems, but I remember reading a while back that they are ideal to handle statistical data. Since I'm about to start writing code that will record data like "how many users were registered on each day", I was thinking I could use this as an opportunity to learn more about NoSQL if it fits the bill. If NoSQL is indeed ideal for this, could you provide me with some information as to why? And which specific systems are best suited for this particular need? Thanks!

    Read the article

  • Generate random number from an arbitrary weighted list

    - by Fernando
    Here's what I need to do, I'll be doing this both in PHP and JavaScript. I have a list of numbers that will range from 1 to 300-500 (I haven't set the limit yet). I will be running a drawing were 10 numbers will be picked at random from the given range. Here's the tricky part: I want some numbers to be less likely to be drawn up. A small set of those 300-500 will be flagged as "lucky numbers". For example, out of 100 drawings, most numbers have equal chances of being drawn, except for a few, that will only be picked once every 30-50 drawings. Basically I need to artificially set the probability of certain numbers to be picked while maintaining an even distribution with the rest of the numbers. The only similar thing I've found so far is this question: Generate A Weighted Random Number, the problem being that my spec has considerably more numbers (up to 500) so the weights would get very small and supposedly this could be a problem with that solution (Rejection Sampling). I'm still trying it, though, but I wonder if there other solutions. Math is not my thing so I appreciate any input. Thanks.

    Read the article

  • What percent of web sites use JavaScript?

    - by Claudiu
    I'm wondering just how pervasive JavaScript is. This article states that 73% of websites they tested rely on JavaScript for important functionality, but it seems to me that the number must be larger. Have any surveys been done on this topic? Maybe a better way to phrase this question is - are there any sites that don't use JavaScript? EDIT: By 'use', I don't necessarily mean "rely on for important functionality" - that was just the statistic that one article gave.

    Read the article

  • Statistical analysis on large data set to be published on the web

    - by dassouki
    I have a non-computer related data logger, that collects data from the field. This data is stored as text files, and I manually lump the files together and organize them. The current format is through a csv file per year per logger. Each file is around 4,000,000 lines x 7 loggers x 5 years = a lot of data. some of the data is organized as bins item_type, item_class, item_dimension_class, and other data is more unique, such as item_weight, item_color, date_collected, and so on ... Currently, I do statistical analysis on the data using a python/numpy/matplotlib program I wrote. It works fine, but the problem is, I'm the only one who can use it, since it and the data live on my computer. I'd like to publish the data on the web using a postgres db; however, I need to find or implement a statistical tool that'll take a large postgres table, and return statistical results within an adequate time frame. I'm not familiar with python for the web; however, I'm proficient with PHP on the web side, and python on the offline side. users should be allowed to create their own histograms, data analysis. For example, a user can search for all items that are blue shipped between week x and week y, while another user can search for sort the weight distribution of all items by hour for all year long. I was thinking of creating and indexing my own statistical tools, or automate the process somehow to emulate most queries. This seemed inefficient. I'm looking forward to hearing your ideas Thanks

    Read the article

  • Calculate Spearman's Rank Correlation Coefficient

    - by Arvigeus
    I'm trying to calculate correlations using this method. However, when I do the math by hand, I get one result, and when I do it with R - cor(dt, use="complete.obs", method="spearman") - I get totally different thing. Example data below: --------------------------------- AnswerPosID.1007 AnswerPosID.1008 --------------------------------- 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 --------------------------------- Manual result: 0.799777530589544; R cor: -0.07142857

    Read the article

  • Suggest a good book for Quantitative Methods & R Programming

    - by Rahul
    Hi folks, Please suggest a good book for beginner in Quantitative Methods/Techniques. Adding to this, a good book for beginners in R programming language, used in Quantitative Methods. And I've a few questions about this: ? Should I have to learn the other subjects like Probability, Statics, etc. before learning Quantitative Methods ? Is there any relation between Quantitative Methods & Data Mining

    Read the article

  • Howto Plot "Reverse" Cumulative Frequency Graph With ECDF

    - by neversaint
    I have no problem plotting the following cumulative frequency graph plot like this. library(Hmisc) pre.test <- rnorm(100,50,10) post.test <- rnorm(100,55,10) x <- c(pre.test, post.test) g <- c(rep('Pre',length(pre.test)),rep('Post',length(post.test))) Ecdf(x, group=g, what="f", xlab='Test Results', label.curves=list(keys=1:2)) But I want to show the graph in forms of the "reverse" cumulative frequency of values x ? (i.e. something equivalent to what="1-f"). Is there a way to do it? Other suggestions in R other than using Hmisc are also very much welcomed.

    Read the article

  • what is the best way to track unique visitors?

    - by dnkira
    hello what i want is to make user counter as true as possible. exluding bots, and clever users as much as possible. as 4 what i know, it can be done in several ways: ip (trouble with dinamic ones and proxy's) cookies (with session id maybe, but can be deleted or browser can be changed) flash cookies (not all users have it) any other ways? and what is the best?

    Read the article

  • What is the relationship between a R.V N(0,1) and others continuous random variables

    - by calejero
    Hello. I have a question. I need to kwon what is the relationship between a Random Variable with Normal distribuation (N(0,1)) and others continuous random variables. Can you write to me an example? Thank You Hola a todos. Necesito saber qué relación existe entre una variable aleatoria Normal (0,1) y cualquier variable aleatoria continua. Además, me vendría bien un ejemplo. Gracias por vuestra ayuda

    Read the article

  • What is the best way to store categorical references in SQL tables?

    - by jlafay
    I'm wanting to store a wide array of categorical data in MySQL database tables. Let's say that for instance I want to to information on "widgets" and want to categorize attributes in certain ways, i.e. shape category. For instance, the widgets could be classified as: round, square, triangular, spherical, etc. Should these categories be stored within a table to reference them best from an application? Another possibility, I would imagine, would be to add a column to widgets that contained a shape column that contained a tiny int. That way my application could search shapes by that and then use a coordinating enum type that would map the shape int meanings. Which would be best? Or is there another solution that I'm not thinking of yet?

    Read the article

  • Computing, storing, and retrieving values to and from an N-Dimensional matrix

    - by Adam S
    This question is probably quite different from what you are used to reading here - I hope it can provide a fun challenge. Essentially I have an algorithm that uses 5(or more) variables to compute a single value, called outcome. Now I have to implement this algorithm on an embedded device which has no memory limitations, but has very harsh processing constraints. Because of this, I would like to run a calculation engine which computes outcome for, say, 20 different values of each variable and stores this information in a file. You may think of this as a 5(or more)-dimensional matrix or 5(or more)-dimensional array, each dimension being 20 entries long. In any modern language, filling this array is as simple as having 5(or more) nested for loops. The tricky part is that I need to dump these values into a file that can then be placed onto the embedded device so that the device can use it as a lookup table. The questions now, are: What format(s) might be acceptable for storing the data? What programs (MATLAB, C#, etc) might be best suited to compute the data? C# must be used to import the data on the device - is this possible given your answer to #1?

    Read the article

  • Probability Question

    - by Juddling
    if i pick 10 numbers from a possible 80. what is the probablity for someone else picking the same 10 numbers as me? or just 4 numbers? or no numbers? i think it's no matches: 10/80 one match: 10*9/80*79 so the formula would be (10!/matches!)/(80!/matches!) is this right? i've only just started doing this at A-level and i need it for a game script i'm making.

    Read the article

  • Dimension Reduction in Categorical Data with missing values

    - by user227290
    I have a regression model in which the dependent variable is continuous but ninety percent of the independent variables are categorical(both ordered and unordered) and around thirty percent of the records have missing values(to make matters worse they are missing randomly without any pattern, that is, more that forty five percent of the data hava at least one missing value). There is no a priori theory to choose the specification of the model so one of the key tasks is dimension reduction before running the regression. While I am aware of several methods for dimension reduction for continuous variables I am not aware of a similar statical literature for categorical data (except, perhaps, as a part of correspondence analysis which is basically a variation of principal component analysis on frequency table). Let me also add that the dataset is of moderate size 500000 observations with 200 variables. I have two questions. Is there a good statistical reference out there for dimension reduction for categorical data along with robust imputation (I think the first issue is imputation and then dimension reduction)? This is linked to implementation of above problem. I have used R extensively earlier and tend to use transcan and impute function heavily for continuous variables and use a variation of tree method to impute categorical values. I have a working knowledge of Python so if something is nice out there for this purpose then I will use it. Any implementation pointers in python or R will be of great help. Thank you.

    Read the article

  • Determining the chances of an event occurring when it hasn't occurred yet

    - by sanity
    A user visits my website at time t, and they may or may not click on a particular link I care about, if they do I record the fact that they clicked the link, and also the duration since t that they clicked it, call this d. I need an algorithm that allows me to create a class like this: class ClickProbabilityEstimate { public void reportImpression(long id); public void reportClick(long id); public double estimateClickProbability(long id); } Every impression gets a unique id, and this is used when reporting a click to indicate which impression the click belongs to. I need an algorithm that will return a probability, based on how much time has past since an impression was reported, that the impression will receive a click, based on how long previous clicks required. Clearly one would expect that this probability will decrease over time if there is still no click. If necessary, we can set an upper-bound, beyond which we consider the click probability to be 0 (eg. if its been an hour since the impression occurred, we can be pretty sure there won't be a click). The algorithm should be both space and time efficient, and hopefully make as few assumptions as possible, while being elegant. Ease of implementation would also be nice. Any ideas?

    Read the article

  • Measuring the limit of a point on a smooth.spline in R

    - by Subtle Array
    I'm not sure if that's the right terminology. I've entered some data into R, and I've put a smoothingSpline through it using the following command. smoothingSpline = smooth.spline(year, rate, spar=0.35) plot(x,y) lines(smoothingSpline) Now I'd like to measure some limits (or where the curve is at a given y point), and maybe to some predictive analysis on points that extend beyond the graph. Are there commands in R for doing this?

    Read the article

  • Given a document select a relevant snippet.

    - by BCS
    When I ask a question here, the tool tips for the question returned by the auto search given the first little bit of the question, but a decent percentage of them don't give any text that is any more useful for understanding the question than the title. Does anyone have an idea about how to make a filter to trim out useless bits of a question? My first idea is to trim any leading sentences that contain only words in some list (for instance, stop words, plus words from the title, plus words from the SO corpus that have very weak correlation with tags, that is that are equally likely to occur in any question regardless of it's tags)

    Read the article

  • Howto plot two cumulative frequency graph together

    - by neversaint
    I have data that looks like this: #val Freq1 Freq2 0.000 178 202 0.001 4611 5300 0.002 99 112 0.003 26 30 0.004 17 20 0.005 15 20 0.006 11 14 0.007 11 13 0.008 13 13 ...many more lines.. Full data can be found here: http://dpaste.com/173536/plain/ What I intend to do is to have a cumulative graph with "val" as x-axis with "Freq1" & "Freq2" as y-axis, plot together in 1 graph. I have this code. But it creates two plots instead of 1. dat <- read.table("stat.txt",header=F); val<-dat$V1 freq1<-dat$V2 freq2<-dat$V3 valf1<-rep(val,freq1) valf2<-rep(val,freq2) valfreq1table<- table(valf1) valfreq2table<- table(valf2) cumfreq1=c(0,cumsum(valfreq1table)) cumfreq2=c(0,cumsum(valfreq2table)) plot(cumfreq1, ylab="CumFreq",xlab="Loglik Ratio") lines(cumfreq1) plot(cumfreq2, ylab="CumFreq",xlab="Loglik Ratio") lines(cumfreq2) What's the right way to approach this?

    Read the article

< Previous Page | 11 12 13 14 15 16 17 18 19 20 21 22  | Next Page >