Search Results

Search found 1631 results on 66 pages for 'statistics'.

Page 13/66 | < Previous Page | 9 10 11 12 13 14 15 16 17 18 19 20 | Next Page >

Get percentiles of data-set with group by month

- by Cylindric

Hello, I have a SQL table with a whole load of records that look like this: | Date | Score | + -----------+-------+ | 01/01/2010 | 4 | | 02/01/2010 | 6 | | 03/01/2010 | 10 | ... | 16/03/2010 | 2 | I'm plotting this on a chart, so I get a nice line across the graph indicating score-over-time. Lovely. Now, what I need to do is include the average score on the chart, so we can see how that changes over time, so I can simply add this to the mix: SELECT YEAR(SCOREDATE) 'Year', MONTH(SCOREDATE) 'Month', MIN(SCORE) MinScore, AVG(SCORE) AverageScore, MAX(SCORE) MaxScore FROM SCORES GROUP BY YEAR(SCOREDATE), MONTH(SCOREDATE) ORDER BY YEAR(SCOREDATE), MONTH(SCOREDATE) That's no problem so far. The problem is, how can I easily calculate the percentiles at each time-period? I'm not sure that's the correct phrase. What I need in total is: A line on the chart for the score (easy) A line on the chart for the average (easy) A line on the chart showing the band that 95% of the scores occupy (stumped) It's the third one that I don't get. I need to calculate the 5% percentile figures, which I can do singly: SELECT MAX(SubQ.SCORE) FROM (SELECT TOP 45 PERCENT SCORE FROM SCORES WHERE YEAR(SCOREDATE) = 2010 AND MONTH(SCOREDATE) = 1 ORDER BY SCORE ASC) AS SubQ SELECT MIN(SubQ.SCORE) FROM (SELECT TOP 45 PERCENT SCORE FROM SCORES WHERE YEAR(SCOREDATE) = 2010 AND MONTH(SCOREDATE) = 1 ORDER BY SCORE DESC) AS SubQ But I can't work out how to get a table of all the months. | Date | Average | 45% | 55% | + -----------+---------+-----+-----+ | 01/01/2010 | 13 | 11 | 15 | | 02/01/2010 | 10 | 8 | 12 | | 03/01/2010 | 5 | 4 | 10 | ... | 16/03/2010 | 7 | 7 | 9 | At the moment I'm going to have to load this lot up into my app, and calculate the figures myself. Or run a larger number of individual queries and collate the results.

Read the article
MySQL Volleyball Standings

- by Torez

I have a database table full of game by game results and want to know if I can calculate the following: GP (games played) Wins Loses Points (2 points for each win, 1 point for each lose) Here is my table structure: CREATE TABLE `results` ( `id` int(10) unsigned NOT NULL auto_increment, `home_team_id` int(10) unsigned NOT NULL, `home_score` int(3) unsigned NOT NULL, `visit_team_id` int(10) unsigned NOT NULL, `visit_score` int(3) unsigned NOT NULL, PRIMARY KEY (`id`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=7 ; And a few testing results: INSERT INTO `results` VALUES(1, 1, 21, 2, 25); INSERT INTO `results` VALUES(2, 3, 21, 4, 17); INSERT INTO `results` VALUES(3, 1, 25, 3, 9); INSERT INTO `results` VALUES(4, 2, 7, 4, 22); INSERT INTO `results` VALUES(5, 1, 19, 4, 20); INSERT INTO `results` VALUES(6, 2, 24, 3, 26); Here is what a final table would look like: +-------------------+----+------+-------+--------+ | Team Name | GP | Wins | Loses | Points | +-------------------+----+------+-------+--------+ | Spikers | 4 | 4 | 0 | 8 | | Leapers | 4 | 2 | 2 | 6 | | Ground Control | 4 | 1 | 3 | 5 | | Touch Guys | 4 | 0 | 4 | 4 | +-------------------+----+------+-------+--------+

Read the article
Partial Least Squares Implementation for C/C++?

- by Colin

Does anyone know of an open-source implementation of a partial least squares algorithm in C or C++?

Read the article
postgresql weighted average?

- by milovanderlinden

say I have a postgresql table with the following values: id | value ---------- 1 | 4 2 | 8 3 | 100 4 | 5 5 | 7 If I use postgresql to calculate the average, it gives me an average of 24.8 because the high value of 100 has great impact on the calculation. While in fact I would like to find an average somewhere around 6 and eliminate the extreme(s). I am looking for a way to eliminate extremes and want to do this "statistically correct". The extreme's cannot be fixed. I cannot say; If a value is over X, it has to be eliminated. I have been bending my head on the postgresql aggregate functions but cannot put my finger on what is right for me to use. Any suggestions?

Read the article
Using ARIMA to model and forecast stock prices using user-friendly stats program

- by Brian

Hi people, Can anyone please offer some insight into this for me? I'm coming from a functional magnetic resonance imaging research background where I analyzed a lot of time series data, and I'd like to analyze the time series of stock prices (or returns) by: 1) modeling a successful stock in a particular market sector and then cross-correlating the time series of this historically successful stock with that of other newer stocks to look for significant relationships; 2) model a stock's price time series and use forecasting (e.g., exponential smoothing) to predict future values of it. I'd like to use non-linear modeling methods (ARIMA and ARCH) to do this. Several questions: How often do ARIMA and ARCH modeling methods (given that the individual who implements them does so accurately) actually fit the stock time series data they target, and what is the optimal fit I can expect? Is the extent to which this model fits the data commensurate with the extent to which it predicts this stock time series' future values? Rather than randomly selecting stocks to compare or model, if profit is my goal, what is an efficient approach, if any, to selecting the stocks I'm going to analyze? Which stats program is the most user-friendly for this? Any thoughts on this would be great and would go a long way for me. Thanks, Brian

Read the article
probability of subset sum after rolling dice 4 times

- by binger

If we roll 4 dices (fair), what is the probability of "sum of subset" being 5. e.g. 1432,1121, 2344, 2354 have a subset sum of 5. Can you illustrate how to calculate this.

Read the article
Histogram matching - image processing - c/c++

- by Raj

Hello I have two histograms. int Hist1[10] = {1,4,3,5,2,5,4,6,3,2}; int Hist1[10] = {1,4,3,15,12,15,4,6,3,2}; Hist1's distribution is of type multi-modal; Hist2's distribution is of type uni-modal with single prominent peak. My questions are Is there any way that i could determine the type of distribution programmatically? How to quantify whether these two histograms are similar/dissimilar? Thanks

Read the article
R: How to remove outliers from a smoother in ggplot2?

- by John

I have the following data set that I am trying to plot with ggplot2, it is a time series of three experiments A1, B1 and C1 and each experiment had three replicates. I am trying to add a stat which detects and removes outliers before returning a smoother (mean and variance?). I have written my own outlier function (not shown) but I expect there is already a function to do this, I just have not found it. I've looked at stat_sum_df("median_hilow", geom = "smooth") from some examples in the ggplot2 book, but I didn't understand the help doc from Hmisc to see if it removes outliers or not. Is there a function to remove outliers like this in ggplot, or where would I amend my code below to add my own function? library (ggplot2) data = data.frame (day = c(1,3,5,7,1,3,5,7,1,3,5,7,1,3,5,7,1,3,5,7,1,3,5,7,1,3,5,7,1,3,5,7,1,3,5,7), od = c( 0.1,1.0,0.5,0.7 ,0.13,0.33,0.54,0.76 ,0.1,0.35,0.54,0.73 ,1.3,1.5,1.75,1.7 ,1.3,1.3,1.0,1.6 ,1.7,1.6,1.75,1.7 ,2.1,2.3,2.5,2.7 ,2.5,2.6,2.6,2.8 ,2.3,2.5,2.8,3.8), series_id = c( "A1", "A1", "A1","A1", "A1", "A1", "A1","A1", "A1", "A1", "A1","A1", "B1", "B1","B1", "B1", "B1", "B1","B1", "B1", "B1", "B1","B1", "B1", "C1","C1", "C1", "C1", "C1","C1", "C1", "C1", "C1","C1", "C1", "C1"), replicate = c( "A1.1","A1.1","A1.1","A1.1", "A1.2","A1.2","A1.2","A1.2", "A1.3","A1.3","A1.3","A1.3", "B1.1","B1.1","B1.1","B1.1", "B1.2","B1.2","B1.2","B1.2", "B1.3","B1.3","B1.3","B1.3", "C1.1","C1.1","C1.1","C1.1", "C1.2","C1.2","C1.2","C1.2", "C1.3","C1.3","C1.3","C1.3")) > data day od series_id replicate 1 1 0.10 A1 A1.1 2 3 1.00 A1 A1.1 3 5 0.50 A1 A1.1 4 7 0.70 A1 A1.1 5 1 0.13 A1 A1.2 6 3 0.33 A1 A1.2 7 5 0.54 A1 A1.2 8 7 0.76 A1 A1.2 9 1 0.10 A1 A1.3 10 3 0.35 A1 A1.3 11 5 0.54 A1 A1.3 12 7 0.73 A1 A1.3 13 1 1.30 B1 B1.1 This is what I have so far and is working nicely, but outliers are not removed: r <- ggplot(data = data, aes(x = day, y = od)) r + geom_point(aes(group = replicate, color = series_id)) + # add points geom_line(aes(group = replicate, color = series_id)) + # add lines geom_smooth(aes(group = series_id)) # add smoother, average of each replicate

Read the article
Classifying captured data in unknown format?

- by monch1962

I've got a large set of captured data (potentially hundreds of thousands of records), and I need to be able to break it down so I can both classify it and also produce "typical" data myself. Let me explain further... If I have the following strings of data: 132T339G1P112S 164T897F5A498S 144T989B9B223T 155T928X9Z554T ... you might start to infer the following: possibly all strings are 14 characters long the 4th, 8th, 10th and 14th characters may always be alphas, while the rest are numeric the first character may always be a '1' the 4th character may always be the letter 'T' the 14th character may be limited to only being 'S' or 'T' and so on... As you get more and more samples of real data, some of these "rules" might disappear; if you see a 15 character long string, then you have evidence that the 1st "rule" is incorrect. However, given a sufficiently large sample of strings that are exactly 14 characters long, you can start to assume that "all strings are 14 characters long" and assign a numeric figure to your degree of confidence (with an appropriate set of assumptions around the fact that you're seeing a suitably random set of all possible captured data). As you can probably tell, a human can do a lot of this classification by eye, but I'm not aware of libraries or algorithms that would allow a computer to do it. Given a set of captured data (significantly more complex than the above...), are there libraries that I can apply in my code to do this sort of classification for me, that will identify "rules" with a given degree of confidence? As a next step, I need to be able to take those rules, and use them to create my own data that conforms to these rules. I assume this is a significantly easier step than the classification, but I've never had to perform a task like this before so I'm really not sure how complex it is. At a guess, Python or Java (or possibly Perl or R) are possibly the "common" languages most likely to have these sorts of libraries, and maybe some of the bioinformatic libraries do this sort of thing. I really don't care which language I have to use; I need to solve the problem in whatever way I can. Any sort of pointer to information would be very useful. As you can probably tell, I'm struggling to describe this problem clearly, and there may be a set of appropriate keywords I can plug into Google that will point me towards the solution. Thanks in advance

Read the article
Optimal two variable linear regression calculation

- by Dave Jarvis

Problem Am looking to apply the y = mx + b equation (where m is SLOPE, b is INTERCEPT) to a data set, which is retrieved as shown in the SQL code. The values from the (MySQL) query are: SLOPE = 0.0276653965651912 INTERCEPT = -57.2338357550468 SQL Code SELECT ((sum(t.YEAR) * sum(t.AMOUNT)) - (count(1) * sum(t.YEAR * t.AMOUNT))) / (power(sum(t.YEAR), 2) - count(1) * sum(power(t.YEAR, 2))) as SLOPE, ((sum( t.YEAR ) * sum( t.YEAR * t.AMOUNT )) - (sum( t.AMOUNT ) * sum(power(t.YEAR, 2)))) / (power(sum(t.YEAR), 2) - count(1) * sum(power(t.YEAR, 2))) as INTERCEPT, FROM (SELECT D.AMOUNT, Y.YEAR FROM CITY C, STATION S, YEAR_REF Y, MONTH_REF M, DAILY D WHERE -- For a specific city ... -- C.ID = 8590 AND -- Find all the stations within a 15 unit radius ... -- SQRT( POW( C.LATITUDE - S.LATITUDE, 2 ) + POW( C.LONGITUDE - S.LONGITUDE, 2 ) ) < 15 AND -- Gather all known years for that station ... -- S.STATION_DISTRICT_ID = Y.STATION_DISTRICT_ID AND -- The data before 1900 is shaky; insufficient after 2009. -- Y.YEAR BETWEEN 1900 AND 2009 AND -- Filtered by all known months ... -- M.YEAR_REF_ID = Y.ID AND -- Whittled down by category ... -- M.CATEGORY_ID = '001' AND -- Into the valid daily climate data. -- M.ID = D.MONTH_REF_ID AND D.DAILY_FLAG_ID <> 'M' GROUP BY Y.YEAR ORDER BY Y.YEAR ) t Data The data is visualized here: Question The following results (to calculate the start and end points of the line) appear incorrect. Why are the results off by ~10 degrees (e.g., outliers skewing the data)? (1900 * 0.0276653965651912) + (-57.2338357550468) = -4.66958228 (2009 * 0.0276653965651912) + (-57.2338357550468) = -1.65405406 I would have expected the 1900 result to be around 10 (not -4.67) and the 2009 result to be around 11.50 (not -1.65). Related Sites Least absolute deviations Robust regression Thank you!

Read the article
Using recode in R

- by celenius

I'm trying to use 'recode' in R (from the 'cars' package) and it is not working. I read in data from a .csv file into a data frame called 'results'. Then, I replace the values in the column 'Built_year', according to the following logic. recode(results$Built_year, "2 ='1950s';3='1960s';4='1970s';5='1980s';6='1990s';7='2000 or later'") When I check results$Built_year after doing this step, it appears to have worked. However, it does not store this value, and returns to its previous value. I don't understand why. Thanks. (at the moment something is going wrong and I can't see any of the icons for formatting)

Read the article
Workflow for statistical analysis and report writing

- by ws

Does anyone have any wisdom on workflows for data analysis related to custom report writing? The use-case is basically this: Client commissions a report that uses data analysis, e.g. a population estimate and related maps for a water district. The analyst downloads some data, munges the data and saves the result (e.g. adding a column for population per unit, or subsetting the data based on district boundaries). The analyst analyzes the data created in (2), gets close to her goal, but sees that needs more data and so goes back to (1). Rinse repeat until the tables and graphics meet QA/QC and satisfy the client. Write report incorporating tables and graphics. Next year, the happy client comes back and wants an update. This should be as simple as updating the upstream data by a new download (e.g. get the building permits from the last year), and pressing a "RECALCULATE" button, unless specifications change. At the moment, I just start a directory and ad-hoc it the best I can. I would like a more systematic approach, so I am hoping someone has figured this out... I use a mix of spreadsheets, SQL, ARCGIS, R, and Unix tools. Thanks! PS: Below is a basic Makefile that checks for dependencies on various intermediate datasets (w/ ".RData" suffix) and scripts (".R" suffix). Make uses timestamps to check dependencies, so if you 'touch ss07por.csv', it will see that this file is newer than all the files / targets that depend on it, and execute the given scripts in order to update them accordingly. This is still a work in progress, including a step for putting into SQL database, and a step for a templating language like sweave. Note that Make relies on tabs in its syntax, so read the manual before cutting and pasting. Enjoy and give feedback! http://www.gnu.org/software/make/manual/html%5Fnode/index.html#Top R=/home/wsprague/R-2.9.2/bin/R persondata.RData : ImportData.R ../../DATA/ss07por.csv Functions.R $R --slave -f ImportData.R persondata.Munged.RData : MungeData.R persondata.RData Functions.R $R --slave -f MungeData.R report.txt: TabulateAndGraph.R persondata.Munged.RData Functions.R $R --slave -f TabulateAndGraph.R report.txt

Read the article
How can I get the decreasing cumulative of frequency series using Perl?

- by neversaint

I have a data that looks like this: 3 2 1 5 What I want to get is the "decreasing" cumulative of this data yielding 11 8 6 5 0 What is the compact way of doing that in Perl?

Read the article
Smoothing Small Data Set With Second Order Quadratic Curve

- by Rev316

I'm doing some specific signal analysis, and I am in need of a method that would smooth out a given bell-shaped distribution curve. A running average approach isn't producing the results I desire. I want to keep the min/max, and general shape of my fitted curve intact, but resolve the inconsistencies in sampling. In short: if given a set of data that models a simple quadratic curve, what statistical smoothing method would you recommend? If possible, please reference an implementation, library, or framework. Thanks SO!

Read the article
Generating lognormally distributed random number from mean, coeff of variation

- by Richie Cotton

Most functions for generating lognormally distributed random numbers take the mean and standard deviation of the associated normal distribution as parameters. My problem is that I only know the mean and the coefficient of variation of the lognormal distribution. It is reasonably straight forward to derive the parameters I need for the standard functions from what I have: If mu and sigma are the mean and standard deviation of the associated normal distribution, we know that coeffOfVar^2 = variance / mean^2 = (exp(sigma^2) - 1) * exp(2*mu + sigma^2) / exp(mu + sigma^2/2)^2 = exp(sigma^2) - 1 We can rearrange this to sigma = sqrt(log(coeffOfVar^2 + 1)) We also know that mean = exp(mu + sigma^2/2) This rearranges to mu = log(mean) - sigma^2/2 Here's my R implementation rlnorm0 <- function(mean, coeffOfVar, n = 1e6) { sigma <- sqrt(log(coeffOfVar^2 + 1)) mu <- log(mean) - sigma^2 / 2 rlnorm(n, mu, sigma) } It works okay for small coefficients of variation r1 <- rlnorm0(2, 0.5) mean(r1) # 2.000095 sd(r1) / mean(r1) # 0.4998437 But not for larger values r2 <- rlnorm0(2, 50) mean(r2) # 2.048509 sd(r2) / mean(r2) # 68.55871 To check that it wasn't an R-specific issue, I reimplemented it in MATLAB. (Uses stats toolbox.) function y = lognrnd0(mean, coeffOfVar, sizeOut) if nargin < 3 || isempty(sizeOut) sizeOut = [1e6 1]; end sigma = sqrt(log(coeffOfVar.^2 + 1)); mu = log(mean) - sigma.^2 ./ 2; y = lognrnd(mu, sigma, sizeOut); end r1 = lognrnd0(2, 0.5); mean(r1) % 2.0013 std(r1) ./ mean(r1) % 0.5008 r2 = lognrnd0(2, 50); mean(r2) % 1.9611 std(r2) ./ mean(r2) % 22.61 Same problem. The question is, why is this happening? Is it just that the standard deviation is not robust when the variation is that wide? Or have a screwed up somewhere?

Read the article
Is it possible to do A/B testing by page rather than by individual?

- by mojones

Lets say I have a simple ecommerce site that sells 100 different t-shirt designs. I want to do some a/b testing to optimise my sales. Let's say I want to test two different "buy" buttons. Normally, I would use AB testing to randomly assign each visitor to see button A or button B (and try to ensure that that the user experience is consistent by storing that assignment in session, cookies etc). Would it be possible to take a different approach and instead, randomly assign each of my 100 designs to use button A or B, and measure the conversion rate as (number of sales of design n) / (pageviews of design n) This approach would seem to have some advantages; I would not have to worry about keeping the user experience consistent - a given page (e.g. www.example.com/viewdesign?id=6) would always return the same html. If I were to test different prices, it would be far less distressing to the user to see different prices for different designs than different prices for the same design on different computers. I also wonder whether it might be better for SEO - my suspicion is that Google would "prefer" that it always sees the same html when crawling a page. Obviously this approach would only be suitable for a limited number of sites; I was just wondering if anyone has tried it?

Read the article
Where can I find simple beta cdf implementation.

- by Gacek

I need to use beta distribution and inverse beta distribution in my project. There is quite good but complicated implementation in GSL, but I don't want to use such a big library only to get one function. I would like to either, implement it on my own or link some simple library. Do you know any sources that could help me? I'm looking for any books/articles about numerical approximation of beta PDF, libraries where it could be implemented. Any other suggestions would be also appreciated. Any programming language, but C++/C# preffered.

Read the article
Exporting Stata results

- by Max M.

I'm sure this is an issue anyone who uses Stata for publications or reports has run into: how do you conveniently export your output to something that can be parsed by a scripting language or Excel? There are a few ADO files that to this for specific commands (try findit tabout or findit outreg2). But what about exporting the output of the table command? Or the results of an anova? I'd love to hear about how Stata users address this problem for either specific commands or in general.

Read the article
Naive Bayes matlab, row classification

- by Jungle Boogie

How do you classify a row of seperate cells in matlab? Atm I can classify single coloums like so: training = [1;0;-1;-2;4;0;1]; % this is the sample data. target_class = ['posi';'zero';'negi';'negi';'posi';'zero';'posi']; % target_class are the different target classes for the training data; here 'positive' and 'negetive' are the two classes for the given training data % Training and Testing the classifier (between positive and negative) test = 10*randn(25, 1); % this is for testing. I am generating random numbers. class = classify(test,training, target_class, 'diaglinear') % This command classifies the test data depening on the given training data using a Naive Bayes classifier Unlike the above im looking at wanting to classify: A B C Row A | 1 | 1 | 1 = a house Row B | 1 | 2 | 1 = a garden Can anyone help? Here is a code example from matlabs site: nb = NaiveBayes.fit(training, class) nb = NaiveBayes.fit(..., 'param1',val1, 'param2',val2, ...) I dont understand what param1 is or what val1 etc should be?

Read the article
Mathematica, PDF Curves and Shading

- by Venerable Garbage Collector

I need to plot a normal distribution and then shade some specific region of it. Right now I'm doing this by creating a plot of the distribution and overlaying it with a RegionPlot. This is pretty convoluted and I'm certain there must be a more elegant way of doing it. I Googled, looked at the docs, found nothing. Help me SO! I guess Mathematica counts as programming? :D

Read the article
How can I structure and recode messy categorical data in R?

- by briandk

I'm struggling with how to best structure categorical data that's messy, and comes from a dataset I'll need to clean. The Coding Scheme I'm analyzing data from a university science course exam. We're looking at patterns in student responses, and we developed a coding scheme to represent the kinds of things students are doing in their answers. A subset of the coding scheme is shown below. Note that within each major code (1, 2, 3) are nested non-unique sub-codes (a, b, ...). What the Raw Data Looks Like I've created an anonymized, raw subset of my actual data which you can view here. Part of my problem is that those who coded the data noticed that some students displayed multiple patterns. The coders' solution was to create enough columns (reason1, reason2, ...) to hold students with multiple patterns. That becomes important because the order (reason1, reason2) is arbitrary--two students (like student 41 and student 42 in my dataset) who correctly applied "dependency" should both register in an analysis, regardless of whether 3a appears in the reason column or the reason2 column. How Can I Best Structure Student Data? Part of my problem is that in the raw data, not all students display the same patterns, or the same number of them, in the same order. Some students may do just one thing, others may do several. So, an abstracted representation of example students might look like this: Note in the example above that student002 and student003 both are coded as "1b", although I've deliberately shown the order as different to reflect the reality of my data. My (Practical) Questions Should I concatenate reason1, reason2, ... into one column? How can I (re)code the reasons in R to reflect the multiplicity for some students? Thanks I realize this question is as much about good data conceptualization as it is about specific features of R, but I thought it would be appropriate to ask it here. If you feel it's inappropriate for me to ask the question, please let me know in the comments, and stackoverflow will automatically flood my inbox with sadface emoticons. If I haven't been specific enough, please let me know and I'll do my best to be clearer.

Read the article
Datasets for Running Statistical Analysis on

- by Tal Galili

What datasets exist out on the internet that I can run statistical analysis on?

Read the article
How to use boost normal distribution classes?

- by David Alfonso

Hi all, I'm trying to use boost::normal_distribution in order to generate a normal distribution with mean 0 and sigma 1. The following code uses boost normal classes. Am I using them correctly? #include <boost/random.hpp> #include <boost/random/normal_distribution.hpp> int main() { boost::mt19937 rng; // I don't seed it on purpouse (it's not relevant) boost::normal_distribution<> nd(0.0, 1.0); boost::variate_generator<boost::mt19937&, boost::normal_distribution<> > var_nor(rng, nd); int i = 0; for (; i < 10; ++i) { double d = var_nor(); std::cout << d << std::endl; } } The result on my machine is: 0.213436 -0.49558 1.57538 -1.0592 1.83927 1.88577 0.604675 -0.365983 -0.578264 -0.634376 As you can see all values are not between -1 and 1. Thank you all in advance!

Read the article
need more complex index !

- by silversky

I'm sorry if it's not an appropriate question for this site, and if it's necesary I'll close this question. But maybe someone could give me an ideea: I'm trying to find a more complex index to make an hierarchy. For example: 5 votes from 6 = 83% AND 500 votes from 600 = 83%; 10 votes from 600 = 1.66% If I make a hierarchy with the %, first two will be on the same place, but I think that 83% from 600 it's more valuable than the first one. I could compare 5, 10, 500, but again it's not fair because the third case (10 votes) will be in front of the first case (5 votes), wich it's not fair beacuse the third case has only 1.66% Maybe someone could give me an ideea how to give more weight for the second case but in the same time let the let the new entries have a fair chance.

Read the article
Linear regression confidence intervals in SQL

- by Matt Howells

I'm using some fairly straight-forward SQL code to calculate the coefficients of regression (intercept and slope) of some (x,y) data points, using least-squares. This gives me a nice best-fit line through the data. However we would like to be able to see the 95% and 5% confidence intervals for the line of best-fit (the curves below). What these mean is that the true line has 95% probability of being below the upper curve and 95% probability of being above the lower curve. How can I calculate these curves? I have already read wikipedia etc. and done some googling but I haven't found understandable mathematical equations to be able to calculate this. Edit: here is the essence of what I have right now. --sample data create table #lr (x real not null, y real not null) insert into #lr values (0,1) insert into #lr values (4,9) insert into #lr values (2,5) insert into #lr values (3,7) declare @slope real declare @intercept real --calculate slope and intercept select @slope = ((count(*) * sum(x*y)) - (sum(x)*sum(y)))/ ((count(*) * sum(Power(x,2)))-Power(Sum(x),2)), @intercept = avg(y) - ((count(*) * sum(x*y)) - (sum(x)*sum(y)))/ ((count(*) * sum(Power(x,2)))-Power(Sum(x),2)) * avg(x) from #lr Thank you in advance.

Read the article

< Previous Page | 9 10 11 12 13 14 15 16 17 18 19 20 | Next Page >