Search Results

Search found 1631 results on 66 pages for 'statistics'.

Page 12/66 | < Previous Page | 8 9 10 11 12 13 14 15 16 17 18 19 | Next Page >

Plot 2 graphs in same plot in R?

- by Sandra Schlichting

I would like to plot y1 and y2 in the same plot. x <- seq(-2, 2, 0.05) y1 <- pnorm(x) y2 <- pnorm(x,1,1) plot(x,y1,type="l",col="red") plot(x,y2,type="l",col="green") But when I do it like this, they are not plotted in the same plot together. In Matlab can one do hold on, but does anyone know how to do this in R?

Read the article
naive bayesian spam filter question

- by Microkernel

Hi guys, I am planning to implement spam filter using Naive Bayesian classification model. Online I see a lot of info on Naive Bayesian classification, but the problem is its a lot of mathematical stuff, than clearly stating how its done. And the problem is I am more of a programmer than a mathematician (yes I had learnt Probability and Bayesian theorem back in school, but out of touch for a long long time, and I don't have luxury of learning it now (Have nearly 3 weeks to come-up with a working prototype)). So if someone can explain or point me to location where its explained for programmers than a mathematician, it would be a great help. PS: By the way I have to implement it in C, if you want to know. :( Regards, Microkernel

Read the article
simple Stata program

- by Cyrus S

I am trying to write a simple program to combine coefficient and standard error estimates from a set of regression fits. I run, say, 5 regressions, and store the coefficient(s) and standard error(s) of interest into vectors (Stata matrix objects, actually). Then, I need to do the following: Find the mean value of the coefficient estimates. Combine the standard error estimates according to the formula suggested for combining results from "multiple imputation". The formula is the square root of the formula for "T" on page 6 of the following document: http://bit.ly/b05WX3 I have written Stata code that does this once, but I want to write this as a function (or "program", in Stata speak) that takes as arguments the vector (or matrix, if possible, to combine multiple estimates at once) of regression coefficient estimates and the vector (or matrix) of corresponding standard error estimates, and then generates 1 and 2 above. Here is the code that I wrote: (breg is a 1x5 vector of the regression coefficient estimates, and sereg is a 1x5 vector of the associated standard error estimates) mat ones = (1,1,1,1,1) mat bregmean = (1/5)*(ones*breg’) scalar bregmean_s = bregmean[1,1] mat seregmean = (1/5)*(ones*sereg’) mat seregbtv = (1/4)*(breg - bregmean#ones)* (breg - bregmean#ones)’ mat varregmi = (1/5)*(sereg*sereg’) + (1+(1/5))* seregbtv scalar varregmi_s = varregmi[1,1] scalar seregmi = sqrt(varregmi_s) disp bregmean_s disp seregmi This gives the right answer for a single instance. Any pointers would be great! UPDATE: I completed the code for combining estimates in a kXm matrix of coefficients/parameters (k is the number of parameters, m the number of imputations). Code can be found here: http://bit.ly/cXJRw1 Thanks to Tristan and Gabi for the pointers.

Read the article
Fitting Gaussian KDE in numpy/scipy in Python

- by user248237

I am fitting a Gaussian kernel density estimator to a variable that is the difference of two vectors, called "diff", as follows: gaussian_kde_covfact(diff, smoothing_param) -- where gaussian_kde_covfact is defined as: class gaussian_kde_covfact(stats.gaussian_kde): def __init__(self, dataset, covfact = 'scotts'): self.covfact = covfact scipy.stats.gaussian_kde.__init__(self, dataset) def _compute_covariance_(self): '''not used''' self.inv_cov = np.linalg.inv(self.covariance) self._norm_factor = sqrt(np.linalg.det(2*np.pi*self.covariance)) * self.n def covariance_factor(self): if self.covfact in ['sc', 'scotts']: return self.scotts_factor() if self.covfact in ['si', 'silverman']: return self.silverman_factor() elif self.covfact: return float(self.covfact) else: raise ValueError, \ 'covariance factor has to be scotts, silverman or a number' def reset_covfact(self, covfact): self.covfact = covfact self.covariance_factor() self._compute_covariance() This works, but there is an edge case where the diff is a vector of all 0s. In that case, I get the error: File "/srv/pkg/python/python-packages/python26/scipy/scipy-0.7.1/lib/python2.6/site-packages/scipy/stats/kde.py", line 334, in _compute_covariance self.inv_cov = linalg.inv(self.covariance) File "/srv/pkg/python/python-packages/python26/scipy/scipy-0.7.1/lib/python2.6/site-packages/scipy/linalg/basic.py", line 382, in inv if info>0: raise LinAlgError, "singular matrix" numpy.linalg.linalg.LinAlgError: singular matrix What's a way to get around this? In this case, I'd like it to return a density that's essentially peaked completely at a difference of 0, with no mass everywhere else. thanks.

Read the article
Where can I find free and open data?

- by kitsune

Sooner or later, coders will feel the need to have access to "open data" in one of their projects, from knowing a city's zip to a more obscure information such as the axial tilt of Pluto. I know data.un.org which offers access to the UN's extensive array of databases that deal with human development and other socio-economic issues. The other usual suspects are NASA and the USGS for planetary data. There's an article at readwriteweb with more links. infochimps.org seems to stand out. Personally, I need to find historic commodity prices, stock values and other financial data. All these data sets seem to cost money however. Clarification To clarify, I'm interested in all kinds of open data, because sooner or later, I know I will be in a situation where I could need it. I will try to edit this answer and include the suggestions in a structured manners. A link for financial data was hidden in that readwriteweb article, doh! It's called opentick.com. Looks good so far! Update I stumbled over semantic data in another question of mine on here. There is opencyc ('the world's largest and most complete general knowledge base and commonsense reasoning engine'). A project called UMBEL provides a light-weight, distilled version of opencyc. Umbel has semantic data in rdf/owl/skos n3 syntax. The Worldbank also released a very nice API. It offers data from the last 50 years for about 200 countries

Read the article
How to know if two stocks move togheter?

- by Damiano

Hello, I have two stocks with their prices, example: STOCK1: 10.56 11.23 12.32 8.90 STOCK2: 1.26 5.80 3.26 10.3 I only found Pearson correlation, but, is there another method to know if two stocks move togheter? (esample: co-integration??) Thank you so much!

Read the article
Latent Dirichlet Allocation, pitfalls, tips and programs

- by Gregg Lind

I'm experimenting with Latent Dirichlet Allocation for topic disambiguation and assignment, and I'm looking for advice. Which program is the "best", where best is some combination of easiest to use, best prior estimation, fast How do I incorporate my intuitions about topicality. Let's say I think I know that some items in the corpus are really in the same category, like all articles by the same author. Can I add that into the analysis? Any unexpected pitfalls or tips I should know before embarking? I'd prefer is there are R or Python front ends for whatever program, but I expect (and accept) that I'll be dealing with C.

Read the article
How to get statistical distributions out of C++ Code?

- by Bader

I want some help in programming a random generator for different types of distribution using C++ language. for the following: Geometric distribution Hypergeometric distribution Weibull distribution Rayleigh distribution Erlang distribution Gamma distribution Poisson distribution Thanks.

Read the article
Student's t distribution in JavaScript

- by Sai Emrys

Google Spreadsheets currently does not support the standard function TDIST - i.e. the Student's t-distribution. This function is critical for calculating p-values. It seems that this is related to the fact that no integral-using functions (AFAICT) are implemented either. However, Google Docs allows people to add and publish their own scripts, in JavaScript. So ideally we should have something like: function tdist(t_value, degrees_of_freedom, two_tailed [defaults true]) {...} Anyone know of either an extant implementation of this (my google-fu has not turned up one, but may be weaker than yours) or a good idea for how to do it? I'd like to publish this together with some other useful functions that are currently calculable but a bit of a pain (like Student's t-test itself). Thanks!

Read the article
science.js’s loess() output is identical to input

- by user3710111

Rendered project available here. The line is supposed to be a trend line (as rendered with LOESS), but it merely follows each data point instead. I am no stats wonk, so maybe it makes sense that a LOESS function’s output would match the input as seen in the above example, but it strikes me as being wrong. Here is the relevant bit of code: var loess = science.stats.loess().bandwidth(.2); var xVal = data.map(function(d) { return d.date; }); var yVal = data.map(function(d) { return d.A; }); var loessData = loess([xVal], [yVal])[0]; console.log(yVal); console.log(loessData);

Read the article
Is there a Pair-Wise PostHoc Comparisons for the Chi-Square Test in R?

- by Tal Galili

Hi all, I am wondering if there exists in R a package/function to perform the: "Post Hoc Pair-Wise Comparisons for the Chi-Square Test of Homogeneity of Proportions" (or an equivalent of it) Which is described here: http://epm.sagepub.com/cgi/content/abstract/53/4/951 My situation is of just making a chi test, on a 2 by X matrix. I found a difference, but I want to know which of the columns is "responsible" for the difference. Thanks, Tal

Read the article
Non-linear regression models in PostgreSQL using R

- by Dave Jarvis

Background I have climate data (temperature, precipitation, snow depth) for all of Canada between 1900 and 2009. I have written a basic website and the simplest page allows users to choose category and city. They then get back a very simple report (without the parameters and calculations section): The primary purpose of the web application is to provide a simple user interface so that the general public can explore the data in meaningful ways. (A list of numbers is not meaningful to the general public, nor is a website that provides too many inputs.) The secondary purpose of the application is to provide climatologists and other scientists with deeper ways to view the data. (Using too many inputs, of course.) Tool Set The database is PostgreSQL with R (mostly) installed. The reports are written using iReport and generated using JasperReports. Poor Model Choice Currently, a linear regression model is applied against annual averages of daily data. The linear regression model is calculated within a PostgreSQL function as follows: SELECT regr_slope( amount, year_taken ), regr_intercept( amount, year_taken ), corr( amount, year_taken ) FROM temp_regression INTO STRICT slope, intercept, correlation; The results are returned to JasperReports using: SELECT year_taken, amount, year_taken * slope + intercept, slope, intercept, correlation, total_measurements INTO result; JasperReports calls into PostgreSQL using the following parameterized analysis function: SELECT year_taken, amount, measurements, regression_line, slope, intercept, correlation, total_measurements, execute_time FROM climate.analysis( $P{CityId}, $P{Elevation1}, $P{Elevation2}, $P{Radius}, $P{CategoryId}, $P{Year1}, $P{Year2} ) ORDER BY year_taken This is not an optimal solution because it gives the false impression that the climate is changing at a slow, but steady rate. Questions Using functions that take two parameters (e.g., year [X] and amount [Y]), such as PostgreSQL's regr_slope: What is a better regression model to apply? What CPAN-R packages provide such models? (Installable, ideally, using apt-get.) How can the R functions be called within a PostgreSQL function? If no such functions exist: What parameters should I try to obtain for functions that will produce the desired fit? How would you recommend showing the best fit curve? Keep in mind that this is a web app for use by the general public. If the only way to analyse the data is from an R shell, then the purpose has been defeated. (I know this is not the case for most R functions I have looked at so far.) Thank you!

Read the article
redirect user, then log his visit using php and mysql

- by Bart van Heukelom

I have a PHP redirect page to track clicks on links. Basically it does: - get url from $_GET - connect to database - create row for url, or update with 1 hit if it exists - redirect browser to url using Location: header I was wondering if it's possible to send the redirect to the client first, so it can get on with it's job, and then log the click. Would simply switching things around work?

Read the article
probability and relative frequency

- by Alexandru

If I use relative frequency to estimate the probability of an event, how good is my estimate based on the number of experiments? Is standard deviation a good measure? A paper/link/online book would be perfect. http://en.wikipedia.org/wiki/Frequentist

Read the article
Generation of an array with defined Min, Max, Mean and Stdev with given number of elements and error

- by Viet

I'd like to generate an array with defined Min, Max, Mean and Stdev with given number of elements and error level. Is there such a library in C, C++, PHP or Python to do so? Please kindly advise. Thanks!

Read the article
How is the iPad going to be classified - as a mobile platform or a desktop platform?

- by Tony Eichelberger

I sometimes use the following site to look at browser and OS trends http://gs.statcounter.com/. It got me thinking about how the iPad is going to be classified, as a mobile platform or a desktop platform, or is it going to spark a new category. Since it runs iPhone OS, it could be considered a mobile device, but I have a hard time with that because of the screen size. What should iPad be classified as: Mobile, Desktop, or Other (Try to come up with a good name for Other)?

Read the article
Multiple outliers for two variable linear regression

- by Dave Jarvis

Problem Building on my previous question, the "extreme" outliers in the following graph are somewhat obvious: Question Given: T - Set of all temperatures Y - Set of all years ST - Sum of temperatures. SY - Sum of years. N - Number of elements T(n) - Temperature of the nth element in the temperature set How would you implement an efficient MySQL stored procedure or user-defined function (UDF) to determine if T(n) is an outlier? (If such an implementation already exists, that would be good to know as well.) Related Sites I am slowly working through these sites to get a better understanding of the problem: Multiple Outliers Detection Procedures in Linear Regression M-estimator Measure of Surprise for Outlier Detection Ordinary Least Squares Linear Regression Many thanks!

Read the article
MySQL Volleyball Standings

- by Torez

I have a database table full of game by game results and want to know if I can calculate the following: GP (games played) Wins Loses Points (2 points for each win, 1 point for each lose) Here is my table structure: CREATE TABLE `results` ( `id` int(10) unsigned NOT NULL auto_increment, `home_team_id` int(10) unsigned NOT NULL, `home_score` int(3) unsigned NOT NULL, `visit_team_id` int(10) unsigned NOT NULL, `visit_score` int(3) unsigned NOT NULL, PRIMARY KEY (`id`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=7 ; And a few testing results: INSERT INTO `results` VALUES(1, 1, 21, 2, 25); INSERT INTO `results` VALUES(2, 3, 21, 4, 17); INSERT INTO `results` VALUES(3, 1, 25, 3, 9); INSERT INTO `results` VALUES(4, 2, 7, 4, 22); INSERT INTO `results` VALUES(5, 1, 19, 4, 20); INSERT INTO `results` VALUES(6, 2, 24, 3, 26); Here is what a final table would look like: +-------------------+----+------+-------+--------+ | Team Name | GP | Wins | Loses | Points | +-------------------+----+------+-------+--------+ | Spikers | 4 | 4 | 0 | 8 | | Leapers | 4 | 2 | 2 | 6 | | Ground Control | 4 | 1 | 3 | 5 | | Touch Guys | 4 | 0 | 4 | 4 | +-------------------+----+------+-------+--------+

Read the article
Beginner SQL question: arithmetic with multiple COUNT(*) results

- by polygenelubricants

Continuing with the spirit of using the Stack Exchange Data Explorer to learn SQL, (see: Can we become our own “Northwind” for teaching SQL / databases?), I've decided to try to write a query to answer a simple question (on meta): What % of stackoverflow users have over 10,000 rep?. Here's what I've done: Query#1 SELECT COUNT(*) FROM Users WHERE Users.Reputation >= 10000 Result: 556 Query#2 SELECT COUNT(*) FROM USERS Result: 227691 Now, how do I put them together into one query? What is this query idiom called? What do I need to write so I can get, say, a one-row three-column result like this: 556 227691 0,00244190592

Read the article
Regressing panel data in SAS.

- by John

Hey Guys, thanks to your help I succesfully managed all my databases! I am now looking at a panel data set on which I have to regress. Since I only started my Phd this semester together with the econometrics courses I am still new to many statistic applications and regression methods. I want to do a simple regression as in Y = x1 x2 x3 etc, now I already browsed through some literature and found that for panel data it's common to do a fixed effects regression. Also, my Y variable only has positive values so I was thinking in the direction of a Tobit model? I'm doing some research concerning the coverage of analysts in the financial business. My independent variable is the coverage of analysts on a certain firm, so per observation i have 1 analyst and 1 firm, together with different characteristics(market cap and betas etc) of the firm. All this data is monthly. As coverage cannot become negative (only 0) I was thinking of a Tobit model? Do you guys have any ideas what would be a good regression method? Or have some good sources (e books, written books, through university I have access to almost anything concerning my field of work) of information (cause I do have to learn these things for future research)? Thanks!

Read the article
Probability distribution for sms answer delays

- by Thomas Ahle

I'm writing an app using sms as communication. I have chosen to subscribe to an sms-gateway, which provides me with an API for doing so. The API has functions for sending as well as pulling new messages. It does however not have any kind of push functionality. In order to do my queries most efficient, I'm seeking data on how long time people wait before they answer a text message - as a probability function. Extra info: The application is interactive (as can be), so I suppose the times will be pretty similar to real life human-human communication. I don't believe differences in personal style will play a big impact on the right times and frequencies to query, so average data should be fine.

Read the article
How do you compare the "similarity" between two dendrograms (in R) ?

- by Tal Galili

I have two dendrograms which I wish to compare to each other in order to find out how "similar" they are. But I don't know of any method to do so (let alone a code to implement it, say, in R). Any leads ? Thanks, Tal

Read the article
Histogram matching - image processing - c/c++

- by Raj

Hello I have two histograms. int Hist1[10] = {1,4,3,5,2,5,4,6,3,2}; int Hist1[10] = {1,4,3,15,12,15,4,6,3,2}; Hist1's distribution is of type multi-modal; Hist2's distribution is of type uni-modal with single prominent peak. My questions are Is there any way that i could determine the type of distribution programmatically? How to quantify whether these two histograms are similar/dissimilar? Thanks

Read the article
Generation of an array of Random numbers with defined Min, Max, Mean and Stdev with given number of

- by Viet

I'd like to generate an array of Random numbers with defined Min, Max, Mean and Stdev with given number of elements and error level. Is there such a library in C, C++, PHP or Python to do so? Please kindly advise. Thanks!

Read the article
How can I loop through variables in SPSS? I want to avoid code duplication.

- by chucknelson

Is there a "native" SPSS way to loop through some variable names? All I want to do is take a list of variables (that I define) and run the same procedure for them: pseudo-code - not really a good example, but gets the point across... for i in varlist['a','b','c'] do FREQUENCIES VARIABLES=varlist[i] / ORDER=ANALYSIS. end I've noticed that people seem to just use R or Python SPSS plugins to achieve this basic array functionality, but I don't know how soon I can get those configured (if ever) on my installation of SPSS. SPSS has to have some native way to do this...right?

Read the article

Search Results

Search found 1631 results on 66 pages for 'statistics'.

Page 12/66 | < Previous Page | 8 9 10 11 12 13 14 15 16 17 18 19 | Next Page >

- by Sandra Schlichting

- by Microkernel

- by Cyrus S

- by user248237

- by kitsune

- by Damiano

- by Gregg Lind

- by Bader

- by Sai Emrys

- by user3710111

- by Tal Galili

- by Dave Jarvis

- by Bart van Heukelom

- by Alexandru

- by Viet

- by Tony Eichelberger

- by Dave Jarvis

- by Torez

- by polygenelubricants

- by John

- by Thomas Ahle

- by Tal Galili

- by Raj

- by Viet

- by chucknelson

< Previous Page | 8 9 10 11 12 13 14 15 16 17 18 19 | Next Page >