Non-linear regression models in PostgreSQL using R

Posted by Dave Jarvis on Stack Overflow See other posts from Stack Overflow or by Dave Jarvis
Published on 2010-05-28T05:07:23Z Indexed on 2010/05/28 5:11 UTC
Read the original article Hit count: 414

Filed under:
|
|
|

Background

I have climate data (temperature, precipitation, snow depth) for all of Canada between 1900 and 2009. I have written a basic website and the simplest page allows users to choose category and city. They then get back a very simple report (without the parameters and calculations section):

The primary purpose of the web application is to provide a simple user interface so that the general public can explore the data in meaningful ways. (A list of numbers is not meaningful to the general public, nor is a website that provides too many inputs.) The secondary purpose of the application is to provide climatologists and other scientists with deeper ways to view the data. (Using too many inputs, of course.)

Tool Set

The database is PostgreSQL with R (mostly) installed. The reports are written using iReport and generated using JasperReports.

Poor Model Choice

Currently, a linear regression model is applied against annual averages of daily data. The linear regression model is calculated within a PostgreSQL function as follows:

SELECT 
  regr_slope( amount, year_taken ),
  regr_intercept( amount, year_taken ),
  corr( amount, year_taken )
FROM
  temp_regression
INTO STRICT slope, intercept, correlation;

The results are returned to JasperReports using:

SELECT
  year_taken,
  amount,
  year_taken * slope + intercept,
  slope,
  intercept,
  correlation,
  total_measurements
INTO result;

JasperReports calls into PostgreSQL using the following parameterized analysis function:

SELECT
  year_taken,
  amount,
  measurements,
  regression_line,
  slope,
  intercept,
  correlation,
  total_measurements,
  execute_time
FROM
  climate.analysis(
    $P{CityId},
    $P{Elevation1},
    $P{Elevation2},
    $P{Radius},
    $P{CategoryId},
    $P{Year1},
    $P{Year2}
  )
ORDER BY year_taken

This is not an optimal solution because it gives the false impression that the climate is changing at a slow, but steady rate.

Questions

Using functions that take two parameters (e.g., year [X] and amount [Y]), such as PostgreSQL's regr_slope:

  • What is a better regression model to apply?
  • What CPAN-R packages provide such models? (Installable, ideally, using apt-get.)
  • How can the R functions be called within a PostgreSQL function?

If no such functions exist:

  • What parameters should I try to obtain for functions that will produce the desired fit?
  • How would you recommend showing the best fit curve?

Keep in mind that this is a web app for use by the general public. If the only way to analyse the data is from an R shell, then the purpose has been defeated. (I know this is not the case for most R functions I have looked at so far.)

Thank you!

© Stack Overflow or respective owner

Related posts about postgresql

Related posts about r