Representing Sparse Data in PostgreSQL

Posted by Chris S on Stack Overflow See other posts from Stack Overflow or by Chris S
Published on 2010-04-07T14:27:08Z Indexed on 2010/04/08 11:03 UTC
Read the original article Hit count: 410

Filed under:

postgresql

|

sparse-matrix

|

relational-database

|

sql

What's the best way to represent a sparse data matrix in PostgreSQL? The two obvious methods I see are:

Store data in a single a table with a separate column for every conceivable feature (potentially millions), but with a default value of NULL for unused features. This is conceptually very simple, but I know that with most RDMS implementations, that this is typically very inefficient, since the NULL values ususually takes up some space. However, I read an article (can't find its link unfortunately) that claimed PG doesn't take up data for NULL values, making it better suited for storing sparse data.
Create separate "row" and "column" tables, as well as an intermediate table to link them and store the value for the column at that row. I believe this is the more traditional RDMS solution, but there's more complexity and overhead associated with it.

I also found PostgreDynamic, which claims to better support sparse data, but I don't want to switch my entire database server to a PG fork just for this feature.

Are there any other solutions? Which one should I use?

© Stack Overflow or respective owner

Related posts about postgresql

Postgresql fails to start on Ubuntu 10.04.4 LTS

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I installed postgresql 9.2 from add-apt-repository ppa:pitti/postgresql using apt-get install postgresql-9.2 At the end of the install and every time I try to launch postgresql by using the following command /etc/init.d/postgresql start or service postgresql start I get this error: Error:… >>> More
can't install psycopg2 in my env on mac os x lion

as seen on Server Fault - Search for 'Server Fault'
I tried install psycopg2 via pip in my virtual env, but got this error: ld: library not found for -lpq (full log here: http://pastebin.com/XdmGyJ4u ) I tried install postgres 9.1 from .dmg and via port, (gksks)iMac-Alexander:~ lorddaedra$ locate libpq /Developer/SDKs/MacOSX10.7.sdk/usr/include/libpq /Developer/SDKs/MacOSX10… >>> More
Postgresql has broken apt-get on Ubuntu

as seen on Super User - Search for 'Super User'
On ubuntu 12.04, whenever I try to install a package using apt-get I'm greeted by: The following packages have unmet dependencies: postgresql-9.1 : Depends: postgresql-client-9.1 but it is not going to be instal led E: Unmet dependencies. Try 'apt-get -f install' with no packages (or specify a so lution)… >>> More
Installing PostgreSQL on FreeBSD (with ports)

as seen on Server Fault - Search for 'Server Fault'
Hey everyone, I am trying to install (using ports) PostgreSQL on a virtual server, running FreeBSD. My one question is this: Which of the following should I install? postgresql-contrib postgresql-docs postgresql-jdbc postgresql-libpgeasy postgresql-libpq++ postgresql-libpqxx postgresql-odbc … >>> More
Strange permission errors in new PostgreSQL installation

as seen on Server Fault - Search for 'Server Fault'
A freshly installed PostgreSQL (with configuration overwritten) won't start: $ sudo service postgresql start * Starting PostgreSQL 9.1 database server * Error: could not read /etc/postgresql/9.1/main/postgresql.conf: Permission denied Looks like it should be able to read it though: $ ls -l postgresql… >>> More

Related posts about sparse-matrix

In R, when using named rows, can a sparse matrix column be added to another sparse matrix?

as seen on Stack Overflow - Search for 'Stack Overflow'
I have two sparse matrices, m1 and m2: > m1 <- Matrix(data=0,nrow=2, ncol=1, sparse=TRUE, dimnames=list(c("b","d"),NULL)) > m2 <- Matrix(data=0,nrow=2, ncol=1, sparse=TRUE, dimnames=list(c("a","b"),NULL)) > m1["b",1]<- 4 > m2["a",1]<- 5 > m1 2 x 1 sparse Matrix of class… >>> More
Find the "largest" dense sub matrix in a large sparse matrix

as seen on Stack Overflow - Search for 'Stack Overflow'
Given a large sparse matrix (say 10k+ by 1M+) I need to find a subset, not necessarily continuous, of the rows and columns that form a dense matrix (all non-zero elements). I want this sub matrix to be as large as possible (not the largest sum, but the largest number of elements) within some aspect… >>> More
MATLAB: Convert two array to a sparse matrix

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm looking for an a command or trick to convert two arrays to a sparse matrix. The two arrays contain x-values and y-values, which gives a coordinate in the cartesian coordinate system. I want to group the coordinates, which if the value is between some value on the x-axes and the y-axes. % MATLAB x_i… >>> More
best way to get a vector from sparse matrix

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, I have a m x n matrix where each row consists of zeros and same values for each row. an example would be: M = -0.6 1.8 -2.3 0 0 0; 0 0 0 3.4 -3.8 -4.3; -0.6 0 0 3.4 0 0 In this example the first column consists of 0s and -0.6, second 0 and 1.8, third -2.3 and so on. In such case I would like… >>> More
Vectorizatoin of index operation for a scipy.sparse matrix

as seen on Stack Overflow - Search for 'Stack Overflow'
The following code runs too slowly even though everything seems to be vectorized. from numpy import * from scipy.sparse import * n = 100000; i = xrange(n); j = xrange(n); data = ones(n); A=csr_matrix((data,(i,j))); x = A[i,j] The problem seems to be that the indexing operation is implemented… >>> More