Statistical analysis on large data set to be published on the web

Posted by dassouki on Stack Overflow See other posts from Stack Overflow or by dassouki
Published on 2010-04-19T12:58:32Z Indexed on 2010/04/19 13:03 UTC
Read the original article Hit count: 199

I have a non-computer related data logger, that collects data from the field. This data is stored as text files, and I manually lump the files together and organize them. The current format is through a csv file per year per logger. Each file is around 4,000,000 lines x 7 loggers x 5 years = a lot of data. some of the data is organized as bins item_type, item_class, item_dimension_class, and other data is more unique, such as item_weight, item_color, date_collected, and so on ...

Currently, I do statistical analysis on the data using a python/numpy/matplotlib program I wrote. It works fine, but the problem is, I'm the only one who can use it, since it and the data live on my computer.

I'd like to publish the data on the web using a postgres db; however, I need to find or implement a statistical tool that'll take a large postgres table, and return statistical results within an adequate time frame. I'm not familiar with python for the web; however, I'm proficient with PHP on the web side, and python on the offline side.

users should be allowed to create their own histograms, data analysis. For example, a user can search for all items that are blue shipped between week x and week y, while another user can search for sort the weight distribution of all items by hour for all year long.

I was thinking of creating and indexing my own statistical tools, or automate the process somehow to emulate most queries. This seemed inefficient.

I'm looking forward to hearing your ideas

Thanks

© Stack Overflow or respective owner

Related posts about web-development

Related posts about python