best way to statistically detect anomalies in data

Posted by reinier on Stack Overflow See other posts from Stack Overflow or by reinier
Published on 2009-08-20T15:01:53Z Indexed on 2010/05/18 2:41 UTC
Read the original article Hit count: 405

Filed under:

statistics

|

database

|

data-warehouse

Hi,

our webapp collects huge amount of data about user actions, network business, database load, etc etc etc

All data is stored in warehouses and we have quite a lot of interesting views on this data.

if something odd happens chances are, it shows up somewhere in the data.

However, to manually detect if something out of the ordinary is going on, one has to continually look through this data, and look for oddities.

My question: what is the best way to detect changes in dynamic data which can be seen as 'out of the ordinary'.

Are bayesan filters (I've seen these mentioned when reading about spam detection) the way to go?

Any pointers would be great!

EDIT: To clarify the data for example shows a daily curve of database load. This curve typically looks similar to the curve from yesterday In time this curve might change slowly.

It would be nice that if the curve from day to day changes say within some perimeters, a warning could go off.

R

© Stack Overflow or respective owner

Related posts about statistics

SQL SERVER – When are Statistics Updated – What triggers Statistics to Update

as seen on SQL Authority - Search for 'SQL Authority'
If you are an SQL Server Consultant/Trainer involved with Performance Tuning and Query Optimization, I am sure you have faced the following questions many times. When is statistics updated? What is the interval of Statistics update? What is the algorithm behind update statistics? These are the puzzling… >>> More
SQL SERVER When are Statistics Updated What triggers Statistics to Update

as seen on Dot net Slackers - Search for 'Dot net Slackers'
If you are an SQL Server Consultant/Trainer involved with Performance Tuning and Query Optimization, I am sure you have faced the following questions many times.When is statistics updated? What is the interval of Statistics update? What is the algorithm behind update statistics? These are the puzzling… >>> More
New Whitepaper: Best Practices for Gathering EBS Database Statistics

as seen on Oracle Blogs - Search for 'Oracle Blogs'
Most Oracle Applications DBAs and E-Business Suite users understand the importance of accurate database statistics. Missing, stale or skewed statistics can adversely affect performance. Oracle E-Business Suite statistics should only be gathered using FND_STATS or the Gather Statistics… >>> More
Incremental Statistics Maintenance – what statistics will be gathered after DML occurs on the table?

as seen on Oracle Blogs - Search for 'Oracle Blogs'
Incremental statistics maintenance was introduced in Oracle Database 11g to improve the performance of gathering statistics on large partitioned table. When incremental statistics maintenance is enabled for a partitioned table, oracle accurately generated global level statistics by aggregating… >>> More
Lies, damned lies, and statistics Part 2

as seen on Oracle Blogs - Search for 'Oracle Blogs'
There was huge interest in our OOW session last year on Managing Optimizer Statistics. It seems statistics and the maintenance of them continues to baffle people. In order to help dispel the mysteries surround statistics management we have created a two part white paper series on Optimizer statistics… >>> More

Related posts about database

SQL SERVER Retrieve and Explore Database Backup without Restoring Database Idera virtual database

as seen on Dot net Slackers - Search for 'Dot net Slackers'
I recently downloaded Ideras SQL virtual database, and tested it. There are a few things about this tool which caught my attention.My ScenarioIt is quite common in real life that sometimes observing or retrieving older data is necessary; however, it had changed as time passed by. The full database… >>> More
Cloning A Database On The Same Server Using Rman Duplicate From Active Database

as seen on Oracle Blogs - Search for 'Oracle Blogs'
To clone a database using Rman we used to require an existing Rman backup, on 11g we can clone databases using the "from active" database option. In this case we do not require an existing backup, the active datafiles will be used as the source for the clone. In order to clone with the source database… >>> More
cPickle ImportError: No module named multiarray

as seen on Stack Overflow - Search for 'Stack Overflow'
Hello, I'm using cPickle to save my Database into file. The code looks like that: def Save_DataBase(): import cPickle from scipy import * from numpy import * a=Results.VersionName #filename='D:/results/'+a[a.find('/')+1:-a.find('/')-2]+Results.AssType[:3]+str(random.randint(0,100))+Results.Distribution+"… >>> More
SQL SERVER – 2008 – Introduction to Snapshot Database – Restore From Snapshot

as seen on SQL Authority - Search for 'SQL Authority'
Snapshot database is one of the most interesting concepts that I have used at some places recently. Here is a quick definition of the subject from Book On Line: A Database Snapshot is a read-only, static view of a database (the source database). Multiple snapshots can exist on a source database and… >>> More
OTN ???? ?????? ???????

as seen on Oracle Blogs - Search for 'Oracle Blogs'
Database ?? Database ??????? Database ?????????? Java WebLogic Server/????????·???? SOA/BPM/????? ???????/???? ID??/?????? ?????EPM/BI EPM/BI ??????? EPM/BI ???? OS/??? ???? ????? MySQL Database ?? ???? ?? ????????? ??? ?? ORACLE MASTER… >>> More