Discover periodic patterns in a large data-set

Posted by Miner on Stack Overflow See other posts from Stack Overflow or by Miner
Published on 2010-04-06T01:27:39Z Indexed on 2010/04/06 1:33 UTC
Read the original article Hit count: 418

Filed under:

algorithm

|

algorithm-design

I have a large sequence of tuples on disk in the form (t1, k1) (t2, k2) ... (tn, kn)

ti is a monotonically increasing timestamp and ki is a key (assume a fixed length string if needed). Neither ti nor ki are guaranteed to be unique. However, the number of unique tis and kis is huge (millions). n itself is very large (100 Million+) and the size of k (approx 500 bytes) makes it impossible to store everything in memory.

I would like to find out periodic occurrences of keys in this sequence.

For example, if I have the sequence (1, a) (2, b) (3, c) (4, b) (5, a) (6, b) (7, d) (8, b) (9, a) (10, b)

The algorithm should emit (a, 4) and (b, 2). That is a occurs with a period of 4 and b occurs with a period of 2.

If I build a hash of all keys and store the average of the difference between consecutive timestamps of each key and a std deviation of the same, I might be able to make a pass, and report only the ones that have an acceptable std deviation(ideally, 0). However, it requires one bucket per unique key, whereas in practice, I might have very few really periodic patterns. Any better ways?

© Stack Overflow or respective owner

Related posts about algorithm

Jpeg Algorithm vs BMP Algorithm?

as seen on Super User - Search for 'Super User'
I'm just wonder, what the differences are between creating a BMP file algorithm and JPG file algorithm ? If you know the others images' format algorithm, please post them. Thanks. >>> More
word disambiguation algorithm (Lesk algorithm)

as seen on Stack Overflow - Search for 'Stack Overflow'
Hii.. Can anybody help me to find an algorithm in Java code to find synonyms of a search word based on the context and I want to implement the algorithm with WordNet database. For example, "I am running a Java program". From the context, I want to find the synonyms for the word "running", but the… >>> More
Search algorithm (with a sort algorithm already implemented)

as seen on Stack Overflow - Search for 'Stack Overflow'
Hello, Im doing a Java application and Im facing some doubts in which concerns performance. I have a PriorityQueue which guarantees me the element removed is the one with greater priority. That PriorityQueue has instances of class Event (which implements Comparable interface). Each Event is associated… >>> More
Is there any algorithm for finding LINES by PIXEL COLORS on picture?

as seen on Stack Overflow - Search for 'Stack Overflow'
So I have Image like this I want to get something like this (I hevent drawn all lines I want but I hope you can get my idea) I need algorithm for finding all straight lines on it by just reading colors of pixels. No hard math, no Haar, no Hough. Some algorithm which would be based on points… >>> More
collsion issues with quadtree [on hold]

as seen on Game Development - Search for 'Game Development'
So i implemented a Quad tree in Java for my 2D game and everything works fine except for when i run my collision detection algorithm, which checks if a object has hit another object and which side it hit.My problem is 80% of the time the collision algorithm works but sometimes the objects just go… >>> More

Related posts about algorithm-design

Solutions for exercises in The Algorithm Design Manual

as seen on Stack Overflow - Search for 'Stack Overflow'
Does anybody know where to find solutions for the exercises in the book The Algorithm Design Manual? >>> More
Algorithm design, "randomising" timetable schedule in Python although open to other languages.

as seen on Stack Overflow - Search for 'Stack Overflow'
Before I start I should add I am a musician and not a native programmer, this was undertook to make my life easier. Here is the situation, at work I'm given a new csv file each which contains a list of sound files, their length, and the minimum total amount of time they must be played. I create… >>> More
Algorithm for dynamic combinations

as seen on Stack Overflow - Search for 'Stack Overflow'
My code has a list called INPUTS, that contains a dynamic number of lists, let's call them A, B, C, .. N. These lists contain a dynamic number of Events I would like to call a function with each combination of Events. To illustrate with an example: INPUTS: A(0,1,2), B(0,1), C(0,1,2,3) I need to… >>> More
Disk Search / Sort Algorithm

as seen on Stack Overflow - Search for 'Stack Overflow'
Given a Range of numbers say 1 to 10,000, Input is in random order. Constraint: At any point only 1000 numbers can be loaded to memory. Assumption: Assuming unique numbers. I propose the following efficient , "When-Required-sort Algorithm". We write the numbers into files which are designated… >>> More
Dynamic programming practice problems with solutions

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, I have seen many questions on stackoverflow where dynamic programming technique can be used to make a exponential algorithm, a polynomial one. I have seen standard problems on dynamic programming. Is there a website or book that contains practice problems and solutions? Thanks Bala >>> More