Count unique visitors by group of visited places

Posted by Mathieu on Programmers See other posts from Programmers or by Mathieu
Published on 2014-06-12T14:37:35Z Indexed on 2014/06/12 15:38 UTC
Read the original article Hit count: 304

Filed under:
|
|

I'm facing the problem of counting the unique visitors of groups of places.

Here is the situation:

I have visitors that can visit places. For example, that can be internet users visiting web pages, or customers going to restaurants. A visitor can visit as much places as he wishes, and a place can be visited by several visitors. A visitor can come to the same place several times.

The places belong to groups. A group can obviously contain several places, and places can belong to several groups.

Given that, for each visitor, we can have a list of visited places, how can I have the number of unique visitors per group of places?

Example: I have visitors A, B, C and D; and I have places x, y and z.

I have these visiting lists:

[
 A -> [x,x,y,x],
 B -> [],
 C -> [z,z],
 D -> [y,x,x,z]
]

Having these number of unique visitors per place is quite easy:

[
 x -> 2, // A and D visited x
 y -> 2, // A and D visited y
 z -> 2  // C and D visited z
]

But if I have these groups:

[
 G1 -> [x,y,z],
 G2 -> [x,z],
 G3 -> [x,y]
]

How can I have this information?

[
 G1 -> 3, // A, C and D visited x or y or z
 G2 -> 3, // A, C and D visited x or z
 G3 -> 2  // A and D visited x or y
]

Additional notes :

  1. There are so many places that it is not possible to store information about every possible group;
  2. It's not a problem if approximation are made. I don't need 100% precision. Having a fast algorithm that tells me that there were 12345 visits in a group instead of 12543 is better than a slow algorithm telling the exact number. Let's say there can be ~5% deviation.

Is there an algorithm or class of algorithms that addresses this type of problem?

© Programmers or respective owner

Related posts about algorithms

Related posts about statistics