Count unique visitors by group of visited places
        Posted  
        
            by 
                Mathieu
            
        on Programmers
        
        See other posts from Programmers
        
            or by Mathieu
        
        
        
        Published on 2014-06-12T14:37:35Z
        Indexed on 
            2014/06/12
            15:38 UTC
        
        
        Read the original article
        Hit count: 389
        
I'm facing the problem of counting the unique visitors of groups of places.
Here is the situation:
I have visitors that can visit places. For example, that can be internet users visiting web pages, or customers going to restaurants. A visitor can visit as much places as he wishes, and a place can be visited by several visitors. A visitor can come to the same place several times.
The places belong to groups. A group can obviously contain several places, and places can belong to several groups.
Given that, for each visitor, we can have a list of visited places, how can I have the number of unique visitors per group of places?
Example: I have visitors A, B, C and D; and I have places x, y and z.
I have these visiting lists:
[
 A -> [x,x,y,x],
 B -> [],
 C -> [z,z],
 D -> [y,x,x,z]
]
Having these number of unique visitors per place is quite easy:
[
 x -> 2, // A and D visited x
 y -> 2, // A and D visited y
 z -> 2  // C and D visited z
]
But if I have these groups:
[
 G1 -> [x,y,z],
 G2 -> [x,z],
 G3 -> [x,y]
]
How can I have this information?
[
 G1 -> 3, // A, C and D visited x or y or z
 G2 -> 3, // A, C and D visited x or z
 G3 -> 2  // A and D visited x or y
]
Additional notes :
- There are so many places that it is not possible to store information about every possible group;
 - It's not a problem if approximation are made. I don't need 100% precision. Having a fast algorithm that tells me that there were 12345 visits in a group instead of 12543 is better than a slow algorithm telling the exact number. Let's say there can be ~5% deviation.
 
Is there an algorithm or class of algorithms that addresses this type of problem?
© Programmers or respective owner