What is a data structure for quickly finding non-empty intersections of a list of sets?

Posted by Andrey Fedorov on Stack Overflow See other posts from Stack Overflow or by Andrey Fedorov
Published on 2010-04-06T22:36:25Z Indexed on 2010/04/06 23:13 UTC
Read the original article Hit count: 368

Filed under:

data-structures

|

sets

|

set-theory

|

optimization

|

mathematical-optimization

I have a set of N items, which are sets of integers, let's assume it's ordered and call it I[1..N]. Given a candidate set, I need to find the subset of I which have non-empty intersections with the candidate.

So, for example, if:

I = [{1,2}, {2,3}, {4,5}]

I'm looking to define valid_items(items, candidate), such that:

valid_items(I, {1}) == {1}
valid_items(I, {2}) == {1, 2}
valid_items(I, {3,4}) == {2, 3}

I'm trying to optimize for one given set I and a variable candidate sets. Currently I am doing this by caching items_containing[n] = {the sets which contain n}. In the above example, that would be:

items_containing = [{}, {1}, {1,2}, {2}, {3}, {3}]

That is, 0 is contained in no items, 1 is contained in item 1, 2 is contained in itmes 1 and 2, 2 is contained in item 2, 3 is contained in item 2, and 4 and 5 are contained in item 3.

That way, I can define valid_items(I, candidate) = union(items_containing[n] for n in candidate).

Is there any more efficient data structure (of a reasonable size) for caching the result of this union? The obvious example of space 2^N is not acceptable, but N or N*log(N) would be.

© Stack Overflow or respective owner

Related posts about data-structures

Clever ways of implementing different data structures in C & data structures that should be used mor

as seen on Stack Overflow - Search for 'Stack Overflow'
What are some clever (not ordinary) ways of implementing data structures in C, and what are some data structures that should be used more often? For example, what is the most effective way (generating minimal overhead) to implement a directed and cyclic graph with weighted edges in C? I know that… >>> More
Is there a way to track data structure dependencies from the database, through the tiers, all the way out to a web page?

as seen on Programmers - Search for 'Programmers'
When we design applications, we generally end up with the same tiered sets of data structures: A persistent data structure that is described using DDL and implemented as RDBMS tables and columns. A set of domain objects that consist primarily of data structures, usually combined with business-rule… >>> More
Why are data structures so important in interviews?

as seen on Programmers - Search for 'Programmers'
I am a newbie into the corporate world recently graduated in computers. I am a java/groovy developer. I am a quick learner and I can learn new frameworks, APIs or even programming languages within considerably short amount of time. Albeit that, I must confess that I was not so strong in data structures… >>> More
Thread-safe data structures

as seen on Stack Overflow - Search for 'Stack Overflow'
Hello, I have to design a data structure that is to be used in a multi-threaded environment. The basic API is simple: insert element, remove element, retrieve element, check that element exists. The structure's implementation uses implicit locking to guarantee the atomicity of a single API call.… >>> More
Data Structures

as seen on Stack Overflow - Search for 'Stack Overflow'
There is a large stream of numbers coming in such as 5 6 7 2 3 1 2 3 .. What kind of data structure is suitable for this problem given the constraints that elements must be inserted in descending order and duplicates should be eliminated. I am not looking for any code just ideas? I was thinking… >>> More

Related posts about sets

Calculate a set of concatenated sets of n sets

as seen on Stack Overflow - Search for 'Stack Overflow'
Okay - I'm not even sure that the term is right - and I'm sure there is bound to be a term for this - but I'll do my best to explain. This is not quite a cross product here, and the order of the results are absolutely crucial. Given: IEnumerable<IEnumerable<string>> sets = new[]… >>> More
Finding subsets that can be completed to tuples without duplicates

as seen on Stack Overflow - Search for 'Stack Overflow'
We have a collection of sets A_1,..,A_n. The goal is to find new sets for each of the old sets. newA_i = {a_i in A_i such that there exist (a_1,..,a_n) in (A1,..,An) with no a_k = a_j for all k and j} So in words this says that we remove all the elements from A_i that can't be used to form a tuple… >>> More
10 Excellent Icon Sets

as seen on Just Skins - Search for 'Just Skins'
Icons are really useful for web design, application interface and more. Everyone loves good looking icons. In this post you will find 10 fresh new icon packs that you can use for your project. Pixelpress Mixed Icons: Social Media Icons By Studio M6: Now Wooden App Icon: Onebit Icon Pack: Fresh… >>> More
EMC Sets Sights on IT Management

as seen on Internet.com - Search for 'Internet.com'
EMC completes its transition from data storage vendor to IT management company with the new Ionix product line. >>> More
EMC Sets Its Sights on Automated Data Management

as seen on Internet.com - Search for 'Internet.com'
EMC wants to be enterprises' one-stop shop for data management, but the company's vision doesn't end with storage. >>> More