Subset generation by rules
        Posted  
        
            by Sazug
        on Stack Overflow
        
        See other posts from Stack Overflow
        
            or by Sazug
        
        
        
        Published on 2009-12-19T21:10:05Z
        Indexed on 
            2010/04/16
            12:53 UTC
        
        
        Read the original article
        Hit count: 320
        
Let's say that we have a 5000 users in database. User row has sex column, place where he/she was born column and status (married or not married) column.
How to generate a random subset (let's say 100 users) that would satisfy these conditions:
- 40% should be males and 60% - females
- 50% should be born in USA, 20% born in UK, 20% born in Canada, 10% in Australia
- 70% should be married and 30% not.
These conditions are independent, that is we cannot do like this:
- (0.4 * 0.5 * 0.7) * 100 = 14 users that are males, born in USA and married
- (0.4 * 0.5 * 0.3) * 100 = 6 users that are males, born in USA and not married.
Is there an algorithm to this generation?
© Stack Overflow or respective owner