removing duplicate strings from a massive array in java efficiently?
        Posted  
        
            by 
                Preator Darmatheon
            
        on Stack Overflow
        
        See other posts from Stack Overflow
        
            or by Preator Darmatheon
        
        
        
        Published on 2012-04-06T15:29:24Z
        Indexed on 
            2012/04/06
            23:29 UTC
        
        
        Read the original article
        Hit count: 429
        
java
|interview-questions
I'm considering the best possible way to remove duplicates from an (Unsorted) array of strings - the array contains millions or tens of millions of stringz..The array is already prepopulated so the optimization goal is only on removing dups and not preventing dups from initially populating!!
I was thinking along the lines of doing a sort and then binary search to get a log(n) search instead of n (linear) search. This would give me nlogn + n searches which althout is better than an unsorted (n^2) search = but this still seems slow. (Was also considering along the lines of hashing but not sure about the throughput)
Please help! Looking for an efficient solution that addresses both speed and memory since there are millions of strings involved without using Collections API!
© Stack Overflow or respective owner