How to keep only duplicates efficiently?

Posted by Marc Eaddy on Stack Overflow See other posts from Stack Overflow or by Marc Eaddy
Published on 2010-03-31T03:05:31Z Indexed on 2010/03/31 3:13 UTC
Read the original article Hit count: 413

Filed under:
|
|
|
|

Given an STL vector, I'd like an algorithm that outputs only the duplicates in sorted order, e.g.,

INPUT : { 4, 4, 1, 2, 3, 2, 3 }
OUTPUT: { 2, 3, 4 }

The algorithm is trivial, but the goal is to make it as efficient as std::unique(). My naive implementation modifies the container in-place:

My naive implementation:

void keep_duplicates(vector<int>* pv)
{
    // Sort (in-place) so we can find duplicates in linear time
    sort(pv->begin(), pv->end());

    vector<int>::iterator it_start = pv->begin();
    while (it_start != pv->end())
    {
        size_t nKeep = 0;

        // Find the next different element
        vector<int>::iterator it_stop = it_start + 1;
        while (it_stop != pv->end() && *it_start == *it_stop)
        {
            nKeep = 1; // This gets set redundantly
            ++it_stop;
        }

        // If the element is a duplicate, keep only the first one (nKeep=1).
        // Otherwise, the element is not duplicated so erase it (nKeep=0).
        it_start = pv->erase(it_start + nKeep, it_stop);
    }
}

If you can make this more efficient, elegant, or general, please let me know. For example, a custom sorting algorithm, or copy elements in the 2nd loop to eliminate the erase() call.

© Stack Overflow or respective owner

Related posts about c++

Related posts about stl