Sorting versus hashing

Posted by Paul Siegel on Programmers See other posts from Programmers or by Paul Siegel
Published on 2013-08-02T15:03:50Z Indexed on 2013/08/02 16:02 UTC
Read the original article Hit count: 330

Filed under:
|

My problem is as follows. I have an array of n strings with m < n of them distinct. I want to create a one-to-one function which assigns each of the m distinct strings to the numbers 0 ... m-1. For example, if my strings are:

Bob, Amy, Bob, Charlie, Amy

then the function:

Bob -> 0, Amy -> 1, Charlie -> 2

would meet my needs. I have thought of three possible approaches:

  1. Sort the list of strings, remove duplicates, and construct the function using a search algorithm.

  2. Create a hash table and check each string to see if it is already in the table before inserting it.

  3. Sort the list of strings, remove duplicates, and put the resulting list into a hash table.

My code will be written in Java, and I will likely use standard Java algorithms: merge sort for sorting, binary search for searching, and whatever the standard Java hash table algorithm is.

Question: Assume that after creating the function I will have to evaluate it on each of the n original strings. Which of the three approaches is fastest? Is there a better way?

Part of the problem is that I don't really know what's going on "under the hood" in standard hashing algorithms. Any help would be appreciated.

© Programmers or respective owner

Related posts about sorting

Related posts about hashing