Wednesday, March 23, 2011

Selecting the largest/smallest N numbers out of M numbers

Given sufficient amount of memory to hold all M.

  1. Your gut feeling that this can be a heap sort. O(n*log(n)).
    Don't be shy, say it out load, it is indeed correct and efficient. But also, do not settle.

  2. Your can also do this by using quick sort, by adjusting the next pivot accordingly after partition. Still O(n*log(n)), but in real world the gain here is actually quite significant as you can ignore at least one of the two partitions each time.

  3. In #2, you may have noticed the fact that you are not required to sort the N number. But taking that one step further. You can apply Selection Sort, which is O(n), to find the Nth number during one iteration, and you should have guessed what steps to follow when you have the Nth number at hand.
Even if you want to have the N number sorted, #3 is still the optimal solution as you can take the N number and sort them in O(Nlog(N)) time, which is smaller than O(Mlog(M)) as N.

Update: It seems that I was wrong about #2. #2 is actually linear. See here. Randomized-Select is the name, and it is introduced in Chapter 9 of "Introduction of Algorithms" 2nd Edition.

Read the book, as always...

No comments: