Sunday, September 9, 2007

Online selection algorithm. Random Selection in Linked Lists with Unknown Length

Question: Given a finite linked list with unknown length(in terms of nodes count), select one item(multiple nodes) with equal possibility for every node in the linked list.

* One item only.
Assuming the input list has k nodes. The possibility for each node to be selected is 1/k. We need to guarantee that after we visiting m nodes, the possibility for each node is 1/m.



Proof of correctness:
When i = 0, node 0 is selected with 100% possibility.
When i = 1, node 1 has 50% chance to replace node 0 as selection result. Both of node 1 and node 0 will have equal chance to be chosen.
When i = n(which means we are testing the (n+1)th node), before line 3 is executed, all nodes from node 0 to node n-1 have possibility of 1/n to be selected. The (n+1)th node has 1/(n+1) possibility to be selected at line 3. For the previous selected node, it has a possibility of n/(n+1) to survive this round. Multiplying with its chance to be selected: 1/n, we get 1/(n+1) as well,

*Extended: Multiple items.
Assuming we have k nodes in the list and we are targeting a result set of size m,
each node will have possibility of m/k to be added to the result set.


No comments: