Tuesday, August 23, 2011

find the only duplicate number in an array

Another interesting algorithm question. The question is this: we have an array $numbers with 1001 numbers as elements, all these numbers are between [1, 1000], including 1 and 1000. Now we know there is only one number appears twice in this array, and we need to find out what this number is (We are not allowed to modify $numbers).

So the array is like $numbers = array(1,2,3...,1000); count($numbers) = 1001; find out the only number that appears twice. The code to generate this $numbers array is below:

<?php
$limit          = 1000;
$numbers        = array();
//randomly generate duplicate number
$duplicate      = mt_rand(1, $limit);
//randomly generate the position of the duplicate number
$randomPosition = mt_rand(0, $limit-1);

for($i = 0; $i < $limit; $i++) {
    $numbers[] = $i + 1;
    if ($i === $randomPosition) {
        $numbers[] = $duplicate;
    }
}

To see what $numbers array is like, we can change the $limit = 10 to make it easier.

I guess this is supposed to be an interview question for C/C++ devs. When it comes to PHP, well, we know PHP provides the most powerful array functions among all these programming languages.

So, first, see how can we do it quickly with help of PHP array functions.

$unique = array_unique($numbers);
$diff = array_diff_key($numbers, $unique);
echo $numbers[$diff];

If we don't consider what array_unique and array_diff_key really do in the background, simply from the standpoint of PHP, we need constant steps(3 steps) to get it done. So the time complexity looks like O(1), although this is really misleading.

What if we are not allowed to use any special PHP array functions?

1. Use a temporary storage

$temp = array();
for($i=0; $i<=$limit; $i++) {
                $value = $numbers[$i];
                if (isset($temp[$value])) {
                                //we find it!
                                echo $value;
                                break;
                }
                $temp[$value] = $value;
}

In this way, the worst case of time complexity is O(n). But the worst case of space complexity is O(n) as well. That means this algorithm may cost a lot of memories (we need another array as a temporary storage).

2. Another idea is, we can get the sum of 1 - 1000, say $s1; get the sum of numbers in $numbers array, say $s2; the duplicate number must be $s2 - $s1. Since 1 - 1000 is an arithmetic sequence (http://en.wikipedia.org/wiki/Arithmetic_series), we have a formula to get its sum easily.

$s1 = ($limit + 1) * $limit / 2;
$s2 = 0;
for($i=0; $i<=$limit; $i++) {
            $s2 += $numbers[$i];
}
echo $s2 - $s1;

For $s2, as i said, not allowed to use special PHP array functions, so array_sum() is not allowed here.
The time complexity is always O(n) in this way. So it is not as fast as the first one. However, the space complexity is O(1), actually, nearly to none. We don't need too much extra space to get the task done.

No comments: