Wednesday, October 31, 2012

Heap Sort

A few days ago I was reviewing the heap sort algorithm when I came across this piece of code:
 public void sort(int[] array) {
  for(int i = (array.length/2) - 1; i >= 0; i--) {
   siftDown(array, i, array.length);
  }

  int temp;

  for (int i = array.length - 1; i >= 1; i--) {
   temp = array[0];
   array[0] = array[i];
   array[i] = temp;
   siftDown(array, 0, i - 1);
  }
 }

 private void siftDown(int[] array, int root, int bottom) {
  boolean done = false;
  int maxChild;
  int temp;

  while ((root * 2 <= bottom) && (!done)) {
   if (root * 2 == bottom)
    maxChild = root * 2;
   else if (array[root * 2] > array[root * 2 + 1])
    maxChild = root * 2;
   else
    maxChild = root * 2 + 1;

   if (array[root] < array[maxChild]) {
    temp = array[root];
    array[root] = array[maxChild];
    array[maxChild] = temp;
    root = maxChild;
   } else {
    done = true;
   }
  }
 }
You're head's probably starting to hurt already.  What the heck is it doing?  Where's the documentation?

It should be noted that this is a modified version that I made to take advantage of Java's length property for arrays.  If you want to see what the original version looked like you can click here, but I don't think they were the original authors of it though. Unfortunately I've seen it being used a couple of times to teach Heap Sort.

If your not familiar with heap sort, or your memory is a little foggy, it's a sorting algorithm that uses heaps to do all the heavy lifting.  A heap is a complete binary tree, that means each level in the tree is filled from left to right.  The above piece of code uses an array to store elements in the tree, which is more efficient than using a linked structure of nodes.

There are two types of heaps; min heaps and max heaps.  The above version is a max heap, which means the root contains the maximum value in the tree.  Once the heap has been built, the algorithm swaps the first and last elements in the array.  The size of the heap is then truncated and rebuilt.  Again the first and "last" elements in the array are swapped.  The process is then repeated until all the elements have been sorted.

The above code also starts in the middle of the array because half the nodes in a complete binary tree are leaves.  Here's a simple visual to show this:

Let n = # of nodes
Let f(n) be function that returns the number of leaves in the heap.
f(n) = ceil(n/2)

With zero nodes, f(0) = ceil(0/2) = 0 leaves

With one node, f(1) = ceil(1/2) = ceil(0.5) = 1 leaf

With two nodes, f(2) = ceil(2/2) = ceil(1) = 1 leaf

With three nodes f(3) = ceil(3/2) = ceil(1.5) = 2 leaves

With four nodes f(4) = ceil(4/2) = ceil(2) = 2 leaves

Well that's the way it's supposed to work, but there's a bug with the above piece of code.  It throws an ArrayIndexOutOfBounds for even length arrays.  Another version I came across that's slightly better is:
 public void sort(int[] array) {
  heapify(array);
  int last = array.length - 1;

  while (last > 0) {
   swap(array, last, 0);
   last = last - 1;
   siftdown(array, 0, last);
  }
 }

 private void heapify(int[] array) {
  int start = array.length/2 - 1;

  while (start >= 0) {
   siftdown(array, start, array.length - 1);
   start = start - 1;
  }
 }

 private void siftdown(int[] array, int start, int end) {
  int root = start;
  int lchild, swap;

  while ((root * 2 + 1) <= end) {
   lchild = root * 2 + 1;
   swap = root;

   if (array[swap] < array[lchild]) {
    swap = lchild;
   }

   if (((lchild + 1) <= end) && (array[swap] < array[lchild + 1])) {
    swap = lchild + 1;
   }

   if (swap != root) {
    swap(array, root, swap);
    root = swap;
   }
  }
 }

 private void swap(int[] array, int x, int y) {
  int temp;

  temp = array[x];
  array[x] = array[y];
  array[y] = temp;
 }
Again this is a slightly modified version that takes advantage of Java's length property for arrays.  I also added a swap method.  The original version can be found here.

I say it's slightly better than the first because I can at least reason through the code.  I still don't understand why "(root * 2) == bottom" is being tested in the first implementation of heap sort or why the previous node's right child is being looked at instead of its own right child.

This second implementation has it's problems though.   For starters is the lack of decent variable names, like lchild instead of leftChild.  It also breaks with convention by using the word swap as a variable identifier.

Like the first attempt, this version also has a bug.  This time its an infinite loop.  If you try to sort the set { 73, 6, 57, 88, 60, 42, 83, 72, 48, 85 }, it will get stuck trying to siftdown the third node in the tree (whose number is 88).  The left and right child for this node happen to be 72 and 48.  Unfortunately this algorithm assumes all the nodes are in the wrong place unless its a leaf and so repeatedly tries to siftdown the value 88.

Here's the version that I came up with.  It uses recursion instead of a while-loop, making it that much more cleaner.
 public void sort(int[] array) {

  for (int i = (array.length / 2) - 1; i >= 0; i--) {
   siftdown(array, i, array.length - 1);
  }

  for (int i = array.length - 1; i >= 1; i--) {
   swap(array, 0, i);
   siftdown(array, 0, i - 1);
  }
 }

 public void siftdown(int[] array, int parent, int end) {

  int maxChild = parent;
  int leftChild = (parent * 2) + 1;
  int rightChild = leftChild + 1;

  if ((leftChild <= end) && (array[maxChild] < array[leftChild])) {
   maxChild = leftChild;
  }

  if ((rightChild <= end) && (array[maxChild] < array[rightChild])) {
   maxChild = rightChild;
  }

  if (parent != maxChild) {
   swap(array, parent, maxChild);
   siftdown(array, maxChild, end);
  }
 }

 private void swap(int array[], int x, int y) {
  int temp = array[x];
  array[x] = array[y];
  array[y] = temp;
 }

No comments:

Post a Comment