Red-black tree (Java)

From LiteratePrograms
Jump to: navigation, search
Other implementations: C | Java

A red-black tree is a type of self-balancing binary search tree typically used to implement associative arrays. It has O(log n) worst-case time for each operation and is quite efficient in practice. Unfortunately, it's also quite complex to implement, requiring a number of subtle cases for both insertion and deletion.

This article walks through a Java implementation of red-black trees, organized in a way to make correctness and completeness easier to understand. For all practical purposes you should use Java's TreeMap class instead.

Contents

[edit] Node structure and node relationships

A red-black tree is a type of binary search tree, so each node in the tree has a parent (except the root node) and at most two children. The tree as a whole will be identified by its root node. For ease of implementation, we will have each node retain a pointer to both its children as well as its parent node (null for the root). Keeping parent nodes costs space and is not strictly necessary, but makes it easy to follow the tree in any direction without maintaining an auxiliary stack. Our red-black tree will implement an associative array, and we will allow key and values of any class type:

<<RBTree.java>>=
enum Color { RED, BLACK }

class Node<K extends Comparable<? super K>,V>
{
    public K key;
    public V value;
    public Node<K,V> left;
    public Node<K,V> right;
    public Node<K,V> parent;
    public Color color;

    node constructor
    node relationships
}

public class RBTree<K extends Comparable<? super K>,V>
{
    constants
    public Node<K,V> root;

    create operation
    verify properties functions
    lookup operation
    rotation operations
    replace node operation
    insertion operation
    deletion operation
    print operation
    main method
}

Each node also stores its color, either red or black, using an enumeration. The role of the color bit will be explained in the properties. There is some internal fragmentation due to using a reference type to store a single bit, but we avoid optimizing this here for simplicity.

The parent of a node n is available by simply using n.parent. We're also interested in three other more complex relationships between nodes:

  • The grandparent of a node, its parent's parent. We use assertions to make sure that we don't attempt to access the grandparent of a node that doesn't have one, such as the root node or its children:
<<node relationships>>=
public Node<K,V> grandparent() {
    assert parent != null; // Not the root node
    assert parent.parent != null; // Not child of root
    return parent.parent;
}
  • The sibling of a node, defined as the other child of its parent. Note that the sibling may be null, if the parent has only one child.
<<node relationships>>=
public Node<K,V> sibling() {
    assert parent != null; // Root node has no sibling
    if (this == parent.left)
        return parent.right;
    else
        return parent.left;
}
  • The uncle of a node, defined as the sibling of its parent. The uncle may also be null, if the grandparent has only one child.
<<node relationships>>=
public Node<K,V> uncle() {
    assert parent != null; // Root node has no uncle
    assert parent.parent != null; // Children of root have no uncle
    return parent.sibling();
}

[edit] Properties

An example of a red-black tree

We will at all times enforce the following five properties, which provide a theoretical guarantee that the tree remains balanced. We will have a helper function verifyProperties() that asserts all five properties in a debug build, to help verify the correctness of our implementation and formally demonstrate their meaning. Note that many of these tests walk the tree, making them very expensive - for this reason we require the constant VERIFY_RBTREE to be true to turn them on.

As shown, the tree terminates in NIL leaves, which we represent using null (we set the child pointers of their parents to null). In an empty tree, the root pointer is null. This saves substantial space compared to explicit representation of leaves.

We will implement each of the testing methods as static methods of the RBTree class that take a Node argument. We could have implemented them as methods of Node; but sometimes we want to test properties on leaf nodes, which we represent with null; and we can't invoke a method on a null reference, so it is easier this way.

<<constants>>=
public static final boolean VERIFY_RBTREE = true;
<<verify properties functions>>=
public void verifyProperties() {
    if (VERIFY_RBTREE) {
        verifyProperty1(root);
        verifyProperty2(root);
        // Property 3 is implicit
        verifyProperty4(root);
        verifyProperty5(root);
    }
}

1. Each node is either red or black:

Technically speaking, because the way colors are represented by the Color enumeration, which only has the values RED and BLACK, the only way this can fail is if the color reference is null.

<<verify properties functions>>=
private static void verifyProperty1(Node<?,?> n) {
    assert nodeColor(n) == Color.RED || nodeColor(n) == Color.BLACK;
    if (n == null) return;
    verifyProperty1(n.left);
    verifyProperty1(n.right);
}

2. The root node is black.

<<verify properties functions>>=
private static void verifyProperty2(Node<?,?> root) {
    assert nodeColor(root) == Color.BLACK;
}

3. All leaves (shown as NIL in the above diagram) are black and contain no data. Since we represent these empty leaves using null, this property is implicitly assured by always treating null as black. To this end we create a nodeColor() helper function:

<<verify properties functions>>=
private static Color nodeColor(Node<?,?> n) {
    return n == null ? Color.BLACK : n.color;
}

4. Every red node has two children, and both are black (or equivalently, the parent of every red node is black).

<<verify properties functions>>=
private static void verifyProperty4(Node<?,?> n) {
    if (nodeColor(n) == Color.RED) {
        assert nodeColor(n.left)   == Color.BLACK;
        assert nodeColor(n.right)  == Color.BLACK;
        assert nodeColor(n.parent) == Color.BLACK;
    }
    if (n == null) return;
    verifyProperty4(n.left);
    verifyProperty4(n.right);
}

5. All paths from any given node to its leaf nodes contain the same number of black nodes. This one is the trickiest to verify; we do it by traversing the tree, incrementing a black node count as we go. The first time we reach a leaf we save the count. We return the count so that when we subsequently reach other leaves, we compare the count to the saved count.

<<verify properties functions>>=
private static void verifyProperty5(Node<?,?> root) {
    verifyProperty5Helper(root, 0, -1);
}

private static int verifyProperty5Helper(Node<?,?> n, int blackCount, int pathBlackCount) {
    if (nodeColor(n) == Color.BLACK) {
        blackCount++;
    }
    if (n == null) {
        if (pathBlackCount == -1) {
            pathBlackCount = blackCount;
        } else {
            assert blackCount == pathBlackCount;
        }
        return pathBlackCount;
    }
    pathBlackCount = verifyProperty5Helper(n.left,  blackCount, pathBlackCount);
    pathBlackCount = verifyProperty5Helper(n.right, blackCount, pathBlackCount);
    return pathBlackCount;
}

Properties 4 and 5 together guarantee that no path in the tree is more than about twice as long as any other path, which guarantees that it has O(log n) height.

[edit] Operations

[edit] Construction

The constructor of RBTree initializes an empty tree, which is represented by a tree with a null root.

<<create operation>>=
public RBTree() {
    root = null;
    verifyProperties();
}

We have a constructor for Node:

<<node constructor>>=
public Node(K key, V value, Color nodeColor, Node<K,V> left, Node<K,V> right) {
    this.key = key;
    this.value = value;
    this.color = nodeColor;
    this.left = left;
    this.right = right;
    if (left  != null)  left.parent = this;
    if (right != null) right.parent = this;
    this.parent = null;
}

[edit] Search

Read-only operations on a red-black tree, such as searching for a key and getting the corresponding value, require no modification from those used for binary search trees, because every red-black tree is a specialization of a simple binary search tree.

We begin by creating a helper function that gets a pointer to the node with a given key. If the key is not found, it returns null. This will be useful later for deletion:

<<lookup operation>>=
private Node<K,V> lookupNode(K key) {
    Node<K,V> n = root;
    while (n != null) {
        int compResult = key.compareTo(n.key);
        if (compResult == 0) {
            return n;
        } else if (compResult < 0) {
            n = n.left;
        } else {
            assert compResult > 0;
            n = n.right;
        }
    }
    return n;
}

We are using the natural ordering of the key type K, which we have required earlier to implement the Comparable interface.

Now looking up a value is straightforward, by finding the node and extracting the data if lookup succeeded. We return null if the key was not found (implying that null cannot be used as a value unless all lookups are expected to succeed).

<<lookup operation>>=
public V lookup(K key) {
    Node<K,V> n = lookupNode(key);
    return n == null ? null : n.value;
}

[edit] Rotations

Both insertion and deletion rely on a fundamental operation for reducing tree height called a rotation. A rotation locally changes the structure of the tree without changing the in-order order of the sequence of values that it stores.

Tree rotation.png

We create two helper functions, one to perform a left rotation and one to perform a right rotation; each takes the highest node in the subtree as an argument:

<<rotation operations>>=
private void rotateLeft(Node<K,V> n) {
    Node<K,V> r = n.right;
    replaceNode(n, r);
    n.right = r.left;
    if (r.left != null) {
        r.left.parent = n;
    }
    r.left = n;
    n.parent = r;
}

private void rotateRight(Node<K,V> n) {
    Node<K,V> l = n.left;
    replaceNode(n, l);
    n.left = l.right;
    if (l.right != null) {
        l.right.parent = n;
    }
    l.right = n;
    n.parent = l;
}

Here, replaceNode() is a helper function that cuts a node away from its parent, substituting a new node (or null) in its place. It simplifies consistent updating of parent and child pointers. It needs the tree passed in because it may change which node is the root.

<<replace node operation>>=
private void replaceNode(Node<K,V> oldn, Node<K,V> newn) {
    if (oldn.parent == null) {
        root = newn;
    } else {
        if (oldn == oldn.parent.left)
            oldn.parent.left = newn;
        else
            oldn.parent.right = newn;
    }
    if (newn != null) {
        newn.parent = oldn.parent;
    }
}

We'll find replaceNode() useful again later on when discussing insertion and deletion.

[edit] Insertion

When inserting a new value, we first insert it into the tree as we would into an ordinary binary search tree. If the key already exists, we just replace the value (since we're implementing an associative array). Otherwise, we find the place in the tree where the new pair belongs, then attach a newly created red node containing the value:

<<insertion operation>>= 
public void insert(K key, V value) {
    Node<K,V> insertedNode = new Node<K,V>(key, value, Color.RED, null, null);
    if (root == null) {
        root = insertedNode;
    } else {
        Node<K,V> n = root;
        while (true) {
            int compResult = key.compareTo(n.key);
            if (compResult == 0) {
                n.value = value;
                return;
            } else if (compResult < 0) {
                if (n.left == null) {
                    n.left = insertedNode;
                    break;
                } else {
                    n = n.left;
                }
            } else {
                assert compResult > 0;
                if (n.right == null) {
                    n.right = insertedNode;
                    break;
                } else {
                    n = n.right;
                }
            }
        }
        insertedNode.parent = n;
    }
    insertCase1(insertedNode);
    verifyProperties();
}

The problem is that the resulting tree may not satisfy our five red-black tree properties. The call to insertCase1() above begins the process of correcting the tree so that it satisfies the properties once more.

Case 1: In this case, the new node is now the root node of the tree. Since the root node must be black, and changing its color adds the same number of black nodes to every path, we simply recolor it black. Because only the root node has no parent, we can assume henceforth that the node has a parent.

<<insertion operation>>= 
private void insertCase1(Node<K,V> n) {
    if (n.parent == null)
        n.color = Color.BLACK;
    else
        insertCase2(n);
}

Case 2: In this case, the new node has a black parent. All the properties are still satisfied and we return.

<<insertion operation>>= 
private void insertCase2(Node<K,V> n) {
    if (nodeColor(n.parent) == Color.BLACK)
        return; // Tree is still valid
    else
        insertCase3(n);
}
Diagram of case 3

Case 3: In this case, the uncle node is red. We recolor the parent and uncle black and the grandparent red. However, the red grandparent node may now violate the red-black tree properties; we recursively invoke this procedure on it from case 1 to deal with this.

<<insertion operation>>= 
void insertCase3(Node<K,V> n) {
    if (nodeColor(n.uncle()) == Color.RED) {
        n.parent.color = Color.BLACK;
        n.uncle().color = Color.BLACK;
        n.grandparent().color = Color.RED;
        insertCase1(n.grandparent());
    } else {
        insertCase4(n);
    }
}
Diagram of case 4

Case 4: In this case, we deal with two cases that are mirror images of one another:

  • The new node is the right child of its parent and the parent is the left child of the grandparent. In this case we rotate left about the parent.
  • The new node is the left child of its parent and the parent is the right child of the grandparent. In this case we rotate right about the parent.

Neither of these fixes the properties, but they put the tree in the correct form to apply case 5.

<<insertion operation>>= 
void insertCase4(Node<K,V> n) {
    if (n == n.parent.right && n.parent == n.grandparent().left) {
        rotateLeft(n.parent);
        n = n.left;
    } else if (n == n.parent.left && n.parent == n.grandparent().right) {
        rotateRight(n.parent);
        n = n.right;
    }
    insertCase5(n);
}
Diagram of case 5

Case 5: In this final case, we deal with two cases that are mirror images of one another:

  • The new node is the left child of its parent and the parent is the left child of the grandparent. In this case we rotate right about the grandparent.
  • The new node is the right child of its parent and the parent is the right child of the grandparent. In this case we rotate left about the grandparent.

Now the properties are satisfied and all cases have been covered.

<<insertion operation>>= 
void insertCase5(Node<K,V> n) {
    n.parent.color = Color.BLACK;
    n.grandparent().color = Color.RED;
    if (n == n.parent.left && n.parent == n.grandparent().left) {
        rotateRight(n.grandparent());
    } else {
        assert n == n.parent.right && n.parent == n.grandparent().right;
        rotateLeft(n.grandparent());
    }
}

Note that inserting is actually in-place, since all the calls above use tail recursion. Moreover, it performs at most two rotations, since the only recursive call occurs before making any rotations.

[edit] Removal

We begin by finding the node to be deleted with lookupNode() and deleting it precisely as we would in a binary search tree. There are two cases for removal, depending on whether the node to be deleted has at most one, or two non-leaf children. A node with at most one non-leaf child can simply be replaced with its non-leaf child. When deleting a node with two non-leaf children, we copy the value from the in-order predecessor (the maximum or rightmost element in the left subtree) into the node to be deleted, and then we then delete the predecessor node, which has only one non-leaf child. This same procedure also works in a red-black tree without affecting any properties.

Error creating thumbnail: /bin/bash: rsvg: command not found
<<deletion operation>>= 
public void delete(K key) {
    Node<K,V> n = lookupNode(key);
    if (n == null)
        return;  // Key not found, do nothing
    if (n.left != null && n.right != null) {
        // Copy key/value from predecessor and then delete it instead
        Node<K,V> pred = maximumNode(n.left);
        n.key   = pred.key;
        n.value = pred.value;
        n = pred;
    }

    assert n.left == null || n.right == null;
    Node<K,V> child = (n.right == null) ? n.left : n.right;
    if (nodeColor(n) == Color.BLACK) {
        n.color = nodeColor(child);
        deleteCase1(n);
    }
    replaceNode(n, child);

    verifyProperties();
}

The maximumNode() helper function just walks right until it reaches the last non-leaf:

<<deletion operation>>= 
private static <K extends Comparable<? super K>,V> Node<K,V> maximumNode(Node<K,V> n) {
    assert n != null;
    while (n.right != null) {
        n = n.right;
    }
    return n;
}

However, before deleting the node, we must ensure that doing so does not violate the red-black tree properties. If the node we delete is black, and we cannot change its child from red to black to compensate, then we would have one less black node on every path through the child node. We must adjust the tree around the node being deleted to compensate.

Case 1: In this case, N has become the root node. The deletion removed one black node from every path, so no properties are violated.

<<deletion operation>>= 
private void deleteCase1(Node<K,V> n) {
    if (n.parent == null)
        return;
    else
        deleteCase2(n);
}
Diagram of case 2

Case 2: N has a red sibling. In this case we exchange the colors of the parent and sibling, then rotate about the parent so that the sibling becomes the parent of its former parent. This does not restore the tree properties, but reduces the problem to one of the remaining cases.

<<deletion operation>>= 
private void deleteCase2(Node<K,V> n) {
    if (nodeColor(n.sibling()) == Color.RED) {
        n.parent.color = Color.RED;
        n.sibling().color = Color.BLACK;
        if (n == n.parent.left)
            rotateLeft(n.parent);
        else
            rotateRight(n.parent);
    }
    deleteCase3(n);
}
Diagram of case 3

Case 3: In this case N's parent, sibling, and sibling's children are black. In this case we paint the sibling red. Now all paths passing through N's parent have one less black node than before the deletion, so we must recursively run this procedure from case 1 on N's parent.

<<deletion operation>>= 
private void deleteCase3(Node<K,V> n) {
    if (nodeColor(n.parent) == Color.BLACK &&
        nodeColor(n.sibling()) == Color.BLACK &&
        nodeColor(n.sibling().left) == Color.BLACK &&
        nodeColor(n.sibling().right) == Color.BLACK)
    {
        n.sibling().color = Color.RED;
        deleteCase1(n.parent);
    }
    else
        deleteCase4(n);
}
Diagram of case 4

Case 4: N's sibling and sibling's children are black, but its parent is red. We exchange the colors of the sibling and parent; this restores the tree properties.

<<deletion operation>>= 
private void deleteCase4(Node<K,V> n) {
    if (nodeColor(n.parent) == Color.RED &&
        nodeColor(n.sibling()) == Color.BLACK &&
        nodeColor(n.sibling().left) == Color.BLACK &&
        nodeColor(n.sibling().right) == Color.BLACK)
    {
        n.sibling().color = Color.RED;
        n.parent.color = Color.BLACK;
    }
    else
        deleteCase5(n);
}
Diagram of case 5

Case 5: There are two cases handled here which are mirror images of one another:

  • N's sibling S is black, S's left child is red, S's right child is black, and N is the left child of its parent. We exchange the colors of S and its left sibling and rotate right at S.
  • N's sibling S is black, S's right child is red, S's left child is black, and N is the right child of its parent. We exchange the colors of S and its right sibling and rotate left at S.

Both of these function to reduce us to the situation described in case 6.

<<deletion operation>>= 
private void deleteCase5(Node<K,V> n) {
    if (n == n.parent.left &&
        nodeColor(n.sibling()) == Color.BLACK &&
        nodeColor(n.sibling().left) == Color.RED &&
        nodeColor(n.sibling().right) == Color.BLACK)
    {
        n.sibling().color = Color.RED;
        n.sibling().left.color = Color.BLACK;
        rotateRight(n.sibling());
    }
    else if (n == n.parent.right &&
             nodeColor(n.sibling()) == Color.BLACK &&
             nodeColor(n.sibling().right) == Color.RED &&
             nodeColor(n.sibling().left) == Color.BLACK)
    {
        n.sibling().color = Color.RED;
        n.sibling().right.color = Color.BLACK;
        rotateLeft(n.sibling());
    }
    deleteCase6(n);
}
Diagram of case 6

Case 6: There are two cases handled here which are mirror images of one another:

  • N's sibling S is black, S's right child is red, and N is the left child of its parent. We exchange the colors of N's parent and sibling, make S's right child black, then rotate left at N's parent.
  • N's sibling S is black, S's left child is red, and N is the right child of its parent. We exchange the colors of N's parent and sibling, make S's left child black, then rotate right at N's parent.

This accomplishes three things at once:

  • We add a black node to all paths through N, either by adding a black S to those paths or by recoloring N's parent black.
  • We remove a black node from all paths through S's red child, either by removing P from those paths or by recoloring S.
  • We recolor S's red child black, adding a black node back to all paths through S's red child.

S's left child has become a child of N's parent during the rotation and so is unaffected.

<<deletion operation>>= 
private void deleteCase6(Node<K,V> n) {
    n.sibling().color = nodeColor(n.parent);
    n.parent.color = Color.BLACK;
    if (n == n.parent.left) {
        assert nodeColor(n.sibling().right) == Color.RED;
        n.sibling().right.color = Color.BLACK;
        rotateLeft(n.parent);
    }
    else
    {
        assert nodeColor(n.sibling().left) == Color.RED;
        n.sibling().left.color = Color.BLACK;
        rotateRight(n.parent);
    }
}

Again, the function calls all use tail recursion, so the algorithm is in-place. Additionally, no recursive calls will be made after a rotation, so no more than three rotations are made.

[edit] Printing

We also implement a method for printing the tree to standard output. This allows us to check that the tree looks as we expect, as well as providing a way to visualize the results of an operation. We print the right subtree before the left subtree so that the tree is displayed sideways. Only the keys are displayed.

<<constants>>=
private static final int INDENT_STEP = 4;

<<print operation>>=
public void print() {
    printHelper(root, 0);
}

private static void printHelper(Node<?,?> n, int indent) {
    if (n == null) {
        System.out.print("<empty tree>");
        return;
    }
    if (n.right != null) {
        printHelper(n.right, indent + INDENT_STEP);
    }
    for (int i = 0; i < indent; i++)
        System.out.print(" ");
    if (n.color == Color.BLACK)
        System.out.println(n.key);
    else
        System.out.println("<" + n.key + ">");
    if (n.left != null) {
        printHelper(n.left, indent + INDENT_STEP);
    }
}


[edit] Test driver

To ensure that all the cases of the complex insert and delete operations are exercised, we will perform a large number of operations on some simple integer data. All properties are verified after each operation, providing strong evidence of correctness.

<<main method>>=
public static void main(String[] args) {
    RBTree<Integer,Integer> t = new RBTree<Integer,Integer>();
    t.print();

    java.util.Random gen = new java.util.Random();

    for (int i = 0; i < 5000; i++) {
        int x = gen.nextInt(10000);
        int y = gen.nextInt(10000);

        t.print();
        System.out.println("Inserting " + x + " -> " + y);
        System.out.println();

        t.insert(x, y);
        assert t.lookup(x).equals(y);
    }
    for (int i = 0; i < 60000; i++) {
        int x = gen.nextInt(10000);

        t.print();
        System.out.println("Deleting key " + x);
        System.out.println();

        t.delete(x);
    }
}

We print the tree after each operation.

Download code
hijacker
hijacker
hijacker
hijacker