Skip list (C Plus Plus)

From LiteratePrograms
Jump to: navigation, search
Other implementations: C++ | Java

This article describes an implementation of the skip list data structure written in C++.

Traditionally balanced trees have been used to efficiently implement Set and HashMap style data structures. Balanced tree algorithms are characterized by the need to continually rebalance the tree as operations are performed. This is necessary in order to assure good performance.

Skip Lists are a data structure that was developed by William Pugh as a probabilistic alternative to balanced trees. The balancing of a Skip List is achieved through the use of a random number generator.

The benefit of the Skip List is that the algorithms for search, insertion and deletion are rather simple. This provides improved constant time overhead as compared to balanced trees. The algorithms involved are also relatively easy to understand.

In a singly linked list each node contains one pointer to the next node in the list. A node in a Skip List contains an array of pointers. The size of this array is chosen at random (between 0 and some MAX_LEVEL) at the time when the node is created. The size of this array determines the level of the node. For example, a level 3 node has 4 forward pointers, indexed from 0 to 3. The pointer at index 0 (called the level 0 forward pointer) always points to the immediate next node in the list. The other pointers however point to nodes further down the list, and as such if followed allow portions of the list to be skipped over. A level i pointer points to the next node in the list that has level >= i.

What follows is an implementation of a (sorted) Set data structure based on Skip Lists, written in C++.

<<skip_list.cpp>>=
Includes
Using namespace
Defines
Structure Definitions
Random Function
Random Level Function
Print Function
Search Function
Insert Function
Delete Function
Main Function
<<Includes>>=
#include <iostream>
#include <cstdlib>
<<Using namespace>>=
using namespace std;

Contents

[edit] Random Levels

Before getting to the main algorithms it is a good idea to tackle the problem of generating random levels for nodes. Each node that is created will have a random level between 0 and MAX_LEVEL inclusive. First we create a function that returns a float value between 0.0 and 1.0. The rand() function returns a pseudo-random integer between 0 and RAND_MAX. In order to get a float between 0.0 and 1.0 we divide this value by RAND_MAX.

<<Random Function>>=
float frand() {
    return (float) rand() / RAND_MAX;
}

The random number generator should be seeded with the current system time so that the same sequence of random numbers isn't generated every time the program is run.

<<Seed Random Generator>>=
srand( (unsigned)time( NULL ) );
<<Includes>>=
#include <ctime>

The desired function will return a random level between 0 and MAX_LEVEL. A probability distribution where half of the nodes that have level i pointers also have level i+1 pointers is used. This gives us a 50% chance of the random_level() function returning 0, a 25% chance of returning 1, a 12.5% chance of returning 2 and so on.

This function will seed the random number generator the first time it is called. This is achieved through the use of a static variable that keeps its value between function calls. This allows the function to seed the generator the first time it is called, and to remember not to seed it again on subsequent calls.

<<Defines>>=
const float P = 0.5;
<<Random Level Function>>=
int random_level() {
    static bool first = true;

    if (first) {
        Seed Random Generator
        first = false;
    }

    int lvl = (int)(log(frand())/log(1.-P));
    return lvl < MAX_LEVEL ? lvl : MAX_LEVEL;
} 
<<Includes>>=
#include <cmath>

[edit] Structure Definitions

Each element in the list is stored in a node. The level of the node is chosen randomly when the node is created. Each SkipNode stores an array of forward pointers as well as a value which represents the element stored in the node. The type of this element is templated. We will assume that this type has a default constructor and a < operator.

The maximum level index is 6. C arrays are indexed starting at 0, therefore each node will have between one and seven forward pointers.

<<Defines>>=
const int MAX_LEVEL = 6;
<<Structure Definitions>>=
template <class T>
struct SkipNode {
    T value;
    SkipNode<T> **forward; // array of pointers

    SkipNode Constructor
    SkipNode Destructor
};

To create a SkipNode we must first allocate memory for the node itself then allocate memory for the array of forward pointers.

The implementation of skip lists given by William Pugh states that the list should be terminated by a special NIL node that stores a value greater than any legal value. This stipulation was made so that the algorithms described in his paper would be very simple as they never had to explicitly check for pointers pointing at NIL. I will instead set the initial value of all pointer fields to the special C value NULL which is defined as 0.

A level 0 node will have one forward pointer, a level 1 node will have two forward pointers and so on. Therefore we need to add one to the level when allocating memory for the array of forward pointers.

<<SkipNode Constructor>>=
SkipNode(int level, const T &value) 
{
    forward = new SkipNode<T> * [level + 1];
    memset(forward, 0, sizeof(SkipNode<T>*)*(level + 1));
    this->value = value;
}
<<SkipNode Destructor>>=
~SkipNode() 
{
    delete [] forward;
}

A structure that represents a SkipSet is defined. It stores a pointer to a header node. The value stored in the header node is irrelevant and is never accessed. The current level of the set is also stored as this is needed by the insert, delete and search algorithms.

<<Structure Definitions>>=
template <class T>
struct SkipSet {
    SkipNode<T> *header;
    int level;

    SkipSet Constructor
    SkipSet Destructor
    void print() const;
    bool contains(const T &) const;
    void insert(const T &);
    void erase(const T &);
};


To create a SkipSet we must first allocate memory for the structure and then allocate memory for the header node. The initial level of the set is 0 because C arrays are indexed starting at 0.

<<SkipSet Constructor>>=
SkipSet() {
    header = new SkipNode<T>(MAX_LEVEL, T());
    level = 0;
}
<<SkipSet Destructor>>=
~SkipSet() {
    delete header;
}

[edit] Printing a SkipSet

Before getting to the algorithms for insert, delete and search, let us first start off with a function that will print the contents of a skip set to the console. This function simply traverses the level 0 pointers and visits every node while printing out the values.

<<Print Function>>=
template <class T>
void SkipSet<T>::print() const {
    const SkipNode<T> *x = header->forward[0];
    cout << "{";
    while (x != NULL) {
        cout << x->value;
        x = x->forward[0];
        if (x != NULL)
            cout << ",";
    }    
    cout << "}\n";
}

[edit] Search Algorithm

The search algorithm will return true if the given value is stored in the set, otherwise false.

The pointer x is set to the header node of the list. The search begins at the topmost pointer of the header node according to the current level of the list. The forward pointers at this level are traversed until either a NULL pointer is encountered or a value greater than the value being searched for is encountered. When this occurs we attempt to continue the search at the next lower level until we have traversed as far as possible. At this point x will point at the node with the greatest value that is less than the value being searched for. In the case that the search value is less than any value in the list, or if the list is empty, then x will still be pointing at the header node.

Finally the level 0 pointer is traversed once. So now there are three possibilities for x.

  • x is pointing at the node with the value we are searching for.
  • x is pointing at a node with value greater than the value we are searching for.
  • x is NULL.

In the first case the search was successful.

<<Find Node>>=
for (int i = level; i >= 0; i--) {
    while (x->forward[i] != NULL && x->forward[i]->value < search_value) {
        x = x->forward[i];
    }
}
x = x->forward[0];
<<Search Function>>=
template <class T>
bool SkipSet<T>::contains(const T &search_value) const {
    const SkipNode<T> *x = header;
    Find Node
    return x != NULL && x->value == search_value;
}


[edit] Insert Algorithm

To insert a value we first perform the same kind of search as in the search algorithm. But, in order to insert a new node into the list we must maintain an array of pointers to the nodes that must be updated.

<<Find and record updates>>=
for (int i = level; i >= 0; i--) {
    while (x->forward[i] != NULL && x->forward[i]->value < value) {
        x = x->forward[i];
    }
    update[i] = x; 
}
x = x->forward[0];

All of the pointers in the update array must be initialized to NULL.

<<Set update pointers to NULL>>=
memset(update, 0, sizeof(SkipNode<T>*)*(MAX_LEVEL + 1));

We need string.h for memset:

<<Includes>>=
#include <cstring>

If a node is found with the same value as the value that is to be inserted then nothing should be done (a mathematical set cannot contain duplicates). Otherwise we must create a new node and insert it into the list.

<<Insert Function>>=
template <class T>
void SkipSet<T>::insert(const T &value) {
    SkipNode<T> *x = header;	
    SkipNode<T> *update[MAX_LEVEL + 1];
    Set update pointers to NULL

    Find and record updates

    if (x == NULL || x->value != value) {        
        int lvl = random_level();
  
        Record header updates
        Insert new node
    }
}


If the new node is greater than any node already in the list then the header node must be updated and the level of the list must be set to the new level.

<<Record header updates>>=
if (lvl > level) {
    for (int i = level + 1; i <= lvl; i++) {
        update[i] = header;
    }
    level = lvl;
}

Two things must be done to insert the node. We must make the new node x point at what the nodes in the update vector are currently pointing at. Then we update the nodes in the |update| vector to point at x.

<<Insert new node>>=
x = new SkipNode<T>(lvl, value);
for (int i = 0; i <= lvl; i++) {
    x->forward[i] = update[i]->forward[i];
    update[i]->forward[i] = x;
}

[edit] Erase Algorithm

The deletion algorithm starts the same way as the insertion algorithm. We declare an update array and then search for the node to be deleted. If the node is found we set the nodes in the update array to point at what |x| is pointing at. This effectively removes x from the list and so the memory occupied by the node may be freed.

<<Delete Function>>=
template <class T>
void SkipSet<T>::erase(const T &value) {
    SkipNode<T> *x = header;	
    SkipNode<T> *update[MAX_LEVEL + 1];
    Set update pointers to NULL

    Find and record updates

    if (x->value == value) {
        Remove x from list
        delete x;
        Decrease list level
    }
}
<<Remove x from list>>=
for (int i = 0; i <= level; i++) {
    if (update[i]->forward[i] != x)
        break;
    update[i]->forward[i] = x->forward[i];
}

After deleting the node we must check to see if the level of the list must be lowered.

<<Decrease list level>>=
while (level > 0 && header->forward[level] == NULL) {
    level--;
}

[edit] Main function

For simple illustration purposes we create a SkipSet and perform some tests.

<<Main Function>>=
int main() {

    SkipSet<int> ss;
    ss.print();

    ss.insert(5);
    ss.insert(10);
    ss.insert(7);
    ss.insert(7);
    ss.insert(6);
    
    if (ss.contains(7)) {
        cout << "7 is in the list\n";
    }

    ss.print();

    ss.erase(7);
    ss.print();
    
    if (!ss.contains(7)) {
        cout << "7 has been deleted\n";
    }

    return 0;
}
Download code
hijacker
hijacker
hijacker
hijacker