Unrolled linked list (C Plus Plus)

From LiteratePrograms
Jump to: navigation, search

An unrolled linked list is a linked list that stores several data elements, up to some small fixed limit, in each node. It has benefits in cache performance and space usage over standard linked lists, especially for small elements such as bits, characters, words, and pointers.


This is an implementation of an unrolled linked list in C++. We will compare its performance for various element types to widely available implementations of standard list classes to show the benefit of the unrolled linked list.

This container class is fully compatible with the STL and the std::list interface, allowing it to be used in many cases as a drop-in replacement for std::list.

Contents

[edit] Internal representation

The list consists of a sequence of nodes, with each node containing up to some fixed number of elements. We choose this number at compile-time such that each node will be the size of a cache line (or a small multiple of the cache line size):

<<node structure>>=
struct node
{
    values_size computation

    node()
      : next(NULL), prev(NULL), count(0)
    {}

    node* next;
    node* prev;
    int   count;
    T values[values_size];
};

The next and prev pointers refer to the next and previous nodes in the list, or NULL if there isn't one. The count field determines how many entries of the values array are in use (the first count will be in use). Although we don't actually need to initialise each element of the array, it is necessary that T have a default constructor to write an implementation that allocates node structures in a straightforward manner.

The desired node structure size is passed in as a template parameter NODE_SIZE and we use it to statically compute the size of the values array. We choose a size that will keep the total structure size at most NODE_SIZE (assuming that no padding is done - it might be slightly larger). We also ensure that it is at least 1, regardless.

<<values_size computation>>=
static const int values_size_lb =
    (NODE_SIZE - 2*sizeof(node*) - sizeof(int))/sizeof(T);
static const int values_size =
    values_size_lb == 0 ? 1 : values_size_lb;

We need to know where our list starts, so our list needs a pointer to the head node of the list. We also need to keep track of the last node to facilitate rapid insertion at the end of the list (push_back()).

<<list private data>>=
node* _head;
node* _tail;

For simplicity, we never want head or tail to be NULL, so we represent the empty list with a single node containing no elements:

<<member initialisation>>=
_head(new node), _tail(_head) 

We also keep track of the number of elements in the list to permit a constant-time size() operation, which would otherwise take linear time:

<<list private data>>=
int _size;

Initially, the size will be 0:

<<member initialisation>>=
, _size(0)

[edit] Iterators

Iterators, objects that reference a particular element of a container, are an important concept in the STL library. Supporting iterators allow C++'s generic algorithms to work on your own data structures, provided that your iterator supports the necessary traits (discussed later).

[edit] Types

We'll provide a const_iterator, which does not allow the referenced element to be modified through it, as well as an iterator, which does allow this. We will first write the const_iterator and then derive iterator from it (see Singly linked list (C Plus Plus)#Types for the reasoning behind this). So we have this hierarchy:

<<iterators>>=
class const_iterator
  : public const_iterator baseclass
{
  public:
    const_iterator interface
  const_iterator state
};

class iterator
  : public const_iterator
{
  public:
    iterator
};

[edit] Traits

Another concept required to be fully compatible with STL are iterator traits. They help algorithms determine what operations an iterator supports. Some standard algorithms can use iterator traits to switch to a more efficient algorithm at compile-time when supplied with more powerful iterators.

Here are the traits for the const_iterator:

<<const_iterator interface>>=
typedef iterator category iterator_category;
typedef T         value_type;
typedef ptrdiff_t difference_type;
typedef T const*  pointer;
typedef T const&  reference;

Next we choose an iterator category. With next and prev pointers, we have efficient bidirectional movement but not random access, which although considerably faster than with a normal linked list still requires linear time. We choose a simple bidirectional iterator:

<<iterator category>>=
std::bidirectional_iterator_tag

Finally, our iterator must be derived from std::iterator:

<<const_iterator baseclass>>=
std::iterator<std::forward_iterator_tag, T, size_t, T const*, T const&>

[edit] State

To denote a single element of the list, our iterator must include both a pointer to the node that element resides in and the index of that element in the values array:

<<const_iterator state>>=
protected:
  node* _curnode;
  int _curindex;

If the iterator is pointing one element past the last element, _curnode will be the last node of the list and _curindex will be that node's count field (one past the index of the last element in the node). Construction is straightforward:

<<const_iterator interface>>=
const_iterator(node* cur_node = NULL, int cur_index = 0)
  : _curnode(cur_node), _curindex(cur_index)
{}

As we will see, the splitting and merging operations of unrolled linked lists make it impossible to guarantee that any existing iterator is not invalidated by such an operation. This is the primary conceptual price that we pay for unrolled linked lists.

[edit] Implementation

The remaining functions to do are element access and increment. These are straightforward to write, keeping in mind that we have to check the count field to determine when we've "used up" a node. Also, we will never "skip over" a node with zero count, because each node in the list must contain at least one element.

We also never allow _curnode to become NULL, as this would make backward movement of the iterator impossible (and is not the chosen representation of end() anyway). This check is not necessary for operator--, since it's invalid to move an iterator before the beginning of the list.

<<const_iterator interface>>=
reference operator*()
{
    return _curnode->values[_curindex];
}

pointer operator ->()
{
    return &(**this);
}

// The lack of arguments indicates that this is prefix ++
const_iterator& operator++()
{
    _curindex++;
    if (_curindex >= _curnode->count &&
        _curnode->next != NULL)
    {
        _curnode = _curnode->next;
        _curindex = 0;
    }
    return *this;
}

// The lack of arguments indicates that this is prefix --
const_iterator& operator--()
{
    _curindex--;
    if (_curindex < 0)
    {
        _curnode = _curnode->prev;
        _curindex = _curnode->count - 1;
    }
    return *this;
}

// The dummy argument indicates that this is the postfix ++
const_iterator operator++(int)
{
    const_iterator old_it = *this;
    ++(*this);
    return old_it;
}

// The dummy argument indicates that this is the postfix --
const_iterator operator--(int)
{
    const_iterator old_it = *this;
    --(*this);
    return old_it;
}

bool operator ==(const_iterator const& rhs)
{
    return _curnode == rhs._curnode &&
           _curindex == rhs._curindex;
}

bool operator !=(const_iterator const& rhs)
{
  return !(*this == rhs);
}

The implementation of the subclass iterator is identical except that we change constant-referencing types to non-constant-referencing types (for the more complex methods, we delegate to the superclass implementation):

<<iterator>>=
iterator(node* iter_node = NULL, int iter_index = 0)
  : const_iterator(iter_node, iter_index)
{}

typedef std::bidirectional_iterator_tag iterator_category;
typedef T      value_type;
typedef size_t difference_type;
typedef T*     pointer;
typedef T&     reference;

using const_iterator::_curnode;
using const_iterator::_curindex;

reference operator*()
{
    return _curnode->values[_curindex];
}

pointer operator ->()
{
    return &(**this);
}

// The lack of arguments indicates that this is prefix ++
iterator& operator++()
{
    const_iterator::operator++();
    return *this;
}

// The lack of arguments indicates that this is prefix --
iterator& operator--()
{
    const_iterator::operator--();
    return *this;
}

// The dummy argument indicates that this is the postfix ++
iterator operator++(int)
{
    iterator old_it = *this;
    ++(*this);
    return old_it;
}

// The dummy argument indicates that this is the postfix --
iterator operator--(int)
{
    iterator old_it = *this;
    --(*this);
    return old_it;
}

[edit] Implementation

[edit] Iteration

Now that we've got iterators, we should give the user of the class access to them. The standard functions for these are:

<<public interface>>=
iterator begin();
iterator end();
const_iterator begin() const;
const_iterator end() const;

For begin() we just use the list's head pointer and index zero. For end(), we use the tail node and the index one past its last element, the state reserved for this purpose.

<<list implementation>>=
template <typename T, int NODE_SIZE>
inline typename unrolled_list<T,NODE_SIZE>::iterator unrolled_list<T,NODE_SIZE>::begin()
{
    return iterator(_head, 0);
}

template <typename T, int NODE_SIZE>
inline typename unrolled_list<T,NODE_SIZE>::iterator unrolled_list<T,NODE_SIZE>::end()
{
    return iterator(_tail, _tail->count);
}

The const versions are similar:

<<list implementation>>=
template <typename T, int NODE_SIZE>
inline typename unrolled_list<T,NODE_SIZE>::const_iterator unrolled_list<T,NODE_SIZE>::begin() const
{
    return const_iterator(_head, 0);
}

template <typename T, int NODE_SIZE>
inline typename unrolled_list<T,NODE_SIZE>::const_iterator unrolled_list<T,NODE_SIZE>::end() const
{
    return const_iterator(_tail, _tail->count);
}

[edit] Insertion

We'll provide three methods that insert elements into lists:

<<public interface>>=
void push_front(T const& elt);
void push_back(T const& elt);
iterator insert(iterator where, T const& elt);

insert() is the most general method. It inserts an element right before the element referred to by the given iterator. Most of the time, this method is as simple as inserting the given value into the array of the current node at the right index:

<<list implementation>>=
template <typename T, int NODE_SIZE>
inline typename unrolled_list<T,NODE_SIZE>::iterator
    unrolled_list<T,NODE_SIZE>::insert(
        typename unrolled_list<T,NODE_SIZE>::iterator where, T const& elt)
{
    make more room in current node if node is full
    for(int i=where._curnode->count; i > where._curindex; i--)
    {
        where._curnode->values[i] = where._curnode->values[i-1];
    }
    where._curnode->values[where._curindex] = elt;
    where._curnode->count++;
    _size++;
    return where;
}

Note the typename keyword in the argument list. It is necessary here because the compiler cannot determine by itself that iterator is a type since it is dependent on the template parameter T.

To write the above, we need to ensure that the list has access to the iterator's internal pointers using a friend declaration. This violates encapsulation, but the iterator and list are tightly coupled, and it's better than exposing the pointer to everyone via a public method:

<<list friend declarations>>=
friend class unrolled_list<T,NODE_SIZE>::iterator;

Now we deal with the case where the node is already full. One strategy is to redistribute elements to surrounding nodes, but this doesn't always work, since these nodes may also be full or become full. A more reliable approach is to create a new node, insert it into the list after the current node, and copy the second half of the elements from the current node into the new node:

<<make more room in current node if node is full>>=
const int num_values = unrolled_list<T,NODE_SIZE>::node::values_size;
if (where._curnode->count == num_values) // if node is full
{
    node* n = new node;
    
    if (where._curindex != where._curnode->count)
    {
        move elements to new node
    }

    insert new node into node list
    update iterator
}

We only move elements to the new node if the current index is not at the end of the original node. This prevents us from leaving half the space in each node unused in the common scenario where elements are sequentially added to the end of the list. Moving the elements is straightforward:

<<move elements to new node>>=
for(int i=0; i < (num_values+1)/2; i++)
{
    n->values[i] = where._curnode->values[i + num_values/2];
}
where._curnode->count = num_values/2;
n->count = (num_values+1)/2;

Before we can continue the insertion, if the element our iterator is pointing to has moved into the new node we must update the iterator accordingly:

<<update iterator>>=
if (where._curindex >= where._curnode->count)
{
    where._curindex -= where._curnode->count;
    where._curnode = n;
}

Note that this code works even if our iterator is the end() iterator. It also gets run if we did not move any elements, and in this case has the effect of moving the iterator to the beginning of the new empty node.

Inserting the new node into the list is no different than inserting a node into a normal doubly-linked list:

<<insert new node into node list>>=
n->next = where._curnode->next;
n->prev = where._curnode;
if (where._curnode->next != NULL)
{
    where._curnode->next->prev = n;
}
where._curnode->next = n;
insert: update tail

Finally, we update _tail if the new node was inserted at the end of the list:

<<insert: update tail>>=
if (n->next == NULL)
{
    _tail = n;
}

Note that this, like any doubly-linked list insertion, is not threadsafe without modification. Additionally, it may invalidate iterators previously pointing to the second half of the elements in the node. Since an application does not know where node boundaries are, this effectively means that all iterators are invalidated except the one returned by insert (this is why we return this iterator).

Implementing push_front() and push_back() is now simple:

<<list implementation>>=
template <typename T, int NODE_SIZE>
inline void unrolled_list<T,NODE_SIZE>::push_front(T const& elt)
{
    insert(begin(), elt);
}

template <typename T, int NODE_SIZE>
inline void unrolled_list<T,NODE_SIZE>::push_back(T const& elt)
{
    insert(end(), elt);
}

[edit] Deletion

We supply four methods for deleting elements from the list:

<<public interface>>=
iterator erase(iterator where);
void pop_front();
void pop_back();
void clear();

pop_front() and pop_back() remove elements from the beginning and end of the list, respectively. clear() removes all elements. The most general, erase(), removes the element referenced by a given iterator and returns an iterator pointing at the following element. In the usual case, erase() involves only removing an element from an array and shifting all higher elements down by one:

<<list implementation>>=
template <typename T, int NODE_SIZE>
inline typename unrolled_list<T,NODE_SIZE>::iterator
  unrolled_list<T,NODE_SIZE>::erase(
    typename unrolled_list<T,NODE_SIZE>::iterator where)
{
    for(int i=where._curindex; i < where._curnode->count - 1; i++)
    {
        where._curnode->values[i] = where._curnode->values[i+1];
    }
    where._curnode->count--;
    _size--;
    if node is small enough, merge it
    update iterator if index too large
    return where;
}

To remain memory-efficient, we want to ensure that we don't reserve too much space — we want to keep the total space allocated per element low. There are many ways to do this, but the simple strategy we take here is to merge two adjacent nodes if their combined element counts fall below some waterline, such as 80% of the node size. The result is that the list uses at least 40% of its space for data (for comparison, a typical doubly linked list can use as little as 8% of its space for data, and even vectors can use as little as 50%). We don't want to use 100% of the node size as our waterline, because this creates the potential for a sequence of insert/remove operations that deallocate and reallocate nodes rapidly, like a normal list.

We also always merge nodes in one special case: when a node becomes empty. This is necessary to ensure that an iteration step remains constant time.

<<if node is small enough, merge it>>=
const int num_values = unrolled_list<T,NODE_SIZE>::node::values_size;
const int threshold = num_values >= 5 ? num_values*4/5 : num_values - 1;
if (where._curnode->prev != NULL &&
    (where._curnode->count == 0 ||
     where._curnode->count + where._curnode->prev->count <= threshold))
{
    where._curnode = where._curnode->prev;
    where._curindex += where._curnode->count;
    merge_node_with_next(where._curnode);
}
else if (where._curnode->next != NULL &&
         (where._curnode->count == 0 ||
          where._curnode->count + where._curnode->next->count <= threshold))
{
    merge_node_with_next(where._curnode);
}

Note that in the case where a single node remains, we may end up with an empty node; this is okay, as we permit the empty node in this limited case. The method merge_node_with_next() simply copies elements from one node onto the end of another, updates the count, and removes the old node from the list:

<<list helper methods>>=
void merge_node_with_next(node* node);

<<list implementation>>=
template <typename T, int NODE_SIZE>
inline void unrolled_list<T,NODE_SIZE>::merge_node_with_next(
  typename unrolled_list<T,NODE_SIZE>::node* node)
{
    typename unrolled_list<T,NODE_SIZE>::node* next = node->next;
    const int count = node->count;
    for(int i=0; i<next->count; i++)
    {
        node->values[i + count] = next->values[i];
    }
    node->count += next->count;
    node->next = node->next->next;
    merge: update tail
    delete next;
}

When we merge nodes, the tail node might change (the head node never changes):

<<merge: update tail>>=
if (node->next == NULL)
{
    _tail = node;
}

Finally, if we erase the last element in a node, it's possible the iterator will become invalid. To deal with this we move it to the next node if there is one (if not, it correctly represents the end iterator):

<<update iterator if index too large>>=
if (where._curindex >= where._curnode->count &&
    where._curnode->next != NULL)
{
    where._curindex -= where._curnode->count;
    where._curnode = where._curnode->next;
}

Implementing the remaining methods using erase() is simple:

<<list implementation>>=
template <typename T, int NODE_SIZE>
inline void unrolled_list<T,NODE_SIZE>::pop_front()
{
    erase(begin());
}

template <typename T, int NODE_SIZE>
inline void unrolled_list<T,NODE_SIZE>::pop_back()
{
    erase(--(end()));
}

We could implement clear() trivially using existing primitives:

<<slow clear implementation>>=
while(begin() != end())
{
    pop_back();
}

This is, however, very inefficient. A more practical approach is to empty out the first node and delete the rest in order, restoring the list to its original state:

<<list implementation>>=
/* Same effect as:
   <<slow clear implementation>>
*/
template <typename T, int NODE_SIZE>
inline void unrolled_list<T,NODE_SIZE>::clear()
{
    typename unrolled_list<T,NODE_SIZE>::node *n, *n_next;
    for (n = _head->next; n != NULL; n = n_next)
    {
        n_next = n->next;
        delete n;
    }
    _head->count = 0;
    _head->next = NULL;
    _tail = _head;
}

[edit] Assignment

We next implement the assignment operator, which can simply use the insertion operation and iteration without ever examining internal state. It must also use clear() to clean out any existing list elements in the target list.

<<public interface>>=
unrolled_list& operator =(unrolled_list<T,NODE_SIZE> const& rhs);

<<list implementation>>=
template <typename T, int NODE_SIZE>
unrolled_list<T,NODE_SIZE>& unrolled_list<T,NODE_SIZE>::operator =(unrolled_list<T,NODE_SIZE> const& rhs)
{
    clear();

    const_iterator rhs_it = rhs.begin();
    for (rhs_it = rhs.begin(); rhs_it != rhs.end(); ++rhs_it)
    {
        push_back(*rhs_it);
    }

    return *this;
}

[edit] List size

We supply two methods that query the number of elements in the list and determine whether it is empty:

<<public interface>>=
size_t size() const;
bool empty() const;

Their implementation is trivial:

<<list implementation>>=
template <typename T, int NODE_SIZE>
inline size_t unrolled_list<T,NODE_SIZE>::size() const
{
  return _size;
}

template <typename T, int NODE_SIZE>
inline bool unrolled_list<T,NODE_SIZE>::empty() const
{
  return _size == 0;
}

[edit] Construction and destruction

Finally, we wish to provide the following constructors and destructor:

<<construction interface>>=
unrolled_list();
unrolled_list(unrolled_list<T,NODE_SIZE> const& rhs);
~unrolled_list();

The default constructor creates an empty list, while the copy constructor and destructor are trivial to implement using the methods we have already implemented:

<<list implementation>>=
template <typename T, int NODE_SIZE>
inline unrolled_list<T,NODE_SIZE>::unrolled_list()
  : member initialisation
{}

template <typename T, int NODE_SIZE>
inline unrolled_list<T,NODE_SIZE>::unrolled_list(unrolled_list<T,NODE_SIZE> const& rhs)
  : member initialisation
{
    *this = rhs;
}

template <typename T, int NODE_SIZE>
inline unrolled_list<T,NODE_SIZE>::~unrolled_list()
{
    clear();
}

[edit] Files

All of the code is placed in header files, since this is a templated class. We separate declarations from definitions by placing one in a .h file and the other in a .hpp file:

<<unrolled_list.h>>=
#include <iterator>

template <typename T, int NODE_SIZE>
class unrolled_list
{
  private:
    node structure
    list private data
    list helper methods

  public:
    iterators

    construction interface
    public interface
    list friend declarations
};

#include "unrolled_list.hpp"

<<unrolled_list.hpp>>=
list implementation

[edit] Performance testing

We consider some tests of the list's performance. The iterator-based design makes it simple to run the same test on many different containers. We begin with a simple test that makes a large number of push_back() calls on a single value:

<<performance tests>>=
template <typename TList, typename TElement>
void test_push_back(TElement value)
{
    TList lst;
    for (int i=0; i<50000000; i++)
    {
        lst.push_back(value);
    }
}

We drive the test with some simple timing code on three different containers:

<<performance headers>>=
#include <vector>
#include <list>
#include "unrolled_list.h"

<<performance test timing>>=
void time_test_push_back()
{
    clock_t start;

    start = clock();
    test_push_back< std::list<int> >(0);
    printf("test_push_back< std::list<int> >: %f\n", ((double)(clock() - start)) / CLOCKS_PER_SEC);

    start = clock();
    test_push_back< std::vector<int> >(0);
    printf("test_push_back< std::vector<int> >: %f\n", ((double)(clock() - start)) / CLOCKS_PER_SEC);

    start = clock();
    test_push_back< unrolled_list<int,1024> >(0);
    printf("test_push_back< unrolled_list<int,1024> >: %f\n", ((double)(clock() - start)) / CLOCKS_PER_SEC);

    start = clock();
    test_push_back< unrolled_list<int,128> >(0);
    printf("test_push_back< unrolled_list<int,128> >: %f\n", ((double)(clock() - start)) / CLOCKS_PER_SEC);
}

<<run tests>>=
time_test_push_back();

Our next test is a worst case for vectors, adding a large number of nodes to the beginning of the list. We cannot use push_front(), since vector deliberately omits this call. We use insert() instead. This test is so slow on vectors that we had to drastically cut the number of iterations to reach completion in a reasonable time.

<<performance tests>>=
template <typename TList, typename TElement>
void test_push_front(TElement value)
{
    TList lst;
    for (int i=0; i<500000; i++)
    {
        lst.insert(lst.begin(), value);
    }
}

<<performance test timing>>=
void time_test_push_front()
{
    clock_t start;

    start = clock();
    test_push_front< std::list<int> >(0);
    printf("test_push_front< std::list<int> >: %f\n", ((double)(clock() - start)) / CLOCKS_PER_SEC);

    start = clock();
    test_push_front< std::vector<int> >(0);
    printf("test_push_front< std::vector<int> >: %f\n", ((double)(clock() - start)) / CLOCKS_PER_SEC);

    start = clock();
    test_push_front< unrolled_list<int,1024> >(0);
    printf("test_push_front< unrolled_list<int,1024> >: %f\n", ((double)(clock() - start)) / CLOCKS_PER_SEC);

    start = clock();
    test_push_front< unrolled_list<int,128> >(0);
    printf("test_push_front< unrolled_list<int,128> >: %f\n", ((double)(clock() - start)) / CLOCKS_PER_SEC);
}

<<run tests>>=
time_test_push_front();

Next, we create a simple iteration test that creates a large list and then iterates over it in order. Since we don't want to count the list creation part, we have the test return the start time:

<<performance tests>>=
template <typename TList, typename TElement>
clock_t test_iteration(TElement value)
{
    TList lst;
    for (int i=0; i<50000000; i++)
    {
        lst.push_back(value);
    }
    clock_t start = clock();
    typename TList::iterator it;
    for (it = lst.begin(); it != lst.end(); it++)
    {
        if (*it != value)
        {
            printf("Error!\n");
        }
    }
    return start;
}

<<performance test timing>>=
void time_test_iteration()
{
    clock_t start;

    start = test_iteration< std::list<int> >(0);
    printf("test_iteration< std::list<int> >: %f\n", ((double)(clock() - start)) / CLOCKS_PER_SEC);

    start = test_iteration< std::vector<int> >(0);
    printf("test_iteration< std::vector<int> >: %f\n", ((double)(clock() - start)) / CLOCKS_PER_SEC);

    start = test_iteration< unrolled_list<int,1024> >(0);
    printf("test_iteration< unrolled_list<int,1024> >: %f\n", ((double)(clock() - start)) / CLOCKS_PER_SEC);

    start = test_iteration< unrolled_list<int,128> >(0);
    printf("test_iteration< unrolled_list<int,128> >: %f\n", ((double)(clock() - start)) / CLOCKS_PER_SEC);
}

<<run tests>>=
time_test_iteration();

We place this code in its own performance testing source file:

<<performance.cpp>>=
performance headers

performance tests
performance test timing

int main()
{
    run tests
    return 0;
}

When this code was compiled by gcc with the "-O3" optimization switch against the standard g++ implementation of the STL and run on a server with a 2.80 GhZ CPU and 1 GB of RAM, we obtained the following results (the columns give CPU time in seconds):

Test list (s) vector (s) unrolled_list<1024> (s) unrolled_list<128> (s)
test_push_back 2.79 1.21 0.53 0.82
test_push_front 0.02 40.26 0.14 0.04
test_iteration 0.96 0.12 0.17 0.36

The unrolled linked list consistently performs comparable to or better than the better of vector or list. There is also a clear tradeoff in node size: nodes close to the cache line size do better for random or front insertions (like a list), while larger node sizes perform better for push_back() and iteration (like a vector). It's no surprise that it outperforms both on push_back(), since in this test list performs frequent allocations and vector performs a large amount of copying as it grows.

[edit] Test code

<<list_test.cpp>>=
#include <iostream>

#include "unrolled_list.h"

template <typename T, int NODE_SIZE>
void print_list(unrolled_list<T,NODE_SIZE> const& lst)
{
  typename unrolled_list<T,NODE_SIZE>::const_iterator it;
  for (it = lst.begin(); it != lst.end(); ++it)
    std::cout << *it << " ";
  std::cout << std::endl;
}

int main()
{
  unrolled_list<int,20> lst;
  lst.push_back(2);
  lst.push_back(3);
  lst.push_front(1);
  lst.push_back(4);
  unrolled_list<int,20> list2;
  list2 = lst;
  print_list(lst);
  unrolled_list<int,20>::iterator it;
  it = lst.begin();
  it++;
  ++it;
  it = lst.insert(it, 42);
  print_list(lst);
  ++it;
  it = lst.erase(it);
  print_list(lst);
  it = lst.begin();
  *it = 10;
  print_list(lst);
  lst.clear();
  print_list(lst);
  print_list(list2);
  std::fill(list2.begin(), list2.end(), 42);
  print_list(list2);
  return 0;
}

If you build and run this, you should see this:

<<list_test_output.txt>>=
1 2 3 4 
1 2 42 3 4 
1 2 42 4
10 2 42 4

1 2 3 4 
42 42 42 42 

[edit] Potential improvements

Unrolled linked lists have two primary benefits compared to ordinary linked lists: increased cache performance and decreased space requirements. We can decrease space requirements further in a number of ways:

  • We can use singly linked lists or XOR linked lists for the nodes. These complicate the implementation but saves a pointer per node.
  • We can create a memory pool for nodes. This is easy since they're of fixed size, and can save several bytes of allocation overhead per node. It also allows us to potentially replace pointers by smaller array indexes into the pool.
  • We can store count information in a much smaller field, such as a byte or a bitfield sharing space with the next and previous pointers.
  • We can dynamically expand nodes as we fill them. This is only really useful for large node sizes and can be accomplished by using C's realloc() augmented with code to update the pointers of the next and previous nodes if the node is moved. This imposes a time overhead for copying elements.
  • We can trade time for code space by providing a single generic implementation instead of generating a specialized implementation for each type (as C++ compilers typically do for templates).
  • If a struct has internal fragmentation due to padding, we can break it into parallel lists of smaller structs with less padding.
  • We can increase the node size. This also increases cache efficiency, but can raise the average operation time as array inserts and deletions become prohibitive.

However, given a large enough node size and a large enough number of nodes, the impact of most of the above techniques is negligible, since data occupies a large proportion of the total space. This is one of the benefits of unrolled linked lists: they amortize away the space required for their own bookkeeping information.

It would also be useful to create a template specialization of unrolled linked lists for booleans, much like vector<bool>. Unrolled linked lists can store these efficiently and it could be a useful way of representing structures such as compressed bitmaps and archives.

Download code
hijacker
hijacker
hijacker
hijacker