Unrolled linked list (C Plus Plus)
From LiteratePrograms
An unrolled linked list is a linked list that stores several data elements, up to some small fixed limit, in each node. It has benefits in cache performance and space usage over standard linked lists, especially for small elements such as bits, characters, words, and pointers.
This is an implementation of an unrolled linked list in C++. We will compare its performance for various element types to widely available implementations of standard list classes to show the benefit of the unrolled linked list.
This container class is fully compatible with the STL and the std::list interface, allowing it to be used in many cases as a drop-in replacement for std::list.
Contents |
Internal representation
The list consists of a sequence of nodes, with each node containing up to some fixed number of elements. We choose this number at compile-time such that each node will be the size of a cache line (or a small multiple of the cache line size):
<<node structure>>= struct node { values_size computation node() : next(NULL), prev(NULL), count(0) {} node* next; node* prev; int count; T values[values_size]; };
The next and prev pointers refer to the next and previous nodes in the list, or NULL if there isn't one. The count field determines how many entries of the values array are in use (the first count will be in use). Although we don't actually need to initialise each element of the array, it is necessary that T have a default constructor to write an implementation that allocates node structures in a straightforward manner.
The desired node structure size is passed in as a template parameter NODE_SIZE and we use it to statically compute the size of the values array. We choose a size that will keep the total structure size at most NODE_SIZE (assuming that no padding is done - it might be slightly larger). We also ensure that it is at least 1, regardless.
<<values_size computation>>= static const int values_size_lb = (NODE_SIZE - 2*sizeof(node*) - sizeof(int))/sizeof(T); static const int values_size = values_size_lb == 0 ? 1 : values_size_lb;
We need to know where our list starts, so our list needs a pointer to the head node of the list. We also need to keep track of the last node to facilitate rapid insertion at the end of the list (push_back()).
<<list private data>>= node* _head; node* _tail;
For simplicity, we never want head or tail to be NULL, so we represent the empty list with a single node containing no elements:
<<member initialisation>>= _head(new node), _tail(_head)
We also keep track of the number of elements in the list to permit a constant-time size() operation, which would otherwise take linear time:
<<list private data>>= int _size;
Initially, the size will be 0:
<<member initialisation>>= , _size(0)
Iterators
Iterators, objects that reference a particular element of a container, are an important concept in the STL library. Supporting iterators allow C++'s generic algorithms to work on your own data structures, provided that your iterator supports the necessary traits (discussed later).
Types
We'll provide a const_iterator, which does not allow the referenced element to be modified through it, as well as an iterator, which does allow this. We will first write the const_iterator and then derive iterator from it (see Singly linked list (C Plus Plus)#Types for the reasoning behind this). So we have this hierarchy:
<<iterators>>= class const_iterator : public const_iterator baseclass { public: const_iterator interface const_iterator state }; class iterator : public const_iterator { public: iterator };
Traits
Another concept required to be fully compatible with STL are iterator traits. They help algorithms determine what operations an iterator supports. Some standard algorithms can use iterator traits to switch to a more efficient algorithm at compile-time when supplied with more powerful iterators.
Here are the traits for the const_iterator:
<<const_iterator interface>>= typedef iterator category iterator_category; typedef T value_type; typedef ptrdiff_t difference_type; typedef T const* pointer; typedef T const& reference;
Next we choose an iterator category. With next and prev pointers, we have efficient bidirectional movement but not random access, which although considerably faster than with a normal linked list still requires linear time. We choose a simple bidirectional iterator:
<<iterator category>>= std::bidirectional_iterator_tag
Finally, our iterator must be derived from std::iterator:
<<const_iterator baseclass>>= std::iterator<std::forward_iterator_tag, T, size_t, T const*, T const&>
State
To denote a single element of the list, our iterator must include both a pointer to the node that element resides in and the index of that element in the values array:
<<const_iterator state>>= protected: node* _curnode; int _curindex;
If the iterator is pointing one element past the last element, _curnode will be the last node of the list and _curindex will be that node's count field (one past the index of the last element in the node). Construction is straightforward:
<<const_iterator interface>>= const_iterator(node* cur_node = NULL, int cur_index = 0) : _curnode(cur_node), _curindex(cur_index) {}
As we will see, the splitting and merging operations of unrolled linked lists make it impossible to guarantee that any existing iterator is not invalidated by such an operation. This is the primary conceptual price that we pay for unrolled linked lists.
Implementation
The remaining functions to do are element access and increment. These are straightforward to write, keeping in mind that we have to check the count field to determine when we've "used up" a node. Also, we will never "skip over" a node with zero count, because each node in the list must contain at least one element.
We also never allow _curnode to become NULL, as this would make backward movement of the iterator impossible (and is not the chosen representation of end() anyway). This check is not necessary for operator--, since it's invalid to move an iterator before the beginning of the list.
<<const_iterator interface>>= reference operator*() { return _curnode->values[_curindex]; } pointer operator ->() { return &(**this); } // The lack of arguments indicates that this is prefix ++ const_iterator& operator++() { _curindex++; if (_curindex >= _curnode->count && _curnode->next != NULL) { _curnode = _curnode->next; _curindex = 0; } return *this; } // The lack of arguments indicates that this is prefix -- const_iterator& operator--() { _curindex--; if (_curindex < 0) { _curnode = _curnode->prev; _curindex = _curnode->count - 1; } return *this; } // The dummy argument indicates that this is the postfix ++ const_iterator operator++(int) { const_iterator old_it = *this; ++(*this); return old_it; } // The dummy argument indicates that this is the postfix -- const_iterator operator--(int) { const_iterator old_it = *this; --(*this); return old_it; } bool operator ==(const_iterator const& rhs) { return _curnode == rhs._curnode && _curindex == rhs._curindex; } bool operator !=(const_iterator const& rhs) { return !(*this == rhs); }
The implementation of the subclass iterator is identical except that we change constant-referencing types to non-constant-referencing types (for the more complex methods, we delegate to the superclass implementation):
<<iterator>>= iterator(node* iter_node = NULL, int iter_index = 0) : const_iterator(iter_node, iter_index) {} typedef std::bidirectional_iterator_tag iterator_category; typedef T value_type; typedef size_t difference_type; typedef T* pointer; typedef T& reference; using const_iterator::_curnode; using const_iterator::_curindex; reference operator*() { return _curnode->values[_curindex]; } pointer operator ->() { return &(**this); } // The lack of arguments indicates that this is prefix ++ iterator& operator++() { const_iterator::operator++(); return *this; } // The lack of arguments indicates that this is prefix -- iterator& operator--() { const_iterator::operator--(); return *this; } // The dummy argument indicates that this is the postfix ++ iterator operator++(int) { iterator old_it = *this; ++(*this); return old_it; } // The dummy argument indicates that this is the postfix -- iterator operator--(int) { iterator old_it = *this; --(*this); return old_it; }
Implementation
Iteration
Now that we've got iterators, we should give the user of the class access to them. The standard functions for these are:
<<public interface>>= iterator begin(); iterator end(); const_iterator begin() const; const_iterator end() const;
For begin() we just use the list's head pointer and index zero. For end(), we use the tail node and the index one past its last element, the state reserved for this purpose.
<<list implementation>>= template <typename T, int NODE_SIZE> inline typename unrolled_list<T,NODE_SIZE>::iterator unrolled_list<T,NODE_SIZE>::begin() { return iterator(_head, 0); } template <typename T, int NODE_SIZE> inline typename unrolled_list<T,NODE_SIZE>::iterator unrolled_list<T,NODE_SIZE>::end() { return iterator(_tail, _tail->count); }
The const versions are similar:
<<list implementation>>= template <typename T, int NODE_SIZE> inline typename unrolled_list<T,NODE_SIZE>::const_iterator unrolled_list<T,NODE_SIZE>::begin() const { return const_iterator(_head, 0); } template <typename T, int NODE_SIZE> inline typename unrolled_list<T,NODE_SIZE>::const_iterator unrolled_list<T,NODE_SIZE>::end() const { return const_iterator(_tail, _tail->count); }
Insertion
We'll provide three methods that insert elements into lists:
<<public interface>>= void push_front(T const& elt); void push_back(T const& elt); iterator insert(iterator where, T const& elt);
insert() is the most general method. It inserts an element right before the element referred to by the given iterator. Most of the time, this method is as simple as inserting the given value into the array of the current node at the right index:
<<list implementation>>= template <typename T, int NODE_SIZE> inline typename unrolled_list<T,NODE_SIZE>::iterator unrolled_list<T,NODE_SIZE>::insert( typename unrolled_list<T,NODE_SIZE>::iterator where, T const& elt) { make more room in current node if node is full for(int i=where._curnode->count; i > where._curindex; i--) { where._curnode->values[i] = where._curnode->values[i-1]; } where._curnode->values[where._curindex] = elt; where._curnode->count++; _size++; return where; }
Note the typename keyword in the argument list. It is necessary here because the compiler cannot determine by itself that iterator is a type since it is dependent on the template parameter T.
To write the above, we need to ensure that the list has access to the iterator's internal pointers using a friend declaration. This violates encapsulation, but the iterator and list are tightly coupled, and it's better than exposing the pointer to everyone via a public method:
<<list friend declarations>>= friend class unrolled_list<T,NODE_SIZE>::iterator;
Now we deal with the case where the node is already full. One strategy is to redistribute elements to surrounding nodes, but this doesn't always work, since these nodes may also be full or become full. A more reliable approach is to create a new node, insert it into the list after the current node, and copy the second half of the elements from the current node into the new node:
<<make more room in current node if node is full>>= const int num_values = unrolled_list<T,NODE_SIZE>::node::values_size; if (where._curnode->count == num_values) // if node is full { node* n = new node; if (where._curindex != where._curnode->count) { move elements to new node } insert new node into node list update iterator }
We only move elements to the new node if the current index is not at the end of the original node. This prevents us from leaving half the space in each node unused in the common scenario where elements are sequentially added to the end of the list. Moving the elements is straightforward:
<<move elements to new node>>= for(int i=0; i < (num_values+1)/2; i++) { n->values[i] = where._curnode->values[i + num_values/2]; } where._curnode->count = num_values/2; n->count = (num_values+1)/2;
Before we can continue the insertion, if the element our iterator is pointing to has moved into the new node we must update the iterator accordingly:
<<update iterator>>= if (where._curindex >= where._curnode->count) { where._curindex -= where._curnode->count; where._curnode = n; }
Note that this code works even if our iterator is the end() iterator. It also gets run if we did not move any elements, and in this case has the effect of moving the iterator to the beginning of the new empty node.
Inserting the new node into the list is no different than inserting a node into a normal doubly-linked list:
<<insert new node into node list>>= n->next = where._curnode->next; n->prev = where._curnode; if (where._curnode->next != NULL) { where._curnode->next->prev = n; } where._curnode->next = n; insert: update tail
Finally, we update _tail if the new node was inserted at the end of the list:
<<insert: update tail>>= if (n->next == NULL) { _tail = n; }
Note that this, like any doubly-linked list insertion, is not threadsafe without modification. Additionally, it may invalidate iterators previously pointing to the second half of the elements in the node. Since an application does not know where node boundaries are, this effectively means that all iterators are invalidated except the one returned by insert (this is why we return this iterator).
Implementing push_front() and push_back() is now simple:
<<list implementation>>= template <typename T, int NODE_SIZE> inline void unrolled_list<T,NODE_SIZE>::push_front(T const& elt) { insert(begin(), elt); } template <typename T, int NODE_SIZE> inline void unrolled_list<T,NODE_SIZE>::push_back(T const& elt) { insert(end(), elt); }
Deletion
We supply four methods for deleting elements from the list:
<<public interface>>= iterator erase(iterator where); void pop_front(); void pop_back(); void clear();
pop_front() and pop_back() remove elements from the beginning and end of the list, respectively. clear() removes all elements. The most general, erase(), removes the element referenced by a given iterator and returns an iterator pointing at the following element. In the usual case, erase() involves only removing an element from an array and shifting all higher elements down by one:
<<list implementation>>= template <typename T, int NODE_SIZE> inline typename unrolled_list<T,NODE_SIZE>::iterator unrolled_list<T,NODE_SIZE>::erase( typename unrolled_list<T,NODE_SIZE>::iterator where) { for(int i=where._curindex; i < where._curnode->count - 1; i++) { where._curnode->values[i] = where._curnode->values[i+1]; } where._curnode->count--; _size--; if node is small enough, merge it update iterator if index too large return where; }
To remain memory-efficient, we want to ensure that we don't reserve too much space — we want to keep the total space allocated per element low. There are many ways to do this, but the simple strategy we take here is to merge two adjacent nodes if their combined element counts fall below some waterline, such as 80% of the node size. The result is that the list uses at least 40% of its space for data (for comparison, a typical doubly linked list can use as little as 8% of its space for data, and even vectors can use as little as 50%). We don't want to use 100% of the node size as our waterline, because this creates the potential for a sequence of insert/remove operations that deallocate and reallocate nodes rapidly, like a normal list.
We also always merge nodes in one special case: when a node becomes empty. This is necessary to ensure that an iteration step remains constant time.
<<if node is small enough, merge it>>= const int num_values = unrolled_list<T,NODE_SIZE>::node::values_size; const int threshold = num_values >= 5 ? num_values*4/5 : num_values - 1; if (where._curnode->prev != NULL && (where._curnode->count == 0 || where._curnode->count + where._curnode->prev->count <= threshold)) { where._curnode = where._curnode->prev; where._curindex += where._curnode->count; merge_node_with_next(where._curnode); } else if (where._curnode->next != NULL && (where._curnode->count == 0 || where._curnode->count + where._curnode->next->count <= threshold)) { merge_node_with_next(where._curnode); }
Note that in the case where a single node remains, we may end up with an empty node; this is okay, as we permit the empty node in this limited case. The method merge_node_with_next() simply copies elements from one node onto the end of another, updates the count, and removes the old node from the list:
<<list helper methods>>= void merge_node_with_next(node* node); <<list implementation>>= template <typename T, int NODE_SIZE> inline void unrolled_list<T,NODE_SIZE>::merge_node_with_next( typename unrolled_list<T,NODE_SIZE>::node* node) { typename unrolled_list<T,NODE_SIZE>::node* next = node->next; const int count = node->count; for(int i=0; i<next->count; i++) { node->values[i + count] = next->values[i]; } node->count += next->count; node->next = node->next->next; merge: update tail delete next; }
When we merge nodes, the tail node might change (the head node never changes):
<<merge: update tail>>= if (node->next == NULL) { _tail = node; }
Finally, if we erase the last element in a node, it's possible the iterator will become invalid. To deal with this we move it to the next node if there is one (if not, it correctly represents the end iterator):
<<update iterator if index too large>>= if (where._curindex >= where._curnode->count && where._curnode->next != NULL) { where._curindex -= where._curnode->count; where._curnode = where._curnode->next; }
Implementing the remaining methods using erase() is simple:
<<list implementation>>= template <typename T, int NODE_SIZE> inline void unrolled_list<T,NODE_SIZE>::pop_front() { erase(begin()); } template <typename T, int NODE_SIZE> inline void unrolled_list<T,NODE_SIZE>::pop_back() { erase(--(end())); }
We could implement clear() trivially using existing primitives:
<<slow clear implementation>>= while(begin() != end()) { pop_back(); }
This is, however, very inefficient. A more practical approach is to empty out the first node and delete the rest in order, restoring the list to its original state:
<<list implementation>>= /* Same effect as: <<slow clear implementation>> */ template <typename T, int NODE_SIZE> inline void unrolled_list<T,NODE_SIZE>::clear() { typename unrolled_list<T,NODE_SIZE>::node *n, *n_next; for (n = _head->next; n != NULL; n = n_next) { n_next = n->next; delete n; } _head->count = 0; _head->next = NULL; _tail = _head; }
Assignment
We next implement the assignment operator, which can simply use the insertion operation and iteration without ever examining internal state. It must also use clear() to clean out any existing list elements in the target list.
<<public interface>>= unrolled_list& operator =(unrolled_list<T,NODE_SIZE> const& rhs); <<list implementation>>= template <typename T, int NODE_SIZE> unrolled_list<T,NODE_SIZE>& unrolled_list<T,NODE_SIZE>::operator =(unrolled_list<T,NODE_SIZE> const& rhs) { clear(); const_iterator rhs_it = rhs.begin(); for (rhs_it = rhs.begin(); rhs_it != rhs.end(); ++rhs_it) { push_back(*rhs_it); } return *this; }
List size
We supply two methods that query the number of elements in the list and determine whether it is empty:
<<public interface>>= size_t size() const; bool empty() const;
Their implementation is trivial:
<<list implementation>>= template <typename T, int NODE_SIZE> inline size_t unrolled_list<T,NODE_SIZE>::size() const { return _size; } template <typename T, int NODE_SIZE> inline bool unrolled_list<T,NODE_SIZE>::empty() const { return _size == 0; }
Construction and destruction
Finally, we wish to provide the following constructors and destructor:
<<construction interface>>= unrolled_list(); unrolled_list(unrolled_list<T,NODE_SIZE> const& rhs); ~unrolled_list();
The default constructor creates an empty list, while the copy constructor and destructor are trivial to implement using the methods we have already implemented:
<<list implementation>>= template <typename T, int NODE_SIZE> inline unrolled_list<T,NODE_SIZE>::unrolled_list() : member initialisation {} template <typename T, int NODE_SIZE> inline unrolled_list<T,NODE_SIZE>::unrolled_list(unrolled_list<T,NODE_SIZE> const& rhs) : member initialisation { *this = rhs; } template <typename T, int NODE_SIZE> inline unrolled_list<T,NODE_SIZE>::~unrolled_list() { clear(); }
Files
All of the code is placed in header files, since this is a templated class. We separate declarations from definitions by placing one in a .h file and the other in a .hpp file:
<<unrolled_list.h>>= #include <iterator> template <typename T, int NODE_SIZE> class unrolled_list { private: node structure list private data list helper methods public: iterators construction interface public interface list friend declarations }; #include "unrolled_list.hpp" <<unrolled_list.hpp>>= list implementation
Performance testing
We consider some tests of the list's performance. The iterator-based design makes it simple to run the same test on many different containers. We begin with a simple test that makes a large number of push_back() calls on a single value:
<<performance tests>>= template <typename TList, typename TElement> void test_push_back(TElement value) { TList lst; for (int i=0; i<50000000; i++) { lst.push_back(value); } }
We drive the test with some simple timing code on three different containers:
<<performance headers>>= #include <vector> #include <list> #include "unrolled_list.h" <<performance test timing>>= void time_test_push_back() { clock_t start; start = clock(); test_push_back< std::list<int> >(0); printf("test_push_back< std::list<int> >: %f\n", ((double)(clock() - start)) / CLOCKS_PER_SEC); start = clock(); test_push_back< std::vector<int> >(0); printf("test_push_back< std::vector<int> >: %f\n", ((double)(clock() - start)) / CLOCKS_PER_SEC); start = clock(); test_push_back< unrolled_list<int,1024> >(0); printf("test_push_back< unrolled_list<int,1024> >: %f\n", ((double)(clock() - start)) / CLOCKS_PER_SEC); start = clock(); test_push_back< unrolled_list<int,128> >(0); printf("test_push_back< unrolled_list<int,128> >: %f\n", ((double)(clock() - start)) / CLOCKS_PER_SEC); } <<run tests>>= time_test_push_back();
Our next test is a worst case for vectors, adding a large number of nodes to the beginning of the list. We cannot use push_front(), since vector deliberately omits this call. We use insert() instead. This test is so slow on vectors that we had to drastically cut the number of iterations to reach completion in a reasonable time.
<<performance tests>>= template <typename TList, typename TElement> void test_push_front(TElement value) { TList lst; for (int i=0; i<500000; i++) { lst.insert(lst.begin(), value); } } <<performance test timing>>= void time_test_push_front() { clock_t start; start = clock(); test_push_front< std::list<int> >(0); printf("test_push_front< std::list<int> >: %f\n", ((double)(clock() - start)) / CLOCKS_PER_SEC); start = clock(); test_push_front< std::vector<int> >(0); printf("test_push_front< std::vector<int> >: %f\n", ((double)(clock() - start)) / CLOCKS_PER_SEC); start = clock(); test_push_front< unrolled_list<int,1024> >(0); printf("test_push_front< unrolled_list<int,1024> >: %f\n", ((double)(clock() - start)) / CLOCKS_PER_SEC); start = clock(); test_push_front< unrolled_list<int,128> >(0); printf("test_push_front< unrolled_list<int,128> >: %f\n", ((double)(clock() - start)) / CLOCKS_PER_SEC); } <<run tests>>= time_test_push_front();
Next, we create a simple iteration test that creates a large list and then iterates over it in order. Since we don't want to count the list creation part, we have the test return the start time:
<<performance tests>>= template <typename TList, typename TElement> clock_t test_iteration(TElement value) { TList lst; for (int i=0; i<50000000; i++) { lst.push_back(value); } clock_t start = clock(); typename TList::iterator it; for (it = lst.begin(); it != lst.end(); it++) { if (*it != value) { printf("Error!\n"); } } return start; } <<performance test timing>>= void time_test_iteration() { clock_t start; start = test_iteration< std::list<int> >(0); printf("test_iteration< std::list<int> >: %f\n", ((double)(clock() - start)) / CLOCKS_PER_SEC); start = test_iteration< std::vector<int> >(0); printf("test_iteration< std::vector<int> >: %f\n", ((double)(clock() - start)) / CLOCKS_PER_SEC); start = test_iteration< unrolled_list<int,1024> >(0); printf("test_iteration< unrolled_list<int,1024> >: %f\n", ((double)(clock() - start)) / CLOCKS_PER_SEC); start = test_iteration< unrolled_list<int,128> >(0); printf("test_iteration< unrolled_list<int,128> >: %f\n", ((double)(clock() - start)) / CLOCKS_PER_SEC); } <<run tests>>= time_test_iteration();
We place this code in its own performance testing source file:
<<performance.cpp>>= performance headers performance tests performance test timing int main() { run tests<