# Median cut algorithm (C Plus Plus)

**Other implementations**:**C++**| J*If you're looking to find the median of a list of elements, see the nth_element standard library function or Category:Selection_algorithm.*

The median cut algorithm is a popular algorithm for color quantization, but can be applied to virtually any point clustering problem. In outline, it works as follows:

- Let B be a set of boxes containing points, initially containing only a single box containing all points.
- While the number of boxes in B is less than desired number of clusters...
- Find the largest side length of any side of any box.
- Cut that box into two boxes along its largest side in such a way that half the contained points fall into each new box.
- Shrink the two new boxes so that they are just large enough to contain their points.

C++ includes a number of standard containers and algorithms that greatly facilitate the simple and efficient implementation of median cut.

## Contents |

## [edit] Point and Block data structures

We represent each point with a simple structure reserving an unsigned byte for each component:

<<constants>>=constintNUM_DIMENSIONS = 3;<<class declarations>>=structPoint{unsignedcharx[NUM_DIMENSIONS];};

We don't include a comparison operator because we'll want to compare them in several different ways.

We represent a block by storing the points corresponding to two opposite corners, where one has all three coordinates as small as possible and the other as large as possible. For example, a block might have minCorner set to (1, 1, 1) and maxCorner set to (2, 2, 2) — this block has all sides of length one. Blocks also keep track of the input points that they contain. For efficiency reasons to be explained later, they track points using a pointer and a length:

<<class declarations>>=classBlock{Point minCorner, maxCorner; Point* points;intpointsLength;public: Block constructor declaration Block public method declarationsprivate: Block helper method declarations};

We'll see that it's necessary that we have random access to the points. The only constructor we will need is one that takes an array of points and initializes the box to the largest possible box:

<<Block constructor declaration>>=Block(Point* points,intpointsLength);<<Block constructor definition>>=Block::Block(Point* points,intpointsLength){this->points = points;this->pointsLength = pointsLength;for(inti=0; i < NUM_DIMENSIONS; i++){minCorner.x[i] = std::numeric_limits<unsignedchar>::min(); maxCorner.x[i] = std::numeric_limits<unsignedchar>::max();}}

We need limits.h for `numeric_limits<int>::min()`

and `max()`

:

<<header files>>=#include <limits>

We provide simple accessors for the points and points length:

<<Block public method declarations>>=Point * getPoints();intnumPoints()const;<<Block public methods>>=Point * Block::getPoints(){returnpoints;}intBlock::numPoints()const{returnpointsLength;}

In each step of the algorithm, we will need to find the block with the longest side. We first add a method that figures out which side of a block is longest:

<<Block public method declarations>>=intlongestSideIndex()const;<<Block public methods>>=intBlock::longestSideIndex()const{intm = maxCorner.x[0] - minCorner.x[0];intmaxIndex = 0;for(inti=1; i < NUM_DIMENSIONS; i++){intdiff = maxCorner.x[i] - minCorner.x[i];if(diff > m){m = diff; maxIndex = i;}}returnmaxIndex;}

Now, we define a method to determine how long the block's longest side is:

<<Block public method declarations>>=intlongestSideLength()const;<<Block public methods>>=intBlock::longestSideLength()const{inti = longestSideIndex();returnmaxCorner.x[i] - minCorner.x[i];}

To avoid noncompliance issues in Visual C++, we refrain from using `std::max`

, instead redefining our own min and max methods:

<<Block helper method declarations>>=template<typenameT>staticT min(constT a,constT b){if(a < b)returna;elsereturnb;}template<typenameT>staticT max(constT a,constT b){if(a > b)returna;elsereturnb;}

Next, we define a priority queue of blocks:

<<header files>>=#include <queue><<median cut declarations>>=std::priority_queue<Block> blockQueue;

Finally, we define a less-than operator on `Block`

that causes the Block having the longest side to have maximum priority in the queue, meaning they will be selected first:

<<Block public method declarations>>=booloperator<(constBlock& rhs)const;<<Block public methods>>=boolBlock::operator<(constBlock& rhs)const{returnthis->longestSideLength() < rhs.longestSideLength();}

Now, `blockQueue.top()`

will rapidly yield the block we desire with no search. Old blocks (blocks being cut) are removed with `pop()`

, and new blocks (the pieces resulting from the cut) can be added using `push()`

:

<<median cut algorithm main loop>>=create initial block blockQueue.push(initialBlock);while(blockQueue.size() < desiredSize){Block longestBlock = blockQueue.top(); blockQueue.pop(); split longestBlock into block1 and block2 blockQueue.push(block1); blockQueue.push(block2);}

Finally, we need a way to shrink a block so that it just barely contains its points; that is, its minimum and maximum coordinates are chosen according to the minimum and maximum coordinates of its points. We add a method `shrink()`

for this, which computes the minimums and maximums of each coordinate of the points in a block and sets the two corners accordingly:

<<Block public method declarations>>=voidshrink();<<Block public methods>>=voidBlock::shrink(){inti,j;for(j=0; j<NUM_DIMENSIONS; j++){minCorner.x[j] = maxCorner.x[j] = points[0].x[j];}for(i=1; i < pointsLength; i++){for(j=0; j<NUM_DIMENSIONS; j++){minCorner.x[j] = min(minCorner.x[j], points[i].x[j]); maxCorner.x[j] = max(maxCorner.x[j], points[i].x[j]);}}}

We compute the minimums and maximums in a single pass to maximize cache performance.

## [edit] Performing the median cut

Our median cut algorithm will take an array of image points as its input and produce a list of points denoting the optimized palette entries:

<<median cut algorithm declaration>>=std::list<Point> medianCut(Point* image,intnumPoints,unsignedintdesiredSize)<<median cut algorithm>>=median cut algorithm declaration{median cut declarations median cut algorithm main loop std::list<Point> result; find a representative point for each block and add it to the resultreturnresult;}<<type definition headers>>=#include <list>

Our initial block will wrap all the points:

<<create initial block>>=Block initialBlock(image, numPoints); initialBlock.shrink();

To split a block, we use `nth_element`

, which can simultaneously locate the median and then rearrange the points list to separate the elements less than the median from those greater than the median:

<<split longestBlock into block1 and block2>>=Point * begin = longestBlock.getPoints(); Point * median = longestBlock.getPoints() + (longestBlock.numPoints()+1)/2; Point * end = longestBlock.getPoints() + longestBlock.numPoints(); partition the points into two sublists using nth_element Block block1(begin, median-begin), block2(median, end-median); block1.shrink(); block2.shrink();

The two new blocks `block1`

and `block2`

wrap around the lower and upper halves of the rearranged list. We need the algorithm header for `nth_element`

:

<<header files>>=#include <algorithm>

Depending on the direction we're cutting, we need to examine a different coordinate of the points when partitioning. One straightforward but slow way to do this is to use a comparator object that knows the block's longest side's index:

<<slow: partition the points into two sublists using nth_element>>=CoordinatePointComparator pointComparator(longestBlock.longestSideIndex()); std::nth_element(begin, median, end, pointComparator);<<slow: class declarations>>=classCoordinatePointComparator{intindex;public: CoordinatePointComparator(intindex){this->index = index;}booloperator()(Point left, Point right){returnleft.x[index] < right.x[index];}};

If we can assume we're dealing with three dimensions, a more efficient approach is to use templates to specialize the comparator object to the particular index being compared, moving costly branches from the inner loop of the partition operation to an initial branch outside the calls:

<<partition the points into two sublists using nth_element>>=switch(longestBlock.longestSideIndex()){case0: std::nth_element(begin, median, end, CoordinatePointComparator<0>());break;case1: std::nth_element(begin, median, end, CoordinatePointComparator<1>());break;case2: std::nth_element(begin, median, end, CoordinatePointComparator<2>());break;}<<class declarations>>=template<intindex>classCoordinatePointComparator{public:booloperator()(Point left, Point right){returnleft.x[index] < right.x[index];}};

Experiments show that this gives a 30% time improvement over the slower solution.

To find a representative point for each block, we merely compute the arithmetic mean (average) of all points in the cluster:

<<find a representative point for each block and add it to the result>>=while(!blockQueue.empty()){Block block = blockQueue.top(); blockQueue.pop(); Point * points = block.getPoints();intsum[NUM_DIMENSIONS] ={0};for(inti=0; i < block.numPoints(); i++){for(intj=0; j < NUM_DIMENSIONS; j++){sum[j] += points[i].x[j];}}Point averagePoint;for(intj=0; j < NUM_DIMENSIONS; j++){averagePoint.x[j] = sum[j] / block.numPoints();}result.push_back(averagePoint);}

## [edit] Files

Finally, we wrap up the implementation in source and header files for reuse:

<<median_cut.h>>=#ifndef _MEDIAN_CUT_H_type definition headers constants class declarations median cut algorithm declaration;#endif /* #ifndef _MEDIAN_CUT_H_ */

<<median_cut.cpp>>=header files#include "median_cut.h"Block constructor definition Block public methods median cut algorithm

## [edit] Sample code

We can try out the algorithm using a small command-line driver. We feed it a raw image file along with its dimensions and the desired palette size. For example, if you download Image:Amorphophallus titanum USBG small.raw, and assuming your compiled binary is named "median_cut_sample", you can run median cut on it like this:

./median_cut_sample Amorphophallus_titanum_USBG_small.raw 100 145 16

The output will be:

74 68 86 51 49 39 165 126 127 135 141 134 118 84 78 20 12 13 179 153 159 205 216 148 41 51 16 171 178 94 241 240 229 134 147 80 57 40 57 74 85 58 120 95 124 184 210 103

I reduced color depth in Photoshop using this custom palette. The resulting image is a bit muted compared to the proprietary color quantization used by Photoshop, but is nevertheless quite suited to the image:

<<median_cut_sample.cpp>>=#include <iostream>#include "median_cut.h"intmain(intargc,char* argv[]){FILE * raw_in;intnumPoints = atoi(argv[2]) * atoi(argv[3]); Point* points = (Point*)malloc(sizeof(Point) * numPoints); raw_in = fopen(argv[1], "rb");for(inti = 0; i < numPoints; i++){fread(&points[i], 3, 1, raw_in);}fclose(raw_in); std::list<Point> palette = medianCut(points, numPoints, atoi(argv[4])); std::list<Point>::iterator iter;for(iter = palette.begin() ; iter != palette.end(); iter++){std::cout << (int)iter->x[0] << " " << (int)iter->x[1] << " " << (int)iter->x[2] << std::endl;}return0;}

## [edit] Performance

Because it avoids sorting the points and is very simple, this implementation performs well. As the following table demonstrates, both the image size and palette size have an impact on runtime. The same photograph at various scales was used for each image. The program was compiled on Linux with gcc with the "-O3" flag and run on a 2.8 GhZ Pentium 4 processor. User space time is measured, including load time. User times were very consistent between runs, with only a few milliseconds of variation.

Image pixels | Time in milliseconds for palette of size
| ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

1 | 2 | 4 | 8 | 16 | 32 | 64 | 128 | 256 | 512 | 1024 | |

196,608 | 24 | 28 | 32 | 40 | 48 | 48 | 56 | 56 | 64 | 76 | 84 |

393,132 | 48 | 56 | 68 | 76 | 92 | 96 | 112 | 116 | 132 | 144 | 164 |

786,432 | 88 | 100 | 120 | 152 | 172 | 188 | 208 | 232 | 252 | 288 | 308 |

1,572,528 | 168 | 200 | 236 | 292 | 324 | 356 | 408 | 448 | 492 | 548 | 612 |

3,145,728 | 336 | 404 | 480 | 580 | 640 | 712 | 832 | 876 | 972 | 1084 | 1184 |

Time scales roughly linearly with image size and logarithmically with palette size, as theory would predict.

According to profiling, runtime is dominated about half by the partition operation and half by the block shrinking operation (mainly, the computing of minimums and maximums). A more efficient approach would combine these operations, finding minimums and maximums at the same time that it performs the partition instead of scanning the points all over again, but this would require opening up or reimplementing `nth_element`

, so we don't do it here.

Download code |