CooleyTukey FFT algorithm (C)
This program is under development.
Please help to debug it. When debugging
is complete, remove the {{develop}} tag.
The CooleyTukey FFT algorithm is a popular fast Fourier transform algorithm for rapidly computing the discrete fourier transform of a sampled digital signal. It applies best to signal vectors whose lengths are highly composite, usually a power of 2.
Here we describe a C implementation of CooleyTukey. We begin with a straightforward implementation and then describe various performance improvements. Like most practical implementations, we will concentrate on the case where the vector length is a power of 2.
Contents 
[edit] Complex arithmetic
Although the newest version of C, C99, has library support for complex number datatypes and arithmetic, it is not widely supported. We write simple implementations of a few basic operations we need here, with little explanation; for a more detailed discussion, see Category:Complex numbers.
<<complex number datatype>>= typedef struct complex_t { double re; double im; } complex; <<complex number prototypes>>= complex complex_from_polar(double r, double theta_radians); double complex_magnitude(complex c); complex complex_add(complex left, complex right); complex complex_sub(complex left, complex right); complex complex_mult(complex left, complex right); <<complex number function definitions>>= complex complex_from_polar(double r, double theta_radians) { complex result; result.re = r * cos(theta_radians); result.im = r * sin(theta_radians); return result; } double complex_magnitude(complex c) { return sqrt(c.re*c.re + c.im*c.im); } complex complex_add(complex left, complex right) { complex result; result.re = left.re + right.re; result.im = left.im + right.im; return result; } complex complex_sub(complex left, complex right) { complex result; result.re = left.re  right.re; result.im = left.im  right.im; return result; } complex complex_mult(complex left, complex right) { complex result; result.re = left.re*right.re  left.im*right.im; result.im = left.re*right.im + left.im*right.re; return result; }
The use of sin()
, cos()
, and sqrt()
requires the header math.h:
<<complex number headers>>= #include <math.h>
[edit] Naïve evaluation of definition
First we discuss the simplest possible DFT solution in order to address some of the issues in a simpler setting. Suppose that our input vector is x and our output vector is X. Then the discrete Fourier transform is defined by the formula:
 For each ,
The quantity is most easily described in terms of its polar complex coordinates: it is on the unit circle (distance r=1 from the origin) and forms an angle θ of radians with the positive real axis. This enables us to evaluate the definition as follows:
<<FFT prototypes>>= complex* DFT_naive_1(complex* x, int N); <<Naive DFT>>= complex* DFT_naive_1(complex* x, int N) { complex* X = (complex*) malloc(sizeof(struct complex_t) * N); int k, n; for(k=0; k<N; k++) { X[k].re = X[k].im = 0.0; for(n=0; n<N; n++) { X[k] = complex_add(X[k], complex_mult(x[n], complex_from_polar(1, 2*PI*n*k/N))); } } return X; }
We require headers for memory management and need to define PI
:
<<FFT headers>>= #include <stdlib.h> <<FFT constants>>= #define PI 3.1415926535897932
This works well for very small inputs and does not place any requirements on N. At the expense of additional space, we can avoid repeatedly recomputing the factors . Instead, we compute only the N values for and use the fact that e^{2πik} = 1 to reduce to one of these:
 where and .
In code:
<<FFT prototypes>>= complex* DFT_naive_2(complex* x, int N); <<Naive DFT>>= complex* DFT_naive_2(complex* x, int N) { complex* X = (complex*) malloc(sizeof(struct complex_t) * N); complex* Nth_root = (complex*) malloc(sizeof(struct complex_t) * N); int k, n; for(k=0; k<N; k++) { Nth_root[k] = complex_from_polar(1, 2*PI*k/N); } for(k=0; k<N; k++) { X[k].re = X[k].im = 0.0; for(n=0; n<N; n++) { X[k] = complex_add(X[k], complex_mult(x[n], Nth_root[(n*k) % N])); } } free(Nth_root); return X; }
This considerably speeds the algorithm in practice. Unfortunately, even with this change, directly evaluating this sum for each k requires O(N^{2}) time overall, which is prohibitive for large N.
[edit] Radix2 decimation in time
The radix2 decimation in time algorithm uses a divideandconquer approach to improve efficiency. It begins by separating the input vector x into two smaller vectors d and e, with d containing the oddnumbered elements of the original vector and e containing the evennumbered elements. For example:
 x = (1, 2, 3, 4, 5, 6)
 d = (2, 4, 6)
 e = (1, 3, 5)
(Keep in mind that the indexing starts at zero). We then recursively compute the FFTs of d and e, call them D and E. We use them to construct X according to the following formula, which can be shown directly from the definition:
Thus, all we need to do to compute X is to compute D, E, and the N/2 complex roots of unity specified by for (called twiddle factors), then do some basic arithmetic. The additional work after computing D and E is linear, so the analysis is just like that of mergesort, requiring O(NlogN) worstcase time. Following is a straightforward implementation based on recursion using the above formulas:
<<FFT prototypes>>= complex* FFT_simple(complex* x, int N /* must be a power of 2 */); <<Simple FFT>>= complex* FFT_simple(complex* x, int N /* must be a power of 2 */) { complex* X = (complex*) malloc(sizeof(struct complex_t) * N); complex * d, * e, * D, * E; int k; if (N == 1) { X[0] = x[0]; return X; } e = (complex*) malloc(sizeof(struct complex_t) * N/2); d = (complex*) malloc(sizeof(struct complex_t) * N/2); for(k = 0; k < N/2; k++) { e[k] = x[2*k]; d[k] = x[2*k + 1]; } E = FFT_simple(e, N/2); D = FFT_simple(d, N/2); for(k = 0; k < N/2; k++) { /* Multiply entries of D by the twiddle factors e^(2*pi*i/N * k) */ D[k] = complex_mult(complex_from_polar(1, 2.0*PI*k/N), D[k]); } for(k = 0; k < N/2; k++) { X[k] = complex_add(E[k], D[k]); X[k + N/2] = complex_sub(E[k], D[k]); } free(D); free(E); return X; }
Although this works, is simple, and has good asymptotic performance, in practice the large number of allocations above degrade speed and triple memory use, which is particularly harmful on large inputs. Additionally, many of the complex_from_polar()
calls are redundant, since only Nth roots of unity are used anywhere in the recursion tree. Finally, this implementation has considerable function call overhead and would ideally be replaced by a simpler algorithm on small subvectors.

[edit] Files
We create a few separate modules: one for our basic complex number support, one for the FFT implementations, and one for the test driver and main()
, described in the next section. We include the complex number header in the FFT header for the complex
type.
<<complex_simple.h>>= #ifndef _COMPLEX_SIMPLE_H_ #define _COMPLEX_SIMPLE_H_ complex number datatype complex number prototypes #endif /* #ifndef _COMPLEX_SIMPLE_H_ */ <<complex_simple.c>>= #include "complex_simple.h" complex number headers complex number function definitions
<<fft.h>>= #ifndef _FFT_H_ #define _FFT_H_ #include "complex_simple.h" FFT prototypes #endif /* #ifndef _FFT_H_ */ <<fft.c>>= #include "fft.h" FFT headers FFT constants Naive DFT Simple FFT
[edit] Test driver

Download code 