I chose to implement the decimation-in-time (DIT) form of a FFT not because I thought it would result in an efficient implementation but rather because the algorithm seemed conceptually simpler. I have favored simplicity over efficiency since this is my first verilog design.
The definition of a discrete fourier transform (DFT) is
where \(x_n\) is a sequence of length \(N\) and \(X_k\) is the fourier transform of this sequence.
The most common FFT algorithm is the Cooley-Tukey FFT algorithm which takes advantage of the fact that a discrete fourier transform (DFT) can be expressed in terms of a combination of two smaller DFTs.
Let \(e_n\) be the even components of \(x_n\) (\(e_n = x_{2n}\)).
Let \(o_n\) be the odd components of \(x_n\) (\(o_n = x_{2n+1}\)).
Let \(E_k\) and \(O_k\) be their DFTs.
Since DFTs wrap around, \(E_k = E_{k-N/2}\), and \(e^{ix} = -e^{i(x+\pi)}\), we can write \(X_k\) as:
Equations 1 and 2
so that we only need to consider \(0 < k \le N/2\) to generate all of \(X_k\).
Using these equations the DFT is calculated in a series of stages where in each stage we calculate a new set of N complex numbers from the N complex number of the previous stage.
Figure 1 shows a schematic of the calculation of a DFT of size 8. The column of rectanges on the left represent \(x_n\), the input data to the DFT (the zeroth stage). The column of rectangles on the right represent \(X_k\), the output of the DFT (the third stage). In between are two intermediate stages. The data in the second stage is labelled with \(E_k\) and \(O_k\) so that we can see how to apply equations 1 and 2 to calculate the third stage from the second. The data in the first stage is labelled with \({EE}_k\), \({OE}_k\), \({EO}_k\), and \({OO}_k\), where \({OE}_k\) is the fourier transform of \({oe}_n\) which is the even components of \(o_n\), the odd components of \(x_n\). We can calculate \(E_k\), using equations 1 and 2, from \({EE}_k\) and \({EO}_k\), and similarly we calculate \(O_k\) from \({OE}_k\) and \({OO}_k\). The lines in Figure 1 show the dataflow from one stage to the next.
In general, stage \(s\) is an interleaving of \(S=N/2^s\) transforms of length \(M=N/S\). Let \(P_{s,i}\) be component \(i\) of stage \(s\). Let \(X_{s,j,k}\) be component \(k\) of transform \(j\) in stage \(s\). So in Figure 1 \(E_3\) is component 6 (we index from 0) of stage 2, \(P_{2, 6}\), and is also component 3 of transform 0 of stage 2, \(X_{2, 0, 3}\).
The DFT \(X_{s,j,k}\) (\(k\) is position in the DFT) was generated by combining two DFTs from the previous stage, \(X_{s-1,j,k}\) and \(X_{s-1,j+S,k}\). We then have,
Equation 3
where \(T_{n} = e^{-2 \pi i n/N}\).
And similarly for the second half of P we have,
Equation 4
This completes the derivation of the algorithm and in the next section we'll describe the verilog implementation.