Implementation of Efficient Fft Algorithms on Fused Multiply-Add Architectures
Abstract
The decimation-in-time radix-2, radix-4, split-radix, and radix-8 algorithms, presented in a paper by Linzer and Feig [5], are described in detail. These algorithms compute discrete Fourier transforms (DFT’s) on input sequences with lengths that are powers of 2 with fewer multiply-adds than traditional Cooley-Tukey algorithms. The descriptions given provide the needed details to implement these algorithms efficiently in a computer program that could compute DFT’s on a length 2m sequence for general m. We describe and give timing results for a radix-4 version that we have implemented on the RS/6000 workstation. The timing results show that a substantial saving in execution time is obtained when the new radix-4 FFT is used instead of a standard Cooley-Tukey radix-4 FFT. Finally, we present a set of experiments that suggest that numerical behavior of the new algorithms is slightly better than the numerical behavior of Cooley-Tukey FFT’s. © 1993 IEEE