Implementation and Benchmarking of FFT algorithms on Multicore Platforms

Claudio Brunelli1 and Roberto Airoldi2

1Nokia Corporation, 2Tampere University of Technology


This paper analyzes the performance of the execution of a few commonly used versions of the Fast Fourier Transform (FFT) algorithm. We started from the C implementation of programs implementing the aforementioned FFT algorithms, then profiled their execution on a series of multicore platforms, both embedded and not. The aim of this work is multiple: in the first place we tried to find out how well different FFT algorithms map to different multicore processors. Secondly, we wanted to understand also how well the performance scales with the number of cores, and how well current compilers manage in exploiting the available hardware when compared to handcrafted programs. Results show that Radix-4 Cooley-Tuckey FFT is on average the best one among the algorithms considered.