This paper describes the implementation of a FFT on a system based on a GP core and a reconfigurable coarse-grain accelerator. The entire system has been prototyped on an Altera Stratix II device. On the prototype a 1024-point FFT gives a 40X speed-up in comparison with the software implementation. The 1024-point FFT is executed in 400us. Considering an ASIC synthesis of the coarse-grain array, the 1024-point FFT is executed in 42us, against the 104μs of a DSP implementation.