The high computation demands of next generation cellular and broadcast wireless require both higher efficiency and greater flexibility in baseband processing. This paper introduces a new DSP architecture optimized for baseband applications, especially applications with heavy workload of complex filtering, FFT and MIMO matrix operations. The Tensilica ConnX BaseBand Engine processor core implements a 3-way VLIW, 8-way SIMD architecture that can sustain 16 multiply-add operations per second, and performance a full radix-4 FFT butterfly per cycle. At 400MHz, it provides almost 13GB per second of memory bandwidth and 1.6B complex FIR filter taps per cycle. It directly implements 8-way parallel division and 4-way parallel reciprocal square root operations. The rich programming environment, including vectorization of scalar C applications, allows easy deployment BaseBand Engine into cellular base-station, femto-cell and other software-agile radio applications, and into multi-standard broadcast receivers.