This paper describes the RTL-to-layout implementation of the PACT XPP-III Coarse-Grained Reconfigurable Architecture (CGRA). The implementation activity was strictly based on a hierarchical approach in order to exploit performance optimization at all levels, as well as guarantee maximum scalability and provide a portfolio of IP-blocks that could be reused to build different configurations and embodiments of the same CGRA template. The final result can be seamlessly introduced in any SoC design flow as embedded accelerator. It is designed in STMicroelectronics 90nm GP technology, occupies 42.5 mm2, delivers 13 16-bit GOPS (0.8 GOPS/mW, 10 MOPS/mW) and has a measured max frequency of 150 MHZ, requiring a measured 13 mW/MHz dynamic power, 93 mW static. A silicon prototype was also produced embedding XPP-III in a complex system-on-chip including an ARM processor as system controller as well as different ASIC blocks.