In this paper we present a new technique for automatically measuring the performance of tasks, functions or arbitrary parts of a program on a multiprocessor embedded system.
The task parallelism is annotated with OpenMP. The technique instruments the tasks described by OpenMP, while ad hoc pragmas in the source indicate other pieces of code to profile. The annotations and the instrumentation are completely target independent, so the same code can be measured on different target architectures, on simulators or on prototypes. We validate the approach on a single and on a dual LEON 3 platform synthesized on FPGA, demonstrating a low instrumentation overhead.
We show how the information obtained with this technique can be easily exploited in a Hardware/Software design space exploration tool, by estimating, with good accuracy, the speed-up of a parallel applications given the profiling on the single processor prototype.