International Symposium on System-on-Chip
SoC | 1999 | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015

General

Conference

Lodging / Travel


Valid XHTML 1.1

Valid CSS!

Tutorial at SoC 2010

Tuesday, September 28, 2010

Dependable Hardware and Software for Embedded Multicore Computing Platforms

Organized by CRISP FP7 project in cooperation with GETA graduate school.

Instructors:

  • Jari Nurmi, TUT, Finland
  • Gerard Rauwerda, Recore Systems, The Netherlands
  • Eric Verhulst, Altreonic, Belgium
  • Bart Vermeulen, NXP, The Netherlands
  • Dmitry Lachover, Freescale Semiconductor, Israel
  • Timon ter Braak, University of Twente, The Netherlands
  • Gerard Smit, University of Twente, The Netherlands

Room Sonaatti, Tampere Hall, from 9:00 to 17:00

Agenda

9:00 Opening

9:05  Trends in Embedded Computing - Scalability, Reconfigurability and Dependability
      J. Nurmi, Tampere University of Technology, FI

9:50  The CRISP Reconfigurable Many-Cores Architecture
      G. Rauwerda, Recore Systems, NL

10:35 Coffee Break

11:00 Formal Development vs. Formal Verification - The OpenComRTOS Example
      E. Verhulst, Altreonic, BE

11:45 CRISP Dependability Approach using Testing
      B. Vermeulen, NXP, NL


12:30 Lunch


13:30 An Architecture for Scalable Concurrent Embedded Software
      E. Verhulst, Altreonic, BE

14:15 Optimal Hardware/Software Partitioning on Multicore DSP
      D. Lachover, Freescale Semiconductor, IL

15:00 Coffee Break

15:30 Run-Time Mapping for Dependable Stream Processing
      T. ter Braak, University of Twente, NL

16:15 Streaming Applications and Multi-Core Architectures
      G. Smit, University of Twente, NL

17:00 End of tutorial

Abstracts of the lectures

Trends in Embedded Computing - Scalability, Reconfigurability and Dependability
Embedded computing in system-on-chip (SoC) environment relies increasingly on the use of multiple processors or processor cores on a single chip. The number of processing elements is expected to grow beyond one thousand in near future. In such a scenario, the scalability of the hardware and software architecture becomes a highly important feature of the computation platform.
Energy-efficiency is and will remain as one of the main concerns in embedded system implementation. Completely software-programmable manycore systems are not enough to meet all the requirements, but on the other hand fixed hardware implementations are not flexible enough. Reconfigurable architectures, especially coarse-grain reconfigurable computing elements, are a viable alternative with higher performance and lower power consumption compared to software-programmable implementation. At the same time, they retain enough of the flexibility as resources can be shared between a number of tasks.
One of the consequences of technology scaling - one of the enablers of scalable manycore systems - is that the chips become less reliable. A part of the system may be faulty since the fabrication, or various permanent or occasional errors or transient faults may occur when in operation. Different schemes improving the dependability of the systems need to be developed, to monitor, analyze, recover from errors, and even repair faulty functionality on-chip at runtime.

The CRISP Reconfigurable Many-Cores Architecture
Recore Systems develops semiconductor IP solutions for reconfigurable multi-core systems-on-chip (SoC) that are low power and low cost. The presented reconfigurable technology is used in the CRISP project to create a scalable reconfigurable many-core SoC. The many-core SoC includes digital signal processing cores and distributed memory resources:

  • The Xentium® is a programmable high-performance DSP processor core that is efficient and offers high-precision.
  • The Memtium™ is a reconfigurable memory tile that enables distributed local random access memory in a multi-core architecture.
The presented reconfigurable SoC consists of programmable fixed-point digital signal processing cores that are connected by a NoC. The NoC provides the bandwidth and flexibility that is required for streaming DSP applications. The communication bandwidth in the NoC scales with the number of cores. The NoC ensures predictable performance due to its point-to-point connections, in contrast to the unpredictability of a shared bus. Moreover, the presented proposed multi-core architecture provides a solution for true scalability by extending the NoC across the chip boundaries.
The concepts and ideas of the CRISP many-core architecture are presented and an outlook to future activities will be given.

Formal Development vs. Formal Verification - The OpenComRTOS Example
Formal verification is often put forward as the way to guarantee proven correctness of e.g. a digital design or written software. However, this assumes that the architectural design was flawless and it doesn't provide much information on the structural properties of the design. Using formal methods from the early beginning can however result in much cleaner, more efficient and hence also more easy to verify software. This will be illustrated by the formal development of OpenComRTOS, resulting in e.g. a code size reduction by a factor 5 to 10. It also showed how formal methods can help by marrying formal methods with creative thinking.

CRISP Dependability Approach using Testing
Advanced CMOS technologies allow present-day systems- on-chip (SoCs) to contain multiple programmable processor cores, and dedicated peripherals. Besides hardware functionality, they also contain a growing amount of embedded software. However, with every new generation of CMOS technology (i.e. 90 nm and beyond), electronic circuits become more susceptible to manufacturing defects, which, when left unaddressed, has a significant, negative impact on the yield and reliability of manufactured chips. Furthermore, SoCs are increasingly used for safety-critical applications.
In this talk, we present the CRISP approach to improving dependability and yield of deep-submicron chips using new techniques for static and dynamic detection and localization of faults and (dynamically) circumventing faulty hardware. For this, we first introduce basic concepts and taxonomy of dependable systems, and explain important attributes such as reliability, maintainability, and availability. Subsequently, we derive the dependability requirements of our target application in terms of its basic functions, and its relevant hardware elements. We then present in detail the Design-for-Dependability measures that have been taken in the Reconfigurable Fabric Device (RFD) design to improve its system dependability. Specifically we focus on the functionality and design of the on-chip Dependability Manager and the dependability wrappers around the RFD processor tiles. Crucial in our approach is the interoperability with the on-chip network-on-chip, and the off-chip run-time mapping software. We will conclude with experimental results of the validation of this dependability functionality.

An Architecture for Scalable Concurrent Embedded Software
If communication is the bottleneck, why develop the scheduler first? In this presentation we show how distributed or many-core realtime embedded systems are dependent on a realtime capable communication subsystem. As this subsystem is often a shared resource a natural architecture is based on concurrency and packet based communication. This idea is reflected in the formally developed OpenComRTOS resulting in unprecedented scalability for embedded applications.

Optimal Hardware/Software Partitioning on Multicore DSP
To meet growing demand for advanced 3G and 4G services, wireless infrastructure equipment manufacturers increasingly require devices that offer exceptional performance and flexibility. Multi-standard devices are needed to fully support Base Band Physical Layer requirements for the target markets of WiMAX, WCDMA/HSPA, 3GPP-LTE, TDD-LTE and TD-SCDMA base stations. To enable these technologies, the device has to provide a low latency and high throughput communication solution at an affordable price.
In addition a balance of high performance, low power processing with sufficient programmability is needed. In this presentation we explore several architectural alternatives and arrive at optimal HW-SW partitioning that meet these requirements. We will demonstrate this partitioning in the highly integrated Freescale's MSC8156/5 DSP device. The MSC8156/5 DSP is a six-core device based on SC3850 StarCore DSP core technology and delivers flexibility, integration and affordability while answering demand from wireless base station OEMs for ultra high computational performance for baseband applications. The MSC8156/5 DSP is an ideal choice for applications ranging from multi-standard wireless base stations, radar systems and other industrial applications. The device offers a total of 48 GMACs (Giga Multiply and Accumulate) per second of DSP core performance. The embedded MAPLE-B/B2L baseband accelerator offers up to 900 Msps of FFT throughput, up to 630 Msps DFT, Turbo decoding with rate de-matching and HARQ-combining capacity of up to 330 Mbps at eight iterations and up to 200 Mbps of Viterbi decoding with tail-biting and multi-iterations. It also provides CRC check or insertion with up to 10 Gbps throughput and turbo encoding with rate matching capabilities up to 900 Mbps.

Run-Time Mapping for Dependable Stream Processing
In this talk, an example application is used as illustration of how to maintain a dependable system in combination with additional flexibility compared to conventional, real-time systems. The example used is an application for beamform- ing, which is a signal processing technique for directional signal transmission or reception, using a fixed array of antennas that can be steered in the software domain.
Safety critical systems often have built-in fault detection mechanisms. The beamforming application also devotes a percentage of time to processing a known dataset to verify the correctness of the system. Tough requirements on the availability of the system demand a short repair time. Therefore, switch- ing hardware components manually is not an option. If, upon detection of a faulty system, the faulty hardware can be localized and isolated, the system might continue its operation while circumventing the hardware issue.
To be able to integrate this level of dependability management, some testing infrastructure is built into the platform. These hardware dependibility features are used by a the run-time resource manager to determine faulty components in the system. When the system is still workable, the resource manager allocates a new set of resources for the beamforming application, such that it may continue its operation. The degree to which such a system may repair itself highly depends on the flexibility exposed by the applications running on the system. A static resource assignment, often performed at design-time, limits this flexibility. This tutorial shows how to model an application to allow dynamic resource management. We give some insight on the resource manager used to map an application to the available resources at run-time. Doing so, other applications running on the system may not be interfered with, maintaining the real-time guarantees given. This talk is illustrated by a software demonstration of the sketched scenario with the beamforming application.

Streaming Applications and Multi-Core Architectures
In this presentation we focus on streaming applications and reconfigurable multi-core architectures. System-on-Chip (SOC) design is gradually moving away from single processor solutions towards multiprocessor (MP) SOC architectures. When processing and memory is combined in these processors (locality of reference), tasks can be executed efficiently. An important class of applications where tasks run in parallel are streaming applications. Due to the available parallelism in streaming applications there is a good match with multi-core architectures. Examples of streaming applications are: wireless baseband processing (for HiperLAN/2, WiMax, DAB, DRM, and DVB), multimedia processing (e.g. MPEG, MP3 coding/decoding), medical image processing, color image processing, sensor processing (e.g. remote surveillance cameras) and phased array radar systems.
We present an iterative hierarchical approach to map an application onto a heterogeneous multiprocessor SoC architecture. The streaming applications are modeled as a set of communicating processes. The optimization objective is to minimize the energy consumption of the application, while still providing the required Quality of Service (e.g. throughput or latency). This approach is flexible, scalable and the performance is promising.