General
Conference
Lodging / Travel
|
Tutorial at SoC 2010
Tuesday, September 28, 2010
Dependable Hardware and Software for Embedded Multicore
Computing Platforms
Organized by CRISP FP7 project in cooperation with GETA graduate school.
Instructors:
- Jari Nurmi, TUT, Finland
- Gerard Rauwerda, Recore Systems, The Netherlands
- Eric Verhulst, Altreonic, Belgium
- Bart Vermeulen, NXP, The Netherlands
- Dmitry Lachover, Freescale Semiconductor, Israel
- Timon ter Braak, University of Twente, The Netherlands
- Gerard Smit, University of Twente, The Netherlands
Room Sonaatti, Tampere Hall, from 9:00 to 17:00
Agenda
9:00 Opening
9:05 Trends in Embedded Computing - Scalability, Reconfigurability and Dependability
J. Nurmi, Tampere University of Technology, FI
9:50 The CRISP Reconfigurable Many-Cores Architecture
G. Rauwerda, Recore Systems, NL
10:35 Coffee Break
11:00 Formal Development vs. Formal Verification - The OpenComRTOS Example
E. Verhulst, Altreonic, BE
11:45 CRISP Dependability Approach using Testing
B. Vermeulen, NXP, NL
12:30 Lunch
13:30 An Architecture for Scalable Concurrent Embedded Software
E. Verhulst, Altreonic, BE
14:15 Optimal Hardware/Software Partitioning on Multicore DSP
D. Lachover, Freescale Semiconductor, IL
15:00 Coffee Break
15:30 Run-Time Mapping for Dependable Stream Processing
T. ter Braak, University of Twente, NL
16:15 Streaming Applications and Multi-Core Architectures
G. Smit, University of Twente, NL
17:00 End of tutorial
Abstracts of the lectures
Trends in Embedded Computing - Scalability, Reconfigurability and Dependability
Embedded computing in system-on-chip (SoC) environment relies
increasingly on the use of multiple processors or processor cores on a
single chip. The number of processing elements is expected to grow
beyond one thousand in near future. In such a scenario, the
scalability of the hardware and software architecture becomes a highly
important feature of the computation platform.
Energy-efficiency is and will remain as one of the main concerns in
embedded system implementation. Completely software-programmable
manycore systems are not enough to meet all the requirements, but on
the other hand fixed hardware implementations are not flexible
enough. Reconfigurable architectures, especially coarse-grain
reconfigurable computing elements, are a viable alternative with
higher performance and lower power consumption compared to
software-programmable implementation. At the same time, they retain
enough of the flexibility as resources can be shared between a number
of tasks.
One of the consequences of technology scaling - one of the enablers of
scalable manycore systems - is that the chips become less reliable. A
part of the system may be faulty since the fabrication, or various
permanent or occasional errors or transient faults may occur when in
operation. Different schemes improving the dependability of the
systems need to be developed, to monitor, analyze, recover from
errors, and even repair faulty functionality on-chip at runtime.
The CRISP Reconfigurable Many-Cores Architecture
Recore Systems develops semiconductor IP solutions for reconfigurable
multi-core systems-on-chip (SoC) that are low power and low cost. The
presented reconfigurable technology is used in the CRISP project to
create a scalable reconfigurable many-core SoC. The many-core SoC
includes digital signal processing cores and distributed memory
resources:
- The Xentium® is a programmable high-performance DSP
processor core that is efficient and offers high-precision.
- The Memtium™ is a reconfigurable memory tile that enables distributed
local random access memory in a multi-core architecture.
The presented reconfigurable SoC consists of programmable fixed-point
digital signal processing cores that are connected by a NoC. The NoC
provides the bandwidth and flexibility that is required for streaming
DSP applications. The communication bandwidth in the NoC scales with
the number of cores. The NoC ensures predictable performance due to
its point-to-point connections, in contrast to the unpredictability of
a shared bus. Moreover, the presented proposed multi-core architecture
provides a solution for true scalability by extending the NoC across
the chip boundaries.
The concepts and ideas of the CRISP many-core architecture are
presented and an outlook to future activities will be given.
Formal Development vs. Formal Verification - The OpenComRTOS Example
Formal verification is often put forward as the way to guarantee proven
correctness of e.g. a digital design or written software. However, this
assumes that the architectural design was flawless and it doesn't provide
much information on the structural properties of the design. Using formal
methods from the early beginning can however result in much cleaner, more
efficient and hence also more easy to verify software. This will be
illustrated by the formal development of OpenComRTOS, resulting in e.g. a
code size reduction by a factor 5 to 10. It also showed how formal methods
can help by marrying formal methods with creative thinking.
CRISP Dependability Approach using Testing
Advanced CMOS technologies allow present-day systems- on-chip (SoCs)
to contain multiple programmable processor cores, and dedicated
peripherals. Besides hardware functionality, they also contain a
growing amount of embedded software. However, with every new
generation of CMOS technology (i.e. 90 nm and beyond), electronic
circuits become more susceptible to manufacturing defects, which, when
left unaddressed, has a significant, negative impact on the yield and
reliability of manufactured chips. Furthermore, SoCs are increasingly
used for safety-critical applications.
In this talk, we present the CRISP approach to improving dependability
and yield of deep-submicron chips using new techniques for static and
dynamic detection and localization of faults and (dynamically)
circumventing faulty hardware. For this, we first introduce basic
concepts and taxonomy of dependable systems, and explain important
attributes such as reliability, maintainability, and
availability. Subsequently, we derive the dependability requirements
of our target application in terms of its basic functions, and its
relevant hardware elements. We then present in detail the
Design-for-Dependability measures that have been taken in the
Reconfigurable Fabric Device (RFD) design to improve its system
dependability. Specifically we focus on the functionality and design
of the on-chip Dependability Manager and the dependability wrappers
around the RFD processor tiles. Crucial in our approach is the
interoperability with the on-chip network-on-chip, and the off-chip
run-time mapping software. We will conclude with experimental results
of the validation of this dependability functionality.
An Architecture for Scalable Concurrent Embedded Software
If communication is the bottleneck, why develop the scheduler first? In this
presentation we show how distributed or many-core realtime embedded systems
are dependent on a realtime capable communication subsystem. As this
subsystem is often a shared resource a natural architecture is based on
concurrency and packet based communication. This idea is reflected in the
formally developed OpenComRTOS resulting in unprecedented scalability for
embedded applications.
Optimal Hardware/Software Partitioning on Multicore DSP
To meet growing demand for advanced 3G and 4G services, wireless
infrastructure equipment manufacturers increasingly require devices
that offer exceptional performance and flexibility. Multi-standard
devices are needed to fully support Base Band Physical Layer
requirements for the target markets of WiMAX, WCDMA/HSPA, 3GPP-LTE,
TDD-LTE and TD-SCDMA base stations. To enable these technologies, the
device has to provide a low latency and high throughput communication
solution at an affordable price.
In addition a balance of high
performance, low power processing with sufficient programmability is
needed. In this presentation we explore several architectural
alternatives and arrive at optimal HW-SW partitioning that meet these
requirements. We will demonstrate this partitioning in the highly
integrated Freescale's MSC8156/5 DSP device. The MSC8156/5 DSP is a
six-core device based on SC3850 StarCore DSP core technology and
delivers flexibility, integration and affordability while answering
demand from wireless base station OEMs for ultra high computational
performance for baseband applications. The MSC8156/5 DSP is an ideal
choice for applications ranging from multi-standard wireless base
stations, radar systems and other industrial applications. The device
offers a total of 48 GMACs (Giga Multiply and Accumulate) per second
of DSP core performance. The embedded MAPLE-B/B2L baseband accelerator
offers up to 900 Msps of FFT throughput, up to 630 Msps DFT, Turbo
decoding with rate de-matching and HARQ-combining capacity of up to
330 Mbps at eight iterations and up to 200 Mbps of Viterbi decoding
with tail-biting and multi-iterations. It also provides CRC check or
insertion with up to 10 Gbps throughput and turbo encoding with rate
matching capabilities up to 900 Mbps.
Run-Time Mapping for Dependable Stream Processing
In this talk, an example application is used as illustration of
how to maintain a dependable system in combination with additional
flexibility compared to conventional, real-time systems. The example
used is an application for beamform- ing, which is a signal processing
technique for directional signal transmission or reception, using a
fixed array of antennas that can be steered in the software domain.
Safety critical systems often have built-in fault detection
mechanisms. The beamforming application also devotes a percentage of
time to processing a known dataset to verify the correctness of the
system. Tough requirements on the availability of the system demand a
short repair time. Therefore, switch- ing hardware components manually
is not an option. If, upon detection of a faulty system, the faulty
hardware can be localized and isolated, the system might continue its
operation while circumventing the hardware issue.
To be able to
integrate this level of dependability management, some testing
infrastructure is built into the platform. These hardware
dependibility features are used by a the run-time resource manager to
determine faulty components in the system. When the system is still
workable, the resource manager allocates a new set of resources for
the beamforming application, such that it may continue its operation.
The degree to which such a system may repair itself highly depends on
the flexibility exposed by the applications running on the system. A
static resource assignment, often performed at design-time, limits
this flexibility. This tutorial shows how to model an application to
allow dynamic resource management. We give some insight on the
resource manager used to map an application to the available resources
at run-time. Doing so, other applications running on the system may
not be interfered with, maintaining the real-time guarantees
given. This talk is illustrated by a software demonstration of the
sketched scenario with the beamforming application.
Streaming Applications and Multi-Core Architectures
In this presentation we focus on streaming applications and
reconfigurable multi-core architectures. System-on-Chip (SOC) design
is gradually moving away from single processor solutions towards
multiprocessor (MP) SOC architectures. When processing and memory is
combined in these processors (locality of reference), tasks can be
executed efficiently. An important class of applications where tasks
run in parallel are streaming applications. Due to the available
parallelism in streaming applications there is a good match with
multi-core architectures. Examples of streaming applications are:
wireless baseband processing (for HiperLAN/2, WiMax, DAB, DRM, and
DVB), multimedia processing (e.g. MPEG, MP3 coding/decoding), medical
image processing, color image processing, sensor processing
(e.g. remote surveillance cameras) and phased array radar systems.
We present an iterative hierarchical approach to map an application
onto a heterogeneous multiprocessor SoC architecture. The streaming
applications are modeled as a set of communicating processes. The
optimization objective is to minimize the energy consumption of the
application, while still providing the required Quality of Service
(e.g. throughput or latency). This approach is flexible, scalable and
the performance is promising.
|