International Symposium on System-on-Chip
SoC | 1999 | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015



Lodging / Travel

Valid XHTML 1.1

Valid CSS!

Design Technologies for Heterogeneous Architectures


Jan Rabaey, UC Berkeley

In the future, multimedia terminals and network computers will need to be able to support multi-modal communications and display flexibility and adaptivity in the peripheral devices to accommodate for the changing conditions in communication bandwidth and data sources. For instance, different video decompression schemes can be used, graphics resolution and encoding scheme can change depending upon the available bandwidth and the application, a variety of security and encryption techniques can be used based on the type of data transmitted, and the radio-modem protocols depend upon the location (cellular GSM, PCS, micro- or pico-cellular). It is therefore believed necessary to base the design of such a terminal around programmable and reconfigurable components.

The goal of the PLEIADES project is to demonstrate that ultra-low power implementations of programmable components are attainable using a well-established and reusable low-power design methodology. The resulting devices will have an energy-efficiency that is orders of magnitude better than presently available solutions (though still not as high as the fully dedicated, application-specific approach). While general computing (including task-scheduling, interrupt handling, and overall control functionality) is best implemented on a general-purpose processor, this is presently done at a high energy cost per operation. Multimedia computation, on the other hand, has intrinsic properties that make it more amenable to high performance, low-energy implementation: it contains a large amount of inherent concurrency and is centered around a few, regular kernels of computation that are executed over and over again and can be optimized for energy efficiency. While an application-specific implementation would be the optimal approach to exploit these properties for power-reduction, significant power-savings approaching the dedicated approach can be obtained by implementing the multi-media kernels on heterogeneous, dedicated co-processors, that are optimized for the task. This leads to the concept of the domain-specific processors, as advocated in this presentation.

Reconfigurable Digital Communication Systems On A Chip

Ravi Subramanian, MorphICs Inc.

It happened in every era - when the communication pipes went broadband, there were major discontinuities in the way microelectronics was used to deliver high-performance and cost-competitive signal processing platforms. In this talk, we will take a look at how, as designers of communications systems on silicon, we have to deal with systems driven by three fast-changing parameters - the exploding algorithmic complexity in 2.5 and 3G wireless systems, the semiconductor industry marching to Moore's Law, and the increasing gap between raw MIPS and efficient MIPS in a broadband mobile communications environment.

We begin by examining the the gap we call "Shannon vs Moore," and the battles being fought in the world of instruction-set processors, programmable logic, and ASICs. From these battles, we illustrate the dominant design challenges in four regimes of digital signal processing for communications that are beginning to dominate the design complexity equation. At the heart of this complexity lies a critical question: how to deliver efficient application-specific horsepower with a flexible programmer's model. The answer, of course, lies in the definition of the words "efficient" and "flexible." We close by showing that there is a spectrum of techniques in the world of reconfigurability that offer compelling price/performance solutions.

Modeling of Embedded Processors with LISA for Architecture Exploration and System-Level Simulation

Heinrich Meyr and Stefan Pees, RWTH Aachen

Designers of today's telecommunication products such as cellular phones, modems, and net working devices are facing a rapidly growing system complexity. Driven by the advances in semiconductor technology and the need for new applications, the amount of system functionality that is realized on a single chip is increasing enormously. This leads to a shift towards heterogeneous SOC designs, combining ASIC hardware with programmable components (DSPs, microcontrollers) on a single die. (See the companion lecture of Jan Rabaey on "Pleiades")

To eliminate the overhead introduced by the usage of "general purpose" DSPs in terms of power consumption and die size, companies increasingly start to build their own application specific programmable devices (ASIP). These architecture need customized tools like processor simulator, assembler, HLL-compiler and HW/SW-cosimulation interface.

The machine description LISA (language for instruction set architecture) for the generation of bit- and cycle accurate models of processors was developed at ISS (Institute for Integrated Signal Processing Systems). Based on a behavioral operation description, the architectural details and pipeline operations of modern processors can be covered. Beyond the behavioral model, LISA descriptions include other architecture related information like the instruction set. The information provided by LISA models enables automatic generation of simulators, assemblers, linkers and HW/SW-cosimulation interfaces. When designing a new architecture, given the application, critical parts of the design have to be discovered and eliminated. This language based approach gives the designer the flexibility to build performance models and insert profiling information at any point and on any issue desired in the architecture -- e.g. estimated power consumption can be attributed to functional units or the usage of processor resources by certain parts of the application can be monitored. Moreover, the designer can easily react on changes in the architecture by just changing the LISA description.

LISA machine descriptions allow to specify models of programmable architectures on various abstraction levels, meeting the needs of the respective application domain. This enables the designer to trade simulation speed for accuracy -- cycle/phase accuracy for joint simulation of hardware and software components via cosimulation interface on the one hand and instruction level accuracy with a limited visibility of the internal states on the other hand. Furthermore, simulation speed of the generated simulator is essential when designing a new architecture, verifying its implementation and performing profiling on applications. We therefore apply the compiled simulation technique, which was developed at ISS and which increases the performance of the generated simulators by a factor of 50-150 compared to commercial simulators currently on the market. We will present a cycle-accurate model of the Texas Instruments TMS320C6201 DSP and show benchmarks of the generated tools.

Cycle and Phase Accurate DSP Modeling and Integration for HW/SW Co-Verification of SoC

Vojin Zivojnovic, Axys Design Automation Inc.

In the talk the practical experience in the modeling and integration of cycle/phase-accurate instruction set architecture (ISA) models of digital signal processors (DSPs) with other hardware and software components is presented. A common approach to the modeling of processors for HW/SW co-verification relies on instruction-accurate ISA models combined (i.e. wrapped) with the bus interface models (BIM) that generate the clock/phase-accurate timing at the component's interface pins. However, for DSPs and new microprocessors with complex architectural features this approach is from our perspective not acceptable. The additional extensive modeling of the pipeline and other architectural details in the BIM would force us to develop two detailed processor models with a complex BIM API between them. AXYS has proposed an alternative approach in which the processor ISAs themselves are modeled in a full cycle/phase-accurate fashion. The bus interface model is then reduced to just modeling the connection to the pins. Our models have been integrated into a number of cycle-based and event-driven system simulation environments. We present such experience in incorporating these models into a VHDL environment. The accuracy has been verified cycle-by-cycle against the gate/RTL level models. Multi-processor debugging and observability into the precise cycle-accurate processor state is provided. The use of co-verification models in place of the RTL resulted in system speedups up to 10 times, with the cycle-accurate ISA models themselves reaching performances of up to 123K cycles/sec.

Retargetable Performance Estimation using Parameterized Processor Architecture Model

Naji Ghazal, UC Berkeley

Given the recent wave of innovation and diversification in digital signal processor (DSP) architecture, the need for quickly evaluating the true potential of considered architectural choices for a given application has been rising. We propose a new scheme, called Retargetable Estimation, that involves analysis of a high-level description of a DSP application, with aggressive optimization search, to provide a performance estimate of its optimal implementation on the architectures considered. With this scheme, we present a new parameterized architecture model that allows quick retargeting to a wide range of architectural choices, and that emphasizes capturing an architecture's salient optimizing features.

With the growing number of choices for DSP architecture, it has become increasingly difficult for designers to determine the appropriate architecture for the intended application and design constraints. Unfortunately, even for a single DSP processor, there has been little development support to reach optimal implementation from a behavioral description of an application. Due to the irregularity in the architectures, even the latest DSP compilers are not sufficient, without the use of the designer's in-depth knowledge of the architecture. New retargetable DSP compilers suffer from the same lack of optimizing technology. So it is challenging and time-consuming to explore the true potential of different architectural choices.

Most DSP applications, however, have special characteristics (predictive data access patterns, execution time locality, etc.) that potentially allow for valuable estimation of run-time behavior on a given architecture, from behavioral specification. With such estimation, optimizing uses of a processor's special architectural features, and hence the true potential of architectural choices, can be exposed to the designer without the need for in-depth expertise in programming the processor.

From Application Specifications to System in Silicon

Oz Levia, Improv Systems, Inc.

SOC is a reality. For many design teams this reality is a mixed blessing. While silicon capacity allows and demand integration of an ever larger portion of the over all system onto a single die, the task of designing, verifying, and maintaining such a design is increasingly taxing and complex.

In this discussion we present a new approach to embedded system design for consumer electronics: the Programmable System Architecture (PSATM). Using the PSA, system application design will be done using VirtualIPTM objects and a framework for composing a complete application from a collection and hierarchy of VirtualIP objects. Once the system application is specified it can be verified and mapped, using an advanced compiler, onto a system chip platform. The PSA is a pre-made architecture that can be configured before manufacturing and programmed post manufacturing. This approach allows system design to focus on the system specifications without concern for HW / SW co-design and partitioning. System applications are written in Java and can be verified using a Java native environment. Once the application is mapped to the PSA, the result is an application specific IC. The PSA approach has the advantages of very rapid time to market, high performance and low design cost and risk.

The focus of this discussion is to illustrate a system application designer's point of view and how the PSA is used in application development of embedded consumer IC.

Transport Triggered Architectures (TTA)

Henk Corporaal, Delft Technical University

Embedded systems are surrounding us everywhere. Processors for these systems often have specific requirements like low cost, high performance and low power consumption. Off-the-shelve processors can not always fulfil these requirements simultaneously. Using a templated processor architecture, which can be tuned for a certain application (domain) offers a solution. This presentation highlights such a templated architecture, called transport triggered architecture (TTA). This architecture combines extreme flexibility, modularity, scalability, while being simple (and therefore easy to generate automatically), while offering a good cost-performance ratio.

TTAs resemble VLIW (very long instruction word) architectures, but their programming mod el is completely different; instead of specifying the operations, the transports between function units and to the register files are programmed explicitly. As a side effect of these transports the operations occur.

Giving the compiler control about all the internal data transports opens a number of new types of (transport level) optimizations. These optimizations are exploited by our compiler. They result in a register traffic reduction of at least 50%. Therefore, the number of register ports can be reduced substantially. Furthermore, the connectivity between function units (needed for by passing results) can be reduced with up to 80%, depending on the type of application.

TTAs have been used in several commercial applications; some of them were automatically generated using our MOVE framework set of tools. The presentation will introduce the TTA concept, explain how we perform architecture exploration using our MOVE framework, and highlight some of the TTA advantages with experimental results. Furthermore we pay attention to recent developments within our MOVE project.