Enabling Technologies for Reconfigurable Computing part 1: Reconfigurable Computing (RC)

- Exploding design cost and shrinking product life cycles of ASICs create a demand on RA usage for product longevity.
- Performance is only one part of the story. The time has come fully exploit their flexibility to support turn-around times of minutes instead of months for real time in-system debugging, profiling, verification, tuning, field-maintenance, and field-upgrades.
- A new “soft machine” paradigm and language framework is available for novel compilation techniques to cope with the new market structures transferring synthesis from vendor to customer.

Reconfigurable Computing (RC)

- The blank sheet of paper: FPGA
- Auto design of a basic system: Tensilica
- Standardized, committee designed components*, cells, and custom IP
- Standard components including more application specific processors **, IP add-ons and custom
- One chip does it all: SMOP ***)

Product strategy vendor
FPGA "sea of uncommitted gate arrays" Xilinx, Altera
compile a system unique processor for every application Tensilica
systolic array many pipelined or parallel processors + custom
DSP, VLIW special purpose processor cores + custom TI
processor + RAM, ASICS general purpose cores, specialized by I/O, etc. IBM, Intel,
universal micro multiprocessor array, programmable I/O Cradle

A Decade of Research in Reconfigurable Computing

- Due to the achievements of numerous Research Projects throughout the 90ies the Breakthrough in Commercialization has started and already a quite comprehensive Methodology is available.
- Dear Colleague, the RC Scene welcomes your contributions to improve it and to push for Inclusion in contemporary CS&E Curricula.
- It is one of the Goals of this Talk to stimulate you by Highlights and introducing some Key Issues.

Reconfigurable Computing Architectures and Methodologies for System-on-Chip;

Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.
Reiner Hartenstein, University of Kaiserslautern, Germany
http://hartenstein.de

Enabling Technologies for System-on-Chip Development,
November 19-20, 2001, Tampere, Finland
http://www.cs.tut.fi/soc/

Reconfigurable Computing Architectures and Methodologies for System-on-Chip;
Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.

no more a strange niche area

• was "Hardware" design for a strange platform
  - CAD, but no Compilation
• Emerging awareness:
  - New mind set
  - New curricular embedding
• coming Dichotomie of CS
  - SW <-> CW
  - HW <-> FW
  - computing in time <-> computing in space

flexibility/versatility trade-off

FPGA

Kress

Array

Xplorer

dedicated

specific

optimization

specific

efficiency

flexibility

trade-off

Reconfigurable Computing Architectures and Methodologies for System-on-Chip;
Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.

RAs are heading for Mainstream
... become indispensable for SoC products?

ASPP application-specific programmable product is:
  • Application specific standard product and:
  • embedded programmable logic

CSoc configurable SoC is:
  • an industry standard microprocessor
  • embedded reconfigurable array
  • memory, dedicated system bus

Soap Chip System on a programmable Chip

Reconfigurable Logic going Mainstream

• Fine grain: FPGAs killing the ASIC market
• Fastest growing segment of semiconductor market
• Substantially improved design flow and libraries
• Coarse grain: several startups
• Comprehensive Methodology
• Please, Lobby for New Curricula.
• One of the goals of this talk: to motivate You by Key Issues and Visionary Highlights.

Designer-oriented Innovation stalled?

• EDA industry: about 7 bio $
• leverages > 200 bio $ semiconductor industry
• FPGAs (7 bio $) fastest growing segment
• EDA industry constantly redefining itself
  "except logic synthesis nor really significant innovation in the past decade"
• CAD developers can’t deliver their ideas effectively
• CAD developers personally don’t appreciate the real problems facing designers

EDA the main bottleneck

“Simulator-of-The-Year” Phenomenon

Digital Simulators in Use

? ? ? ?

early 1990's

early 1990's
Enabling Technologies for System-on-Chip Development,
November 19-20, 2001, Tampere, Finland
http://www.cs.tut.fi/soc/

Reconfigurable Computing Architectures
and Methodologies for System-on-Chip;
Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.

"...Adoption of VHDL was one of the biggest mistakes in the history of design automation, causing users and EDA vendors to waste hundreds of millions of dollars..." — Joe Costello, Cadence Design Systems, 1995
The Impact of Makimoto’s Paradigm Shifts

Paradigm Shift

- History
- Paradigm Shift
- Coarse Grain: why?
- Coarse Grain Architectures
- Reconfiguration Architecture

Sequential vs. structural RAM

Changing Models of Computing

The Microprocessor is a Methuselah

The Microprocessor is a Methuselah

9 technology generations...

- 1st: 4004
- 2nd: 8008
- 3rd: 8086
- 4th: 80286
- 5th: 80386
- 6th: 80486
- 7th: Pentium (P5)
- 8th: Pentium Pro / Pentium II
- 9th: Pentium III

... Decline of Wintel Business Model

Decline of Wintel Business Model

Billion Subscribers worldwide

Million Devices delivered in the U.S.

... Decline of Wintel Business Model

Billion Subscribers worldwide

Million Devices delivered in the U.S.
Basics of Binding Time

time of “Instruction Fetch”
run time
loading time
compile time

Reconfigurable Computing

Binding Time vs. Computing Domain

- time domain
  - procedural
  - at run time
  - at compile time

- space domain
  - structural
  - before fabrication
  - after fabrication

- hybrid
  - time & space
  - at run time
  - at compile time

Dataquest Predicts Programmability to be Predominant in SOC

- Application-specific programmable products (ASPPs) will be the next best thing in semiconductor technology
- With programmability as a standard feature, ASPPs will be predominant system-on-a-chip products in five years
  - Jordi Selburn, principal analyst, ASICS and system-level integration, Dataquest Inc.'s Semiconductors Group

Applications

- next generations' wireless
- network processors
- many other areas

Applications (2)

- Image Processing:
  - for smart car (collision avoidance, others ...)
  - Smart traffic pilots, robotics, fast material inspection,
  - smart stub finders, motion detection (MPEG-4, ...)
- Signal Processing, Speech Processing, Software Radio,
- Correlation, Encryption, Comm. Switching / Protocols,
- Innovative consumer electronics:
  - super smart cards, smart handles, wearable,
  - portable, set-top, laptop, desktop, embedded, ...
- many others, ...

Dataquest Predicates Programmability to be Predominant in SOC

- Jordan Selburn, principal analyst, ASICs and system-level integration, Dataquest Inc.'s Semiconductors Group

Applications

- new cellular standard: up to 2 Mbit/sec: new CDMA standard: > 500 MIPS needed just for RF receiver part
- wide variety of end-user's devices: smart handies, palm pilots, laptops, games, camcorder-likes, the internet car, many new types of devices to come ...
- increasing wide variety of services available from network provider: download just what a particular customer is subscribed to
- expert group (Vissers): > 20% of it will be accelerator code*
Why coarse grain?

Shannon's Law

It's a Paradigm Shift!

It's a General Paradigm Shift!

Fine-grained vs. coarse-grained

Reconfigurable Computing Architectures and Methodologies for System-on-Chip:
Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.
Reiner Hartenstein, University of Kaiserslautern, Germany
http://hartenstein.de

Reconfigurable Computing Architectures and Methodologies for System-on-Chip:
November 19-20, 2001, Tampere, Finland

---

**Reconfigurability Overhead**

- Area used by application
- Partly for configuration code storage
- Resources needed for reconfigurability
- "Hidden RAM" not shown

**Principle of a Typical FPGA**

---

**Routing Overhead in FPGAs**

Routing Congestion (DeHon): often 50% or less of CLBs used

- All transistors at each switching point
- ~1000 transistors at each switch point

---

**Why Coarse Grain instead of FPGA?**

- Physical vs. logical
- Transistors per area
- FPGA vs. HICP
- Reduced reconfigurability overhead by up to ~100x
- Much faster loading
- A lot of more benefits

---

**Configurable Computing Systems**

- Combine programmable sequential processor with Flexware (structurally programmable “hard”ware)
- Capitalize on the strength of both, Flexware and software.
- Early 60ies: Estrin (UCLA): enabling technology not available
- 90ies: Significant increase of research activities (DARPA...)
- FPGAs: Not the enabling technology: hardware skills needed
- Verilog or VHDL based systems often result in poor performance

---

**Extremely high efficiency**

1. Avoiding address computation overhead
2. Avoiding instruction fetch and interpretation overhead
3. High parallelism, massively multiple deep pipelines
4. Much less configuration memory
5. No routing areas to configure functions from CLBs

---

Enabling Technologies for System-on-Chip Development,
November 19-20, 2001, Tampere, Finland
http://www.cs.tut.fi/soc/
Platforms available

- Soft Data Path Arrays
  - KressArray
  - Xtreme (PACT)
  - ACM (Quicksilver Tech)
  - CHESS Array (Elixent)
  - others

- Compilation techniques feasibility studies:
  - Partitioning Co-Compiler
  - Design Space Explorer
  - others

Also as an autonomous Machine

- New Machine Paradigm (Xputer)
- is the counterpart of the so-called van Neumann paradigm
  - CONS: confuses customers (paradigm switch: the brain hurts)
  - PROS: strong guidance of EDA tool development
  - more effective hardware/software API's
  - compilation techniques similar to traditional compilation
  - better Application Development Tools accepting C or Java
- easy to teach: simple machine principles
  - scan patterns (data counter) similar to control flow (program counter)
  - general model of hardware/software co-design
  - fascination for freak effect: opening up a new R&D discipline

Coarse Grain Architectures

- History
- Paradigm Shift
- Coarse Grain: why?
- **Coarse Grain Architectures**
- Reconfiguration Architecture

Company

- Architecture
- Business Model
- Markets

Adaptive Silicon
- Not disclosed
- Sell Cores
- Embedded DSP

Chameleon Systems
- 32 bit datapath array
- Sell Chips
- Networking

Malleable
- Not disclosed
- Sell Chips
- Voice over IP

Morphics
- Not disclosed
- Sell Cores
- Wireless Commun.

Silicon Spice
- Not disclosed
- Sell Solutions
- Networking

Syntel
- 8-bit serial systolic array
- Sell Cores
- Signal Conditioning

TRISERAD
- System on Chip
- Sell Chips
- Embedded Systems

Some Players in Silicon Valley and ….

- ...11 Players
- Clock: Xilinx's largest Customer

Commercial rDPAs

- Xtreme Processor Platform (XPP) family of IP cores, high-speed data-stream-capable, scalable, reconfigurable clusters of arrays of 32-bit DPUs with embedded memories, and high-speed I/O ports -
  - Application development support software featuring a flow graph-style algorithm mapping language - to minimize training requirements.
  - XPP's fabrics, featuring automatic DataFlow synchronization and flagged Event Network to dynamically configure the execution flow.
  - Supports dynamic RTR: hierarchical configuration managers free the designer from chip-level details and ensure that configurations are independently loaded in exactly the intended order.
  - Automatic event-based task swapping along with data streams: released resources automatically reconfigured immediately

PACT Corp

- Reconfigurable Computing Architectures
- and Methodologies for System-on-Chip
Enabling Technologies for System-on-Chip Development,
November 19-20, 2001, Tampere, Finland
http://www.cs.tut.fi/soc/

Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.

Reconfigurable Computing Architectures and Methodologies for System-on-Chip;
Reiner Hartenstein, University of Kaiserslautern, Germany
http://hartenstein.de
**Super Pipe Networks**

The key is *mapping*, rather than architecture.

<table>
<thead>
<tr>
<th>array</th>
<th>applications</th>
<th>pipeline properties</th>
<th>mapping</th>
<th>scheduling (data stream formation)</th>
</tr>
</thead>
<tbody>
<tr>
<td>systolic array</td>
<td>regular data dependencies only</td>
<td>linear only, uniform only</td>
<td>linear projection or algebraic synthesis</td>
<td></td>
</tr>
<tr>
<td>super-systolic RA</td>
<td>no restrictions</td>
<td>simulated annealing or P&amp;R algorithm (e.g., force-directed) scheduling algorithm</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

© 2001, reiner@hartenstein.de

---

**Communication Resource Requirements**

... often Functional Resources are not the Throughput Bottleneck

In some Application Areas, such as e.g. Wireless Communication, Reconfigurable Computing Arrays need extraordinarily rich and powerful Communication Resources.

The Solution: Generators for Domain-specific RA Platforms

© 2001, reiner@hartenstein.de

---

**SNN filter KressArray Mapping Example**

http://kressarray.de

array size: 10 x 16 = 160 rDPUs

© 2001, reiner@hartenstein.de

---

**Xplorer Plot: SNN Filter Example**

http://kressarray.de

© 2001, reiner@hartenstein.de

---

**Super Pipe Networks**

The key is *mapping*, rather than architecture.

<table>
<thead>
<tr>
<th>array</th>
<th>applications</th>
<th>pipeline properties</th>
<th>mapping</th>
<th>scheduling (data stream formation)</th>
</tr>
</thead>
<tbody>
<tr>
<td>systolic array</td>
<td>regular data dependencies only</td>
<td>linear only, uniform only</td>
<td>linear projection or algebraic synthesis</td>
<td></td>
</tr>
<tr>
<td>super-systolic RA</td>
<td>no restrictions</td>
<td>simulated annealing or P&amp;R algorithm (e.g., force-directed) scheduling algorithm</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

© 2001, reiner@hartenstein.de

---

**KressArray: try out yourself!**

- You may experiment yourself
- You may use it over the internet
- Map an application onto a KressArray
- Start with a simple example
- Visit http://kressarray.de
- Click the link to Xplorer
- ... does not run on internet explorer ...
- ... since Bill Gates does not like Java 😊

http://kressarray.de

© 2001, reiner@hartenstein.de

---

**Reconfigurable Computing Architectures and Methodologies for System-on-Chip:**

Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.
Michael Herz

Dissertation
Michael Herz: *Agilent, Sindelfingen*
- ... on mapping parallel memory architectures for stream-based arrays onto KessArrays
- ... also transformation of storage schemes to optimize memory bandwidth
- (MoM scan pattern transformations)

Ulrich Nageldinger

Dissertation
Ulrich Nageldinger: *infineon technologies, Munich*
- ... on mapping applications onto KessArrays
- ... simultaneous routing and placement by simulated annealing
- Supporting a huge family of KessArrays
- fuzzy logic improvement proposal generator
- profiling
- design space exploration

Rainer Kress

Dissertation
Rainer Kress: *infineon technologies, Munich*
- ... on mapping applications onto his* KessArray
- DPSS datapath synthesis system
- Including a data scheduler
- (data stream scheduler)
- Generalization of the Systolic Array
- (KressArray is a super systolic array)
- 32 bit design via Eurochip support

Jürgen Becker

Dissertation
Jürgen Becker: *Professor at Univ. Karlsruhe*
- ... Automatically partitioning Co-*compiler*
- (configware / software co-*compilation*)
- Resource-parameter-driven re-targettable
- Profiler-driven optimization
- Accepts HLL „ALE-X“ (extended C subset)
- (subset: pointers not supported)

Karin Schmidt

Dissertation
Karin Schmidt: *DaimlerChrysler Research*
- Compilation Techniques for Xputers
- modified loop transformations
- Modified parts of implementation used for Jürgen Becker's Ph. D. thesis

Reconfigurable Computing Architectures and Methodologies for System-on-Chip; Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.
• RISC processor and an array of 108 arithmetic processing units. Each of those 32-bit processing cores runs at 125 MHz.
• The CS2112 is the industry’s first Reconfigurable Communications Processor (RCP), a streaming data processor.
• The vendor claims a performance of 20 billion 16-bit operations per second, and 2.4 billion 16-bit multiply-accumulates per second - and 1.6 GBytes/sec for first programmable I/O (PIO) banks.
• It also has a PCI interface.
• Tool suite G-SIDE for developing, verifying and optimizing.

<table>
<thead>
<tr>
<th>Source</th>
<th>Project</th>
<th>Bits Granularity</th>
</tr>
</thead>
<tbody>
<tr>
<td>KressArray</td>
<td>Variable</td>
<td>U. Kaiserslautern</td>
</tr>
<tr>
<td>Garp</td>
<td>UC Berkeley</td>
<td>16 &amp; 32</td>
</tr>
<tr>
<td>CHESS</td>
<td>Hewlett Packard</td>
<td>16 &amp; analog</td>
</tr>
<tr>
<td>Rain</td>
<td>M.I.T.</td>
<td>8</td>
</tr>
<tr>
<td>Copy</td>
<td>Virginia Tech</td>
<td>16 &amp; 16</td>
</tr>
<tr>
<td>DREAM</td>
<td>UC Irvine</td>
<td>16 &amp; 8</td>
</tr>
<tr>
<td>REMARC</td>
<td>Biokar</td>
<td>16</td>
</tr>
<tr>
<td>Research</td>
<td>Sikun Space</td>
<td>16 &amp; analog</td>
</tr>
<tr>
<td>CALSTO</td>
<td>Brunels</td>
<td>16 &amp; 16</td>
</tr>
<tr>
<td>RECA</td>
<td>Multiple</td>
<td>16 &amp; 16</td>
</tr>
<tr>
<td>CHESS,16 &amp; 32</td>
<td>Salomon</td>
<td>16 &amp; 32</td>
</tr>
<tr>
<td>GPP</td>
<td>Tandem</td>
<td>8</td>
</tr>
<tr>
<td>PADDI</td>
<td>UC Berkeley</td>
<td>16</td>
</tr>
<tr>
<td>PADDI-2</td>
<td>UC Berkeley</td>
<td>16</td>
</tr>
</tbody>
</table>

Primarily Mesh-based ....

Crossbar-based Architectures

1990: UC Berkeley (Jan Rabaey)
1993: PADDI-II (Jan Rabaey)
1997: Pleiades (mesh & crossbar)
32 bit

Reconfigurable Computing Architectures and Methodologies for System-on-Chip; Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.
Enabling Technologies for System-on-Chip Development,
November 19-20, 2001, Tampere, Finland
http://www.cs.tut.fi/soc/

Reiner Hartenstein, University of Kaiserslautern, Germany
http://hartenstein.de

© 2001, reiner@hartenstein.de

© 2001, reiner@hartenstein.de

© 2001, reiner@hartenstein.de

© 2001, reiner@hartenstein.de

© 2001, reiner@hartenstein.de

© 2001, reiner@hartenstein.de
Enabling Technologies for System-on-Chip Development,
November 19-20, 2001, Tampere, Finland
http://www.cs.tut.fi/soc/

Reiner Hartenstein, University of Kaiserslautern, Germany
http://hartenstein.de

Enabling Technologies for System-on-Chip Development,
November 19-20, 2001, Tampere, Finland
http://www.cs.tut.fi/soc/

Reiner Hartenstein, University of Kaiserslautern, Germany
http://hartenstein.de

Enabling Technologies for System-on-Chip Development,
November 19-20, 2001, Tampere, Finland
http://www.cs.tut.fi/soc/

Reiner Hartenstein, University of Kaiserslautern, Germany
http://hartenstein.de