Torrent Vector Microprocessor Update


From the ICSI newsletter, Fall 1995, Volume 8, Number 2

The Realization Group's CNS project is entering a new and exciting phase as complete, fully functioning SPERT-II systems are deployed. SPERT-II is an SBus-compatible workstation accelerator built around the Torrent-0 (T0) vector microprocessor developed in a joint project between the Realization Group at ICSI and U. C. Berkeley. Researchers at ICSI and U.C. Berkeley immediately put boards to use training neural networks forspeech recognition, and several other projects are investigating the board's capabilities.

[T0 Wafer]
One of the first wafers of T0 chips.

The Spring 1995 issue of the ICSI Newsletter featured an article written just after the Torrent team had completed successful testing of three wafers of T0 chips. The next stage was to saw these wafers into individual die, then mount the die onto SPERT-II boards for further testing.

SPERT-II takes an unusual approach to packaging the T0 processor - there is no package! Instead, the processor die is attached to the printed circuit board (PCB) using chip-on-board (COB) technology. The bare silicon is glued directly to the PCB, and wire bonds are made between circuit pads on the die and copper traces on the PCB. While COB has traditionally been used for mounting inexpensive semiconductors, such as in wristwatches and electronic greeting cards, ICSI engineer James Beck saw the potential for providing a high performance chip interconnect at a very attractive price. "COB gives us low inductance connections, which is particularly important for a chip with a 128-bit data bus," said Beck. The mounted die is coated with a silicone gel to provide protection from atmospheric contaminants and covered with a metal lid to protect the fragile bond wires and shield the circuitry from ambient light.

[T0 COB mounting site]
T0 die mounted on SPERT-II circuit board using Chip-on-Board technology.

The use of COB for a die that is large and power-hungry with many off-chip connections stretches the limits of current PCB fabrication technology. One problem encountered was the tendency of the glue holding the PCB layers together to seep out over the T0 bond area during board fabrication. Another was the warping of fabricated boards during the repeated temperature cycling needed for the assembly process. It took the PCB manufacturers three attempts to successfully fabricate the first run of SPERT-II boards.

On May 1, 1995, eleven SPERT-II boards mounted with T0 die were returned to ICSI for a second round of testing to catch any defects introduced by the mounting process. This testing stage made use of a remarkable connector material developed by Fuji Polymer Industries. The material, Fujipoly, is a sheet plastic membrane that is only conductive through the short dimension. A test rig, custom engineered by Beck, provides pressure to squeeze a sandwich of a test connector, the Fujipoly and the SPERT-II board under test. The resulting connections allow the mounted die to be exercised using the special test circuitry built into each T0 without damaging the SPERT-II board.

[SPERT-II board in test jig]
SPERT-II board, bare except for T0 die under metal cover, about to be clamped into place in Fujipoly test rig.

After this second test stage, five boards were selected and sent to the assembly house to have the remaining components mounted. At the end of May, 1995, completed SPERT-II boards were back at ICSI ready for final testing. Until that point, all tests had been of the T0 die in isolation; it was time to test T0 talking to other components on the board, in particular the SRAM external memory and the Xilinx reconfigurable logic device that provides the interface to the host workstation.

To allow maximum flexibility, the SPERT-II board design has software programmable clock generators. The processor clock frequency and the timing of the phases within a memory access can be individually tweaked to allow different types of SRAM to be cycled at their maximum speed. The timing of communications between T0 and the Xilinx is also under software control to allow for adjustments due to delays within the reconfigurable part. Unfortunately, the clock generators are not completely independent and have limited resolution. Much of the first week of testing was spent carefully adjusting the clock parameters to find combinations that would work. ICSI software engineer David Johnson quickly put together a graphical tool to allow interactive manipulation of these various clocking parameters.

On June 9, the first program from external SRAM was successfully run on the SPERT-II board. On June 12, the first real application program was run: a complete neural net training of a speech net. The board was still only running at 15 MHz, whereas conservative timing analysis performed towards the end of the T0 design effort had predicted a maximum clock frequency of 33 MHz.

In subsequent days, as work proceeded on the Xilinx design and the settings of the clock generators, the attainable system clock speed rose dramatically. One of the SPERT-II boards had been built with smaller, faster SRAM parts especially to find the maximum processor speed. By the end of June, it was clear that the maximum T0 die speed had been found, a gratifying 46 MHz!

More impressively, during system test and bring up, and continuing through production use of the SPERT-II boards, not a single bug was found in the T0 processor design. This is a rare achievement for a die of this complexity, and pays tribute to the diligence of the T0 design team of Krste Asanovic, Brian Kingsbury and Bertrand Irissou.

Mid-August saw another important milestone: the first public presentation of T0 at the Hot Chips VII conference held at Stanford University. Hot Chips is renowned as a showcase for leading industry microprocessors, and T0 was presented alongside processors such as Sun's UltraSPARC, HP's PA-8000, and MIPS' R10000.

Today, the first SPERT-II systems are in daily production use at ICSI. The primary intended application for SPERT-II systems is error backpropagation training of neural networks for use in speech recognition. Performance measurements show speedups of as much as 20 times over extensively tuned code running on a Sparcstation-20/61 workstation, and as much as 5 times over a high-end IBM RS/6000-590 workstation. It is interesting to note that the processor cores on these two workstations respectively contain 4.5 and 33 times as many transistors as T0, and are clocked 1.5 to 2 times as fast. Planned improvements to the T0 code and SPERT-II I/O subsystem are expected to further increase performance on this code.

[SPERT-II board in Sparcstation]
SPERT-II board mounted inside a Sparcstation at ICSI.

There has been much interest in harnessing the capabilities of T0 to accelerate other application areas. Widening an existing ICSI collaboration, the PET algorithm for resilient video transmission, developed within ICSI's Theory and Networks Groups and described in the Fall 1994 issue of the ICSI Newsletter, is being implemented for T0 by U.C. Berkeley computer science graduate student John Hauser. Initial results indicate a 9-fold speedup over a workstation implementation. Arno Formella, an ICSI postdoc who visited in 1993/94 from University of Saarland, Germany, was an early T0 experimenter, porting pieces of the MPEG video decompression algorithm and observing speedups in the range 16-52 of a Sparcstation-2. U.C. Berkeley's computer science graduate student Todd Hodes is developing vectorized additive sound synthesis code for T0 that will accelerate a sound synthesis system developed by UCB's Center for New Music and Audio Technologies. Paolo Moretto, an ICSI postdoc visitor from Alenia Elsag Sistemi Navali, SSG, Italy, ported the ICSI RASTA adaptive speech filter code to T0. Chris Bregler, an ICSI alumnus who's now a UCB computer science graduate student, has been developing T0 image processing code to speed his own research in computer vision. Other projects in progress are mapping FFT algorithms and speech utterance decoding to T0. This range of applications is testimony to the flexibility of the Torrent architecture. "We're continually amazed by the range of applications for which we see substantial speedups over commercial workstations," comments chief architect Krste Asanovic.

There has been strong demand for SPERT-II systems, and a second production run of 22 SPERT-II systems is underway using the first batch of T0 die. These will be installed at eight locations across Europe and the USA, principally at sites collaborating in speech recognition research with the Realization Group.

To meet future demand, a shrink of the T0 design to a newer, submicron process technology is under consideration. The shrink would reduce the cost per die, and should also allow a significant boost in clock speed while reducing power consumption.

While the hardware side of the SPERT project winds down, the software support effort has only just begun. "A zero bug processor is a hard act to follow," bemoaned David Johnson as he frantically tested yet another T0 library routine.

Find out more about T0 and SPERT-II follow links from the T0 home page at the URL http://www.icsi.berkeley.edu/real/spert/t0-intro.html.


David Johnson <davidj@ICSI.Berkeley.EDU>
$Date: 2000/08/16 01:37:58 $