Torrent Vector Microprocessor Makes Debut

From the ICSI newsletter, Spring 1995, Volume 8, Number 1

Early on the morning of April 14, 1995, a new vector microprocessor architecture, developed at ICSI as a joint project with U. C. Berkeley, achieved a major milestone. At 12:01 a.m., the first chips implementing this new architecture, called Torrent, executed a series of tests 100% successfully. This event marked the end of the silicon design process for the project and a shift in emphasis to systems development and deployment.

[T0 die]
Micrograph of a T0 die.

ICSI's Realization Group will provide boards and systems using these Torrent chips to researchers at ICSI and elsewhere working with compute-intensive algorithms. Of particular interest to the Realization Group speech researchers and their collaborators are the speech recognition training algorithms which use artificial neural networks.

The design of the Torrent architecture began in late 1992 when the idea of combining a vector execution unit with a RISC processor was first proposed by U.C. Berkeley Ph.D. candidate Krste Asanovic, who has worked with the Realization Group since 1989. From his earlier studies of how to customize processors to execute neural network algorithms efficiently, Asanovic concluded that a narrow "application-specific" approach would severely limit the flexibility that researchers typically require for their work. On the other hand, the general-purpose architecture of a conventional RISC processor would not offer a performance advantage over commercial workstations. Asanovic's solution, to incorporate both general and specific processing elements on a single silicon die, is the key feature of the Torrent design.

The first Torrent chip design, known internally at ICSI as T0 (T-Zero), implements this architecture in 1 micron CMOS with two metal layers on a 17 x 17 mm die. Included on the die are a MIPS-compatible scalar RISC processor, a vector coprocessor customized for compute-intensive tasks and a 1KB instruction cache. The vector unit features a vector register file containing 16 vector registers of 32 elements each, one vector memory pipeline, and two vector arithmetic pipelines. Each pipeline contains eight parallel datapaths, so it can produce up to eight results per cycle. Since the Torrent architecture specification is envisioned as a blueprint for a family of code-compatible processor chips, larger and faster devices can be designed later using more advanced silicon fabrication technology. All devices in the Torrent family will execute the same instruction set.

One particular task targeted for T0 is the popular back-propagation algorithm for training connectionist nets. Simulations indicate that T0 will be able to perform up to 50 million updates per second on this task for moderate-sized networks. This performance is somewhat higher than ICSI currently achieves using a four-board Ring Array Processor for the same problem (ICSI's Winter 1989 Newsletter featured an article on the RAP board).

The core design team that "realized" T0 include Asanovic, Brian Kingsbury and Bertrand Irissou, all students of U.C. Berkeley Computer Science Professor John Wawrzynek. Each team member contributed his unique experience to the project.

[Others with first T0 wafers]
Brian, Krste and Morgan with the first T0 wafers.

[JohnW with first T0 wafers]
John Wawrzynek with the first T0 wafers.

The arithmetic multiplier in T0 is the brainchild of Ph.D. student Brian Kingsbury, who implemented the 16-bit x 16-bit integer, single cycle unit. The 32-bit result from the multiplier matches the 32-bit width of the other vector functional units including adders, shifters and the vector register file. Kingsbury was also primarily responsible for managing the CAD tools, and wrote several utilities specifically for the T0 project.

[T0 Multiplier Bit]
Micrograph of one full adder cell within the multiplier array.

Bertrand Irissou, a recent Master's graduate from U.C. Berkeley's computer science department, designed the critical high-speed vector register file for T0. He was also the lead designer for the tricky "analog" parts of the digital chip, including the power distribution grid, the clock buffer and the I/O drivers.

Along the way, Irissou write utility programs for automatically routing wires and had responsibility for putting all the pieces together for the final design. He has since taken a position as a VLSI designer at Integration Associates in Mountain View, California.

Aside from his responsibilities as chief architect, Asanovic was in charge of the overall testing strategy. He was also the liaison with the software development effort, led by David Johnson (see page one for a profile of Johnson and his work on T0 and other ICSI projects).

[T0 Faces Logo]
Micrograph of the T0 logo.
The VLSI design team made sure their faces were etched onto every die!

The design of the T0 chip was completed - "taped out" in the parlance of VLSI designers - on February 14, 1995. On that day the project files were turned over to MOSIS, a silicon fabrication broker which arranged to have the chips built. MOSIS, in turn, sent the project files to Hewlett-Packard, where the fruits of over two years of hard work by ICSI's design team were converted into silicon.

To mark the end of the design phase, the three chip designers were honored at a Realization Group lunch on March 15. Krste Asanovic, Brian Kingsbury and Bertrand Irissou were presented with plaques marking their achievements. In addition, the de riguer project T-shirt was unveiled and modeled by the team.

[Torrent T-shirts]
Krste, Bertrand, and Brian model the Torrent T-shirt at the Realization Group picnic.

[Team members at picnic]
Krste, Bertrand, and Morgan at the Realization Group picnic.

The die arrived back at ICSI on April 4, and testing started late in the day on April 13 at Digital Testing Services (DTS) in Santa Clara, California. The testing process involved running 22 different programs on each of the fabricated die to look for correct functionality. After some false starts and a bit of head scratching, the testing team of Asanovic, ICSI engineer James Beck and Prof. John Wawrzynek, with help from Dr. Sassan Raissi of DTS, managed to test the first completely functional device just after midnight on April 14th. In all, 48 good die were found out of 120 fabricated for an excellent yield of 40%.

[Test team at DTS]
Dr. Sassan Raissi, Prof. John Wawrzynek, and Krste Asanovic during the testing process.

The next step is to mount each working die onto a printed circuit board, along with memory, clock chips and an interface to a host processor. The first implementation of this board, called SPERT, is SBus-compatible, for use with Sun Microsystem's line of workstations. The progress of the debugging and system integration process can be followed via the ICSI world wide web page at: http://WWW.ICSI.Berkeley.EDU/real/spert/spert-intro.html

David Johnson <davidj@ICSI.Berkeley.EDU>
$Date: 2000/08/16 01:39:03 $