Processor Type | Thin | Thin 2 | Wide | WideClock Speed | 66 MHz | 66 MHz | 66 MHz | 66 MHz | Peak Megaflops | 266 | 266 | 266 | 266 | Memory Cards | 2 | 2 | 2 | 4 or 8 | Memory | 64-512 MB | 64-512 MB | 64-512 MB | 64-2048 MB | Memory Bus | 64 bit | 128 bit | 128 bit | 256 bit | Data Cache | 64 KB | 128 KB | 128 KB | 256 KB | Proc. to Data Cache Bus | 128 bit | 256 bit | 256 bit | 256 bit | Instruction Cache | 32 KB | 32 KB | 32 KB | 32 KB | Disk | 1-9 GB | 1-9 GB | 1-18 GB | 1-18 GB | Microchannel Adapter Slots | 4 | 4 | 8 | 8 | Level 2 Cache | 0-1 MB | 0-2 MB | n/a | n/a
| |
The POWER2 processor consists of eight units: an Instruction Cache Unit (ICU), a Fixed-Point Unit (FXU), a Floating-Point Unit (FPU), four Data Cache Units (DCU), and a Storage Control Unit (SCU).
The FXU contains two execution units and handles all integer arithmetic, storage references and logical operations. It also contains the general purpose registers, a data cache directory and a data translation look-aside buffer. Each execution unit contains an adder and a logical functional unit. The second unit also contains a multiply and divide unit. Two instructions can be executed per clock-cycle but not two multiply or divide instructions. A multiply takes two clock cycles while a divide takes 13 to 17 cycles.
The FPU also contains two execution units which are double-precision (64-bit) together with the floating-point registers. Both execution units are identical and conform to IEEE 754 binary floating-point standard. To speed up execution a multiply-add (AxB+C) instruction is available. The multiply-add takes one cycle so using both units, four floating-point operations per second can be achieved. There is also a hardware square-root instruction available further improving calculation performance. A separate unit for normalizing store data is also available, resulting in effectively zero FPU cycles needed for a floating-point store.
As an example, in small systems (up to 64-way, i.e. 64 nodes) only one switch board is required per 16 nodes. A
An SP2 node connects to the switch board through an intelligent Micro Channel adapter. The adapter has an onboard microprocessor (Intel i860 XR) that offloads some of the work associated with moving messages between nodes. The adapter can move messages to and from processor memory directly via direct memory access (DMA), thus reducing the overhead on the processor node for message processing and significantly improving the sustainable bandwidth. Message cyclic redundancy check (CRC) code generation and checking is also done by the adapter to detect errors in the links, further reducing the overhead on the SP2 node.
The switch always contains at least one stage more than necessary for full connectivity. Since the basic switching element is a 4 * 4 bidirectional crossbar, this extra stage guarantees that there are at least four different paths between every pair of nodes. The redundant paths provide for recovery in the presence of failures (as well as reduce congestion in the switch).
The communication subsystem software complements the hardware capability to provide transparent recovery of lost or corrupted messages. The communication protocol supports end-to-end packet acknowledgment. For every packet sent by a source node, there is a returned acknowledgment after the packet has reached and been received by the destination node. Thus the loss of a packet will be detected by the source node. The communication subsystem software automatically resends packets if an acknowledgment is not received within a preset interval of time.
For high-performance I/O requirements, the SP2 allows I/O and file servers to be integrated into the system by configuring some of the nodes as I/O and file servers. Raw I/O capacity and bandwidth can be arbitrarily increased simply by adding more I/O server nodes.
Because of the underlying message-passing architecture, clearly a message-passing programming style is the preferred one for performance on the SP2. Several message-passing libraries callable from FORTRAN and C are supported on the SP2. The SP2 also supports the data parallel programming model with High Performance FORTRAN.
IBM has made no fundamental change to the base RISC System/6000 processor and the AIX operating system for the SP2. This means that any of the major hardware or software options available on the base RISC System/6000 workstations can be installed on an SP2 node. Similarly, several thousand RISC System/6000 applications are available immediately to an SP2 customer.
Benchmark | Thin node | Thin node 2 | Wide node | Units |
SPECint92 | 114 | 122 | 122 | SPEC units |
SPECfp92 | 205 | 251 | 260 | SPEC units |
Table 3 shows a comparison with the established competition - Cray Research's T3D, Thinking Machines' CM5 and Intel's Paragon.
SP2 Wide nodes | SP2 Thin nodes | Cray T3D | TMC CM5 | Intel Paragon |
12.1 | 9.2 | 6.4 | 3.8 | 2.0 |