【Tsukuba University】CP-PACS Supercomputer

CP-PACS was a massively parallel distributed memory supercomputer that reached a peak theoretical performance of 614 gigaFLOPS. Tsukuba University developed CP-PACS in collaboration with Hitachi. The system consisted of 2,048 processing units (PU) and 128 input/output units (IOU), which processed inputs and outputs on a distributed basis. These units were connected by a 3D network. The system set a world record of 368.2 gigaFLOPS on the Linpack benchmark in September 1996 and was listed as the most powerful supercomputer site in the world by the TOP500 project in November 1996.

CP-PACS went into operation at the university’s Center for Computational Physics (today, the Center for Computational Sciences) in October 1996 and remained in service for eight years until September 2005. During this period, the use of CP-PACS led to global accomplishments in numerical research in particle physics, condensed matter physics, and astrophysics, among others.

In the 1970s and the first half of the 1980s, Tsukuba University developed the PACS/PAX series of parallel scientific computing computers, which led to the world-leading research that calculated actual neutron diffusions in nuclear reactor cores. The fifth machine of the series, the QCDPAX, ran at 13.75 gigaFLOPS, which was among the world’s most powerful computers in 1989 when the system was completed. The QCDPAX was used in particle physics research for ten years. Based on the results of the QCDPAX, the university began looking at developing the CP-PACS massively parallel supercomputer in the summer of 1991 to continue its physics research.

The CP-PACS project was officially launched after being selected as a theme for the Ministry of Education, Science and Culture’s 1992 Program for New Technology Development. This also coincided with the establishment of the university’s Center for Computational Physics. Development proceeded for five years until fiscal year 1995, receiving a total of 2.2 billion yen in subsidies under the Ministry’s Grant in Aid for Scientific Research program. The first stage was completed and went into operation in March 1996 with 1,024 processing units (with a theoretical peak performance of 307 gigaFLOPS). The expansion to 2,048 processing units was completed at the end of September 1996, which completed the massively parallel supercomputer system with a theoretical peak performance of 614 gigaFLOPS.

Each CP-PACS processing unit (PU) consisted of 64 MB of main memory and a superscalar RISC processor based on the PA-RISC 1.1 architecture. The peak performance of each PU was 300 megaFLOPS, giving a peak performance of 614 gigaFLOPS when 2,048 PUs were used in parallel. To counter the typical decline in computation performance of RISC processors caused by insufficient data cache memory during large-scale scientific computing, the developers introduced a pseudo vector function called PVP-SW (pseudo vector processor based on slide-windowed registers). With PVP-SW, a superscalar processor can achieve highly efficient vector processing.

The Center for Computational Physics was reorganized and expanded in April 2004 and became the Center for Computational Sciences. The PACS-CS, the successor to the CP-PACS, has been in operation since July 2006.

Hitachi developed and sold the SR2201 massively parallel distributed memory supercomputer in 1996, based on technology it jointly developed in the CP-PACS project.

See the following link for Center for Computational Physics.

Main specifications of the CP-PACS supercomputer
System No. of PUs 2,048
No. of IOUs 128
Total memory capacity 128GB
Theoretical peak performance 614 GFLOPS
Operating system UNIX
Processing units (PU) Processor Based on the PA-RISC 1.1 specification with a PVP-SW function
Clock frequency 150MHz
Theoretical peak performance 300megaFLOPS
Memory capacity 64MB
Cache memory Primary: 16 KB (instruction), 16 KB (data)
Secondary: 512 KB (instruction), 512 KB (data)
Bidirectional connection network Topology 3D hyper crossbar network
Peak transfer performance 300 MB/s per link
Transfer method Wormhole + remote DMA
Input/output units (IOU) Total memory capacity 529 GB (distributed disks in a RAID 5 configuration)