Brief History

Scientific and engineering calculations continually demand faster computers. Development of high-performance supercomputers for scientific and engineering applications began in the 1960s in the United States alongside the development of general-purpose computers (mainframes). The Control Data Corporation (CDC) began shipping the CDC 6600 in 1964, which could perform mathematical calculations three times faster than the Stretch, IBM's largest mainframe computer. Its floating-point arithmetic processing speed of 4 MFLOPS put the computer in a different league from anything else at the time. As a result, the CDC 6600 has been called either the first supercomputer or the forerunner of the supercomputer.

Vector Supercomputers

Because scientific and engineering calculations often involve floating-point operations on large data arrays, vector processors were developed that grouped vector data arranged systematically in memory and processed the data through pipelines. IBM, in 1969, commercialized the IBM 2938 array processor as an add-on processor that was connected to the I/O channels of a System/360 computer. The first completely independent vector computers were CDC's STAR-100 and Texas Instruments' ASC computers, both announced in 1972．

Seymour Cray, who developed the CDC 6600 and CDC 7600, founded Cray Research, Inc. (CRI) in 1972 and announced the Cray-1 with a peak performance of 160 MFLOPS in 1976. Because of its vector register architecture and support for automatically vectorizing FORTRAN, the Cray-1 was widely used by users for large scientific and engineering calculations and made the supercomputer name famous. Many later vector machines were based on the Cray-1 architecture. CRI led the industry with the later development of the Cray X-MP, the Cray-2, and the Cray Y-MP.

In Japan, Fujitsu completed the FACOM 230-75APU with vector processors in 1977. It was quickly joined by Hitachi, NEC, and Mitsubishi Electric, who all commercialized integrated array processors (IAPs) that incorporated vector processing functions in large mainframe computers. Hitachi completed the HITAC M-180 IAP (with a performance in excess of 10 MFLOPS) in 1978, the M-200H IAP (48 MFLOPS) in 1979, and the M-280-H IAP (67 MFLOPS) in 1982. NEC developed the ACOS-1000 IAP (28 MFLOPS) in 1982, and Mitsubishi Electric developed the MELCOM COSMO 700 III IAP．

From 1982 on, Japanese computer makers jumped into the supercomputer market, with Fujitsu announcing vector supercomputers under the VP series (VP-100 and 200), Hitachi under its S-810 series, NEC under its SX series (SX-1 and SX-2). Hitachi started shipping the S-810/20 (630 MFLOPS) in 1983, the same year Fujitsu started shipping the VP-200 (570 MFLOPS). NEC started shipping the SX-2 (1,300 MFLOPS) in 1985 that exceeded for the first time in the world 1 MFLOPS on Loop 7 of the Livermore Fortran kernels. Although all these supercomputers used the same vector register architecture of the Cray-1, they were equipped with multiple pipelines per processor and had powerful hardware that substantially increased the capacity of the vector registers and the memory capacity. These supercomputers also maintained compatibility with their brand of mainframes in order to make supercomputers easier to use by mainframe users. In the next upgrade round, Hitachi announced the S-820 (3 GFLOPS) in 1987 and Fujitsu announced the VP-2600 (5 GFLOPS) in 1988. In 1989, NEC announced the SX-3 (5.5 GFLOPS per processor) with up to four parallel processors. The SX-3 was a true shared-memory parallel vector processor．

In February 1993, the National Aerospace Laboratory of Japan (today, the Japanese Aerospace Exploration Agency) developed the Numerical Wind Tunnel (NWT) in partnership with Fujitsu. The NWT used the architecture of a distributed-memory, parallel-vector computer to connect 166 processing elements with crossbar switches. The CPU boards used three types of LSIs: bipolar CMOS, emitter-coupled logic, and gallium arsenide. The NWT's peak performance was 280 GFLOPS, which remained the world's fastest until 1995．

In 1993, Hitachi developed the S-3800 (8 GFLOPS per processor), which ran on up to four shared-memory vector processors using bipolar CMOS LSIs, and Fujitsu developed the VPP500 (1.6 GFLOPS per processor), a distributed-memory, highly parallel (with up to 222 processors) vector computer that borrowed technology from the NWT that Fujitsu had developed in partnership with the National Aerospace Laboratory of Japan. The company also shipped the VPP300 (2.2 GFLOPS per processor), a CMOS version of the VPP500 with 16 parallel processors, in 1995 and the high-end VPP700 (2.2 GFLOPS per processor) with 512 parallel processors in 1996. NEC shipped the SX-4 (2 GFLOPS per processor), a vector processor with up to 512 CMOS processing elements and up to 32 GB in shared memory, in 1995 and the SX-5 (8 GFLOPS per processor and up to 512 processors) in 1998.

Parallel Supercomputers

The University of Illinois first proposed a parallel processing architecture as an alternative to vector architectures. The University of Illinois and the Burroughs Corporation jointly developed the ILLIAC IV in 1972, a parallel computer with a peak performance of 50 MFLOPS that consisted of 64 processing elements. In the 1980s, with the advances in VLSI technology, massively parallel systems began to be developed, such as the Goodyear Massively Parallel Processor (MPP), which connected 16,384 one-bit processing elements in a lattice grid, and the Connection Machine, produced by Thinking Machines, which connected 65,536 functional units in a cube network. Development continued in the United States on scalar-type highly parallel and massively parallel supercomputers．

In Japan, Fujitsu announced the AP1000 (50 MFLOPS per processor) scalar-parallel supercomputer in 1992 that ran on 16 to 1,024 processing elements and the AP3000 in 1996 with four to 1,024 nodes using the UltraSPARC architecture. NEC developed the Cenju-2 in 1993, the Cenju-3 (50 MFLOPS per processor with up to 256 processing elements) in 1994, and the Cenju-4 (400 MFLOPS per processor with up to 1,024 processing elements) in 1997．

Hitachi announced the SR2001 in 1994 with between eight and 128 processing elements and a maximum performance of 23 GFLOPS. Tsukuba University, in partnership with Hitachi, completed the CP-PACS massively parallel supercomputer with distributed memory and a peak performance of 614 GFLOPS in September 1996. CP-PACS connected 2,048 arithmetic units and 128 I/O units with a 3D network. The system set a world record of 368.2 GFLOPS on the Linpack benchmark in September 1996 and was listed as the most powerful supercomputer site in the world by the TOP500 project in November 1996．

Hitachi started shipping the SR2201 (0.3 GFLOPS per processor with 2,048 parallel elements), a RISC-based pseudo vector computer with distributed memory based on CP-PACS technology, in 1996. The company started shipping the successor model, the SR8000 (eight processors per node, up to 128 nodes, 1,024 GFLOPS) in 1998. Hitachi moved to scalar massively parallel architectures, NEC stuck to its parallel-vector path, and Fujitsu later also moved to massively parallel architectures．

Supercomputers

PageTop