An Introduction to HPC
Definition of HPC
No strict definition but any compute system delivering over 1
Gigaflop/s; alternatively, any computer that has an order of magnitude,
or greater, performance than current top-end workstations.
The need for HPC
Areas suitable for application are:
- large scientific, engineering, and medical applications.
- business and commerce are applying financial modelling techniques.
- database applications for fast retrieval of large quantities of data.
- Virtual Reality.
Why the parallel approach?
To get higher performance from contemporary sequential computers
requires: • Increasingly sophisticated chip design, which requires
complex memory hierarchy and multiple functional units.
This is expensive!!
Economic issues
There are only about at most 200 sales of supercomputers world wide per year.
So must leverage chip design for PC's and workstations.
Parallel Processing
Technological and economic constraints on single processor computers has
driven the approach towards parallel computation. This is more
cost-effective - in terms of hardware - since commodity state-of-the-art
chips may be used. The major features of parallel computers are:
- Processors can perform operations simultaneously.
- An interconnection network that allows processors to cooperate and
share data.
Machine Classification
Here we do not classify according to the structure of the machine, but
rather on how the machine relates its instructions to data being
processed.
MIMD (Multiple Instruction, Multiple Data)
- Tightly coupled - shared memory e.g. SG PowerChallenge, Sun SPARCserver 1000.
- Loosely coupled - distributed memory e.g. Meiko CS-2, IBM SP-2, Cray T3D
- Workstation cluster - e.g. Local Area Networks (LANs) and Wide Area Networks (WANs) running, say, PVM or MPI.
SIMD (Single Instruction, Multiple Data)
MasPar MP-2, Thinking Machine's CM-2, and others.
MIMD systems
In these systems:
- Different processors can execute different instructions at the same time.
- There is no central controller, any synchronization is achieved through software or specialized hardware.
MIMD - tightly coupled
In shared memory systems:
- At least part of the memory is shared between processors.
- Hardware and operating system designed to assure that multiple
processors do not erroneously write to the same address simultaneously
and so corrupt memory contents.
- Interprocessor communication is possible through reservation of
memory locations (usually control blocks) visible to all processes.
- May not scale well for large numbers of processors due to memory
contention bottleneck.
Examples: SG PowerChallenge, Sun SPARCserver 1000.
MIMD - loosely coupled
In distributed memory systems:
- Each processor or node has its own local memory.
- Communication between processors is achieved by sending messages using
library calls.
- Scaling to much larger number of nodes is possible while still
maintaining system balance.
Examples: Meiko CS-2, IBM SP-2, Thinking Machine's CM-5, and Cray T3D.
MIMD - workstation cluster
In these distributed memory systems:
- Each processor or node may be any compute resource on a LAN or WAN.
- Communication is achieved through exchange of data, using message
passing library calls e.g. PVM, MPI, and others.
- Scaling to much larger number of nodes is possible while still
maintaining system balance.
Example: CERN HP workstation cluster for High Energy Physics event processing.
SIMD systems
In these systems:
- All processors execute the same instruction at the same time.
- Each processor has its own local memory, and communicates with other
processors using special instructions.
Examples include Thinking Machines CM-2, and MasPar MP-2.
Interconnections
Mechanisms for communication between processors is provided by:
- Point-to-point - a dedicated connection between two processors.
- Buses - several processors physically sharing a common path.
Physical Node topologies
Processors or nodes may be connected in a variety of topologies. These include:
- Connect each node to every other node; example, BBN Butterfly.
- Connect each node to its neighbour in a ring.
- Lattice in two or three dimensions; example, Cray T3D.
- Hypercube; example, n-Cube range.
- Fat tree; example Thinking Machine's CM-5.
A non-computing example of High Performance processing
Construction of a wall. This illustrates:
- Domain decomposition - dividing the problem (of building the wall)
between processors (bricklayers).
- Size of the domain will determine speedup.
- Communication at boundaries.
- Difficulties encountered in irregular domains (load balancing).
Summary of different types of parallelism in constructing the wall
The construction of the wall also illustrates different forms of
parallelism found in computing.
- Data parallelism. This is when all bricklayers are working on a
different section of wall, but essentially performing the same task.
- Pipelining. This was shown when the task of building the wall was
decomposed into horizontal sections.
- Task parallelism. Although this was not illustrated we can envisage
construction of the wall by assigning different tasks to different
workers. For instance, one worker may be delivering bricks, another
mixing cement, and another laying the bricks.
Submitted by Mark Johnston,
last updated on 9 November 1994.