Abstract |
---|
This paper describes a system that enables parallel programs writtenusing the BSPlib communications library to migrate processes among anetwork of workstations. Not only does the system provide faulttolerance of BSPlib jobs, but by utilising a load manager thatmaintains an approximation of the global load of the system, it ispossible to continually schedule the migration of BSP processes ontothe least loaded machines in a network. Results are provided for anindustrial electro-magnetics application that show that we can achievesimilar throughput on a publically available collection ofworkstations as a dedicated NOW. |
Contact |
Jonathan Hill Oxford University Computing Laboratory,,Wolfson Building,Parks Road,Oxford,OX1 3QD Jonathan.Hill@comlab.ox.ac.uk |