Kalaiselvi, S and Rajaraman, V (1999) A checkpointing algorithm for an SCI based distributed shared memory system. In: Microprocessors and Microsystems, 22 (9). pp. 515-522.
sdarticle.pdf - Published Version
Restricted to Registered users only
Download (339Kb) | Request a copy
Distributed Shared Memory (DSM) systems combine the ease of programming of shared memory parallel computers and scalability of message passing multicomputers. IEEE has proposed an interface standard known as SCI standard to construct DSM systems. When the number of processors in a parallel computer increase it is imperative to build fault tolerance. This article presents an algorithm for checkpointing and rollback recovery of an SCI based DSM system using the provisions of the standard. It is shown that this checkpointing and rollback recovery procedure judiciously combines the features of both shared memory and message passing distributed memory system. (C) 1999 Elsevier Science B.V. All rights reserved.
|Item Type:||Journal Article|
|Additional Information:||Copyright of this article belongs to Elsevier Science B.V.|
|Keywords:||Fault tolerant computing;Checkpointing;Rollback recovery;SCI standard|
|Department/Centre:||Division of Information Sciences > Supercomputer Education & Research Centre|
|Date Deposited:||02 Jan 2009 16:48|
|Last Modified:||19 Sep 2010 04:58|
Actions (login required)