Vayrynen, Mikael and Singh, Virendra and Larsson, Erik (2009) Fault-Tolerant Average Execution Time Optimization for General Purpose Multi-Processor System-on Chips. In: International Conference on Design Automation and Test in Europe (DATE), Nice, 20-24 April 2009 , Nice.
Fault-Tolerant.pdf - Published Version
Restricted to Registered users only
Download (160Kb) | Request a copy
Fault-tolerance is due to the semiconductor technology development important, not only for safety-critical systems but also for general-purpose (non-safety critical) systems. However, instead of guaranteeing that deadlines always are met, it is for general-purpose systems important to minimize the average execution time (AET) while ensuring fault-tolerance. For a given job and a soft (transient) error probability, we define mathematical formulas for AET that includes bus communication overhead for both voting (active replication) and rollback-recovery with checkpointing (RRC). And, for a given multi-processor system-on-chip (MPSoC), we define integer linear programming (ILP) models that minimize AET including bus communication overhead when: (1) selecting the number of checkpoints when using RRC, (2) finding the number of processors and job-to-processor assignment when using voting, and (3) defining fault-tolerance scheme (voting or RRC) per job and defining its usage for each job. Experiments demonstrate significant savings in AET.
|Item Type:||Conference Paper|
|Additional Information:||Copyright 2009 IEEE. Personal use of this material is permitted.However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.|
|Department/Centre:||Division of Information Sciences > Supercomputer Education & Research Centre|
|Date Deposited:||16 Dec 2011 07:01|
|Last Modified:||16 Dec 2011 07:01|
Actions (login required)