Fault-tolerance Techniques for Low-power Microprocessor design

Figure 1

Figure 1: Soft-error mechanisms in logic: delay violations, transients in combinational logic, and upsets in sequential gates.

Digital logic circuits are most energy-efficient when operated at very low (near subthreshold) voltages. However, many severely energy-constrained applications (e.g., implanted medical devices) also require high reliability, and the rate of radiation-induced soft errors increases significantly at low voltages [1].  Existing techniques for improving soft-error resilience come with significant power overhead.  The purpose of this project is to investigate (for both memory and logic) error detection and correction mechanisms that are specifically optimized for micropower, low-voltage systems.

Figure 2

Figure 2: BCH decoding, showing the separate phases of error detection and correction.

Soft-error events affect both combinational and sequential logic gates, as shown in Figure 1.  Due to the necessarily tight power-supply voltage margins at low voltage, power-supply droop can also generate errors by causing signals to arrive late.  Flip-flop and latch designs capable of detecting all of these errors [2] [3] have been previously demonstrated by others.  However, their work has focused on high-performance processors with significant speculative state, so that errors can be recovered from simply by flushing speculative instructions from the pipeline.  Micro-power processors have little or no speculative state, so we are working on alternative error-recovery mechanisms.

SRAMs make up the majority of the area of most microprocessor chips and must be continuously powered for data retention.  Designing SRAMs for low-voltage operation is therefore particularly important. However, scaling down power-supply voltage not only increases susceptibility to radiation-induced soft errors, but also degrades bit-cell stability due to device variation effects.  Simple SECDED Hamming codes are quite effective at recovering from radiation-induced errors.  We are exploring the use of higher-order BCH codes capable of correcting multiple bits per word, in order to address both radiation and bitcell-variation-induced errors (Figure 2).


References
  1. T. Heijmen, et al., “A Comprehensive study on the soft-error rate of flip-flops from 90-nm production libraries,” IEEE Transactions on Device and Materials Reliability, vol. 7, pp. 84-96, Mar. 2007. []
  2. S. Das, et al., “RazorII: In situ error detection and correction for PVT and SER tolerance” IEEE Journal of Solid-State Circuits, vol. 44, pp. 32-48, Jan. 2009. []
  3. K. Bowman, et al., “Energy-efficient and metastability-immune resilient circuits for dynamic variation tolerance” IEEE Journal of Solid-State Circuits, vol. 44, pp. 49-63, Jan. 2009. []

Microsystems Technology Laboratories | Massachusetts Institute of Technology | 60 Vassar Street, 39-321 | Cambridge, MA 02139 | http://www.mtl.mit.edu
Copyright © Massachusetts Institute of Technology. | Information on MIT Accessibility