Lockstep Processor Design: Achieving High-Integrity Computing
Lockstep is a hardware redundancy technique in which two or more processor cores execute the identical instruction stream in tight synchronization, allowing a comparator to detect any divergence in their behavior. It is one of the foundational architectures used to meet functional safety standards such as ISO 26262 (automotive), IEC 61508 (industrial), and EN 50129 (railway), where random hardware faults must be detected with very high diagnostic coverage. This article examines how lockstep works, how it differs from voting-based redundancy, and the practical trade-offs an SoC architect must weigh.
Quick Summary
| Goal | Detect random hardware faults with high diagnostic coverage (often >99%) |
| Mechanism | Two cores run the same code; outputs are compared cycle-by-cycle |
| Limitation | DCLS detects faults but does not correct them — correction needs TMR or recovery |
The Lockstep Concept
A single processor core, however well designed, cannot reliably detect its own faults. A stuck bit in the ALU, a soft error in a register, or a defective gate can corrupt a result, and the core has no independent reference against which to check itself. Lockstep solves this by introducing a redundant checker core that performs exactly the same computation. If the two cores ever produce different results, a fault has occurred.
The key principle is deterministic execution: given identical reset state, identical clocks, and identical inputs, two correctly functioning cores must produce bit-identical outputs every cycle. Any deviation is, by definition, a fault. This requires the design to eliminate sources of non-determinism — uninitialized memory, asynchronous events that arrive at different times, and free-running counters must all be carefully controlled.
What Gets Compared
Rather than comparing internal register state directly, lockstep comparators monitor the core's external interface signals, which fully characterize its behavior:
- Bus transactions: AXI/AHB address, data, and control signals on both read and write channels
- Coprocessor and peripheral accesses: all outbound requests
- Status and exception outputs: interrupt acknowledges, fault signals, debug state
- Memory protection decisions: MPU/MMU permission results
Dual-Core Lockstep (DCLS)
Dual-Core Lockstep (DCLS) is the most common configuration. It pairs a primary core, whose outputs drive the system, with a checker core whose outputs are used only for comparison. The checker's bus interface is typically disconnected from the system fabric so that only the primary actually performs transactions; the checker shadows the computation purely to validate it.
Synchronization and Inputs
Both cores receive the same inputs. Read data returning from memory is fanned out to both cores simultaneously, and interrupts are synchronized so they are observed on the same cycle by both cores. Because the two cores share a single view of memory and I/O, their internal state remains identical as long as neither is faulty.
The Diagnostic Trade-off
DCLS provides fault detection but not fault correction. When the comparator flags a mismatch, it knows that one of the two cores is wrong but cannot determine which one. The safe response is therefore to signal an error and force the system into a defined safe state — for example, de-energizing actuators or handing control to a redundant channel. DCLS is ideal for "fail-safe" systems where a detected fault can be handled by entering a safe state.
Delayed Lockstep and Temporal Diversity
A naive lockstep design clocks both cores on exactly the same cycle. The problem is that a single transient event — a power-supply glitch, an electromagnetic disturbance, or a clock-tree disruption — can affect both cores simultaneously and identically. Both would produce the same wrong answer, the comparator would see agreement, and the fault would go undetected. This is a common cause failure (CCF).
Delayed lockstep mitigates this by offsetting the checker core's execution by a fixed number of clock cycles (commonly 1.5 to 2 cycles, as in the ARM Cortex-R family). Input signals to the checker are delayed by N cycles, and the primary's outputs are delayed by the same N cycles before comparison. The result is temporal diversity: a transient that strikes at a given instant hits the two cores at different points in their respective computations, so it produces different (and therefore detectable) error signatures.
The Comparator / Checker Logic
The comparator is the heart of a lockstep system — and, critically, it is itself a potential single point of failure. A comparator that is stuck reporting "match" would silently defeat the entire redundancy scheme. Good designs therefore make the comparator self-checking.
Self-Checking Comparators
- Dual-rail / two-rail checkers: The comparator output is encoded so that the "no error" state is a complementary pair (e.g., 01 or 10). A stuck-at fault in the comparator collapses the pair to 00 or 11, which is itself flagged as an error.
- Redundant comparators: Two independent comparators check the cores, and their results are cross-monitored.
- Periodic self-test: Deliberate fault injection (forcing a known mismatch) verifies at startup and at intervals that the comparator can still detect divergence — this is the basis of the Logic Built-In Self-Test (LBIST) diagnostic.
The comparator also gates how quickly a fault is reported. The Fault Tolerant Time Interval (FTTI) defined by ISO 26262 sets the maximum time from fault occurrence to reaching a safe state; cycle-by-cycle comparison gives near-immediate detection, well inside typical FTTI budgets.
Fault Detection vs. Fault Correction
It is essential to distinguish the two capabilities, because they drive completely different architectures:
- Fault detection identifies that an error has occurred. Two redundant channels (DCLS) are sufficient, since disagreement reveals a fault. The system then fails safe.
- Fault correction identifies which channel is wrong and continues operating correctly. This requires a majority — at least three channels — so the two agreeing channels outvote the faulty one. This enables "fail-operational" behavior.
Choosing between them is a system-level safety decision. A braking ECU that can safely cut torque may only need detection; a flight control surface that must keep functioning needs correction.
TMR vs. Lockstep
Triple Modular Redundancy (TMR) runs three identical cores and feeds their outputs into a majority voter. If one core diverges, the other two outvote it, and the system continues with the correct result — correction without interruption. TMR is favored in aerospace and space, where mission continuity is mandatory and the area cost is acceptable.
| Attribute | Single Core | DCLS (Dual Lockstep) | TMR (Triple) |
|---|---|---|---|
| Cores Required | 1 | 2 | 3 |
| Fault Detection | None (no reference) | Yes (mismatch) | Yes (minority vote) |
| Fault Correction | No | No (fail-safe only) | Yes (fail-operational) |
| Logic Area Overhead | Baseline (1x) | ~2x + comparator | ~3x + voter |
| Performance Impact | None | Negligible (parallel) | Negligible (parallel) |
| Typical Use | Non-safety / QM | Automotive, industrial | Aerospace, space |
The lockstep approach trades the ability to continue operating for a roughly 33% reduction in redundant logic compared to TMR. Many automotive SoCs combine the two ideas: DCLS cores handle safety functions while a software-level "1oo2" or "2oo3" voting scheme runs across multiple ECUs at the vehicle level.
Common Cause Failure Mitigation: Physical Diversity
Redundancy is only effective if a single root cause cannot disable both copies at once. Common cause failures — shared power rails, clock disturbances, thermal hotspots, manufacturing defects in adjacent silicon, or correlated radiation strikes — threaten that independence. Lockstep designs counter CCF with deliberate diversity:
- Temporal diversity: the cycle delay described earlier, so transients hit the cores at different execution points.
- Spatial / physical diversity: the checker core is placed at a physical distance from the primary, often rotated or mirrored in the floorplan, so a localized defect or particle strike is unlikely to corrupt both identically.
- Layout diversity: some designs synthesize the checker with a different placement or even different gate-level implementation, reducing the chance of a shared systematic defect.
- Supply and clock isolation: separate decoupling and balanced clock trees limit shared electrical disturbances.
Recovery and Reset
Detecting a fault is only half the job; the system must respond within the FTTI. Lockstep error handling typically distinguishes transient from permanent faults:
- Safe-state entry: on a mismatch, hardware immediately asserts an error to the safety mechanism, which can disable outputs or switch to a redundant channel.
- Lockstep recovery: for transient (soft) errors, the cores can be re-synchronized by saving a known-good context, resetting both cores, and restoring state — resuming execution if the fault does not recur.
- Permanent fault response: if mismatches persist after recovery, the fault is treated as permanent and the system stays in the safe state, often raising a diagnostic trouble code.
- Re-synchronization: both cores must be brought back to bit-identical state from the same reset point before lockstep can resume, which is why deterministic reset behavior is a hard design requirement.
Real-World Examples
ARM Cortex-R Family
The ARM Cortex-R5, R52, and R52+ cores offer an optional DCLS configuration aimed at automotive and industrial safety. The redundant core executes with a fixed clock-cycle delay (temporal diversity), and comparison logic monitors the core interfaces. These cores also integrate ECC on memories and caches and support ASIL D safety levels when deployed with the appropriate safety mechanisms.
Infineon AURIX (TriCore)
The Infineon AURIX TC3xx family implements lockstep on selected TriCore CPUs. Each lockstep pair consists of a primary core and a delayed checker core placed with physical and temporal diversity. AURIX targets ISO 26262 ASIL D and is widely used in powertrain, braking, and steering ECUs. Other notable safety processors include the TI Hercules (RM4x/TMS570) series, which pairs Cortex-R cores in lockstep with ECC and built-in self-test.
Performance and Area Overhead
A frequent misconception is that lockstep halves throughput. It does not: both cores run in parallel at full speed, so instruction throughput is essentially unchanged. The checker simply shadows the primary's work. The real costs are:
- Silicon area: roughly doubling the CPU logic (the checker core), plus comparator, delay pipelines, and self-test logic. Note that shared resources — caches, memories, and the bus fabric — are not duplicated; they are protected by ECC instead, so the full-chip overhead is well below 2x.
- Power: the checker core draws dynamic power equivalent to the primary, increasing CPU power consumption accordingly.
- Determinism constraints: non-deterministic features (some speculative behaviors, asynchronous interfaces) must be controlled, which can modestly reduce achievable performance compared to a non-lockstep variant.
Implementation Best Practices
- Enforce determinism: initialize all state from reset, synchronize interrupts to both cores on the same cycle, and eliminate uncontrolled asynchronous inputs.
- Use temporal diversity: apply a fixed cycle delay between primary and checker to break simultaneous common-cause transients.
- Apply physical diversity: separate and rotate the checker core in the floorplan; isolate its supply and clock distribution where practical.
- Make the comparator self-checking: use dual-rail encoding and periodic fault injection so a comparator failure cannot mask divergence.
- Protect shared memory with ECC: since caches and RAM are not duplicated, SECDED ECC must cover them to maintain end-to-end coverage.
- Define the safe state and FTTI: specify the safe state and ensure detection-to-reaction latency fits inside the Fault Tolerant Time Interval.
- Plan recovery semantics: distinguish transient from permanent faults and implement deterministic re-synchronization from reset.
- Verify the safety mechanism, not just the function: run fault-injection campaigns to measure diagnostic coverage and validate the Failure Modes, Effects, and Diagnostic Analysis (FMEDA).
Diagnostic Coverage and Safety Metrics
Diagnostic Coverage: DC = λDD / (λDD + λDU)
Single-Point Fault Metric: SPFM = 1 − (ΣλSPF / Σλ)
Where λDD = detected dangerous failure rate, λDU = undetected dangerous failure rate, and λSPF = single-point fault rate. Lockstep drives DC toward >99%, supporting ASIL D / SIL 3 targets.
Conclusion
Lockstep is the workhorse of high-integrity computing. DCLS delivers high diagnostic coverage for fail-safe systems at roughly twice the CPU logic, while TMR adds a third channel to achieve fail-operational correction for mission-critical applications. The effectiveness of either approach hinges on disciplined determinism, a self-checking comparator, and deliberate temporal and physical diversity to defeat common cause failures.
Choosing the right architecture is fundamentally a safety-goal decision: detect-and-fail-safe versus detect-correct-and-continue, balanced against area, power, and the Fault Tolerant Time Interval. Real-world parts such as the ARM Cortex-R, Infineon AURIX, and TI Hercules show how these principles translate into shipping safety silicon.
Vcores offers safety-processor and fault-tolerant IP — including lockstep-ready core wrappers, self-checking comparators, ECC-protected memory subsystems, and FMEDA support — to help you reach ISO 26262 and IEC 61508 targets in your FPGA and ASIC designs.