IEC 61508: Functional Safety for Industrial Control Systems
IEC 61508 is the international umbrella standard for the functional safety of electrical, electronic, and programmable electronic (E/E/PE) systems. It defines a rigorous, risk-based framework for ensuring that a safety function performs correctly when demanded, and that residual risk is reduced to a tolerable level. For VLSI and embedded engineers building industrial control systems, IEC 61508 governs everything from random hardware failure rates to systematic development rigor. This guide provides the technical depth needed to design, verify, and certify safety-related hardware and IP.
Quick Summary
| Scope | Functional safety of E/E/PE safety-related systems across all industries |
| Core Metric | Safety Integrity Level (SIL 1 to SIL 4) quantifying risk reduction |
| Failure Targets | PFDavg for low demand; PFH for high/continuous demand |
| Two Pillars | Control of random hardware failures + avoidance of systematic faults |
IEC 61508 Standard Overview
IEC 61508 is organized into seven parts, addressing requirements, hardware and software, definitions, techniques, and worked examples. It serves as the basic safety publication from which sector-specific standards are derived. Its central thesis is that no system is failure-free, so safety must be engineered, measured, and managed throughout the entire lifecycle.
- Part 1: General requirements and the overall safety lifecycle
- Part 2: Requirements for E/E/PE safety-related hardware
- Part 3: Software requirements (development rigor by SIL)
- Part 4: Definitions and abbreviations
- Part 5: Methods for determining SIL (risk graph, LOPA)
- Part 6: Guidelines for applying Parts 2 and 3
- Part 7: Overview of techniques and measures
The standard distinguishes two fundamentally different failure classes. Random hardware failures arise from physical wear-out, defects, and environmental stress, and are quantifiable using failure rates (FIT). Systematic failures stem from errors in specification, design, or implementation, and cannot be quantified probabilistically; they are controlled by process rigor instead.
Safety Integrity Levels (SIL 1 to SIL 4)
A Safety Integrity Level (SIL) is a discrete measure of the risk reduction provided by a safety function. SIL 4 represents the highest integrity (greatest risk reduction) and SIL 1 the lowest. The required SIL is determined by hazard and risk analysis, and each level imposes both a quantitative failure target and qualitative process requirements.
| SIL | PFDavg (Low Demand) | PFH (High Demand, per hour) | Risk Reduction Factor |
|---|---|---|---|
| SIL 1 | ≥10-2 to <10-1 | ≥10-6 to <10-5 | 10 to 100 |
| SIL 2 | ≥10-3 to <10-2 | ≥10-7 to <10-6 | 100 to 1,000 |
| SIL 3 | ≥10-4 to <10-3 | ≥10-8 to <10-7 | 1,000 to 10,000 |
| SIL 4 | ≥10-5 to <10-4 | ≥10-9 to <10-8 | 10,000 to 100,000 |
SIL 4 is rarely applied outside of nuclear and rail signaling because of the extreme cost and difficulty of achieving such low failure probabilities. Most industrial process safety functions target SIL 1 to SIL 3.
PFD vs PFH: Low Demand vs High Demand Modes
IEC 61508 specifies failure targets differently depending on how often the safety function is called upon to act. Selecting the correct mode is essential, as it changes which metric you must calculate and verify.
Low Demand Mode (PFDavg)
Used when the safety function is demanded no more than once per year. The metric is the Average Probability of Failure on Demand (PFDavg), a dimensionless number. A classic example is an emergency shutdown system that may sit dormant for years before a single demand. Because the function is rarely exercised, proof-test interval dominates the calculation, since undetected dangerous failures accumulate between tests.
High Demand or Continuous Mode (PFH)
Used when the safety function is demanded more frequently than once per year, or operates continuously. The metric is the Probability of dangerous Failure per Hour (PFH), expressed in failures per hour. Examples include a motor-drive safe-torque-off function or a continuously active machine-guarding interlock. Here the failure rate itself, not the test interval, governs integrity.
The Safety Lifecycle
IEC 61508 is built around the overall safety lifecycle, a structured sequence of 16 phases that span from concept to decommissioning. Every phase has defined inputs, outputs, and verification activities, ensuring that safety requirements are traceable end to end.
- Concept & Scope: Define the equipment under control (EUC) and its boundaries
- Hazard & Risk Analysis: Identify hazards and quantify risk reduction needed
- Safety Requirements Allocation: Derive Safety Functions and assign target SILs
- Realization: Design and implement E/E/PE hardware and software
- Installation & Commissioning: Validate against the safety requirements specification
- Operation & Maintenance: Proof testing, repair, and failure data collection
- Modification & Decommissioning: Impact analysis and controlled retirement
A central principle is functional safety management (FSM): competence, configuration management, verification, and validation must be planned and documented. Independent assessment (FSA) is required, with greater independence demanded at higher SILs.
Hardware Fault Tolerance (HFT)
Hardware Fault Tolerance (HFT) describes how many dangerous faults a subsystem can withstand while still performing its safety function. An HFT of N means that N+1 faults are required to cause loss of the safety function.
- HFT = 0: A single fault can defeat the safety function (e.g., 1oo1 architecture)
- HFT = 1: Two faults are needed (e.g., 1oo2 redundant architecture)
- HFT = 2: Three faults are needed (e.g., 2oo3 voting architecture)
Increasing HFT through redundancy directly raises achievable SIL but increases cost, complexity, and spurious trip rate. Redundant channels must guard against common cause failures (quantified by the beta factor) through diversity and physical separation.
Safe Failure Fraction (SFF)
The Safe Failure Fraction (SFF) is the proportion of a subsystem's total failure rate that either tends toward the safe state or is detected by diagnostics before it can cause harm. It is a key determinant of the SIL a given hardware architecture can claim.
Safe Failure Fraction Formula
SFF = (λS + λDD) / λtotal
Where: λtotal = λS + λDD + λDU
λS = safe failure rate, λDD = detected dangerous failure rate, λDU = undetected dangerous failure rate
Only the undetected dangerous failures (λDU) erode safety integrity, because they leave the system silently impaired. Improving diagnostics moves failures from λDU into λDD, raising SFF and allowing a higher SIL claim for the same HFT.
Diagnostic Coverage (DC)
Diagnostic Coverage (DC) is the fraction of dangerous failures detected by automatic on-line diagnostics, expressed as DC = λDD / (λDD + λDU). IEC 61508 classifies DC into bands:
- Low: 60% to 90%
- Medium: 90% to 99%
- High: ≥99%
In VLSI safety designs, DC is raised using techniques such as dual-core lockstep, ECC on memories, CRC on configuration, watchdogs, logic and memory BIST, and parity on data paths. Higher DC reduces λDU, directly improving both SFF and the PFH/PFD result.
Architectural Constraints: Route 1H and Route 2H
Even if the probabilistic target is met, IEC 61508 imposes architectural constraints that cap the maximum SIL based on hardware structure. Edition 2 of the standard provides two alternative routes.
Route 1H (SFF-Based)
The traditional approach: maximum SIL is determined by the combination of SFF, HFT, and element type. Type A elements are simple, well-characterized components with predictable failure modes; Type B elements are complex (e.g., microcontrollers, ASICs) where failure behavior is harder to fully characterize, so they require higher SFF for the same SIL.
| SFF (Type B) | HFT = 0 | HFT = 1 | HFT = 2 |
|---|---|---|---|
| <60% | Not allowed | SIL 1 | SIL 2 |
| 60% to 90% | SIL 1 | SIL 2 | SIL 3 |
| 90% to 99% | SIL 2 | SIL 3 | SIL 4 |
| ≥99% | SIL 3 | SIL 4 | SIL 4 |
Route 2H (Reliability-Data-Based)
An alternative that relies on field-feedback reliability data and confidence levels rather than SFF. It requires demonstrating, with at least 90% confidence, the failure rates from operational experience, plus a minimum HFT of 1 for SIL 3 and 2 for SIL 4. Route 2H is useful for proven-in-use components where extensive field data exists.
Derived Sector-Specific Standards
As the foundational standard, IEC 61508 spawns tailored standards for individual industries. These inherit its core concepts (SIL, lifecycle, fault tolerance) while adapting terminology and constraints to their domain.
- IEC 61511: Functional safety for the process industry (oil, gas, chemical). Focuses on Safety Instrumented Systems (SIS) and adopts SIL directly.
- ISO 26262: Road vehicle functional safety. Replaces SIL with ASIL (A to D) derived from severity, exposure, and controllability.
- IEC 62061: Functional safety of machinery control systems.
- EN 50128 / EN 50129: Railway signaling and electronic systems.
- IEC 60601: Medical electrical equipment safety.
Implementation Best Practices
- Start with risk analysis: Derive the target SIL from hazard and risk assessment before fixing any architecture. Never reverse-engineer the SIL to fit a chosen design.
- Build the FMEDA early: Perform Failure Modes, Effects, and Diagnostic Analysis on the hardware to quantify λS, λDD, and λDU and drive design decisions.
- Maximize diagnostic coverage: Add lockstep, ECC, BIST, watchdogs, and parity to convert undetected dangerous failures into detected ones.
- Manage common cause failures: Use diversity, separation, and a documented beta-factor analysis for any redundant channels.
- Enforce systematic capability (SC): Apply the development rigor, reviews, and techniques required for the target SIL to avoid systematic faults.
- Maintain full traceability: Link every safety requirement to design, verification, and validation evidence in the safety case.
- Plan proof testing: Define and document proof-test intervals and coverage, as they directly affect low-demand PFDavg.
- Engage independent assessment: Schedule functional safety assessment with the appropriate level of independence for the SIL.
Conclusion
IEC 61508 transforms safety from an afterthought into a measurable, engineered property. By combining quantitative failure targets (PFDavg and PFH), architectural constraints (HFT, SFF, Route 1H/2H), and systematic process rigor across the full safety lifecycle, it provides a defensible path to certifiable safety integrity.
For VLSI and embedded teams, the practical takeaways are clear: invest in diagnostic coverage, document failure-mode analysis through FMEDA, and treat the safety lifecycle as a first-class part of the design flow. Doing so not only satisfies certification bodies but produces genuinely more robust industrial control systems.
Vcores offers functional safety IP cores and engineering services, including FMEDA, diagnostic-rich lockstep and ECC architectures, and SIL/ASIL documentation support to accelerate your IEC 61508, IEC 61511, and ISO 26262 certification efforts.