Does SIL live up to expectations?

Author: Roger Short
Day: Aspect Day One

Safety Session

The relationship between unsafe failure rate and SIL has always
been somewhat equivocal. On one hand the relevant standards
insist that SIL is concerned with systematic failures, whose
occurrence is inherently unquantifiable, especially where software
is concerned. On the other hand the standards contain tables
which align tolerable functional failure rates with the
corresponding SIL to be attributed with regard to systematic
failures. This leads to the tacit expectation that if, for example, a
system is developed according to the requirements for SIL4,
unsafe functional failures due to design errors in software or
hardware will manifest themselves at a rate in the region of 10-8
to 10-9 per hour.For years it was possible to regard this as an
abstruse academic problem, since empirical confirmation of such
performance would require evidence of hundreds of thousands of
equipment-years of operation. However, there are now in service
in railway signalling and train control applications tens of
thousands of individual items of equipment claimed to meet SIL4
requirements. They include interlockings, radio block centres, axle
counters, ETCS on board units and wayside and on board CBTC
subsystems.When making assessments of overall railway system
risk it is usual to assume that the contribution to risk resulting
from unsafe failure of SIL4 subsystems or components is zero.
There is a need to consider whether this assumption continues to
be justified for applications such as major CBTC or ETCS Level 2
schemes which may involve hundreds of individual SIL2 units.This
paper will discuss the feasibility of estimating the actual unsafe
failure rate, including systematic failures, of the current
population of SIL4 units, taking account of all the gaps and
uncertainties in the relevant data. The validity of the very notion
of a numerical value for the rate of systematic failure of a large
population of disparate products will be critically examined. In the
course of this analysis the distinction between random and
systematic failures will be challenged, and it will be argued that all
failures have both systematic and random aspects.A major
problem for determining actual failure rates for high integrity
systems is the sparsity of data relating to systematic failures. The
paper will look at the mathematical techniques available for
handling sparse data, and will also consider suitably conservative
assumptions to make in the absence of data.