Keywords: Requirements, Scenarios, Exceptions, Search, Dissonance
There is an important interface between requirements and safety engineering, but its very existence causes a problem: it belongs to neither discipline. This paper looks at identifying failure modes from a requirements perspective, and considers how the two disciplines can most effectively collaborate.
Requirements techniques have evolved rapidly in the last 10 years from the traditional hit-and-miss list of “shall” statements to a mix of approaches that can be applied systematically, including the analysis of goals and scenarios.
Many of the failure modes hardest to identify derive from the increasingly wide range of functions carried out within systems, mostly by software. Fortunately, these functions are designed to satisfy requirements, often both discovered and organised using scenarios. Hence, scenario-directed search offers a powerful and systematic means of discovering functional failures. But ‘dissonance’, failure caused by undesired interactions within systems, presents a deeper problem.
Safety engineering is one of several inter-related disciplines within systems engineering, and one of several interdependent streams of activity within projects. These streams must feed into each other for projects to work efficiently. One such connection is the way that failure modes are identified from requirements work.
The quality of the subsequent safety analysis will be no better than the quality (completeness, correctness) of the requirements work on which it is based.
This paper describes requirements techniques that identify failure modes early in system development, and considers the adequacy of these techniques.
Scenarios, especially as use cases, are replacing traditional functional requirements (but not qualities and constraints, eg dependability and interface requirements) in current practice.
Most of the concepts described are not new. Indeed, all human planning, all human thought, is based on the stone-age technique of thinking out scenarios “around the camp fire”, imagining hazards, and coming to grief on hoof, horn, or tusk only in one’s mind and to the laughter of one’s hunting partners, rather than in cold grey reality the next morning.
The overall RE process involves eliciting, prioritising, formalising and validating requirements. Eliciting is a complex activity conducted by engineers and system stakeholders. It discovers and plays back candidate requirements using many techniques including interviewing, scenario workshops, and prototyping.
We will use the following terms. A Stakeholder is a person or legal entity with a valid interest in a system. A Goal is a result (however vague) desired by a stakeholder. A Scenario is a story-like sequence of system interactions or Steps meant to achieve a goal. A Requirement is a verifiable request for a feature or quality of a system.
Scenarios define the context of functions, and so are ideal for discovering functional failure modes. A scenario-based approach is clearly most useful when innovation creates new types of system (or greatly extends old systems), and hence introduces the risk that traditional hazard identification, eg based on lists of known hazard types, may fail. That risk is most acute with undesired interactions, so-called ‘dissonance’. A wider range of techniques will be needed to discover those before they occur.
The remainder of this paper is structured as follows:
Section 2 lists requirements-based search techniques.
Section 3 considers how requirements can help.
Section 4 illustrates the search for dissonance, using an automotive example.
Section 5 looks at related work.
Section 6 discusses some conclusions.
Several search techniques based on requirements practice are possible in system projects.
a) Identify system stakeholders.
b) Elicit system goals from stakeholders.
c) Decompose goals down to functions.
d) Elicit scenarios to attain each functional goal.
e) Identify exception events which could block each step of those scenarios.
f) Decide if those exceptions could affect safety.
g) Define (exception-handling) scenarios to handle those exceptions.
h) Define each scenario step as a (possibly safety-related) functional requirement.
This general scheme
permits systematic search:
for all goals
for all scenarios
for all steps
identify exceptions,
and decide if those affect safety.
However, “Identify exceptions” is itself a broad task.
A general-purpose set of techniques for identifying exceptions, in the absence of more specific knowledge, involves:
Asking “What could go wrong here?” at each step.
This can be done in a focussed, systematic Workshop by people knowledgeable in the domain.
Such a workshop can be held by identifying and inviting a balanced team of Stakeholders [3] (an “Integrated Project Team” or IPT).
That is a proven approach, but involving everyone on an IPT in the analysis of every scenario step (as implicitly advocated by Maiden [8]) may not be cost-effective.
The following techniques therefore begin with step (2.1e), refining the search for exceptions, and thus for failure modes.
a) Define the type of interaction in each scenario step (from 2.1e).
b) Map a library of generic or domain-specific Exception Classes on to the interactions, (eg in a human-human interaction, misunderstanding, disagreement and other exceptions are possible).
c) For each scenario step, ask stakeholders to confirm if the predicted kinds of exception could happen, and if so, how these should be handled (including whether they need to be handled): this generates exception-handling goals and scenarios.
d) Proceed from step 2.1f.
Interaction-directed search can be seen to be an intensive, systematic, recursive search through the whole hierarchy of system scenarios, both normal course and exception-handling.
Dependability is here used to mean the class of required qualities dealing with the ability of a system to continue functioning in the presence of problems of any kind. It includes safety, security, reliability, survivability and similar qualities. Dependability threats can stem from human or inanimate causes.
a) Identify failure cases that could threaten system goals. These can often be localised to specific scenarios or scenario steps (from 2.1e).
b) Identify exception events at which such failure cases could be detected.
a) Identify safety hazards that could be caused by those failure cases (dependability threats).
b) Identify exception-handling scenarios to handle those events.
The Misuse Case technique (with synonyms such as Abuse Case) [1, 11, 12] is a variant of this, aimed mainly at Security threats. It adds a search for negative stakeholders (hostile roles of any kind) to help discover intentional threats. These include most if not all security threats, which in turn lead to safety and survivability concerns.
The analysis of dependability threats can discover hazards caused by external events, whether intentional or not, (eg attacks, bad weather) and failures caused by internal system events (functional failures, feature interactions, etc).
An important element in the failure/misuse case technique is precisely that it is not specifically scenario-directed. Rather, it is stakeholder- or goal-directed (operating at the level of whole Use Cases, not scenario steps: a use case’s title is its functional goal). Failure/misuse case-directed search for security threats or for failure modes can thus be independent both of traditional techniques and of scenario-directed search.
A scenario-based approach closer to traditional (Functional) Hazard Analysis, Hazard Class-directed Search, can achieve a similar result.
a) Define the type of interaction in each scenario step (from 2.1e, and as in 2.2a).
b) Map a library of generic or domain-specific hazard classes on to the scenario steps and their interactions.
c) Identify exceptions, relevant hazards and exception-handling scenarios from steps 2.2c onwards.
From the point of view of safety engineering, the interest in requirements-directed search processes lies in whether they are capable of discovering merely the same safety issues as eg traditional HAZOP work, or whether they can find more by coming at the problem from other directions.
It is not the intention here to suggest yet another classification, but several broad categories are relevant:
i. Failure of essential functions, eg a car’s steering, leading directly to safety effects (up to catastrophic).
ii. External threats and hazards, such as intentional attack and bad weather. These may affect specific functions, or may have multiple effects on systems.
iii. Failure of interactions between parts of the system, including decision-making roles (whether human or machine: there are many possible categories here). This “dissonance” or dysfunctional interaction between system components is an important, complex, and still poorly-handled source of system failures.
Scenarios offer a systematic way of identifying type (i) functional failures, as proposed by Allenby & Kelly [4], assuming the architecture is known. (Contrary to dogma, few requirements can be written in the absence of some knowledge of the design or “domain”.) The method is essentially to apply some form of the “what can go wrong here?” question to all steps. But failures with a single functional cause form only part of the search space.
Type (ii) hazards can be discovered by thinking creatively through scenarios. For well-understood domains there will be few new (external) hazards to discover.
For new domains, the technique offers a way to discover possible hazards, independent of traditional safety engineering techniques. Even if no better than traditional means, it may discover different subsets of the possible hazards. But the technique is at best only partially systematic.
Type (iii) “dissonance” is of special interest. Dissonance is becoming understood as a significant problem, both for correct system functioning in the presence of increasing numbers of processors, and for safety.
Dissonance has a literal engineering meaning: the undesired oscillations between coupled bodies
“caused when two major systems resonate at similar natural frequencies” [15].
This can cause “unpleasant vibration” and possibly mechanical failure. Here we are concerned with undesired interactions of any kind between features or subsystems, including humans participating in system operations.
One class of dissonance is known by telecommunications engineers as “feature interaction” [9], where functional features like “call waiting”, “call redirect”, and automatic answering could conflict. Unfortunately, dissonance is not limited to interactions between intended system features.
Let us consider an automotive example. According to Houdek and Zink at the DaimlerChrysler Research and Technology Centre, Ulm, with whom the author has been collaborating, the Mercedes-Benz S-class contains
“more than 50 Electronic Control Units (ECUs) embodying more than 600,000 lines of code. These are connected by three buses with several hundred bus messages. There are gateways between the buses, and functions are heavily interconnected. Via wireless interfaces, subsystems can connect to external systems…” [7].
This is evidently an environment in which many undesired interactions are possible. Any such failures which affect control of the vehicle, either electronically or via the driver, could cause an accident (by creating hazards at the system boundary).
Dissonances are side-effects. Like drug side-effects, they may be hard to predict, and hard to discover by testing. In the shape of “feature interactions” they are a recognized problem in telecommunications systems (where correct functionality rather than safety is the main concern: in domains where safety is involved, the two are closely intertwined) [9]. Side-effects are awkward, both because they are unexpected, and because there is a very large search space if any two (or more) of a large set of functional “features” can interact adversely.
Several observations can be made:
Purely functionally-oriented techniques (scenario or otherwise) will certainly not discover dissonance, simply by virtue of focussing too narrowly on single functions.
Scenario-directed search can discover dissonance by at least 2 routes:
o Systematic search (as in section 2.2);
o Creative identification of failure/misuse cases (as in section 2.3)
Let us consider these 2 routes in turn.
The basic approach is to scan through directly-participating roles in each scenario step to identify expected types of interaction, and hence expected classes of exception. Unfortunately, stray interactions from other functions are not considered explicitly.
For example, the driver of a car may need to attend to an engine warning, but his attention may be distracted by e.g. the traffic situation outside the car, by a conversation inside the car, or by music, navigation, or telephony using features of the car’s integrated telematics system.
Such problems are caused ultimately by conflicts between valid goals for the car, such as to provide music and to control the car. These both contribute to ‘make the car attractive’, which contributes to the commercial goal ‘sell cars’. But the conflicts must be identified and mitigated. Problems can be identified by a stakeholder team workshop given the task of answering the question
Q1: “What could go wrong when the car gives an engine warning to the driver?”.
A stock reply for machine-to-human communication:
A1: “The <human> driver is too distracted to notice the <message> engine warning from the <machine> car”
can be proposed automatically as a possible problem. However, additional creativity is required to translate that into specific safety requirements.
Identification of dissonance is possible with misuse cases. Given answer A1, a misuse case workshop can ask:
Q2: “What could cause the driver to be too distracted to notice an engine warning?”
The workshop needs to combine creative thinking with a measure of systematic search. A brainstorming session can yield candidate answers to question Q2 such as:
Children. Other passengers.
Anything on the road. Cars. Trucks. Puddles. Rain. Accidents on the other carriageway. Advertising hoardings. Pedestrians.
Anything in the driver’s head. The driver’s boss. The business meeting he’s about to attend. His wife.
Anything in the car. Drinks. Sandwiches. Biscuits. Reaching over to the glove-box. Things that rattle or make a noise. Wasps. Music.
Anything specifically in the Telematics system. CD player. Telephone. Navigation. Radio. Other caution and warning messages from the car.
From this wealth of creative suggestions, the workshop can home in on factors under system control, and perhaps work to mitigate some that are not. Clearly other features in the Telematics system could be important distractions (and telephoning is inherently risky for drivers).
For example, CD and Radio can be muted for an audible engine warning, but should the Telephone be? The driver is already busy driving as well as giving attention to the telephone conversation. If the engine is about to seize for lack of oil, then the engine warning must take immediate priority: other warnings might be delayed for a few minutes until the telephone conversation ends.
A plausible
scenario (perhaps it is a Misuse Case) has Radio interrupted at intervals by
Traffic Warnings, voiced Navigation commands, a Telephone conversation, and an
Engine Warning all operating at once; conceivably there could be a telephone
Call Waiting as well. A hierarchy of priorities is evidently necessary in this
situation.
Requirements
Technique |
Search Type† |
Effort* Needed |
Failure Modes
Discovered |
Use Cases (scenarios + exceptions) |
S, |
µ
N´E |
Functional Failures, with single causes |
Misuse / Failure Cases (with brainstorming) |
Mainly C |
As for use cases |
Any, but in practice mainly Functional with few causes |
Exception Classes |
S |
µ N´C |
Functional, with 1 or 2 causes (eg human-machine interaction) |
Feature List (needs other techniques) |
n/a; (if reuse, then S) |
n/a |
n/a; (if reuse, then all known failure modes) |
Prioritisation (of Features) |
S |
µ N |
Functional, with 2 causes (may also prevent undiscovered problems) |
Feature Interaction Matrix |
S |
µ N(N-1)/2 |
Functional, Dissonance with 2 causes |
? |
? |
³
N(N-1)/2, |
Dissonance with Multiple causes |
Table 1: Requirements Techniques to Discover Failure Modes
†where S=Systematic, C=Creative
*where N=number of functional
steps, E=number of exceptions, C=number of relevant exception classes.
Such priorities might need to be dynamic, eg an engine warning’s priority could rise with time. The resulting list effectively creates a large number of pairwise rules for handling possible conflicts on the driver’s auditory channel.
Better, once the team is aware of the possibility of conflicts of this kind (eg, the radio shall be played at the level set by the user, except when…), there is no further need to discover dissonances of this particular class creatively. Systematic prioritisation of all demands for the driver’s auditory attention is then sufficient.
The problem is rather the possible existence of unsuspected classes of dissonance. Known possibilities include problems like electromagnetic interference and contention on a data network. In software especially, it is hard to demonstrate that the myriad rules and switches will not sometimes interfere dangerously with each other.
For example, in the Jaguar XJ, S-type, and XK models, it was discovered that if oil pressure in the transmission system became very low, a warning light would come on, and “In this condition it is possible that reverse [gear] could be selected by mistake”, according to a Jaguar spokesman. 68,000 vehicles were recalled. A pairwise analysis of all feature-to-feature interactions – already a sizeable task – could probably have prevented this result. However such a feature interaction matrix approach would only address failure modes with exactly 2 causes.
Unfortunately, failure modes can arise from multiple causes including software, physical factors, electronics design, and the environment. Given that development engineers have worked to mitigate simple hazards over the years, the remaining hazards will often be complex. A further battery of techniques is therefore needed to detect such dissonance.
The opportunities for dissonance rise much faster than the number of features: at least with N(N-1)/2 for pairwise effects, and faster for more complex ones. Since the number of system features is also rising faster than linearly with time in automotive and other domains, dissonance will become a major problem unless it is adequately controlled. Given this combinatorial explosion, reuse of past experience (eg history of accidents) becomes a poor guide to the future: more and more dissonance hazards will become possible.
To sum up, requirements techniques can help to identify many kinds of safety issue (Table 1). They can help even with dissonance, but as yet not always systematically.
Scenarios have always been used by planners to identify possible threats. The currently fashionable way to organise scenarios in system development, the Use Case, derives from a proposal made by Jacobson [8] and greatly elaborated by Cockburn [6]. Jacobson essentially suggested a simple sequence of steps. Cockburn transformed this into an analytic software specification structure, with among other features discrete “Extension” scenarios (variously known as Alternative Paths/Courses, etc), launched when “Extension Conditions” were met. Alexander [2] reframed this in system engineering terms, with Exception-handling Scenarios launched when Exception Events occurred.
Misuse Cases were initially proposed by Sindre & Opdahl to help surface Security threats and hence Security-handling requirements [12]. McDermott & Fox similarly proposed “Abuse Cases” [11]. Misuse Cases have been applied by Alexander to discover and validate a range of problems, and to analyse the trade-offs involved in mitigating those [1].
A similar technique which could be called ‘Failure Cases’ was proposed for Safety requirements by Allenby & Kelly [4]. This formed a component of their systematic approach to Functional Hazard Analysis, corresponding to “Dependability Threat-directed Search” (see section 2.3 above).
An equally systematic application of scenarios to discover Exceptions using Exception Classes is described by Maiden [10]. This is the basis of the “Interaction-directed search” described here, though without special emphasis on safety.
Requirements stem ultimately from people – system stakeholders. Goals offer an alternative route for discovering requirements, eg Antón & Potts [5]. A taxonomy of stakeholders can, suggests Alexander, serve as a template for discovering stakeholders and their goals [3]. Cockburn notes however that use case titles are functional goals [6].
That systems and safety engineering are closely related should not be a surprise. Requirements “engineering” – a set of overlapping techniques for discovering, validating, and structuring requirements, starting from an undefined footing – is itself a component of systems (and software) engineering.
Functional requirements have for too long been treated by some authors (specially software developers) as if they were the primary or even the only targets worth considering.
However, requirements engineers and product managers know that the core needs on which people choose eg cars are not functions but qualities such as reliability, comfort, elegance, sporty performance, and indeed safety. These act as top-level system goals. When decomposed into verifiable (often functional) goals, they profoundly influence specifications for essentially every function and component.
Safety, requirements and other system disciplines can be seen as aspects: cross-cutting points of view on systems. “Safety requirements” are both causes and effects of systems engineering. Quality, goals, requirements, functions, design features, hazards – not to mention security policies, ergonomics, marketing, even politics – are intimately intertwined in system development.
The goal of systems engineering is “Coping with complexity”, as the title of a well-known textbook has it [13]. A goal of safety engineering in the same vein might be “Coping safely with complexity”, or indeed “Coping safely with systems engineering”. Dissonance is a perfect example of the unwanted effects of complexity. A combination of structured and creative scenario techniques helps to identify classes of dissonance. It looks as though a battery of new techniques could be devised to discover dissonance, starting from these foundations.
Discovering and mitigating the problems caused by the ways systems currently work is not the only system-wide option for improving safety. An alternative is to redesign systems to work in ways that are inherently safer. In the automotive domain
“Driving accidents usually have both a human cause and a human victim. To certain engineers – especially those who build robots – that [has] an obvious solution: replace the easily distracted, readily fatigued driver with an ever attentive, never tiring machine.” [14]
On the other hand, automation has its ironies, à la Beryl Bainbridge. Human adaptability prevents most system hazards from becoming accidents. In other words, you can choose your failure modes, just as you can choose your requirements. Of course, they go together.
I would like to thank Carl Sandom for reading this paper and guiding me in the use of safety terminology. The remaining errors are of course mine.
[1] Ian Alexander. Initial Industrial Experience of Misuse Cases in Trade-Off Analysis, Proceedings of IEEE Joint International Requirements Engineering Conference, Essen, pp 61-68, 9-13 September 2002.
[2] Ian Alexander & Neil Maiden. Scenarios, Stories, Use Cases, John Wiley, 2004.
[3] Ian Alexander. A Taxonomy of Stakeholders, International Journal of Technology and Human Interaction, Volume 1, 1, pp 23-59, 2005.
[4] Karen Allenby & Tim Kelly. Deriving Safety Requirements Using Scenarios, Proceedings of the 5th International Symposium on Requirements Engineering, pp 228-235, Toronto, Canada, 27-31 August 2001.
[5] Annie Antón & Colin Potts. The Use of Goals to Surface Requirements for Evolving Systems, Proceedings of the 20th International Conference on Software Engineering (ICSE`98), Kyoto, Japan, pp. 157-166, 19-25 April 1998.
[6] Alistair Cockburn. Writing Effective Use Cases , Addison-Wesley, 2001.
[7] Frank Houdek & Thomas Zink. Story Use and Reuse in Automotive Systems Engineering, in [2], chapter 16.
[8] Ivar Jacobson et al: Object-Oriented Software Engineering: A Use Case Driven Approach, Addison-Wesley, 1992
[9] Evan H. Magill. Feature Interaction: Old Hat or Deadly New Menace? Chapter 13 in Service Provision, John Wiley, 2004.
[10] Neil Maiden. Systematic Scenario Walkthroughs with Art-Scene, in [2], chapter 9.
[11] John McDermott & Chris Fox, Using Abuse Case Models for Security Requirements Analysis, 15th IEEE Annual Computer Security Applications Conference, pp 55-66, 1999.
[12] Guttorm Sindre & Andreas L. Opdahl. Eliciting Security Requirements by Misuse Cases, Proceedings of TOOLS Pacific 2000, pp 120-131, 20-23 November 2000.
[13] Richard Stevens et al, Systems Engineering, Coping with Complexity . Prentice-Hall, 1998.
[14] W. Wayt Gibbs, Innovations from a Robot Rally, Scientific American, pp 50-57, January 2006.
[15] World Auto Steel, ULSAB Structural Performance, http://www.worldautosteel.org/ulsab/OverviewReport/ULSAB_Performance.pdf. Downloaded February 2006.
More Papers, Consultancy and Training on
Ian Alexander's Home Page