J. Robert Taylor Engineering Systems Division

Download 0.97 Mb.

View original pdf

Date	05.03.2020
Size	0.97 Mb.
	#54400

RISK-18-1110 AuthorProof

Document Outline

J. Robert Taylor
Engineering Systems Division,
Department of Management Engineering,
Technical University of Denmark,
Building 424,
Kongens, Lyngby 2800, Denmark e-mail: roberttayloritsa@gmail.com
Organizational Failure Analysis for Industrial Safety
Organizational and management errors and failures represent a very significant causal influence in industrial accidents. A method, organizational failure analysis (OFA) is described for in-depth identification of organizational deficiencies and failures that can lead to accidents. The method was developed on the basis of an extensive data collection from safety management audits, accident and incident investigations, and emergency training exercises led by the author over a period of 26 years. From these data, a number of models of organizational performance and management behavior are derived. The models allow a semi-automated application of the method, which is important for the application to large organizations. The method is described with examples, and the results of several studies aimed at validating the method are given.
[DOI: 10.1115/1.4044945]
Keywords: human error, management error, organizational failure, safety management audit
Introduction
Management and organizational factors are identified as con- tributory causes in the majority of industrial accidents. In a review of major accidents from the authors own experience in the chemi- cal, oil, and gas industries, 77% of the accidents involved manage- ment errors and omissions and 35% involved organizational deficiencies. Of 22 accidents with fatal consequences, all but one had a management error or an organizational failure as a direct cause or major causal influence [
1
]. Failures were mostly those of omission, but about 25% of the management errors were those of commission. (These statistics were taken from follow-up studies of 103 risk analyses and safety management audits made by the author or his team over a period of 40 years. Examples in this paper are taken from these follow-up reviews or from the audit reports, unless otherwise stated.)
Failures can arise in organizations themselves, rather than the managerial errors. The interactions between two workers can fail,
and a work team can fail because of a misunderstanding or poor team coordination. Also, the organization can be defective in not providing a communication mechanism. As an example of a defi- cient organization, two gas companies were separated only by a chain link fence. A team from one company was replacing a valve. They vented the gas from the short unrelieved pipe section.
The gas contained hydrogen sulfide. A team in the other company were testing and calibrating instruments nearby, and were exposed to the gas. There was no organizational mechanism to ensure coordination between work teams in the two companies. As another example, in the Flixborough accident, a major explosion occurred because a plant pressure piping modification was made without the post of plant mechanical engineer being manned [
2
].
As an example of a defective organizational structure, a major process engineering company audited had a well-developed man- agement of change system to cover design changes made during plant commissioning. However, a safety management audit led by the author revealed that this was only used for costing and charg- ing for changes. There was no safety assessment of any design changes.
Of course, it is possible to recognize management and organiza- tional deficiencies in almost any accident. It is the nature of acci- dent investigation to seek to prevent similar accidents in future,
and therefore to devise methods to do so, including those of organizational change. To prevent this study from devolving into a search into an ever-expanding set of problems, with an ever expanding definition of organizational failure, the term safety- related organizational failure is taken here to mean a deviation from the requirements of U.S. Code of Federal Regulations (CFR)
Part 1910.119, process safety management of highly hazardous chemicals. The regulation is here applied to a full range of acci- dent scenarios, not just to the highly hazardous ones.
This paper describes a method aimed at identifying the poten- tial organizational failures and managerial errors. It seeks to ana- lyze the potential as close to the root causes as possible in order to enable preventive measures to be proposed. It is primarily based on direct observations in the oil, gas, and chemical industries from safety management audits. Examples of application of the method and validation studies are described.
There are few accidents directly caused by management failure,
as can be seen by reviewing the accident investigation reports published by the U.S. Chemical Safety Board [
3
]. Where such accidents do occur, it has generally been due to managers issuing a direct order or a prohibition in a hazardous situation. For the most part, organizational failures and managerial errors have adverse effects by causing operator, maintenance, and other work errors. For this reason, organizational failure analysis (OFA) will only be able to cause accidents by functioning as error-inducing or error-forcing events and conditions at the hands-on level of plant or system operation (Fig.
1
). For this reason, OFA will only be meaningful as an adjunct to operator or maintenance error anal- ysis, and these in turn depend on plant and control risk analyses.
Motivation—Why Make Organizational Failure
Analyses?
Because organizational failure is so central to accident causal- ity, it is of practical interest to be able to understand how manage- rial and organizational failures occur, and to determine ways in which the errors and failures can be prevented. Most of the meth- ods developed for organizational failure analysis have, however,
been focused on general understanding after accidents or predic- tion in order to make risk analyses more complete.
It would be logical to include organizational failure into risk analysis methods. However, outside the field of nuclear power,
even operator and maintenance error analysis are not included in standard procedures such as [
4
] despite the methods for operator error analysis being well developed and validated [
5
]. Attempts were made by some to include a “management factor” into risk
Manuscript received November 30, 2018; final manuscript received September 2,
2019; published online November 14, 2019. Assoc. Editor: Raphael Moura.
ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems,
Part B: Mechanical Engineering
MARCH 2020, Vol. 6 / 011006-1
Copyright
V
C
2020 by ASME

analysis by Hurst et al. [
6
]. However, from professional experi- ence as a third-party reviewer of risk analyses, this approach was not well liked by the managements paying for the analyses. Also,
the results were often rejected by authorities as too likely to be subject to change.
The area where organizational failure assessment can definitely be used is in safety management auditing as is shown by the examples in the sections on validation and reflections below. Such audits are standard practice in many companies and generally involve a team which include a leader from an external organiza- tion, and may include members from other divisions in the same organization. The team performs a review of the organization, its work procedures, the actual performance of work, and documenta- tion of work done. An audit generally produces a report, and a presentation is made which should include recommendations for improvement. The work generally requires a very diplomatic approach in order to achieve success in improving safety. Also,
for success, the audit process requires strong support from the most senior management.
This work is intended to support safety management auditing and identification of possibilities for risk reduction by organiza- tional improvement.
Earlier Work—Safety Management Auditing
An approach to organizational and management safety prob- lems arose through the 1970s, for use in loss prevention. The international safety rating system (ISRS) is a systematic approach to safety developed by Frank Bird. His approach is an audit method which includes study of the management [
7
] and is offered to companies as an auditing service by DNV-GL. It includes many of the issues described in this paper, but has a much wider scope and less depth (259 audit issues). ISRS has been applied for a large number of companies worldwide.
An extended audit checklist for organizational and management phenomena affecting risk is given as Appendix D of the API Pub- lication 581 risk-based inspection base resource document [
8
].
This also gives a scoring method giving a factor which can be used to modify a risk analysis value for the physical level. The method is based on engineering judgment rather than objective evidence.
A more theoretically bases approach to auditing was developed by reason in the form of the TRIPOD method [
9
]. “Tripod-
DELTA is a checklist-based approach to carrying out safety
“health checks” in process plant. The issues addressed are:
hardware,
design,
maintenance management,
procedures,
error enforcing conditions,
housekeeping,
incompatible goals,
organization,
communication,
training, and
defenses.
Quoting from the manual [
10
], “Tripod delta is a scientifically developed and proven means of measuring performance and determining which areas of the business are vulnerable to incidents.” Comprising of a database of 1500 questions, it does this by asking companies to answer a random selection of 275 of these. Each question is about the occurrence of an unsafe act, and when responses are examined, it is possible to determine which basic risk factors—, i.e., which organizational issues—the organi- zation is performing well in and where improvement is needed,
based on answers to these questions.
This method is wider in scope than that of this paper, though not as focused on organizational error. An important feature taken from the method [
11
] is to regard accidents as:
arising from fallible decision and omissions at the manage- rial level, leading to latent failures,
leading to line management deficiencies,
leading to precursors (forcing and inducing conditions) for unsafe acts,
leading to unsafe acts, and
leading to accidents if defenses are inadequate (which can also result from errors at the management level).
The tripod-delta method has been developed further by Gibb and coworkers [
12
] in the form of the incident causational matrix
(ICAM) method, for application to transport organization.
Fig. 1
Overall analysis model with the topic of this paper highlighted
011006-2 / Vol. 6, MARCH 2020
Transactions of the ASME

Reason has also written two books on organizational failure with many examples of organizationally caused accidents [
13
,
14
] which have proved useful in checking the completeness of the work here.
Earlier Work—Organizational Factors in Risk Analysis
There has been extensive work on incorporating organizational factors into risk analysis. Only a few of them have direct relevance to the oil, gas, and chemical industries, not due to any deficiencies in the methods, but due to the fact that current guidelines for risk analyses in these industries do not require any human or organiza- tional component (see, e.g., Ref. [
8
]). An important exception is the
Norwegian work summarized as follows (Refs. [
15
–
17
]).
Organizational influences on operator error were incorporated into Swain and Guttman’s technique for human error rate prediction
(THERP) method as performance-shaping factors [
15
]. Organiza- tional factors such as degree of training, quality of administrative controls, team size, workload, staffing level, and communications were included as mathematical factors (“performance-shaping factors”) used to multiply baseline operator error probabilities.
A tradition arose during the 1990s for improving human error analysis for risk analysis purposes. Most of these either included or focused on organizational issues.
Embrey developed the MACHINE method for incorporating organizational factors into probabilistic safety assessment [
16
]. The method uses influence diagrams to support quantification of policy deficiencies on error-inducing factors and the influence of error-inducing factors on human (operator) error. The error-inducing factors are [inadequate] training and experience, distractions, proce- dures, fatigue, workplace environment, responsibilities and supervi- sion, and the policy deficiencies recognized as [inadequate] project management, safety policy, safety culture, risk management, design of instructions, training policy, and communication systems.
Devoudian et al. [
17
] developed the work process analysis model for analyzing the impact of organizational phenomena on safety. The method takes a conventional process plan fault tree as a starting point, and for each cut set, selects candidate parameter groups which affect the basic events in the cut set. Note that each cut set, in combination with the fault tree top event, defines an accident or a system failure scenario. The candidate parameter groups are failure to restore (RE), miscalibration (MC), unavail- ability due to maintenance (UM), failure to function on demand
(FR), common cause failures not due to human error (CCF), and time available for recovery (TR). The candidate parameter groups depend on organizational factors:
Centralization
Organizational learning
Communication—external
Ownership [of issues or problems]
Communication—interdepartmental
Performance evaluation
Communication—intradepartmental
Personnel selection
Coordination of work
Problem identification
Formalization
Resource allocation
Goal prioritization
Roles and responsibilities
Organizational culture
Technical knowledge
Organizational knowledge
Time urgency
Training
Ratings for each of these factors are used to adjust basic event probabilities. The success likelihood index (SLIM) method [
18
] is used to estimate conditional probabilities based on expert judgment.
Cooper et al. developed a techniqe for human event analysis
(ATHEANA) method [
19
,
20
]. The error mode identification in this method is failures to respond correctly to emergency simula- tions, rather than errors during normal plant operation. The causal mechanisms are very similar to those in Rasmussen’s human error taxonomy [
21
]. There is an emphasis, however, on quality of training, quality of procedures, time pressures, workload, crew dynamics, and resources available, i.e., on organizational influen- ces. The method also distinguished between error-inducing condi- tions and performance-shaping factors.
Mohaghegh et al. [
22
] described the fundamental principles of modeling for use in organizational failure, setting important prin- ciples for relations between causal factors in a model. They then presented a method, socio-technical risk analysis which relates safety culture, organizational structure, safety culture, safety atti- tudes, and safety performance.
One of the problems in developing analysis methods for organi- zational failure is that of obtaining field data which is adequate to support and validate the method. Norwegian teams developed the barrier and operational risk analysis (BORA) [
23
,
24
] and risk- organisational human technical (OMT) methods. Risk-OMT [
25
]
uses the fault tree event tree methodology to model accident event sequences and causal Boolean networks to model managerial and organizational risk-inducing factors. In applying the method to off- shore maintenance, the team made use of actual incident (leakage)
data from the Norwegian offshore industry. They also carried out interviews of person involved in maintenance and studied surveys reported to the Petroleum Safety Authority of Norway from 2002
onward. The quantity of data is such that conditional probabilities could be derived. The risk-inducing factors considered were those identified as important to the offshore maintenance tasks and the occurrence of leaks. The factors are:
Factor group
Factor
Management competence
Competence
Management information
Disposable work descriptions
Governing documents
Technical documents
Management technical
Design
Human machine interface (HMI)
Management general
Communication
Management task
Supervision
Time pressure
Workload
Work motivation
Work very similar in purpose and evidence-based methodology were carried out by Pence et al. [
26
]. They describe a method for causal modeling and a way of data mining from large quantities of textual data in incident reports.
A review covering organizational factors in human reliability analysis was published by Alvarenga et al. [
27
]. The review dis- tinguishes between systemic methods (ones which model the organizational system) and ones which are primarily causal factor analysis without any underlying system and provides an in-depth review of the most widely used methods.
Earlier Work—Hazard and Operability Analysis -
Based Methods for Organizational Problem
Identification
Kennedy and Kirwan [
28
] developed a method called safety cul- tural hazard and operability analysis (HAZOP) in order to address the managerial contribution to accident causality. They used a mod- ification of the traditional HAZOP guide words “missing, skipped,
mistimed, more, less, wrong, as well as, other” to the parameters
“person/skill, action, procedure/specification, training, information,
resources, detail, protection, decision, control, communication.”
The resilience-based integrated process systems hazard analysis
[
29
] is a method based on HAZOP which integrates at the plant equipment level with failure analysis of management functions. It especially focuses on resilience methods such as plasticity, early detection, error tolerant design and recoverability in order to miti- gate or prevent accidents.
Earlier Work—Systems Modeling
More recently, i.e., since the late 1990s, another tradition has arisen with focus on identification and solving of safety problems in organizations, rather than support for risk assessment.
ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems,
Part B: Mechanical Engineering
MARCH 2020, Vol. 6 / 011006-3

Systems theoretic process analysis (STPA) [
30
] is a method for describing organizations as a nested set of control systems. Func- tional failure analysis based on control system concepts is then applied in order to identify possible function failures or devia- tions. The consequences of failure are then evaluated by failure effect propagation tracing in the same way as is done in failure mode and affects analysis (FMEA), HAZOP, or cause conse- quence analysis. The results in terms of failure mode and signifi- cant consequences are tabulated, and proposals are made for prevention or mitigation. The functional failure modes considered in STPA are “not providing a function,” “providing a function,”
“incorrect timing or sequence,” “stopped too long,” and “provided too long.” In the Leveson and Thomas’s latest version [
31
], no attempt is made to investigate the causes of functional failure modes. STPA has been enormously successful in providing safety analyses for a wide range of systems in process plants, medical,
military, and space systems.
Hollnagel’s functional resonance analysis method (FRAM)
[
32
,
33
] was developed as a method for describing the complex organizational interactions in an organization which can lead to accidents. It focuses especially on the fact that small deviations from nominal behavior can accumulate and interact (Hollnagel uses the term resonance) in order to produce large accidental con- sequences. The start of an analysis is a functional description of an activity, preferably as it is carried our rather than as it is imag- ined by the analysts or the managers. The analysis then involves a functional failure analysis taking into account the inputs for a function, the resulting output, the dependency on preconditions,
on resources, on controls, and on timing and sequence. The method has been widely applied.
Functional resonance analysis method is an important develop- ment in functional failure analysis, quite apart from its value as a method for organizational analysis. First, it recognizes the impor- tance of multiple small deviations with cumulative effects. This differs from most analysis methods such as FMEA, functional failure analysis and fault tree analysis which regard failures as all or nothing events. Second, it incorporates preconditions, resour- ces, controls and timing factors into functional failure analysis,
providing significant additional guidance in finding causes of functional failure. These concepts can be applied in technical sys- tems failure just as well as in organizational failure.
Another publication of note is “reasons managing the risks of organizational accidents” [
25
], which gives a broad description of organizational defects.
Organizational Failure Analysis
The origin of this work was the development of the action error analysis method for operator errors [
34
]. This method uses an extended functional failure analysis to identify error modes (see
Table
1
) and an error causal analysis for error mechanisms based on Rasmussen’s skill-rules-knowledge model of operator perform- ance [
35
,
36
]. This method was validated qualitatively by using it in the design of a small chemical plant [
37
], and then following incidents in the plant over a number of years, with reasonable agreement between the predicted errors and actual near misses observed.
Application of the action error analysis method to oil, gas and chemical plant, military command and control systems, and the international space station over the following years necessitated extending the operator error causal mechanisms to communica- tions in work groups and to management commands [
38
]. Collec- tion of human error probability data to support the method was carried out by review of 103 risk analyses performed by or carried out by the author over a period of 36 years. A review was also made of accidents which subsequently occurred in these plants [
5
].
Many of the operator error causes which were found important in this study were actually due to error-forcing conditions arising from management error and organizational deficiencies. This work can be regarded as an extension of the work on action error analysis. The method has been used for oil, gas, and chemical plants and for aerospace and military systems analysis. It has been reasonably widely applied in Scandinavia. Action error analysis is important as a background to the following organizational failure analysis and involves:
(1) establishing the series of steps in an operating procedure,
either from written procedures or by observation,
(2) applying a rather long series of action error modes (includ- ing normal action in the presence of latent hazards) to each step to derive the starting points for analysis,
Table 1
Functional failure modes for action error analysis [
5
]
Deviations
Functional failure mode
Action
Check
Decision
No function
Failure to act
Failure to check
No decision
Partial function
Action incomplete
Partial check
Inadequate function
Too little action
Excessive function
Too much action
Inadequate function
Too little action
Imprecise function
Imprecise action
Wrong direction
Spurious (unwanted) function
Unintended or unwanted act
Decision when none required
Premature function
Action too early
Premature check, check at wrong time
Premature decision
Delayed function
Action too late
Delayed check
Delay in decision
Wrong sequence
Action in wrong sequence
Forget then remember function later
Forget then remember action
Unwanted repeated function
Duplicated action
Wrong function
Wrong action
Wrong check
Erroneous decision
Wrong parameter value
Wrong action value
Check with wrong value, wrong criterion
Wrong value in choice
Wrong object of function
Action on wrong object
Check of wrong object
Correct function but precondition not satisfied
Correct action with latent problem or hazard
Wrong setup for check
Decision made ignoring important preconditions
Correct function but with latent hazard or deviation or unwanted side effect
Correct action with latent problem or hazard or unwanted side effect
Check made with unwanted side effect of checking
Decision made ignoring latent hazard or side effect
011006-4 / Vol. 6, MARCH 2020
Transactions of the ASME

(3) for each action error determining the series of events which could result from the action error,
(4) determine the possibilities for recovery or mitigation of the error, and
(5) for those action errors leading to accidents or other signifi- cant effects, determining the possible error-forcing or error- influencing conditions from the error cause taxonomy.
Organizational failure analysis extends action error analysis into the higher levels of operations management and organization.
The purpose of organizational failure analysis is to allow deep causes of organizational failure at the psychological and social levels to be identified. As a starting point, a description of the organization is needed as a basis for the analysis. The methods for representation used are organization organograms, functional block diagrams, or procedural flowcharts.
The organizational failure analysis procedure is then:
(1) From interviews and organizational documents and proce- dure description, create a functional block diagram and if necessary, procedural flowcharts for the organization.
(2) Select parts of the organization which are particularly sig- nificant for safety.
(3) Confirm the block diagram and the procedures for the orga- nization by direct observation, by review of organizational products such as work orders, completion records, quality control reports and safety inspection reports, and especially by interviews with workers focused on any difficulties they may have.
(4) Revise or annotate the block diagrams or flowcharts.
(5) Perform a functional failure analysis or action error analy- sis for the parts of the organization selected for analysis.
The checklist of functional failure modes is given in
Table
1
(6) Trace the consequences of the functional failures using conventional disturbance propagation analysis procedures,
together with a modification for small deviations described as follows.
(7) For those functional failure modes which have significant consequences, such as accidents, identify organizational failure causes using the checklist of organizational failure phenomena as in the Appendix.
(8) Identify safeguards which can mitigate or prevent the con- sequences of the functional failures or which can reduce the effects of organizational failure causes.
(9) Report the results and any recommendations for risk reduc- tion and add these to the risk management action list.
Linear and Nonlinear Accident Models
One criticism which has been made of hazard identification methods is that they are “linear,” while actual accidents are often
“nonlinear.” By linear is meant that the accident scenarios mod- eled start with an initiating event and proceed through a sequence of intermediate events to a significant consequence. Along the path, the scenarios may involve the failure or failed state of one or more safety barriers. Linear accident scenarios can be described graphically using the “Swiss Cheese” model of reason [
11
].
This criticism is not appropriate for all methods. The fault tree analysis method was never “linear.” The HAZID-bow tie method requires fault tree like branching for every safety barrier, to record the potential causes of safety barriers failure, the safety critical activities which are intended to assure safety barrier integrity, and the deficiencies which can arise in the safety critical procedures.
Nevertheless, the criticism is justified, in that none of the standard risk analysis methods address interactions across an organization,
with interference between apparently unrelated functions.
The usual functional failure analysis procedure follows intended functional paths and identifies deviations along these paths. This in turn means that it does not identify many accident types of the typed called “normal accidents” by Perrow [
39
], or of a steady drift into poor performance as described by Dekker [
40
].
A modification to the procedure which allows analysis of
“nonlinear accident event sequences” is to record side effects of failures and errors, continuously through the course of an analysis,
and to take these into account once the primary analysis has been completed, by extending the search for causes to include these side effects.
The previously-mentioned procedure does not identify the effects of multiple small deviations from standard procedures of the kind identified as important by Hollnagel [
33
]. For example when ana- lyzing an emergency organization, delays in response may not just be large ones which prevent an effective response. They can be small ones at different stages in the emergency response which col- lectively can result in disaster. Failures, errors, and relatively innocuous “happenstances” can interact to cause an accident.
As an example, consider one accident investigated by the author in which a team was performing maintenance inside a dis- tillation column. They were using breathing air supplied by hose from a battery of compressed air bottles. Part way through the task, additional complexities arose, prolonging the maintenance task. A message was sent to fetch a new air cylinder battery. How- ever, three things occurred. First, one of the air-lines in use by another team began to leak, increasing air demand. Second, there were two additional persons performing the work. There was no
“breathable air watchman” to monitor air use, as this was not a standard practice for the company. At the same time, the truck bringing the air battery was delayed due to fallen scaffolding blocking the road. Air ran out and the team had to be rescued by emergency responders using self-contained breathing apparatus.
This kind of problem can be addressed by extending the range of consequences in an analysis to include small deviations in the consequence analysis. This can be very difficult to carry out man- ually, but the semi-automated methods described below can make analysis practical.
Organizational Failure Phenomena
Various organizational phenomena leading to deviations from standard operating procedures were identified from 32 safety man- agement audits led by the author over a period of 30 years for plants in the oil gas and chemical industries. The companies var- ied in performance from that of high-integrity organizations to a few which had decidedly poor performance. The audits were gen- erally extensive, involving up to 10 persons and lasting from one to several weeks. Also included were observations from accident investigations. Some examples from these studies are illustrated in case histories in Ref. [
5
].
Any empirical study of this kind will produce a large number of organizational failure phenomena, and any checklist will be very difficult to administrate. The list was therefore structured accord- ing to the influences imposed on the organization, using the ideas drawn from Leveson and Thomas [
31
]. The model used is shown in Fig.
2
. The structured list of organizational failure phenomena is shown in the Appendix. Note that STPA is not the only possible structuring, but is convenient when there is a hierarchical organi- zation. Other organizational structures are described as follows.
The organizational phenomena listed in the Appendix may be direct causes of accidents, but they may also be “causes of causes.” Figure
3
gives an example of a chain of causes identified during accident investigation for a permit to work (PTW) system involving a permit to work department.
Actually, using this method requires some industrial experience on the part of the analyst. The requirement is minimized by provi- sion of a guide book which explains the individual terms and gives examples.
Organizational Structures
The most obvious organizational structure to consider in an organizational failure analysis is that of the formal management hierarchy. There are, though, other structures that provide an
ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems,
Part B: Mechanical Engineering
MARCH 2020, Vol. 6 / 011006-5

informal organization parallel to the management hierarchy. One type is that of cliques, which are groups of persons with tight social links, generally with each member having some form of relationship to all others. These can be social links such as clubs,
religious ties, nationality, and can also be work related ties, such as cross-departmental projects [
41
]. Cliques can have a very bene- ficial effect, allowing easy communication, division of labor and mutual support at work in a cohesive group. Cliques can also be pernicious, in propagating erroneous beliefs, and in nepotism, for example, during safety management audits, it is sometimes possi- ble to identify such cliques.
Another structure found to be important is that of a functional or logistic chain. These are subgroups, not necessarily in the same organization, each of which carries out a part of a task. An
Fig. 2
Organizational failure influences
Fig. 3
Causal influence chains of organizational function deviations leading to a failure of the permit to work systems
011006-6 / Vol. 6, MARCH 2020
Transactions of the ASME

example is supply of spare parts, which requires scheduling,
ordering, payment, supply, import, transport, warehousing, and distribution. Organizational failure can occur in any step of the chain, and also in the feedback along the chain needed for clarifi- cations and for correction of any deviations such as substitution for out of stock items.
Emergent Phenomena in Organizations
The term emergent phenomena has many different meanings,
for example, in the mathematics of complex systems and in the biology of exotic diseases. Here, emergent phenomena are defined as those which emerge from a system without arising from any part of the system alone, but as a result of interactions between parts. Examples identified from in the safety management audits are:
overload, log jam, deadlock affecting resources, and exces- sive demand on budgets or equipment,
Organizational drift and decay of functions,
myth generations and decision making on the basis of myth,
channeling and keyholing of activities, for example, by demands on an understaffed IT department due to expanding needs from multiple departments or sudden problems,
chasing the latest fad at the expense of considered decision,
overpromising and under-budgeting,
scope creep,
chasing key performance indices (KPIs) at the expense of balanced performance,
chasing quarterly performance bonuses,
staffing policies which do not take human and knowledge capital into account, and
unmotivated efficiency rounds and staff reduction.
There are undoubtedly many more such phenomena. In order to identify such phenomena, OFA needs to look at the organization as a whole and not just at individual functions.
Processing Audit Data
In order to make systematic use of audit data, the accident anat- omy method was used [
42
]; incidents were described in the form of cause consequence diagrams [
43
] with the sequence of events in any accident or near miss arising included. This allows much more information to be gathered than, for example, simple tabula- tion of incident types. This allows several cause–consequence event pairs to be identified in any incident scenario. At the same time, it allows identification of parallels between different event subsequences incidents at the detailed level.
In many cases, the diagrams needed to be extended by adding the accident potential up to the final consequence (damage, injury,
fatality). Then similar accident types were grouped according to accident type and the groups consolidated into one cause conse- quence diagram for each group. Causal influences which could have but did not lead to incidents were marked onto the diagram,
along with any recovery measures taken which prevented inci- dents. Relative importance for the different event subsequences and causal influence were then simply obtained by counting the number of occurrences. An example is shown in Fig.
4
. The case was the explosion of an “empty” gasoline storage tank when very hot oil was pumped to it from a fluid catalytic cracker. The explosion did not cause injury but it did spread heavy oil across the manager’s car park. The figure is simplified for presentation here and is just one of a large dataset expressed in event language (ELAN) (see the
Autmation of the Method section).
This data analysis technique can be used for determining proba- bilities of causal links if the actual frequency for just a few incident types can be found (anchor points) and then the other frequencies inferred from the relative number of occurrences. Anchor points have been found for just a few incident types of managerial failure.
A complete database of operator error probabilities was built up using this approach [
5
]. It is worth noting that when this is done,
the cause consequence diagrams are populated with conditional probabilities and the cause consequence diagrams become mathe- matically equivalent to causal Bayesian networks [
44
].
Fig. 4
Example of a single scenario analysis with management deficiencies marked
ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems,
Part B: Mechanical Engineering
MARCH 2020, Vol. 6 / 011006-7

Automation of the Method
The work in OFA can be quite onerous. If the organizational failures are to be related to specific accident potential, the work is equivalent to about twice that needed for a traditional HAZOP
analysis.
The HAZEX program [
45
] was developed as a tool for record- ing the results of HAZOP, FMEA, functional failure analysis, and action error analysis studies. It includes facilities for semi- automated analysis in which the program asks questions about the plant to be analyzed and then proposes causes and consequences of failures and operational upsets. It asks what safeguards exist and proposes further safeguards if those existing are inadequate.
The data for the program are drawn from just over 100 earlier risk analyses and from a database of earlier accidents.
The database was extended to cover organizational failure analy- sis by adding the OFA checklists and causal relations as tabulated in the Appendix to the HAZEX database. The motivation for this was to allow a validation of the method while minimizing any bias in validation of the method which may result from the analysts’
experience. This is necessary when validating a method because an experienced analyst can often incorporate more experience into results than that which arises from the methodology alone.
The causal structure for the organizational phenomena is an important part of the HAZEX database. The structure describes the cause and effect relationships between the management and organizational phenomena, operator error, and consequences in the physical system (as in Fig.
3
). The individual causal relation- ships are expressed as statements in the ELAN language [
45
] of the form event; condition; condition
) event; event; event
The event on the left-hand side is the causal event, and the effects are dependent on zero or more conditions. The event or events on the right-hand side are the effects. Such statements can be chained by matching output events in one functional block to input events in the next so as to construct a complete sequence of events form- ing an accident scenario. Side effects can be identified by follow- ing all events on the right-hand side of the above-mentioned event transfer function, and multiple causes can be found by tracing causal chains for the conditions on the left-hand side. The method of chaining together event sequences can be applied for the physi- cal system (plant), the control systems, the operators, and the management levels. Algorithms for this are given in Ref. [
45
].
The HAZEX program allows actual case histories and accident reports to be retrieved rapidly in order to support the judgments on relevance and importance of failure phenomena identified by the algorithms.
Table
2
shows some of the questions which the software may ask concerning organizational failure. As can be seen, the ques- tions are designed for use during safety audits. In all, over 300
questions are answered by the analyst, but most of these need to be posed only once for any analysis.
Modeling to Support Automated Analysis
The list of failure and error phenomena in the Appendix is evi- dence driven, coming from just over 2400 observations in 32
safety management audits. Classification of the observations though requires an underlying model in order to achieve consis- tency. Good models also allow a degree of prediction for new interaction possibilities.
The starting point of modeling for the method here was Ras- mussen’s stepladder skill-rules-knowledge (S-R-K) model for operator performance and error. Industrial application of the model quickly showed the need for model extension to allow for communication between persons, and the value of Rasmussen’s variant in which knowledge was explicitly registered as a func- tional module supporting diagnosis, decision making, and plan- ning. Rasmussen himself proposed use of multiple S-R-K
stepladder models to represent levels of organization (in internal working notes).
Industrial application to management error also necessitated model extension to cover a much wider and deeper modeling of activity and function types than those included in the S-R-K
model, such as checking, goal setting, policy definition, priority setting, leadership, guidance, training supervision, manning, etc.
The list is long. The need for more detailed models also arose in order to be able to capture details in the audit observations.
The raw material for an OFA analysis is a functional block dia- gram of the organization as it works normally. The notation for the diagram could be simple blocks with inputs and outputs, but there are advantages in using the structured analysis and design technique (SADT) addition of controls and resources [
46
]. The actual diagramming used is often an extension of the FRAM nota- tion [
24
]. The form of the functional blocks is shown in Fig.
5
Of course, block diagrams of this type could be developed with an ever increasing number of types of links (interactions and inter- ference) between blocks, so each link type should be justified.
Input and output are obviously necessary, and control and resour- ces, from SADT, have proved very useful through the years. Holl- nagel uses TIME to describe time pressure influences, but as a link it is useful to represent to represent task interactions arising from races and delays. Precondition links are from the FRAM
Table 2
Examples of the questions posed by the HAZEX analysis support tool
Are there personnel problems with the work?
Guidance on answering the questions during interviews
Is there frustration with the work which could lead to impatience and unapproved procedures or bypassing of steps?
This is difficult, can only be determined after establishing a good working relationship with personnel. Save all questions until relaxing over coffee,
and preferably no questions at all.
Do the workers feel that there are stupid rules for working?
This can be determined by discussion with workers about how to improve rules.
Are there problems of communication intended to ensure job protection?
Discuss procedures with key workers. If there is reluctance this may be job protection, or it may indicate a need for industrial secrets protection.
Is there distrust of management?
Information of this kind is difficult to obtain explicitly unless a good rap- port is built up. It can be determined from the tone of discussions.
Are there cases of counterproductive blame?
Inspect accident and near miss reports and dismissals.
Fig. 5
An OFA functional block
011006-8 / Vol. 6, MARCH 2020
Transactions of the ASME

notation and are useful to represent the links between earlier steps in a task, and functions in another task, which are preparatory.
FEEDBACK links are used to represent monitoring aspects of control, as in STAMP [
31
]. ACCESS links are probably specific to maintenance and construction activities, and are useful for rep- resenting spatial interference between tasks. KNOWLEDGE is included because lack of knowledge is important in a large frac- tion of organizational failures, but also because the communica- tion of erroneous information between tasks represent a common cause of failure. SIDE EFFECTS are important for several rea- sons. Activation of latent hazards can occur in a normal functional sequence, but can also arise as a side effect of normal functioning.
Side effects can also cause interference between unrelated tasks,
such as the release of toxic materials to a drain while others are working close by. SIDE EFFECTS often link to INTERFER-
ENCE, either directly or by creating adverse latent conditions.
The arrangement of the block diagrams can vary. In many cases, a management hierarchy is suitable, as in STAMP, but in many cases an SADT arrangement is preferable, particularly for detailed submodels.
A model corresponding to seven cases similar to that in Fig.
4
is shown in Fig.
6
A part of the ELAN model for “determine transfer route” is
NORMAL FUNCTION
IN -> ROUTE REQUEST
¼> OUT -> ROUTE SELECTED
FAILURES AND ERRORS
IN -> ROUTE REQUEST, ERRONEOUS KNOWLEDGE
¼>
OUT -> INCORRECT ROUTE SELECTED
PIPING
RECENTLY
MODIFIED,
KNOWLEDGE
NOT
TRANSFERRED TO MIMIC DISPLAY
¼>ERRONEOUS
KNOWLEDGE
ENGINEERING DEPARTMENT OVERSIGHT
¼> KNOWL-
EDGE NOT TRANSFERRED TO MIMIC DISPLAY
GAP IN ORGANIZATION NO UPDATE SOP
¼> KNOWL-
EDGE NOT TRANSFERRED TO MIMIC DISPLAY
The full model for this function is 22 transfer function ele- ments, and for the full task over 200 lines. Fortunately, it is possi- ble to build up a library of functional and functional failure models, so that modeling effort only needs to be made once [
45
].
Quantification
Deriving tables of probabilities for organizational failure and management error is difficult in the oil and gas industry. The num- ber of occurrences of the different error and failure modes, and the number of causal mechanisms can be counted. For example, at least one case of each of the causes listed in the Appendix was found. However, the number of opportunities for error and failure is very difficult to determine. It would require detailed recording of managerial activities over a period of years. This means that probabilities per managerial action are not generally available.
In the absence of a full set of conditional probabilities, the rela- tive importance of the different organizational and managerial failures is determined according to the number of cases arising in the organizational failure database (at present just over 2400
observations), together with the severity of the potential accidents which could be caused and 91 accidents which were actually caused. This was made easier in actual applications because extensive major hazards risk analyses were available for all of the plants studied.
Quantitative risk analysis is possible in special cases when the organizational tasks are highly standardized and where the impor- tance of the failures justifies the effort. It is relevant to do this when the cost or difficulty of introducing risk reduction measures is high. This was done, for example, in two of the audits used as a background for this development, once to help determine the level of staffing needed in the permit to work organization and once to help determine the risk involved in running formal training in par- allel with on the job training for new maintenance technicians.
For quantitative calculations, the different causes of organiza- tional failure and management error are regarded as independent error-forcing or error-inducing conditions, similar to that in the operator error analysis method ATHEANA. The contributions from each cause are combined using fault tree calculation meth- ods, using OR logic for the different causes. Very often, for exam- ple, an organizational weakness does not have a significant consequence until adverse conditions arise, and AND logic is used to combine the probability of weakness or failure with the probability of the adverse conditions. One such combination which occurred several times in incidents recorded in the audits studied is the case of operators who have not completed training working under a lead operator recently transferred to a new unit and facing an unexpected plant disturbance. This case involves two normal deviations from ideal performance combined with a not unusual plant state deviation.
Fig. 6
Model for petroleum products tank yard transfer management
ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems,
Part B: Mechanical Engineering
MARCH 2020, Vol. 6 / 011006-9

Examples of Application and Validation of the Method
The demonstration application of the method was for a PTW
system, one well known to the author. The purpose of a permit to work system is to ensure that when work, particularly construction or maintenance work is to be carried out, the conditions for safe work exist. The method involves the supervisor of the work teams filling out a detailed questionnaire and performing or updating a task risk assessment. The PTW application and task risk analysis or job safety analysis are submitted to a permit to work officer,
and this officer checking both the application and the work site.
An area or overall work authority (a person) checks the permit to work application and ensures that all applications are compatible and do not involve simultaneous operations problems. The super- visor then instructs his or her team, and posts the permit to work so that it can be seen by all. The permit is usually approved for one shift or one day. A site safety officer helps to ensure that new hazards do not arise during the course of one day. A block dia- gram for the PTW organization is shown in Fig.
7
. It is in STPA
format, as is natural for this case.
An example of the results of the analysis is shown in Table
3
In the analysis, 57 error or failure modes were identified, and typically 3–7 significant organizational failure causes for each failure mode. The analysis was checked against 17 cases described in Trevor Kletz’s book “What went wrong” [
47
]. All of the 17 cases were included in the predictive analysis.
A second test case was a management of change system for engineering modifications. The procedure for this included nine steps. The analysis was carried out manually by students as a case study in a short course at the Master’s level. Forty-eight causes of organizational error with potential accidents as a result were iden- tified. These were compared with cases described by Sanders [
48
].
The analysis covered 80% of the cases in this book.
The method was applied using the HAZEX program and generic
ELAN models to the organization of plant isolation for mainte- nance. This is a procedure in which pipes are sealed off so that per- sons cannot begin to open manholes or flanges without hazardous substances entering the plant section concerned. It was similarly applied to the process for gas testing before vessel entry, for lock- out and tag-out for work on electrical systems, and for emergency response in the case of hazardous materials release. Again, the results were checked against the case histories for this task given by Kletz [
47
] and by Sanders [
39
]. All of the incidents recorded for this task in these references were predicted automatically.
A fourth analysis was for an emergency response procedure for an offshore oil production facility. This was observed during a full-scale exercise involving several hundred persons, with both prestationed observers and video recording. The results from this test are not so clear as from the others. All of the deviations from the written procedure which were observed were also predicted by automatic analysis. However, there were also deficiencies in the written procedures which were overlooked.
Risk Reduction
The main objective in using OFA has been to aid in proposals for risk reduction. Associated with each operational failure cause is a list of possible risk reduction measures. Many of these are specific to the individual causes, but some generic methods are given in Table
4
Observations and Conclusions
The cases studied indicate that the method is applicable and pro- duces useful results. It revealed some important results, for example:
The most senior management of companies are usually the only ones with authority to call on mutual aid or external aid in an emergency, but these persons are generally the least trained and are sometimes untrained. As a result, they get overinvolved in details of response and often do not carry out their overview and contingency readiness functions.
In PTW systems for large projects, the workload on the pro- ject safety authority is excessive, and this interferes with per- formance. However, splitting the work between persons introduces problems of coordinating simultaneous opera- tions. Improved procedures are needed.
The function of gas testing which is needed before entry into confined spaces may be subject to delays, and this can lead to frustration in work teams.
The work described here is a first step in developing organiza- tional failure analysis. Because the method is based on empirical observations it is unlikely to be complete. Further work is needed to identify possible gaps or weaknesses. Such work is currently being performed in order to determine the applicability of the methods in other areas of industry and other activity types.
Reflections
The method described here resemble several others derived from the risk analysis tradition, and especially there is a structural similarity to the work of Sklet et al. [
24
,
25
] and a similarity in data processing to that of Pence et al. [
26
] The main differences here are that the risk-inducing factors used are based on direct observation in safety management audits and incident investiga- tions, and there is a large difference in the degree of detail. The question arises—Is it necessary to investigate in in the high level of detail described here?
From 1978 through to 2014, the author led a total of 105 risk analyses, mostly of large oil, gas, and chemical plants, with just four of mechanical production processes. Subsequently, 92
follow-up studies were made up to the end of recording in 2016.
For the analyses carried out up to 1994, there were subsequently a total of 20 major hazards accidents arising from management fail- ing to implement recommendations, with a total of 164 fatalities to follow [ ]. The companies involved were obviously well inten- tioned (they paid for the analyses), and some could be classified as high-safety integrity companies. We had naively assumed that all that was needed for accident prevention was a good risk analy- sis and a good set of well-presented recommendations.
The management errors were in one case due to simple refusal to implement PTW (refusal to consider implementing PTW
because of lack of space for a PTW office on the platform com- plex) where the accident was directly the result of lack of PTW
and caused 11 fatalities. One was a case involving a very long communications chain (4 companies, 11 departments). All the others were due to postponement of implementation followed by forgetfulness. After 1994, the problem was recognized, and we made a change in the approach to result presentation approach. The potential accidents were illustrated with case histories and photo- graphs of relevant earlier accidents, and the risk reduction recom- mendations were accompanied by conceptual stage designs for the safety measures. Since 1994, only one recommendation failed to be implemented, and only three further fatalities occurred. The actual residual risk for plants analyzed after 1994 was more than an order of magnitude lower than for those analyzed prior to 1994.
The relevance of this history for OFA was that all of the acci- dents involved managerial omissions and less than adequate per- formance at the detailed level, such as inadequacies in the plant section isolation procedures (preparing for maintenance). The deficiencies were not those of generally poor safety management.
The companies had in all but three cases good safety management.
The problems arose in nearly cases at the detailed level due to spe- cific deficiencies.
Whether detailed analysis, as described here, is needed, or whether just a few key management performance factors can adequately reflect the risk is still an open question. However, there is enough evidence to make detailed management and organiza- tional failure analysis a worthwhile precautionary measure.
The other question arising is whether the analysis is truly neces- sary. Could we not just implement all of the safety management
011006-10 / Vol. 6, MARCH 2020
Transactions of the ASME

Fig. 7
A block diagram for a PTW system
ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems,
Part B: Mechanical Engineering
MARCH 2020, Vol. 6 / 011006-11

recommendations arising from questions such as those in Table
2
above, without any analysis?
From our own experience of application of the method since
2010, the analysis is very useful. If carried out generically, it can serve to improve understanding and the quality of auditing. When applied to a specific plant, it can serve to prioritize recommenda- tions. Prioritization is often needed. Over the years, we have found that presenting senior managements with risk reduction plans involving more than ten issues is counterproductive (When there are more than recommendations ten recommendations they need to be packaged into groups). The analysis allows prioritiza- tion of recommendations to be made objectively. It also allows the recommendations to be related to earlier accident reports from around the world.
To support this view, the findings from three safety manage- ment audits were studied, and the source of the recommendations was determined. In a safety management audit, three types of review can be made [
49
]:
review of safety documentation such as safety policy, train- ing records, management of change records, etc.;
review of organizational procedures completeness and good working;
In field mechanical integrity audit (this is a general audit of equipment and operation);
The methods described here supplement these approaches,
allowing detailed organizational assessment. Results from three audits are given in Table
5
The results in Table
5
do not provide an absolute measure of the success of the method, because the relative importance of the rec- ommendations is not known, and because results will differ from company to company. The results do show, however, that the method has some value in supporting safety management audit.
Further Work
Current activities include documenting the data collection for this study in a form that can be used by other researchers, expand- ing the dataset beyond the oil gas and chemical industries, and applying the method to the design and construction stages. There is also a need to investigate the degree to which the method and causal factors as described here is sufficiently complete.
Whether sufficient field data can be obtained to allow the method to quantify probabilities “per opportunity,” for example,
per operation is an open question. Calculation on the basis of fre- quency per company per year is already possible, but quantification
Table 3
An extract from the PTW system of a showing analysis of part of one function
Agent
Failure mode
Cause
Consequence
Safeguards
PTW officer
Fails to perform PTW
application check
Work overload
Some aspects of the work may be unchecked and defective.
Area authority performs overall site check.
–Insufficient manning
–Spike in the workload due to new problems
Site safety officer may observe the hazardous condition.
Possible accident
–PTW officer mistake
–Chaotic filing in the
PTW office
Fails to perform site check
Work overload leads
PTW officer to sign off without site check.
As above
PTW office signs off a condition on the form incorrectly
Lack of knowledge
As above
Improved training.
–Insufficient training
–Poor recruiting or promotion policy
Wrong area checked
Lack of knowledge
As above
Good plot plan on wall display.
–Poor plot plan
Periodic audit of PTW
system performance.
–Error in markup of the plot plan by the supervisor
–Plot plan for work not updated
Cross-check by PTW
applicant required.
–Poor supervision
Lack of situational aware- ness by management
–Complacency
Delay in checking
Work overload
Supervisor becomes tired or waiting, commences work without PTW
Site safety officer checks for lack of PTW or out of date PTW
Table 4
Examples of risk reduction measures
Problem
Risk reduction measure
Organization structure problem
Redesign of the organization, new functions, integrations of func- tions, standardization of procedures
Lack of knowledge
Training, high quality manuals, videos, “canned” tool box talks
Overloading
Reassessment of workloads.
Identification of peak loading during periods of high demand and in response to emergencies or unusual events. Devise a supplemen- tary staffing strategy.
011006-12 / Vol. 6, MARCH 2020
Transactions of the ASME

Table 5
Sources of significant recommendations in three safety management audits
Approach to identifying issues and recommendations
Plant type
Document review
Procedure audit
Mech. integrity
OFA
Refinery
22 4
41 7
LNG plant
7 1
23 21
Gas treatment and NGL plant
14 1
32 11
per task type requires more data. Extending the data collection to include publicly reported accidents, such as those reported in loss prevention bulletin, may provide a way forward.
The most important question for the intended use of the method is whether the OFA approach with a relatively theoretical back- ground can provide an improvement when compared with tradi- tional methods for audit support such as ISRS, tripod-delta, and incident causational matrix. It is also important to know whether the introduction of recent concepts such as a system theoretic approach, nonlinear accident scenarios, side effects and task interference and emergent hazards has a significant impact on results. Comparative studies applying several methods to a refer- ence example are planned.
Acknowledgment
The author would like to thank Dr. I. Kozine for useful comments and help in writing this paper, and in support during student exer- cises in applying the method for student risk analysis case studies.
Also, thanks are due to the audit team members and the students,
who provided useful comments on the application of the method.
Appendix
Organizational failure potential causes.
Organizational failure group
Organizational failure cause
Possible effects
Organization
Problematic goals
Undesired function
Conflicting goals
Conflicting functions, intermittent or inconsis- tent functions
Missing function
Function not performed
Under-resourced function
Poor performance of function
Misunderstood function
Poor performance of function
Outdated or declining function
Poor performance of function
New and still learning function
Poor performance of function
Duplicated or overlapping function
Possible chaotic functioning, interference,
blockage
Competing functions, uncoordinated function- ing, rivalry
Possible chaotic functioning, interference,
blockage, “stealing resources”
Gaps between functions, unclear responsibilities
Necessary functions not performed due “not my responsibility” effects
Degraded function
Function gradually becomes substandard through uncorrected drift
Silo organization
Departments do not communicate and take decisions independently
Uncoordinated procedures
Multiple incompatible systems
Difficulties in information transfer. Communica- tion errors.
Multiple incompatible procedures
Errors on transfer of staff between departments
Leadership relations
Authoritarian
Poor error correction
Democratic
Paternal
Problematic if there are gaps in functioning
Professional
Generally ideal
Tight linked professional
Ideal, but can give group error
Absence of leadership
Loss of control, departments veer in their own chosen direction.
Attitude to leadership and employees
Respect
Good performance, initiative
Tolerance
Contempt
Poor performance, low initiative
Animosity
Hidden or overt disobedience
Decision making
Decisions made by work groups or the organiza- tion as a whole:
Under-informed decisions, poor feedback
Erroneous decisions
Lack of knowledge required for decision
Erroneous decisions
Erratic, inconsistent decision making
Erroneous decisions
Tunnel vision
Erroneous decisions
Fixation
Erroneous decisions
Over-hurried decision making
Erroneous decisions
ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems,
Part B: Mechanical Engineering
MARCH 2020, Vol. 6 / 011006-13

(continued)
Organizational failure group
Organizational failure cause
Possible effects
Competing opinions
Erroneous decisions
Animosity in discussions
Delays, lack of consensus, poor coordination
Feed back
Missing
Continued erroneous functioning
Incomplete
Continued erroneous functioning
Inaccurate
Erroneous decisions, incorrect functioning
Biased
Erroneous decisions, incorrect functioning
Fraudulent
Erroneous decisions, incorrect functioning
Safety attitude
Systematic and professional
Generally no bad consequences
Well-intentioned but uninformed
Lack of knowledge can cause accidents
Complacent
Management does not worry too much about safety, just provides
Macho
Risk taking, bypassing safety rules
In denial
Management demies that risks are real or significant
Dysfunctional
Management does not care Workers perform according to own convenience
Supervision
Strict and continuous
A degree of safety is provided in preventing ini- tiating events or in activating safety measures
Strict but sporadic or intermittent
Risk reduction in proportion of time present,
plus encouragement to staff
Absent
Encourages possibly harmful improvisation, no on the job training
Complacent
As above
Communication
Organization does not consider:
Simple forgetting to communicate
Function not carried out
Incomplete or erroneous circulation list
Function may be incomplete
Language problem
Erroneous function
Use of argot may be misunderstood
Erroneous function
Communication abbreviated
Erroneous function
Communication ambiguous
Erroneous function
Erroneous reference (to item name or number etc.)
Erroneous function
Model mismatch between sender and recipient.
Erroneous function
Wrong recipient
Erroneous function
Mixed or interfering messages
Erroneous function
Communication channel overloaded
Omission of function
Priority error in message processing
Omission of critical function
Communication channel breaks
Function not carried out or incomplete
Noise
Erroneous function
Deliberate misinformation
Erroneous function
Instructions
Organization does not prevent:
Communication error as above
Erroneous or incomplete function
Wrong instruction
Erroneous or incomplete function
Incomplete instruction list
Erroneous or incomplete function
Preconditions not specified
Erroneous or incomplete function
Erroneous information
Erroneous or incomplete function
Unreported essential background
Erroneous or incomplete function
Conflicting instructions
Erroneous or incomplete function
Work overload
Erroneous or incomplete function
Priority error in work processing
Erroneous or incomplete function
Knowledge
Essential knowledge not known, by work group or by the organization
Erroneous or incomplete or omission of function
–Basic physics or chemistry
–Engineering
–Equipment working (generic)
–Specific equipment knowledge
–Plant or installation knowledge
–Procedures and standard operating procedures
(SOPs)
Misteaching
As above
Mislearning
As above
Knowledge not remembered or not recalled, slip of the mind
As above
Knowledge which is inappropriate for the actual equipment or installation
Direct error, wrong functioning
011006-14 / Vol. 6, MARCH 2020
Transactions of the ASME

(continued)
Organizational failure group
Organizational failure cause
Possible effects
Situational awareness
Both individuals such as managers and the orga- nization as a whole can lack awareness
Loss of awareness due to concentration
Required response not made
No awareness due to tunnel vision
Required response not made
Fixation
Wrong response to situation
Loss of mode awareness
Inappropriate response
Poor mode display
Inappropriate response
Work situation
Organization does not prevent, or forces:
High workload
Errors or omissions in function
High peak workload
Errors or omissions in function
High work intensity
Errors or omissions in function
High work complexity
Errors or omissions in function
Low workload, boredom
Errors or omissions in function
Inadequate resources, poor resources
Errors or omissions in function
–Tools
Errors or omissions in function
–Materials
Errors or omissions in function
–Equipment
Errors or omissions in function
–HMI
Errors or omissions in function
–Access
Errors or omissions in function
Work environment
Organization does not prevent the effects of:
Noise
Errors or omissions in function
Interruptions
Errors or omissions in function
Distractions
Errors or omissions in function
Too high or low temperature
Errors or omissions in function
Too high or low humidity
Errors or omissions in function
Draft
Errors or omissions in function
Poor lighting
Errors or omissions in function
Rain
Errors or omissions in function
Wind
Errors or omissions in function
Snow
Errors or omissions in function
Ice
Errors or omissions in function
Exposure, height
Errors or omissions in function
Work demand
Organization does not consider:
Concentration
Requirements exceed capability
Precision
Requirements exceed capability
Speed
Requirements exceed capability
Balance
Requirements exceed capability
Strength
Requirements exceed capability
Personal size
Requirements exceed capability
Endurance
Requirements exceed capability
Organization induces:
Chronic work stress
Requirements exceed capability
Stress during an emergency
Requirements exceed capability
Work pattern
Organization does not sufficiently consider the effects of:
Shift work
Reduced reliability and speed
High work intensity
Reduced reliability and speed
Shift length
Reduced reliability and speed
Lack of sleep
Reduced reliability and speed
None workloads on time, e.g., transport
Reduced reliability and speed
Personal issues
Health
Reduced reliability and speed
Economy
Reduced reliability and speed
Family
Reduced reliability and speed
Frustration
Reduced reliability and speed
Self-confidence
Lack of confidence
Failure to perform function
Overconfidence
Performance without needed checks.
Risk taking behavior
Responsibilities
Unclear
Omission of function
Unspecified
Omission of function
Split responsibility
Omission of function
Inadequate support or backup
Inconsistent performance
Procedures
Nonexistent
Errors of omission and commission
Out of date
See Human error chapter.
ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems,
Part B: Mechanical Engineering
MARCH 2020, Vol. 6 / 011006-15

References
[1] Taylor, J. R., 2016, “Does QRA Help to Reduce Risk? a Study of 92 Risk Anal- yses Over 36 Years,” Chem. Eng. Trans., 48, pp. 811–817.
[2] Health and Safety Executive, 1975, “
The Flixborough Disaster: Report of the
Court of Inquiry,” HMSO, Richmond, UK.
[3] Chemical Safety Board, 2002, “This is the U.S. National Organisation Provid- ing in Depth Accident Investigation Reports for Industry,”.
[4] TNO, 1996, “Methods for the Calculation of the Effects of the Escape of Dan- gerous Materials,” Dutch Ministry of Labour.
[5] Taylor, J. R., 2015,
Human Error in Process Plant Design and Operations,
CRC Taylor and Francis, Boca Raton, FL.
[6] Hurst, N. W., Bellamy, L. J., and Geyer, T. A. W., 1990, “Organisational Man- agement and Human Factors in Quantified Risk Assessment, a Theoretical and
Empirical Basis for Modification of Risk Estimates,”
Safety and Reliability in the 90’s” (SARRS ’90), Water and Cox ed., Elsevier Applied Science, Amster- dam, The Netherlands.
[7] Bird, F., 1974,
Management Guide to Loss Control, Institute Press, Atlanta,
GA.
[8] API, 2000, “Risk Based Inspection Base Resource Document,” American Petro- leum Institute, Washington, DC, Standard No. API PUBL 581.
[9] Thomas, P.,Hudson, W., Reason, J. T., Bentley, P. D., and Primrose, M., 1994,
“Tripod Delta: Proactive Approach to Enhanced Safety,”
J. Pet. Technol.
,
46(1), epub.
[10] Tripod-Foundation, 2008, “Tripod Beta User Guide,” Stichting Tripod Founda- tion, London.
[11] Reason, J. T., 1990,
Human Error, Cambridge University Press, Cambridge,
UK.
[12] De Landre, J., Gibb, G., and Walters, N., 1983, “Using Incident Investigation
Tools Proactively for Incident Prevention,” Safety Wise Solutions Pty Ltd, US
Nuclear Regulatory Commission, Washington, DC,
https://www.asasi.org/
papers/2006/Payne_Stewart_Learjet_Investigation_De%20Landre_Gibb_Walters_
DOC.pdf
[13] Reason, J. T., 1997,
Managing the Risks of Organizational Accidents, Ashgate,
Aldershot, Hants, UK.
[14] Reason, J. T., 2015,
Organisational Accidents Revisited, CRC Press, London.
[15] Swain, A. D., and Guttman, H. E., 1983, “Handbook of Human Reliability
Analysis With Emphasis on Nuclear Power Plant Applications,” US Nuclear
Regulatory Commission, Washington, DC, Report No. NUREG/CR-1278.
[16] Embrey, D. E., 1992, “Incorporating Management and Organisational
Factors Into Probabilistic Safety Assessment,”
Reliab. Eng. Syst. Saf.
, 38, pp.
199–208.
[17] Davoudian, K., Wu, J.-S., and Apostolakis, G., 1994, “Incorporating Organiza- tional Factors Into Risk Assessment Through the Analysis of Work Processes,”
Reliab. Eng. Syst. Saf.
, 45(1–2), pp. 85–105.
[18] Embrey, D. E., Humphreys, P. C., Rosa, E. A., Kirwan, B., and Rea, K., 1984,
“SLIM-MAUD: An Approach to Assessing Human Error Probabilities Using
Structured Expert Judgment,” U.S. Nuclear Regulatory Commission, Washing- ton, DC, Report No. NUREG/CR-3518.
[19] Cooper, S. E., Ramey-Smith, A. M., and Wreathall, J. A., 1996, “Technique for
Human Error Analysis (ATHEANA),” U.S. Nuclear Regulatory Commission,
Rockville, MD.
[20] Forester, J. A., Bley, D. C., Cooper, S., Kolakzowski, A. M., Thompson, C.,
Ramey-Smith, A., and Wreathall, J., 2000, “A Description of the Revised
ATHEANA (A Technique for Human Event Analysis),” NUREG 1624, Rev 1,
US Nuclear Regulatory Commission, Washington, DC.
[21] Rasmussen, J., 1982, “Human Errors - A Taxonomy for Describing
Human Malfunction in Industrial Installations,”
J. Occup. Accid.
, 4(2–4), pp.
311–333.
[22] Mohaghegh, Z., Kazemi, R., and Mosleh, A., 1994, “Incorporating Organiza- tional Factors Into Probabilistic Risk Assessment (PRA) of Complex Socio-
Technical Systems: A Hybrid Technique Formalization,”
Reliab. Eng. Syst.
Saf.
, 94(5), pp. 1000–1018.
[23] Aven, T., Sklet, S., and Vinnem, J. E., 2006, “Barrier and Operational Risk
Analysis of
Hydrocarbon
Releases
(BORA-Release)—Part
I:
Method
Description,”
J. Hazard. Mater.
, 137(2), pp. 681–691.
(continued)
Organizational failure group
Organizational failure cause
Possible effects
Incomplete
Incorrect
Poor style, difficult to understand
Procedural drift
Incorrect procedure
Training
Nonexistent
Errors and commission in function
Incomplete scope
Omissions
Long training cycle, persons must function while waiting for training
Errors and commission in function
Training backlog
Poor style, low retention
Inadequate refreshers
Erroneous training
QA/QC
Inadequate, gaps
Mistakes and failures not caught
Limited scope
Mistakes and failures not caught
Under-dimensioned
Mistakes and failures not caught
Unreliable
Mistakes and failures not caught
Hidden problems
Mistakes and failures not caught
Cultural
Racial, religious or orientation prejudice
Tension and lack of cooperation
National and racial norms
Deviation from expected behavior, not necessar- ily bad, but can lead to mistakes
Violations
Violations can occur at all levels in an organiza- tion hierarchy:
Substance abuse
Errors and omissions
Hiding serious health problems
Slowness, unreliability
Smoking in hazardous areas
Fire
Lack of cooperation for job protection
Slow functioning, lack of knowledge for replacement staff
Intergroup rivalry misreporting and accusations
Tension and lack of cooperation
Work-to-rule
Delays in function
Rogue contractors
Theft, fraud, poor work, hazardous equipment or product
Fake parts or materials
Premature failure or failure to work on demand
Fake work
Omission of possibly necessary functions
Theft
Possible loss of safety equipment
Fraud
Inadequate equipment supplied
Sabotage
Direct cause of accidents
011006-16 / Vol. 6, MARCH 2020
Transactions of the ASME

[24] Sklet, S., Vinnem, J. E., and Aven, T., 2006, “Barrier and Operational Risk
Analysis of Hydrocarbon Releases (BORA-Release)—Part II: Results From a
Case Study,”
J. Hazard. Mater.
, 137(2), pp. 692–708.
[25] Gran, B., Rolf, B., Nyheim, O. M., Okstad, E. H., Seljelid, J., Sklet, S., Vatn, J., and
Vinnem, J. E., 2012, “Evaluation of the Risk OMT Model for Maintenance Work on
Major Offshore Process Equipment,”
J. Loss Prev. Process Ind.
, 25(3), pp. 582–593.
[26] Pence, J., Mohaghegha, Z., Ostroff, C., Keec, E., Yilmazd, Z., Grantome, R., and
Johnson, D., 2014, “Toward Monitoring Organizational Safety Indicators by
Integrating Probabilistic Risk Assessment, Socio-Technical Systems Theory,
and Big Data Analytics,” Conference Probabilistic Safety Assessment & Man- agement PSAM 12, Honolulu, HI, 2014, CreateSpace Independent Publishing
Platform, 2016.
[27] Alvarenga, M. A. B., Frutuoso e Melo, P. F. F., and Fonseca, R. A., 2014, “A
Critical Review of Methods and Models for Evaluating Organizational Factors in Human Reliability Analysis,”
Prog. Nucl. Energy
, 75, pp. 25–41.
[28] Kennedy, R., and Kirwan, B., 1998, “Development of a Hazard and
Operability-Based Method for Identifying Safety Management Vulnerabilities in High Risk Systems,”
Saf. Sci.
, 30(1998), pp. 2496–274.
[29] Jain, P., Rogers, W., Pasman, H., and Mannan, M. S., 2018, “A Resilience-Based
Integrated Process Systems Hazard Analysis (RIPSHA) Approach—Part II: Man- agement System Layer,”
Process Saf. Environ. Prot.
, 118, pp. 115–124.
[30] Leveson, N. G., 2013, “An STPA Primer Version 1”.
[31] Leveson, N. G., and Thomas, J. P., 2018, “STPA Handbook,” accessed, Nov.
30, 2018,
http://psas.scripts.mit.edu/home
[32] Hollnagel, E., 2004,
Barriers and Accident Prevention, Ashgate, Aldershot, UK.
[33] Hollnagel, E., 2012,
FRAM—The Functional Resonance Analysis Method:
Modelling Complex Socio-Technical Systems, Ashgate, Farnham, UK.
[34] Taylor, J. R., and Rasmussen, J., 1976, “Notes on Human Factors Problems in
Process Plant Reliability and Safety Prediction,” Risø National Laboratory,
Roskilde, Denmark.
[35] Taylor, J. R., 1994,
Risk Analysis for Process Plant, Pipelines and Transport,
Taylor and Francis—Spon, London.
[36] Rasmussen, J., 1974, “The Human Data Processor as a System Component.
Bits and Pieces of a Model,” accessed Nov. 30, 2018,
http://orbit.dtu.dk/en/
publications/
[37] Taylor, J. R., Hansen, O., Jensen, C., Jacobsen, O. F., Justesen, M., and Kjær- gaard, S., 1982, “Risk Analysis of a Distillation Unit,” Technical University of
Denmark, Lyngby, Denmark, accessed Oct. 3, 2019,
http://orbit. dtu.dk/files/
88560585/ris
[38] Hollnagel, E., Rosness, R., and Taylor, R. J., 1990, “Human Reliability and the
Reliability of Cognition,” Paper Presented at the Third International Conference on ‘Human Machine Interaction and Artificial Intelligence in Aeronautics and
Space, Blagnac, France, Sept. 26–28.
[39] Perrow, C., 1984,
Normal Accidents—Living With High Risk Technologies,
Princeton Univerity Press, Princeton, NJ.
[40] Dekker, S., 2011,
Drift Into Failure, Ashgate, Farnham, Surrey, UK.
[41] Tichy, N., 1973, “An Analysis of Clique Formation and Structure in Organ- izations,”
Administ. Sci. Q.
, 18(2), pp. 194–208.
[42] Bruun, O., Rasmussen, A., and Taylor, J. R., 1979, “Cause Consequence
Reporting for Accident Reduction. The Accident Anatomy Method,”
accessed Oct. 3, 2019,
http://orbit.dtu.dk/en/publications/cause-conse- quence-reporting-for-accident-reduction-the-accident-anatomy-method
[43] Nielsen, D., 1975, “Use of Cause-Consequence Charts in Practical Systems
Analysis,” Reliability and Fault Tree Analysis. Theoretical and Applied
Aspects of System Reliability and Safety Assessment: Papers of the Conference on Reliability and Fault Tree Analysis, Society for Industrial and Applied
Mathematics, Philadelphia, PA.
[44] Verma, T., and Pearl, J., 1990, “Equivalence and Synthesis of Causal Models,”
Proceedings of Sixth Conference on Uncertainty in Artificial Intelligence, Mor- gan Kaufmann, Boston, MA, July 27–29, pp. 220–227.
[45] Taylor, J. R., 2017, “Automated HAZOP Revisited,”
Process Saf. Environ.
Prot.
, 111, pp. 635–651.
[46] Marca, D. A., and McGowan, C. L., 1988,
SADT: Structured Analysis and
Design Technique, McGraw-Hill Book Co., New York.
[47] Kletz, T., 2009,
What Went Wrong?, 5th ed., Elsevier, Amsterdam, The
Netherlands.
[48] Sanders, R. E., 2016,
Chemical Process Safety: Learning From Case Histories,
3rd ed., Butterworth Heinemann, Oxford, UK.
[49] Deighton, M. G., 2016,
Facility Integrity Management, Elsevier, Amsterdam,
The Netherlands.
ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems,
Part B: Mechanical Engineering
MARCH 2020, Vol. 6 / 011006-17

Document Outline

l
1
1
2
3
4
s13
2
5
6
7
3
4
APP1
5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49

Download 0.97 Mb.

Share with your friends: