J. Robert TaylorEngineering Systems Division, Department of Management Engineering, Technical University of Denmark, Building 424, Kongens, Lyngby 2800, Denmark e-mail: roberttayloritsa@gmail.com Organizational Failure Analysis for Industrial Safety Organizational and management errors and failures represent a very significant causal influence in industrial accidents. A method, organizational failure analysis (OFA) is described for in-depth identification of organizational deficiencies and failures that can lead to accidents. The method was developed on the basis of an extensive data collection from safety management audits, accident and incident investigations, and emergency training exercises led by the author over a period of 26 years. From these data, a number of models of organizational performance and management behavior are derived. The models allow a semi-automated application of the method, which is important for the application to large organizations. The method is described with examples, and the results of several studies aimed at validating the method are given. [DOI: 10.1115/1.4044945] Keywords: human error, management error, organizational failure, safety management audit Introduction Management and organizational factors are identified as con- tributory causes in the majority of industrial accidents. In a review of major accidents from the authors own experience in the chemi- cal, oil, and gas industries, 77% of the accidents involved manage- ment errors and omissions and 35% involved organizational deficiencies. Of 22 accidents with fatal consequences, all but one had a management error or an organizational failure as a direct cause or major causal influence [ 1 ]. Failures were mostly those of omission, but about 25% of the management errors were those of commission. (These statistics were taken from follow-up studies of 103 risk analyses and safety management audits made by the author or his team over a period of 40 years. Examples in this paper are taken from these follow-up reviews or from the audit reports, unless otherwise stated.) Failures can arise in organizations themselves, rather than the managerial errors. The interactions between two workers can fail, and a work team can fail because of a misunderstanding or poor team coordination. Also, the organization can be defective in not providing a communication mechanism. As an example of a defi- cient organization, two gas companies were separated only by a chain link fence. A team from one company was replacing a valve. They vented the gas from the short unrelieved pipe section. The gas contained hydrogen sulfide. A team in the other company were testing and calibrating instruments nearby, and were exposed to the gas. There was no organizational mechanism to ensure coordination between work teams in the two companies. As another example, in the Flixborough accident, a major explosion occurred because a plant pressure piping modification was made without the post of plant mechanical engineer being manned [ 2 ]. As an example of a defective organizational structure, a major process engineering company audited had a well-developed man- agement of change system to cover design changes made during plant commissioning. However, a safety management audit led by the author revealed that this was only used for costing and charg- ing for changes. There was no safety assessment of any design changes. Of course, it is possible to recognize management and organiza- tional deficiencies in almost any accident. It is the nature of acci- dent investigation to seek to prevent similar accidents in future, and therefore to devise methods to do so, including those of organizational change. To prevent this study from devolving into a search into an ever-expanding set of problems, with an ever expanding definition of organizational failure, the term safety- related organizational failure is taken here to mean a deviation from the requirements of U.S. Code of Federal Regulations (CFR) Part 1910.119, process safety management of highly hazardous chemicals. The regulation is here applied to a full range of acci- dent scenarios, not just to the highly hazardous ones. This paper describes a method aimed at identifying the poten- tial organizational failures and managerial errors. It seeks to ana- lyze the potential as close to the root causes as possible in order to enable preventive measures to be proposed. It is primarily based on direct observations in the oil, gas, and chemical industries from safety management audits. Examples of application of the method and validation studies are described. There are few accidents directly caused by management failure, as can be seen by reviewing the accident investigation reports published by the U.S. Chemical Safety Board [ 3 ]. Where such accidents do occur, it has generally been due to managers issuing a direct order or a prohibition in a hazardous situation. For the most part, organizational failures and managerial errors have adverse effects by causing operator, maintenance, and other work errors. For this reason, organizational failure analysis (OFA) will only be able to cause accidents by functioning as error-inducing or error-forcing events and conditions at the hands-on level of plant or system operation (Fig. 1 ). For this reason, OFA will only be meaningful as an adjunct to operator or maintenance error anal- ysis, and these in turn depend on plant and control risk analyses. Motivation—Why Make Organizational Failure Analyses? Because organizational failure is so central to accident causal- ity, it is of practical interest to be able to understand how manage- rial and organizational failures occur, and to determine ways in which the errors and failures can be prevented. Most of the meth- ods developed for organizational failure analysis have, however, been focused on general understanding after accidents or predic- tion in order to make risk analyses more complete. It would be logical to include organizational failure into risk analysis methods. However, outside the field of nuclear power, even operator and maintenance error analysis are not included in standard procedures such as [ 4 ] despite the methods for operator error analysis being well developed and validated [ 5 ]. Attempts were made by some to include a “management factor” into risk Manuscript received November 30, 2018; final manuscript received September 2, 2019; published online November 14, 2019. Assoc. Editor: Raphael Moura. ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part B: Mechanical Engineering MARCH 2020, Vol. 6 / 011006-1 Copyright V C 2020 by ASME analysis by Hurst et al. [ 6 ]. However, from professional experi- ence as a third-party reviewer of risk analyses, this approach was not well liked by the managements paying for the analyses. Also, the results were often rejected by authorities as too likely to be subject to change. The area where organizational failure assessment can definitely be used is in safety management auditing as is shown by the examples in the sections on validation and reflections below. Such audits are standard practice in many companies and generally involve a team which include a leader from an external organiza- tion, and may include members from other divisions in the same organization. The team performs a review of the organization, its work procedures, the actual performance of work, and documenta- tion of work done. An audit generally produces a report, and a presentation is made which should include recommendations for improvement. The work generally requires a very diplomatic approach in order to achieve success in improving safety. Also, for success, the audit process requires strong support from the most senior management. This work is intended to support safety management auditing and identification of possibilities for risk reduction by organiza- tional improvement. Earlier Work—Safety Management Auditing An approach to organizational and management safety prob- lems arose through the 1970s, for use in loss prevention. The international safety rating system (ISRS) is a systematic approach to safety developed by Frank Bird. His approach is an audit method which includes study of the management [ 7 ] and is offered to companies as an auditing service by DNV-GL. It includes many of the issues described in this paper, but has a much wider scope and less depth (259 audit issues). ISRS has been applied for a large number of companies worldwide. An extended audit checklist for organizational and management phenomena affecting risk is given as Appendix D of the API Pub- lication 581 risk-based inspection base resource document [ 8 ]. This also gives a scoring method giving a factor which can be used to modify a risk analysis value for the physical level. The method is based on engineering judgment rather than objective evidence. A more theoretically bases approach to auditing was developed by reason in the form of the TRIPOD method [ 9 ]. “Tripod- DELTA is a checklist-based approach to carrying out safety “health checks” in process plant. The issues addressed are: hardware, design, maintenance management, procedures, error enforcing conditions, housekeeping, incompatible goals, organization, communication, training, and defenses. Quoting from the manual [ 10 ], “Tripod delta is a scientifically developed and proven means of measuring performance and determining which areas of the business are vulnerable to incidents.” Comprising of a database of 1500 questions, it does this by asking companies to answer a random selection of 275 of these. Each question is about the occurrence of an unsafe act, and when responses are examined, it is possible to determine which basic risk factors—, i.e., which organizational issues—the organi- zation is performing well in and where improvement is needed, based on answers to these questions. This method is wider in scope than that of this paper, though not as focused on organizational error. An important feature taken from the method [ 11 ] is to regard accidents as: arising from fallible decision and omissions at the manage- rial level, leading to latent failures, leading to line management deficiencies, leading to precursors (forcing and inducing conditions) for unsafe acts, leading to unsafe acts, and leading to accidents if defenses are inadequate (which can also result from errors at the management level). The tripod-delta method has been developed further by Gibb and coworkers [ 12 ] in the form of the incident causational matrix (ICAM) method, for application to transport organization. Fig. 1 Overall analysis model with the topic of this paper highlighted 011006-2 / Vol. 6, MARCH 2020 Transactions of the ASME
Reason has also written two books on organizational failure with many examples of organizationally caused accidents [ 13 , 14 ] which have proved useful in checking the completeness of the work here. Earlier Work—Organizational Factors in Risk Analysis There has been extensive work on incorporating organizational factors into risk analysis. Only a few of them have direct relevance to the oil, gas, and chemical industries, not due to any deficiencies in the methods, but due to the fact that current guidelines for risk analyses in these industries do not require any human or organiza- tional component (see, e.g., Ref. [ 8 ]). An important exception is the Norwegian work summarized as follows (Refs. [ 15 – 17 ]). Organizational influences on operator error were incorporated into Swain and Guttman’s technique for human error rate prediction (THERP) method as performance-shaping factors [ 15 ]. Organiza- tional factors such as degree of training, quality of administrative controls, team size, workload, staffing level, and communications were included as mathematical factors (“performance-shaping factors”) used to multiply baseline operator error probabilities. A tradition arose during the 1990s for improving human error analysis for risk analysis purposes. Most of these either included or focused on organizational issues. Embrey developed the MACHINE method for incorporating organizational factors into probabilistic safety assessment [ 16 ]. The method uses influence diagrams to support quantification of policy deficiencies on error-inducing factors and the influence of error-inducing factors on human (operator) error. The error-inducing factors are [inadequate] training and experience, distractions, proce- dures, fatigue, workplace environment, responsibilities and supervi- sion, and the policy deficiencies recognized as [inadequate] project management, safety policy, safety culture, risk management, design of instructions, training policy, and communication systems. Devoudian et al. [ 17 ] developed the work process analysis model for analyzing the impact of organizational phenomena on safety. The method takes a conventional process plan fault tree as a starting point, and for each cut set, selects candidate parameter groups which affect the basic events in the cut set. Note that each cut set, in combination with the fault tree top event, defines an accident or a system failure scenario. The candidate parameter groups are failure to restore (RE), miscalibration (MC), unavail- ability due to maintenance (UM), failure to function on demand (FR), common cause failures not due to human error (CCF), and time available for recovery (TR). The candidate parameter groups depend on organizational factors: Centralization Organizational learning Communication—external Ownership [of issues or problems] Communication—interdepartmental Performance evaluation Communication—intradepartmental Personnel selection Coordination of work Problem identification Formalization Resource allocation Goal prioritization Roles and responsibilities Organizational culture Technical knowledge Organizational knowledge Time urgency Training Ratings for each of these factors are used to adjust basic event probabilities. The success likelihood index (SLIM) method [ 18 ] is used to estimate conditional probabilities based on expert judgment. Cooper et al. developed a techniqe for human event analysis (ATHEANA) method [ 19 , 20 ]. The error mode identification in this method is failures to respond correctly to emergency simula- tions, rather than errors during normal plant operation. The causal mechanisms are very similar to those in Rasmussen’s human error taxonomy [ 21 ]. There is an emphasis, however, on quality of training, quality of procedures, time pressures, workload, crew dynamics, and resources available, i.e., on organizational influen- ces. The method also distinguished between error-inducing condi- tions and performance-shaping factors. Mohaghegh et al. [ 22 ] described the fundamental principles of modeling for use in organizational failure, setting important prin- ciples for relations between causal factors in a model. They then presented a method, socio-technical risk analysis which relates safety culture, organizational structure, safety culture, safety atti- tudes, and safety performance. One of the problems in developing analysis methods for organi- zational failure is that of obtaining field data which is adequate to support and validate the method. Norwegian teams developed the barrier and operational risk analysis (BORA) [ 23 , 24 ] and risk- organisational human technical (OMT) methods. Risk-OMT [ 25 ] uses the fault tree event tree methodology to model accident event sequences and causal Boolean networks to model managerial and organizational risk-inducing factors. In applying the method to off- shore maintenance, the team made use of actual incident (leakage) data from the Norwegian offshore industry. They also carried out interviews of person involved in maintenance and studied surveys reported to the Petroleum Safety Authority of Norway from 2002 onward. The quantity of data is such that conditional probabilities could be derived. The risk-inducing factors considered were those identified as important to the offshore maintenance tasks and the occurrence of leaks. The factors are: Factor group Factor Management competence Competence Management information Disposable work descriptions Governing documents Technical documents Management technical Design Human machine interface (HMI) Management general Communication Management task Supervision Time pressure Workload Work motivation Work very similar in purpose and evidence-based methodology were carried out by Pence et al. [ 26 ]. They describe a method for causal modeling and a way of data mining from large quantities of textual data in incident reports. A review covering organizational factors in human reliability analysis was published by Alvarenga et al. [ 27 ]. The review dis- tinguishes between systemic methods (ones which model the organizational system) and ones which are primarily causal factor analysis without any underlying system and provides an in-depth review of the most widely used methods. Earlier Work—Hazard and Operability Analysis - Based Methods for Organizational Problem Identification Kennedy and Kirwan [ 28 ] developed a method called safety cul- tural hazard and operability analysis (HAZOP) in order to address the managerial contribution to accident causality. They used a mod- ification of the traditional HAZOP guide words “missing, skipped, mistimed, more, less, wrong, as well as, other” to the parameters “person/skill, action, procedure/specification, training, information, resources, detail, protection, decision, control, communication.” The resilience-based integrated process systems hazard analysis [ 29 ] is a method based on HAZOP which integrates at the plant equipment level with failure analysis of management functions. It especially focuses on resilience methods such as plasticity, early detection, error tolerant design and recoverability in order to miti- gate or prevent accidents. Earlier Work—Systems Modeling More recently, i.e., since the late 1990s, another tradition has arisen with focus on identification and solving of safety problems in organizations, rather than support for risk assessment. ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part B: Mechanical Engineering MARCH 2020, Vol. 6 / 011006-3 Systems theoretic process analysis (STPA) [ 30 ] is a method for describing organizations as a nested set of control systems. Func- tional failure analysis based on control system concepts is then applied in order to identify possible function failures or devia- tions. The consequences of failure are then evaluated by failure effect propagation tracing in the same way as is done in failure mode and affects analysis (FMEA), HAZOP, or cause conse- quence analysis. The results in terms of failure mode and signifi- cant consequences are tabulated, and proposals are made for prevention or mitigation. The functional failure modes considered in STPA are “not providing a function,” “providing a function,” “incorrect timing or sequence,” “stopped too long,” and “provided too long.” In the Leveson and Thomas’s latest version [ 31 ], no attempt is made to investigate the causes of functional failure modes. STPA has been enormously successful in providing safety analyses for a wide range of systems in process plants, medical, military, and space systems. Hollnagel’s functional resonance analysis method (FRAM) [ 32 , 33 ] was developed as a method for describing the complex organizational interactions in an organization which can lead to accidents. It focuses especially on the fact that small deviations from nominal behavior can accumulate and interact (Hollnagel uses the term resonance) in order to produce large accidental con- sequences. The start of an analysis is a functional description of an activity, preferably as it is carried our rather than as it is imag- ined by the analysts or the managers. The analysis then involves a functional failure analysis taking into account the inputs for a function, the resulting output, the dependency on preconditions, on resources, on controls, and on timing and sequence. The method has been widely applied. Functional resonance analysis method is an important develop- ment in functional failure analysis, quite apart from its value as a method for organizational analysis. First, it recognizes the impor- tance of multiple small deviations with cumulative effects. This differs from most analysis methods such as FMEA, functional failure analysis and fault tree analysis which regard failures as all or nothing events. Second, it incorporates preconditions, resour- ces, controls and timing factors into functional failure analysis, providing significant additional guidance in finding causes of functional failure. These concepts can be applied in technical sys- tems failure just as well as in organizational failure. Another publication of note is “reasons managing the risks of organizational accidents” [ 25 ], which gives a broad description of organizational defects. Organizational Failure Analysis The origin of this work was the development of the action error analysis method for operator errors [ 34 ]. This method uses an extended functional failure analysis to identify error modes (see Table 1 ) and an error causal analysis for error mechanisms based on Rasmussen’s skill-rules-knowledge model of operator perform- ance [ 35 , 36 ]. This method was validated qualitatively by using it in the design of a small chemical plant [ 37 ], and then following incidents in the plant over a number of years, with reasonable agreement between the predicted errors and actual near misses observed. Application of the action error analysis method to oil, gas and chemical plant, military command and control systems, and the international space station over the following years necessitated extending the operator error causal mechanisms to communica- tions in work groups and to management commands [ 38 ]. Collec- tion of human error probability data to support the method was carried out by review of 103 risk analyses performed by or carried out by the author over a period of 36 years. A review was also made of accidents which subsequently occurred in these plants [ 5 ]. Many of the operator error causes which were found important in this study were actually due to error-forcing conditions arising from management error and organizational deficiencies. This work can be regarded as an extension of the work on action error analysis. The method has been used for oil, gas, and chemical plants and for aerospace and military systems analysis. It has been reasonably widely applied in Scandinavia. Action error analysis is important as a background to the following organizational failure analysis and involves: (1) establishing the series of steps in an operating procedure, either from written procedures or by observation, (2) applying a rather long series of action error modes (includ- ing normal action in the presence of latent hazards) to each step to derive the starting points for analysis, Table 1 Functional failure modes for action error analysis [ 5 ] Deviations Functional failure mode Action Check Decision No functionFailure to act Failure to check No decision Partial function Action incomplete Partial check Inadequate function Too little action Excessive function Too much action Inadequate function Too little action Imprecise function Imprecise action Wrong direction Spurious (unwanted) function Unintended or unwanted act Decision when none required Premature function Action too early Premature check, check at wrong time Premature decision Delayed function Action too late Delayed check Delay in decision Wrong sequence Action in wrong sequence Forget then remember function later Forget then remember action Unwanted repeated function Duplicated action Wrong function Wrong action Wrong check Erroneous decision Wrong parameter value Wrong action value Check with wrong value, wrong criterion Wrong value in choice Wrong object of function Action on wrong object Check of wrong object Correct function but precondition not satisfied Correct action with latent problem or hazard Wrong setup for check Decision made ignoring important preconditions Correct function but with latent hazard or deviation or unwanted side effect Correct action with latent problem or hazard or unwanted side effect Check made with unwanted side effect of checking Decision made ignoring latent hazard or side effect 011006-4 / Vol. 6, MARCH 2020 Transactions of the ASME (3) for each action error determining the series of events which could result from the action error, (4) determine the possibilities for recovery or mitigation of the error, and (5) for those action errors leading to accidents or other signifi- cant effects, determining the possible error-forcing or error- influencing conditions from the error cause taxonomy. Organizational failure analysis extends action error analysis into the higher levels of operations management and organization. The purpose of organizational failure analysis is to allow deep causes of organizational failure at the psychological and social levels to be identified. As a starting point, a description of the organization is needed as a basis for the analysis. The methods for representation used are organization organograms, functional block diagrams, or procedural flowcharts. The organizational failure analysis procedure is then: (1) From interviews and organizational documents and proce- dure description, create a functional block diagram and if necessary, procedural flowcharts for the organization. (2) Select parts of the organization which are particularly sig- nificant for safety. (3) Confirm the block diagram and the procedures for the orga- nization by direct observation, by review of organizational products such as work orders, completion records, quality control reports and safety inspection reports, and especially by interviews with workers focused on any difficulties they may have. (4) Revise or annotate the block diagrams or flowcharts. (5) Perform a functional failure analysis or action error analy- sis for the parts of the organization selected for analysis. The checklist of functional failure modes is given in Table 1 (6) Trace the consequences of the functional failures using conventional disturbance propagation analysis procedures, together with a modification for small deviations described as follows. (7) For those functional failure modes which have significant consequences, such as accidents, identify organizational failure causes using the checklist of organizational failure phenomena as in the Appendix. (8) Identify safeguards which can mitigate or prevent the con- sequences of the functional failures or which can reduce the effects of organizational failure causes. (9) Report the results and any recommendations for risk reduc- tion and add these to the risk management action list. Linear and Nonlinear Accident Models One criticism which has been made of hazard identification methods is that they are “linear,” while actual accidents are often “nonlinear.” By linear is meant that the accident scenarios mod- eled start with an initiating event and proceed through a sequence of intermediate events to a significant consequence. Along the path, the scenarios may involve the failure or failed state of one or more safety barriers. Linear accident scenarios can be described graphically using the “Swiss Cheese” model of reason [ 11 ]. This criticism is not appropriate for all methods. The fault tree analysis method was never “linear.” The HAZID-bow tie method requires fault tree like branching for every safety barrier, to record the potential causes of safety barriers failure, the safety critical activities which are intended to assure safety barrier integrity, and the deficiencies which can arise in the safety critical procedures. Nevertheless, the criticism is justified, in that none of the standard risk analysis methods address interactions across an organization, with interference between apparently unrelated functions. The usual functional failure analysis procedure follows intended functional paths and identifies deviations along these paths. This in turn means that it does not identify many accident types of the typed called “normal accidents” by Perrow [ 39 ], or of a steady drift into poor performance as described by Dekker [ 40 ]. A modification to the procedure which allows analysis of “nonlinear accident event sequences” is to record side effects of failures and errors, continuously through the course of an analysis, and to take these into account once the primary analysis has been completed, by extending the search for causes to include these side effects. The previously-mentioned procedure does not identify the effects of multiple small deviations from standard procedures of the kind identified as important by Hollnagel [ 33 ]. For example when ana- lyzing an emergency organization, delays in response may not just be large ones which prevent an effective response. They can be small ones at different stages in the emergency response which col- lectively can result in disaster. Failures, errors, and relatively innocuous “happenstances” can interact to cause an accident. As an example, consider one accident investigated by the author in which a team was performing maintenance inside a dis- tillation column. They were using breathing air supplied by hose from a battery of compressed air bottles. Part way through the task, additional complexities arose, prolonging the maintenance task. A message was sent to fetch a new air cylinder battery. How- ever, three things occurred. First, one of the air-lines in use by another team began to leak, increasing air demand. Second, there were two additional persons performing the work. There was no “breathable air watchman” to monitor air use, as this was not a standard practice for the company. At the same time, the truck bringing the air battery was delayed due to fallen scaffolding blocking the road. Air ran out and the team had to be rescued by emergency responders using self-contained breathing apparatus. This kind of problem can be addressed by extending the range of consequences in an analysis to include small deviations in the consequence analysis. This can be very difficult to carry out man- ually, but the semi-automated methods described below can make analysis practical. Organizational Failure Phenomena Various organizational phenomena leading to deviations from standard operating procedures were identified from 32 safety man- agement audits led by the author over a period of 30 years for plants in the oil gas and chemical industries. The companies var- ied in performance from that of high-integrity organizations to a few which had decidedly poor performance. The audits were gen- erally extensive, involving up to 10 persons and lasting from one to several weeks. Also included were observations from accident investigations. Some examples from these studies are illustrated in case histories in Ref. [ 5 ]. Any empirical study of this kind will produce a large number of organizational failure phenomena, and any checklist will be very difficult to administrate. The list was therefore structured accord- ing to the influences imposed on the organization, using the ideas drawn from Leveson and Thomas [ 31 ]. The model used is shown in Fig. 2 . The structured list of organizational failure phenomena is shown in the Appendix. Note that STPA is not the only possible structuring, but is convenient when there is a hierarchical organi- zation. Other organizational structures are described as follows. The organizational phenomena listed in the Appendix may be direct causes of accidents, but they may also be “causes of causes.” Figure 3 gives an example of a chain of causes identified during accident investigation for a permit to work (PTW) system involving a permit to work department. Actually, using this method requires some industrial experience on the part of the analyst. The requirement is minimized by provi- sion of a guide book which explains the individual terms and gives examples. Organizational Structures The most obvious organizational structure to consider in an organizational failure analysis is that of the formal management hierarchy. There are, though, other structures that provide an ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part B: Mechanical Engineering MARCH 2020, Vol. 6 / 011006-5
informal organization parallel to the management hierarchy. One type is that of cliques, which are groups of persons with tight social links, generally with each member having some form of relationship to all others. These can be social links such as clubs, religious ties, nationality, and can also be work related ties, such as cross-departmental projects [ 41 ]. Cliques can have a very bene- ficial effect, allowing easy communication, division of labor and mutual support at work in a cohesive group. Cliques can also be pernicious, in propagating erroneous beliefs, and in nepotism, for example, during safety management audits, it is sometimes possi- ble to identify such cliques. Another structure found to be important is that of a functional or logistic chain. These are subgroups, not necessarily in the same organization, each of which carries out a part of a task. An Fig. 2 Organizational failure influences Fig. 3 Causal influence chains of organizational function deviations leading to a failure of the permit to work systems 011006-6 / Vol. 6, MARCH 2020 Transactions of the ASME
example is supply of spare parts, which requires scheduling, ordering, payment, supply, import, transport, warehousing, and distribution. Organizational failure can occur in any step of the chain, and also in the feedback along the chain needed for clarifi- cations and for correction of any deviations such as substitution for out of stock items. Emergent Phenomena in Organizations The term emergent phenomena has many different meanings, for example, in the mathematics of complex systems and in the biology of exotic diseases. Here, emergent phenomena are defined as those which emerge from a system without arising from any part of the system alone, but as a result of interactions between parts. Examples identified from in the safety management audits are: overload, log jam, deadlock affecting resources, and exces- sive demand on budgets or equipment, Organizational drift and decay of functions, myth generations and decision making on the basis of myth, channeling and keyholing of activities, for example, by demands on an understaffed IT department due to expanding needs from multiple departments or sudden problems, chasing the latest fad at the expense of considered decision, overpromising and under-budgeting, scope creep, chasing key performance indices (KPIs) at the expense of balanced performance, chasing quarterly performance bonuses, staffing policies which do not take human and knowledge capital into account, and unmotivated efficiency rounds and staff reduction. There are undoubtedly many more such phenomena. In order to identify such phenomena, OFA needs to look at the organization as a whole and not just at individual functions. Processing Audit Data In order to make systematic use of audit data, the accident anat- omy method was used [ 42 ]; incidents were described in the form of cause consequence diagrams [ 43 ] with the sequence of events in any accident or near miss arising included. This allows much more information to be gathered than, for example, simple tabula- tion of incident types. This allows several cause–consequence event pairs to be identified in any incident scenario. At the same time, it allows identification of parallels between different event subsequences incidents at the detailed level. In many cases, the diagrams needed to be extended by adding the accident potential up to the final consequence (damage, injury, fatality). Then similar accident types were grouped according to accident type and the groups consolidated into one cause conse- quence diagram for each group. Causal influences which could have but did not lead to incidents were marked onto the diagram, along with any recovery measures taken which prevented inci- dents. Relative importance for the different event subsequences and causal influence were then simply obtained by counting the number of occurrences. An example is shown in Fig. 4 . The case was the explosion of an “empty” gasoline storage tank when very hot oil was pumped to it from a fluid catalytic cracker. The explosion did not cause injury but it did spread heavy oil across the manager’s car park. The figure is simplified for presentation here and is just one of a large dataset expressed in event language (ELAN) (see the Autmation of the Method section). This data analysis technique can be used for determining proba- bilities of causal links if the actual frequency for just a few incident types can be found (anchor points) and then the other frequencies inferred from the relative number of occurrences. Anchor points have been found for just a few incident types of managerial failure. A complete database of operator error probabilities was built up using this approach [ 5 ]. It is worth noting that when this is done, the cause consequence diagrams are populated with conditional probabilities and the cause consequence diagrams become mathe- matically equivalent to causal Bayesian networks [ 44 ]. Fig. 4 Example of a single scenario analysis with management deficiencies marked ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part B: Mechanical Engineering MARCH 2020, Vol. 6 / 011006-7
Automation of the Method The work in OFA can be quite onerous. If the organizational failures are to be related to specific accident potential, the work is equivalent to about twice that needed for a traditional HAZOP analysis. The HAZEX program [ 45 ] was developed as a tool for record- ing the results of HAZOP, FMEA, functional failure analysis, and action error analysis studies. It includes facilities for semi- automated analysis in which the program asks questions about the plant to be analyzed and then proposes causes and consequences of failures and operational upsets. It asks what safeguards exist and proposes further safeguards if those existing are inadequate. The data for the program are drawn from just over 100 earlier risk analyses and from a database of earlier accidents. The database was extended to cover organizational failure analy- sis by adding the OFA checklists and causal relations as tabulated in the Appendix to the HAZEX database. The motivation for this was to allow a validation of the method while minimizing any bias in validation of the method which may result from the analysts’ experience. This is necessary when validating a method because an experienced analyst can often incorporate more experience into results than that which arises from the methodology alone. The causal structure for the organizational phenomena is an important part of the HAZEX database. The structure describes the cause and effect relationships between the management and organizational phenomena, operator error, and consequences in the physical system (as in Fig. 3 ). The individual causal relation- ships are expressed as statements in the ELAN language [ 45 ] of the form event; condition; condition ) event; event; event The event on the left-hand side is the causal event, and the effects are dependent on zero or more conditions. The event or events on the right-hand side are the effects. Such statements can be chained by matching output events in one functional block to input events in the next so as to construct a complete sequence of events form- ing an accident scenario. Side effects can be identified by follow- ing all events on the right-hand side of the above-mentioned event transfer function, and multiple causes can be found by tracing causal chains for the conditions on the left-hand side. The method of chaining together event sequences can be applied for the physi- cal system (plant), the control systems, the operators, and the management levels. Algorithms for this are given in Ref. [ 45 ]. The HAZEX program allows actual case histories and accident reports to be retrieved rapidly in order to support the judgments on relevance and importance of failure phenomena identified by the algorithms. Table 2 shows some of the questions which the software may ask concerning organizational failure. As can be seen, the ques- tions are designed for use during safety audits. In all, over 300 questions are answered by the analyst, but most of these need to be posed only once for any analysis. Modeling to Support Automated Analysis The list of failure and error phenomena in the Appendix is evi- dence driven, coming from just over 2400 observations in 32 safety management audits. Classification of the observations though requires an underlying model in order to achieve consis- tency. Good models also allow a degree of prediction for new interaction possibilities. The starting point of modeling for the method here was Ras- mussen’s stepladder skill-rules-knowledge (S-R-K) model for operator performance and error. Industrial application of the model quickly showed the need for model extension to allow for communication between persons, and the value of Rasmussen’s variant in which knowledge was explicitly registered as a func- tional module supporting diagnosis, decision making, and plan- ning. Rasmussen himself proposed use of multiple S-R-K stepladder models to represent levels of organization (in internal working notes). Industrial application to management error also necessitated model extension to cover a much wider and deeper modeling of activity and function types than those included in the S-R-K model, such as checking, goal setting, policy definition, priority setting, leadership, guidance, training supervision, manning, etc. The list is long. The need for more detailed models also arose in order to be able to capture details in the audit observations. The raw material for an OFA analysis is a functional block dia- gram of the organization as it works normally. The notation for the diagram could be simple blocks with inputs and outputs, but there are advantages in using the structured analysis and design technique (SADT) addition of controls and resources [ 46 ]. The actual diagramming used is often an extension of the FRAM nota- tion [ 24 ]. The form of the functional blocks is shown in Fig. 5 Of course, block diagrams of this type could be developed with an ever increasing number of types of links (interactions and inter- ference) between blocks, so each link type should be justified. Input and output are obviously necessary, and control and resour- ces, from SADT, have proved very useful through the years. Holl- nagel uses TIME to describe time pressure influences, but as a link it is useful to represent to represent task interactions arising from races and delays. Precondition links are from the FRAM Table 2 Examples of the questions posed by the HAZEX analysis support tool Are there personnel problems with the work? Guidance on answering the questions during interviews Is there frustration with the work which could lead to impatience and unapproved procedures or bypassing of steps? This is difficult, can only be determined after establishing a good working relationship with personnel. Save all questions until relaxing over coffee, and preferably no questions at all. Do the workers feel that there are stupid rules for working? This can be determined by discussion with workers about how to improve rules. Are there problems of communication intended to ensure job protection? Discuss procedures with key workers. If there is reluctance this may be job protection, or it may indicate a need for industrial secrets protection. Is there distrust of management? Information of this kind is difficult to obtain explicitly unless a good rap- port is built up. It can be determined from the tone of discussions. Are there cases of counterproductive blame? Inspect accident and near miss reports and dismissals. Fig. 5 An OFA functional block 011006-8 / Vol. 6, MARCH 2020 Transactions of the ASME
notation and are useful to represent the links between earlier steps in a task, and functions in another task, which are preparatory. FEEDBACK links are used to represent monitoring aspects of control, as in STAMP [ 31 ]. ACCESS links are probably specific to maintenance and construction activities, and are useful for rep- resenting spatial interference between tasks. KNOWLEDGE is included because lack of knowledge is important in a large frac- tion of organizational failures, but also because the communica- tion of erroneous information between tasks represent a common cause of failure. SIDE EFFECTS are important for several rea- sons. Activation of latent hazards can occur in a normal functional sequence, but can also arise as a side effect of normal functioning. Side effects can also cause interference between unrelated tasks, such as the release of toxic materials to a drain while others are working close by. SIDE EFFECTS often link to INTERFER- ENCE, either directly or by creating adverse latent conditions. The arrangement of the block diagrams can vary. In many cases, a management hierarchy is suitable, as in STAMP, but in many cases an SADT arrangement is preferable, particularly for detailed submodels. A model corresponding to seven cases similar to that in Fig. 4 is shown in Fig. 6 A part of the ELAN model for “determine transfer route” is NORMAL FUNCTION IN -> ROUTE REQUEST¼> OUT -> ROUTE SELECTED FAILURES AND ERRORS IN -> ROUTE REQUEST, ERRONEOUS KNOWLEDGE ¼> OUT -> INCORRECT ROUTE SELECTED PIPING RECENTLY MODIFIED, KNOWLEDGE NOT TRANSFERRED TO MIMIC DISPLAY ¼>ERRONEOUS KNOWLEDGE ENGINEERING DEPARTMENT OVERSIGHT ¼> KNOWL- EDGE NOT TRANSFERRED TO MIMIC DISPLAY GAP IN ORGANIZATION NO UPDATE SOP ¼> KNOWL- EDGE NOT TRANSFERRED TO MIMIC DISPLAY The full model for this function is 22 transfer function ele- ments, and for the full task over 200 lines. Fortunately, it is possi- ble to build up a library of functional and functional failure models, so that modeling effort only needs to be made once [ 45 ]. Quantification Deriving tables of probabilities for organizational failure and management error is difficult in the oil and gas industry. The num- ber of occurrences of the different error and failure modes, and the number of causal mechanisms can be counted. For example, at least one case of each of the causes listed in the Appendix was found. However, the number of opportunities for error and failure is very difficult to determine. It would require detailed recording of managerial activities over a period of years. This means that probabilities per managerial action are not generally available. In the absence of a full set of conditional probabilities, the rela- tive importance of the different organizational and managerial failures is determined according to the number of cases arising in the organizational failure database (at present just over 2400 observations), together with the severity of the potential accidents which could be caused and 91 accidents which were actually caused. This was made easier in actual applications because extensive major hazards risk analyses were available for all of the plants studied. Quantitative risk analysis is possible in special cases when the organizational tasks are highly standardized and where the impor- tance of the failures justifies the effort. It is relevant to do this when the cost or difficulty of introducing risk reduction measures is high. This was done, for example, in two of the audits used as a background for this development, once to help determine the level of staffing needed in the permit to work organization and once to help determine the risk involved in running formal training in par- allel with on the job training for new maintenance technicians. For quantitative calculations, the different causes of organiza- tional failure and management error are regarded as independent error-forcing or error-inducing conditions, similar to that in the operator error analysis method ATHEANA. The contributions from each cause are combined using fault tree calculation meth- ods, using OR logic for the different causes. Very often, for exam- ple, an organizational weakness does not have a significant consequence until adverse conditions arise, and AND logic is used to combine the probability of weakness or failure with the probability of the adverse conditions. One such combination which occurred several times in incidents recorded in the audits studied is the case of operators who have not completed training working under a lead operator recently transferred to a new unit and facing an unexpected plant disturbance. This case involves two normal deviations from ideal performance combined with a not unusual plant state deviation. Fig. 6 Model for petroleum products tank yard transfer management ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part B: Mechanical Engineering MARCH 2020, Vol. 6 / 011006-9 Examples of Application and Validation of the Method The demonstration application of the method was for a PTW system, one well known to the author. The purpose of a permit to work system is to ensure that when work, particularly construction or maintenance work is to be carried out, the conditions for safe work exist. The method involves the supervisor of the work teams filling out a detailed questionnaire and performing or updating a task risk assessment. The PTW application and task risk analysis or job safety analysis are submitted to a permit to work officer, and this officer checking both the application and the work site. An area or overall work authority (a person) checks the permit to work application and ensures that all applications are compatible and do not involve simultaneous operations problems. The super- visor then instructs his or her team, and posts the permit to work so that it can be seen by all. The permit is usually approved for one shift or one day. A site safety officer helps to ensure that new hazards do not arise during the course of one day. A block dia- gram for the PTW organization is shown in Fig. 7 . It is in STPA format, as is natural for this case. An example of the results of the analysis is shown in Table 3 In the analysis, 57 error or failure modes were identified, and typically 3–7 significant organizational failure causes for each failure mode. The analysis was checked against 17 cases described in Trevor Kletz’s book “What went wrong” [ 47 ]. All of the 17 cases were included in the predictive analysis. A second test case was a management of change system for engineering modifications. The procedure for this included nine steps. The analysis was carried out manually by students as a case study in a short course at the Master’s level. Forty-eight causes of organizational error with potential accidents as a result were iden- tified. These were compared with cases described by Sanders [ 48 ]. The analysis covered 80% of the cases in this book. The method was applied using the HAZEX program and generic ELAN models to the organization of plant isolation for mainte- nance. This is a procedure in which pipes are sealed off so that per- sons cannot begin to open manholes or flanges without hazardous substances entering the plant section concerned. It was similarly applied to the process for gas testing before vessel entry, for lock- out and tag-out for work on electrical systems, and for emergency response in the case of hazardous materials release. Again, the results were checked against the case histories for this task given by Kletz [ 47 ] and by Sanders [ 39 ]. All of the incidents recorded for this task in these references were predicted automatically. A fourth analysis was for an emergency response procedure for an offshore oil production facility. This was observed during a full-scale exercise involving several hundred persons, with both prestationed observers and video recording. The results from this test are not so clear as from the others. All of the deviations from the written procedure which were observed were also predicted by automatic analysis. However, there were also deficiencies in the written procedures which were overlooked. Risk Reduction The main objective in using OFA has been to aid in proposals for risk reduction. Associated with each operational failure cause is a list of possible risk reduction measures. Many of these are specific to the individual causes, but some generic methods are given in Table 4 Observations and Conclusions The cases studied indicate that the method is applicable and pro- duces useful results. It revealed some important results, for example: The most senior management of companies are usually the only ones with authority to call on mutual aid or external aid in an emergency, but these persons are generally the least trained and are sometimes untrained. As a result, they get overinvolved in details of response and often do not carry out their overview and contingency readiness functions. In PTW systems for large projects, the workload on the pro- ject safety authority is excessive, and this interferes with per- formance. However, splitting the work between persons introduces problems of coordinating simultaneous opera- tions. Improved procedures are needed. The function of gas testing which is needed before entry into confined spaces may be subject to delays, and this can lead to frustration in work teams. The work described here is a first step in developing organiza- tional failure analysis. Because the method is based on empirical observations it is unlikely to be complete. Further work is needed to identify possible gaps or weaknesses. Such work is currently being performed in order to determine the applicability of the methods in other areas of industry and other activity types. Reflections The method described here resemble several others derived from the risk analysis tradition, and especially there is a structural similarity to the work of Sklet et al. [ 24 , 25 ] and a similarity in data processing to that of Pence et al. [ 26 ] The main differences here are that the risk-inducing factors used are based on direct observation in safety management audits and incident investiga- tions, and there is a large difference in the degree of detail. The question arises—Is it necessary to investigate in in the high level of detail described here? From 1978 through to 2014, the author led a total of 105 risk analyses, mostly of large oil, gas, and chemical plants, with just four of mechanical production processes. Subsequently, 92 follow-up studies were made up to the end of recording in 2016. For the analyses carried out up to 1994, there were subsequently a total of 20 major hazards accidents arising from management fail- ing to implement recommendations, with a total of 164 fatalities to follow [ ]. The companies involved were obviously well inten- tioned (they paid for the analyses), and some could be classified as high-safety integrity companies. We had naively assumed that all that was needed for accident prevention was a good risk analy- sis and a good set of well-presented recommendations. The management errors were in one case due to simple refusal to implement PTW (refusal to consider implementing PTW because of lack of space for a PTW office on the platform com- plex) where the accident was directly the result of lack of PTW and caused 11 fatalities. One was a case involving a very long communications chain (4 companies, 11 departments). All the others were due to postponement of implementation followed by forgetfulness. After 1994, the problem was recognized, and we made a change in the approach to result presentation approach. The potential accidents were illustrated with case histories and photo- graphs of relevant earlier accidents, and the risk reduction recom- mendations were accompanied by conceptual stage designs for the safety measures. Since 1994, only one recommendation failed to be implemented, and only three further fatalities occurred. The actual residual risk for plants analyzed after 1994 was more than an order of magnitude lower than for those analyzed prior to 1994. The relevance of this history for OFA was that all of the acci- dents involved managerial omissions and less than adequate per- formance at the detailed level, such as inadequacies in the plant section isolation procedures (preparing for maintenance). The deficiencies were not those of generally poor safety management. The companies had in all but three cases good safety management. The problems arose in nearly cases at the detailed level due to spe- cific deficiencies. Whether detailed analysis, as described here, is needed, or whether just a few key management performance factors can adequately reflect the risk is still an open question. However, there is enough evidence to make detailed management and organiza- tional failure analysis a worthwhile precautionary measure. The other question arising is whether the analysis is truly neces- sary. Could we not just implement all of the safety management 011006-10 / Vol. 6, MARCH 2020 Transactions of the ASME
Fig. 7 A block diagram for a PTW system ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part B: Mechanical Engineering MARCH 2020, Vol. 6 / 011006-11
recommendations arising from questions such as those in Table 2 above, without any analysis? From our own experience of application of the method since 2010, the analysis is very useful. If carried out generically, it can serve to improve understanding and the quality of auditing. When applied to a specific plant, it can serve to prioritize recommenda- tions. Prioritization is often needed. Over the years, we have found that presenting senior managements with risk reduction plans involving more than ten issues is counterproductive (When there are more than recommendations ten recommendations they need to be packaged into groups). The analysis allows prioritiza- tion of recommendations to be made objectively. It also allows the recommendations to be related to earlier accident reports from around the world. To support this view, the findings from three safety manage- ment audits were studied, and the source of the recommendations was determined. In a safety management audit, three types of review can be made [ 49 ]: review of safety documentation such as safety policy, train- ing records, management of change records, etc.; review of organizational procedures completeness and good working; In field mechanical integrity audit (this is a general audit of equipment and operation); The methods described here supplement these approaches, allowing detailed organizational assessment. Results from three audits are given in Table 5 The results in Table 5 do not provide an absolute measure of the success of the method, because the relative importance of the rec- ommendations is not known, and because results will differ from company to company. The results do show, however, that the method has some value in supporting safety management audit. Further Work Current activities include documenting the data collection for this study in a form that can be used by other researchers, expand- ing the dataset beyond the oil gas and chemical industries, and applying the method to the design and construction stages. There is also a need to investigate the degree to which the method and causal factors as described here is sufficiently complete. Whether sufficient field data can be obtained to allow the method to quantify probabilities “per opportunity,” for example, per operation is an open question. Calculation on the basis of fre- quency per company per year is already possible, but quantification Table 3 An extract from the PTW system of a showing analysis of part of one function Agent Failure mode Cause Consequence Safeguards PTW officer Fails to perform PTW application check Work overload Some aspects of the work may be unchecked and defective. Area authority performs overall site check. –Insufficient manning –Spike in the workload due to new problems Site safety officer may observe the hazardous condition. Possible accident –PTW officer mistake –Chaotic filing in the PTW office Fails to perform site check Work overload leads PTW officer to sign off without site check. As above PTW office signs off a condition on the form incorrectly Lack of knowledge As above Improved training. –Insufficient training –Poor recruiting or promotion policy Wrong area checked Lack of knowledge As above Good plot plan on wall display. –Poor plot plan Periodic audit of PTW system performance. –Error in markup of the plot plan by the supervisor –Plot plan for work not updated Cross-check by PTW applicant required. –Poor supervision Lack of situational aware- ness by management –Complacency Delay in checking Work overload Supervisor becomes tired or waiting, commences work without PTW Site safety officer checks for lack of PTW or out of date PTW Table 4 Examples of risk reduction measures Problem Risk reduction measure Organization structure problem Redesign of the organization, new functions, integrations of func- tions, standardization of procedures Lack of knowledge Training, high quality manuals, videos, “canned” tool box talks Overloading Reassessment of workloads. Identification of peak loading during periods of high demand and in response to emergencies or unusual events. Devise a supplemen- tary staffing strategy. 011006-12 / Vol. 6, MARCH 2020 Transactions of the ASME Table 5 Sources of significant recommendations in three safety management audits Approach to identifying issues and recommendations Plant type Document review Procedure audit Mech. integrity OFA Refinery 22 4 41 7 LNG plant 7 1 23 21 Gas treatment and NGL plant 14 1 32 11 per task type requires more data. Extending the data collection to include publicly reported accidents, such as those reported in loss prevention bulletin, may provide a way forward. The most important question for the intended use of the method is whether the OFA approach with a relatively theoretical back- ground can provide an improvement when compared with tradi- tional methods for audit support such as ISRS, tripod-delta, and incident causational matrix. It is also important to know whether the introduction of recent concepts such as a system theoretic approach, nonlinear accident scenarios, side effects and task interference and emergent hazards has a significant impact on results. Comparative studies applying several methods to a refer- ence example are planned. Acknowledgment The author would like to thank Dr. I. Kozine for useful comments and help in writing this paper, and in support during student exer- cises in applying the method for student risk analysis case studies. Also, thanks are due to the audit team members and the students, who provided useful comments on the application of the method. Appendix Organizational failure potential causes. Organizational failure group Organizational failure cause Possible effects Organization Problematic goals Undesired function Conflicting goals Conflicting functions, intermittent or inconsis- tent functions Missing function Function not performed Under-resourced function Poor performance of function Misunderstood function Poor performance of function Outdated or declining function Poor performance of function New and still learning function Poor performance of function Duplicated or overlapping function Possible chaotic functioning, interference, blockage Competing functions, uncoordinated function- ing, rivalry Possible chaotic functioning, interference, blockage, “stealing resources” Gaps between functions, unclear responsibilities Necessary functions not performed due “not my responsibility” effects Degraded function Function gradually becomes substandard through uncorrected drift Silo organization Departments do not communicate and take decisions independently Uncoordinated procedures Multiple incompatible systems Difficulties in information transfer. Communica- tion errors. Multiple incompatible procedures Errors on transfer of staff between departments Leadership relations Authoritarian Poor error correction Democratic Paternal Problematic if there are gaps in functioning Professional Generally ideal Tight linked professional Ideal, but can give group error Absence of leadership Loss of control, departments veer in their own chosen direction. Attitude to leadership and employees Respect Good performance, initiative Tolerance Contempt Poor performance, low initiative Animosity Hidden or overt disobedience Decision making Decisions made by work groups or the organiza- tion as a whole: Under-informed decisions, poor feedbackErroneous decisions Lack of knowledge required for decision Erroneous decisions Erratic, inconsistent decision making Erroneous decisions Tunnel vision Erroneous decisions Fixation Erroneous decisions Over-hurried decision making Erroneous decisions ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part B: Mechanical Engineering MARCH 2020, Vol. 6 / 011006-13 (continued) Organizational failure group Organizational failure cause Possible effects Competing opinions Erroneous decisions Animosity in discussions Delays, lack of consensus, poor coordination Feed back Missing Continued erroneous functioning Incomplete Continued erroneous functioning Inaccurate Erroneous decisions, incorrect functioning Biased Erroneous decisions, incorrect functioning Fraudulent Erroneous decisions, incorrect functioning Safety attitude Systematic and professional Generally no bad consequences Well-intentioned but uninformed Lack of knowledge can cause accidents Complacent Management does not worry too much about safety, just provides Macho Risk taking, bypassing safety rules In denial Management demies that risks are real or significant Dysfunctional Management does not care Workers perform according to own convenience Supervision Strict and continuous A degree of safety is provided in preventing ini- tiating events or in activating safety measures Strict but sporadic or intermittent Risk reduction in proportion of time present, plus encouragement to staff Absent Encourages possibly harmful improvisation, no on the job training Complacent As above Communication Organization does not consider: Simple forgetting to communicate Function not carried out Incomplete or erroneous circulation list Function may be incomplete Language problem Erroneous function Use of argot may be misunderstood Erroneous function Communication abbreviated Erroneous function Communication ambiguous Erroneous function Erroneous reference (to item name or number etc.) Erroneous function Model mismatch between sender and recipient. Erroneous function Wrong recipient Erroneous function Mixed or interfering messages Erroneous function Communication channel overloaded Omission of function Priority error in message processing Omission of critical function Communication channel breaks Function not carried out or incomplete Noise Erroneous function Deliberate misinformation Erroneous function Instructions Organization does not prevent: Communication error as above Erroneous or incomplete function Wrong instruction Erroneous or incomplete function Incomplete instruction list Erroneous or incomplete function Preconditions not specified Erroneous or incomplete function Erroneous information Erroneous or incomplete function Unreported essential background Erroneous or incomplete function Conflicting instructions Erroneous or incomplete function Work overload Erroneous or incomplete function Priority error in work processing Erroneous or incomplete function Knowledge Essential knowledge not known, by work group or by the organization Erroneous or incomplete or omission of function –Basic physics or chemistry –Engineering –Equipment working (generic) –Specific equipment knowledge –Plant or installation knowledge –Procedures and standard operating procedures (SOPs) Misteaching As above Mislearning As above Knowledge not remembered or not recalled, slip of the mind As above Knowledge which is inappropriate for the actual equipment or installationDirect error, wrong functioning 011006-14 / Vol. 6, MARCH 2020 Transactions of the ASME (continued) Organizational failure group Organizational failure cause Possible effects Situational awareness Both individuals such as managers and the orga- nization as a whole can lack awareness Loss of awareness due to concentration Required response not made No awareness due to tunnel vision Required response not made Fixation Wrong response to situation Loss of mode awareness Inappropriate response Poor mode display Inappropriate response Work situation Organization does not prevent, or forces: High workload Errors or omissions in function High peak workload Errors or omissions in function High work intensity Errors or omissions in function High work complexity Errors or omissions in function Low workload, boredom Errors or omissions in function Inadequate resources, poor resources Errors or omissions in function –Tools Errors or omissions in function –Materials Errors or omissions in function –Equipment Errors or omissions in function –HMI Errors or omissions in function –Access Errors or omissions in function Work environment Organization does not prevent the effects of: Noise Errors or omissions in function Interruptions Errors or omissions in function Distractions Errors or omissions in function Too high or low temperature Errors or omissions in function Too high or low humidity Errors or omissions in function Draft Errors or omissions in function Poor lighting Errors or omissions in function Rain Errors or omissions in function Wind Errors or omissions in function Snow Errors or omissions in function Ice Errors or omissions in function Exposure, height Errors or omissions in function Work demand Organization does not consider: Concentration Requirements exceed capability Precision Requirements exceed capability Speed Requirements exceed capability Balance Requirements exceed capability Strength Requirements exceed capability Personal size Requirements exceed capability Endurance Requirements exceed capability Organization induces: Chronic work stress Requirements exceed capability Stress during an emergency Requirements exceed capability Work pattern Organization does not sufficiently consider the effects of: Shift work Reduced reliability and speed High work intensity Reduced reliability and speed Shift length Reduced reliability and speed Lack of sleep Reduced reliability and speed None workloads on time, e.g., transport Reduced reliability and speed Personal issues Health Reduced reliability and speed Economy Reduced reliability and speed Family Reduced reliability and speed Frustration Reduced reliability and speed Self-confidence Lack of confidence Failure to perform function Overconfidence Performance without needed checks. Risk taking behavior Responsibilities Unclear Omission of function Unspecified Omission of function Split responsibility Omission of function Inadequate support or backup Inconsistent performance Procedures Nonexistent Errors of omission and commission Out of date See Human error chapter. ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part B: Mechanical Engineering MARCH 2020, Vol. 6 / 011006-15
References [1] Taylor, J. R., 2016, “Does QRA Help to Reduce Risk? a Study of 92 Risk Anal- yses Over 36 Years,” Chem. Eng. Trans., 48, pp. 811–817. [2] Health and Safety Executive, 1975, “ The Flixborough Disaster: Report of the Court of Inquiry,” HMSO, Richmond, UK. [3] Chemical Safety Board, 2002, “This is the U.S. National Organisation Provid- ing in Depth Accident Investigation Reports for Industry,”. [4] TNO, 1996, “Methods for the Calculation of the Effects of the Escape of Dan- gerous Materials,” Dutch Ministry of Labour. [5] Taylor, J. R., 2015, Human Error in Process Plant Design and Operations, CRC Taylor and Francis, Boca Raton, FL. [6] Hurst, N. W., Bellamy, L. J., and Geyer, T. A. W., 1990, “Organisational Man- agement and Human Factors in Quantified Risk Assessment, a Theoretical and Empirical Basis for Modification of Risk Estimates,” Safety and Reliability in the 90’s” (SARRS ’90), Water and Cox ed., Elsevier Applied Science, Amster- dam, The Netherlands. [7] Bird, F., 1974, Management Guide to Loss Control, Institute Press, Atlanta, GA. [8] API, 2000, “Risk Based Inspection Base Resource Document,” American Petro- leum Institute, Washington, DC, Standard No. API PUBL 581. [9] Thomas, P.,Hudson, W., Reason, J. T., Bentley, P. D., and Primrose, M., 1994, “Tripod Delta: Proactive Approach to Enhanced Safety,” J. Pet. Technol. , 46(1), epub. [10] Tripod-Foundation, 2008, “Tripod Beta User Guide,” Stichting Tripod Founda- tion, London. [11] Reason, J. T., 1990, Human Error, Cambridge University Press, Cambridge, UK. [12] De Landre, J., Gibb, G., and Walters, N., 1983, “Using Incident Investigation Tools Proactively for Incident Prevention,” Safety Wise Solutions Pty Ltd, US Nuclear Regulatory Commission, Washington, DC, https://www.asasi.org/ papers/2006/Payne_Stewart_Learjet_Investigation_De%20Landre_Gibb_Walters_ DOC.pdf [13] Reason, J. T., 1997, Managing the Risks of Organizational Accidents, Ashgate, Aldershot, Hants, UK. [14] Reason, J. T., 2015, Organisational Accidents Revisited, CRC Press, London. [15] Swain, A. D., and Guttman, H. E., 1983, “Handbook of Human Reliability Analysis With Emphasis on Nuclear Power Plant Applications,” US Nuclear Regulatory Commission, Washington, DC, Report No. NUREG/CR-1278. [16] Embrey, D. E., 1992, “Incorporating Management and Organisational Factors Into Probabilistic Safety Assessment,” Reliab. Eng. Syst. Saf. , 38, pp. 199–208. [17] Davoudian, K., Wu, J.-S., and Apostolakis, G., 1994, “Incorporating Organiza- tional Factors Into Risk Assessment Through the Analysis of Work Processes,” Reliab. Eng. Syst. Saf. , 45(1–2), pp. 85–105. [18] Embrey, D. E., Humphreys, P. C., Rosa, E. A., Kirwan, B., and Rea, K., 1984, “SLIM-MAUD: An Approach to Assessing Human Error Probabilities Using Structured Expert Judgment,” U.S. Nuclear Regulatory Commission, Washing- ton, DC, Report No. NUREG/CR-3518. [19] Cooper, S. E., Ramey-Smith, A. M., and Wreathall, J. A., 1996, “Technique for Human Error Analysis (ATHEANA),” U.S. Nuclear Regulatory Commission, Rockville, MD. [20] Forester, J. A., Bley, D. C., Cooper, S., Kolakzowski, A. M., Thompson, C., Ramey-Smith, A., and Wreathall, J., 2000, “A Description of the Revised ATHEANA (A Technique for Human Event Analysis),” NUREG 1624, Rev 1, US Nuclear Regulatory Commission, Washington, DC. [21] Rasmussen, J., 1982, “Human Errors - A Taxonomy for Describing Human Malfunction in Industrial Installations,” J. Occup. Accid. , 4(2–4), pp. 311–333. [22] Mohaghegh, Z., Kazemi, R., and Mosleh, A., 1994, “Incorporating Organiza- tional Factors Into Probabilistic Risk Assessment (PRA) of Complex Socio- Technical Systems: A Hybrid Technique Formalization,” Reliab. Eng. Syst. Saf. , 94(5), pp. 1000–1018. [23] Aven, T., Sklet, S., and Vinnem, J. E., 2006, “Barrier and Operational Risk Analysis of Hydrocarbon Releases (BORA-Release)—Part I: Method Description,” J. Hazard. Mater. , 137(2), pp. 681–691. (continued) Organizational failure group Organizational failure cause Possible effects Incomplete Incorrect Poor style, difficult to understand Procedural drift Incorrect procedure Training Nonexistent Errors and commission in function Incomplete scope Omissions Long training cycle, persons must function while waiting for training Errors and commission in function Training backlog Poor style, low retention Inadequate refreshers Erroneous training QA/QC Inadequate, gaps Mistakes and failures not caught Limited scope Mistakes and failures not caught Under-dimensioned Mistakes and failures not caught Unreliable Mistakes and failures not caught Hidden problems Mistakes and failures not caught Cultural Racial, religious or orientation prejudice Tension and lack of cooperation National and racial norms Deviation from expected behavior, not necessar- ily bad, but can lead to mistakes Violations Violations can occur at all levels in an organiza- tion hierarchy: Substance abuse Errors and omissions Hiding serious health problems Slowness, unreliability Smoking in hazardous areas Fire Lack of cooperation for job protection Slow functioning, lack of knowledge for replacement staff Intergroup rivalry misreporting and accusations Tension and lack of cooperation Work-to-rule Delays in function Rogue contractorsTheft, fraud, poor work, hazardous equipment or product Fake parts or materials Premature failure or failure to work on demand Fake work Omission of possibly necessary functions Theft Possible loss of safety equipment Fraud Inadequate equipment supplied Sabotage Direct cause of accidents 011006-16 / Vol. 6, MARCH 2020 Transactions of the ASME [24] Sklet, S., Vinnem, J. E., and Aven, T., 2006, “Barrier and Operational Risk Analysis of Hydrocarbon Releases (BORA-Release)—Part II: Results From a Case Study,” J. Hazard. Mater. , 137(2), pp. 692–708. [25] Gran, B., Rolf, B., Nyheim, O. M., Okstad, E. H., Seljelid, J., Sklet, S., Vatn, J., and Vinnem, J. E., 2012, “Evaluation of the Risk OMT Model for Maintenance Work on Major Offshore Process Equipment,” J. Loss Prev. Process Ind. , 25(3), pp. 582–593. [26] Pence, J., Mohaghegha, Z., Ostroff, C., Keec, E., Yilmazd, Z., Grantome, R., and Johnson, D., 2014, “Toward Monitoring Organizational Safety Indicators by Integrating Probabilistic Risk Assessment, Socio-Technical Systems Theory, and Big Data Analytics,” Conference Probabilistic Safety Assessment & Man- agement PSAM 12, Honolulu, HI, 2014, CreateSpace Independent Publishing Platform, 2016. [27] Alvarenga, M. A. B., Frutuoso e Melo, P. F. F., and Fonseca, R. A., 2014, “A Critical Review of Methods and Models for Evaluating Organizational Factors in Human Reliability Analysis,” Prog. Nucl. Energy , 75, pp. 25–41. [28] Kennedy, R., and Kirwan, B., 1998, “Development of a Hazard and Operability-Based Method for Identifying Safety Management Vulnerabilities in High Risk Systems,” Saf. Sci. , 30(1998), pp. 2496–274. [29] Jain, P., Rogers, W., Pasman, H., and Mannan, M. S., 2018, “A Resilience-Based Integrated Process Systems Hazard Analysis (RIPSHA) Approach—Part II: Man- agement System Layer,” Process Saf. Environ. Prot. , 118, pp. 115–124. [30] Leveson, N. G., 2013, “An STPA Primer Version 1”. [31] Leveson, N. G., and Thomas, J. P., 2018, “STPA Handbook,” accessed, Nov. 30, 2018, http://psas.scripts.mit.edu/home [32] Hollnagel, E., 2004, Barriers and Accident Prevention, Ashgate, Aldershot, UK. [33] Hollnagel, E., 2012, FRAM—The Functional Resonance Analysis Method: Modelling Complex Socio-Technical Systems, Ashgate, Farnham, UK. [34] Taylor, J. R., and Rasmussen, J., 1976, “Notes on Human Factors Problems in Process Plant Reliability and Safety Prediction,” Risø National Laboratory, Roskilde, Denmark. [35] Taylor, J. R., 1994, Risk Analysis for Process Plant, Pipelines and Transport, Taylor and Francis—Spon, London. [36] Rasmussen, J., 1974, “The Human Data Processor as a System Component. Bits and Pieces of a Model,” accessed Nov. 30, 2018, http://orbit.dtu.dk/en/ publications/ [37] Taylor, J. R., Hansen, O., Jensen, C., Jacobsen, O. F., Justesen, M., and Kjær- gaard, S., 1982, “Risk Analysis of a Distillation Unit,” Technical University of Denmark, Lyngby, Denmark, accessed Oct. 3, 2019, http://orbit. dtu.dk/files/ 88560585/ris [38] Hollnagel, E., Rosness, R., and Taylor, R. J., 1990, “Human Reliability and the Reliability of Cognition,” Paper Presented at the Third International Conference on ‘Human Machine Interaction and Artificial Intelligence in Aeronautics and Space, Blagnac, France, Sept. 26–28. [39] Perrow, C., 1984, Normal Accidents—Living With High Risk Technologies, Princeton Univerity Press, Princeton, NJ. [40] Dekker, S., 2011, Drift Into Failure, Ashgate, Farnham, Surrey, UK. [41] Tichy, N., 1973, “An Analysis of Clique Formation and Structure in Organ- izations,” Administ. Sci. Q. , 18(2), pp. 194–208. [42] Bruun, O., Rasmussen, A., and Taylor, J. R., 1979, “Cause Consequence Reporting for Accident Reduction. The Accident Anatomy Method,” accessed Oct. 3, 2019, http://orbit.dtu.dk/en/publications/cause-conse- quence-reporting-for-accident-reduction-the-accident-anatomy-method [43] Nielsen, D., 1975, “Use of Cause-Consequence Charts in Practical Systems Analysis,” Reliability and Fault Tree Analysis. Theoretical and Applied Aspects of System Reliability and Safety Assessment: Papers of the Conference on Reliability and Fault Tree Analysis, Society for Industrial and Applied Mathematics, Philadelphia, PA. [44] Verma, T., and Pearl, J., 1990, “Equivalence and Synthesis of Causal Models,” Proceedings of Sixth Conference on Uncertainty in Artificial Intelligence, Mor- gan Kaufmann, Boston, MA, July 27–29, pp. 220–227. [45] Taylor, J. R., 2017, “Automated HAZOP Revisited,” Process Saf. Environ. Prot. , 111, pp. 635–651. [46] Marca, D. A., and McGowan, C. L., 1988, SADT: Structured Analysis and Design Technique, McGraw-Hill Book Co., New York. [47] Kletz, T., 2009, What Went Wrong?, 5th ed., Elsevier, Amsterdam, The Netherlands. [48] Sanders, R. E., 2016, Chemical Process Safety: Learning From Case Histories, 3rd ed., Butterworth Heinemann, Oxford, UK. [49] Deighton, M. G., 2016, Facility Integrity Management, Elsevier, Amsterdam, The Netherlands. ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part B: Mechanical Engineering MARCH 2020, Vol. 6 / 011006-17
Document Outline - l
- 1
- 1
- 2
- 3
- 4
- s13
- 2
- 5
- 6
- 7
- 3
- 4
- APP1
- 5
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
Share with your friends: |