The Risk is in the Management
Do you use a risk management program to conserve asset resources?
Does your
employer foster a site environment where risk management is a routine part of job planning,
preparation and execution?
Did you answer No to the above questions?
Risk Management was once thought to be the sole product of the site safety department.
Maintenance and Operation professionals now understand the importance of a risk management
process to aid in protecting, conserving, and extending the reliability of critical assets. Failure to
effectively manage the risks of asset failure can add costs to an operating unit at any plant, site,
or installation.
Managing risks related to asset maintenance and operation requires good judgment and some professional expertise because this is an art, or vocation, and a science with its own well developed technological hierarchy. The objective of managing risk is not to remove all risk but to eliminate unnecessary or avoidable risk, thus the process must allow individuals to make informed decisions about what risks to accept at each operational level. Managers should compare standard Risk Management principles with historical asset data and their personal experience; then consider How, When, and Why it applies to specific situations within their area of functional responsibility.
Both Managers and Craft/Techs manage risk on a daily basis. Craft/Techs continuously search for hazards within their areas of expertise during daily job performance and routinely recommend the proper controls to reduce risks. Potential hazards and resulting risks vary as operating circumstances and parameters change. Management knowledge, gained from these experienced Craft/Techs, coupled with additional subject matter training can influence the extent and success of risk reduction measures.
Have you ever heard of SFMEA?, RCFA?, Maintenance Optimization, or RCM?
These
are all tools that can be employed to help a site preserve asset resources. Programs like these can
provide the means to identify, assess, and implement controls of risks and potential hazards to
critical assets. Specific parts of these tools also help compile information necessary for making
decisions to help balance PM/PdM program costs with increased operating benefits. What does
each have in common with the others? They all ask the same questions as the basic Risk
Management model. In the following table note the similarities of each process step, or decision
level.
BASIC RISK
MANAGEMENT |
SFMEA |
RCFA |
MAINTENANCE OPTIMIZATION |
RCM |
|
Step 1 | 1. Identify the failure hazards. | Potential Failure Mode Potential Effects of Failure Severity Potential Causes | 1. Define The Problem | 1. Determine Operational Criticality. | 1. What are the functions and associated desired standards of performance of the asset in its present operating context (functions)? 2. In what ways can it fail to fulfil its functions (functional failures)? |
Step 2 | 2. Assess failure hazards to determine risks to site operations based on probability of occurrence. | Probability of Occurrence | 2. Analyze the Problem | 2. Understand and Predict Equipment Behavior | 3. What causes each functional failure (failure modes)? 4.What happens when each failure occurs (failure effects)? 5. In what way does each failure matter (failure consequences)? |
Step 3 |
3. Develop controls to reduce or avoid failures and make decisions reference the level of acceptable risk. | Detection | 3. Develop Solutions | 3. Develop Maintenance Solutions to Improve Future Behavior | 6. What should be done to predict or prevent each failure(proactive tasks and task intervals)? |
Step 4 | 4. Implement controls. | Improvements | 4. Implement Solutions | 4. Implement Solutions | 7. What should be done if a suitable proactive task cannot be found (default actions)? |
Step 5 | 5. Supervise performance and evaluate | Current Process Contro | 5. Monitor Results | 5. Monitor Results and Adjust Maintenance Tactics | ( Auditing the Continuous Improvement of proactive failure reduction tasks.) |
Figure 1 - Program Comparison
Most of the above referenced processes also have a big “M” in their acronym. Its meaning varies to many different individuals. The commonality of these programs points to the real definition of that big “M”. All require Management. The acute risk to our plant, site, or installation critical assets is failing to use a process to manage them.
The Risk Management Process is composed of five (5) basic tasks or process steps.
1. Identify Failure Hazards,
2. Assess Failure Hazards,
3. Develop Controls and Make Risk Decisions,
4. Implement Controls,
5. Supervise and Evaluate (performance of the control measures).
Tasks 1 and 2 comprise the risk assessment. In Task 1, Managers and Craft/Techs
identify the failure modes and hazards which may be encountered during operation of
plant, site, or installation critical assets. Task 2 is a determination of impact of each failure
incident and resulting loss of operational function.
Tasks 3 thru 5 are activities to help the Manager effectively reduce the occurrence,
mitigate the consequences, and manage risk incidents. In these steps, managers balance asset
failure risks against costs of performing RIB (risk based inspections), increased frequency PM
procedures, and expanded PdM programs. They also implement the appropriate actions required
to eliminate unnecessary failure risks during asset operation. The planning, preparation, and
performance of repair, replacement and preventive maintenance activities are carefully evaluated
during these steps along the risk management path. Lastly, control activities are monitored and
evaluated for their effectiveness and valuable lessons learned are collected for use by others.
To apply the Basic Risk Management model:
1. Identify the Failure Hazards - A hazard is a condition or potential condition where
the failure results in loss of an operating function, damage to, or loss of an asset and related
components found in an operational environment.
2. Assess the Failure Hazards - Asset risk is defined as the combination of probability
of failure and the consequences (severity) of that occurrence. We can define probability as the
likelihood of a failure occurring, and severity as a measure of the impact of the failure to the
plant, site, or installation operating functions. Asset risk calculations increase as a result of
higher probability rates and greater impact to an operation.
A Risk Assessment requires each potential failure incident, hazard, or mode be evaluated
in relation to the probability of an incident occurring, and the severity (or impact upon the plant,
site, or installation) of that incident or failure.
This activity is heavily dependent upon the use of asset history, lessons learned in the
field, intuitive analysis, the Manager’s and Craft/Tech’s experience and sound judgment.
Incomplete, inaccurate, undependable, or contradictory information creates doubt and
uncertainty when determining the probability and severity of a failure incident. Assessment of
risk requires good judgment.
Figures 2 and 3 are tools that can be employed to perform an asset risk assessment.
Risk
Assessment Tool 1A is a simplified matrix which can be used by the Manager, or Craft/Tech, to
enter the estimated degree of severity and probability for each failure incident or hazard.
Numerical values have been assigned to each of the standardized descriptors. Multiplying the
severity number by the probability number will yield a product between 1 and 25. Comparing to
the attached key will indicate the estimated risk of failure. The larger the number, the higher the
risk.
Risk Assessment Tool 1B is a similarly designed table that can be used by the Manager,
or Craft/Tech, much in the same manner. Estimate the level of severity and probability of
occurrence then read right and up. The point where the failure severity row and probability of
occurrence column intersect, will define the level of failure risk for a particular asset.
Defining the levels of Probability of Failure occurrence:
Frequent - Failures happen often.
Likely - A failure will occur several times during the functional life of the asset.
Occasionally - Sporadic incidents of failure.
Seldom - Remote chance of an isolated failure.
Unlikely - An asset failure is not impossible but highly improbable.
The degrees of Failure Severity are:
Catastrophic - Total loss of asset functionality. Implied threat to related assets, systems, and property.
Critical - Significant reduction in asset, system, or plant operational capability. Significant collateral damage to adjacent assets, components, property, or environmental systems.
Marginal - Possibility of minor impact upon plant, site, or installation operational activities and requirements.
Negligible - Little or no impact on asset, system, or plant operation or capability. Little or no collateral asset, property, or environmental damage.
None - No impact.
The risk assessment tool examines potential failure occurrences in terms of probability and severity to determine the level of risk.
Assessing The Risk of Failure |
||||||
Probability of Failure Occurrence |
||||||
5 |
4 |
3 |
2 |
1 |
||
Value |
Level of Failure
Severity |
Frequent |
Likely |
Occasionally |
Seldom |
Unlikely |
5 |
Catastrophic |
25 |
20 |
15 |
10 |
5 |
4 |
Critical |
20 |
16 |
12 |
8 |
4 |
3 |
Marginal |
15 |
12 |
9 |
6 |
3 |
2 |
Negligible |
10 |
8 |
6 |
4 |
2 |
1 |
None |
5 |
4 |
3 |
2 |
1 |
Very High Risk | < 15 |
High Risk | < 10 |
Moderate Risk | < 5 |
Low Risk | > 5 |
Figure 2 – Risk Assessment Tool 1A
Assessing The Risk of Failure |
||||||
Level of Failure Severity | Probability of Failure Occurrence |
|||||
Frequent |
Likely |
Occasionally |
Seldom |
Unlikely |
||
Catastrophic | VH |
VH |
VH |
H |
M |
|
Critical | VH |
H |
H |
M |
L |
|
Marginal | VH |
H |
M |
M |
L |
|
Negligible | H |
M |
M |
L |
L |
|
None | M |
L |
L |
L |
L |
Very High Risk | VH |
High Risk | H |
Moderate Risk | M |
Low Risk | L |
Figure 3 – Risk Assessment Tool 1B
3. Develop Controls and Make Risk Decisions
After identifying and assessing each failure hazard, Managers and Craft/Techs must
develop one or more risk controls that will aid in avoiding, preventing, or reducing the risk
(probability and/or severity) of a failure incident. While developing controls, Managers must
consider the reason for the failure, not just the incident or its impact on asset functions and
operation.
Failure controls are generally fall into three (3) categories: risk avoidance, reliability based technology, and educational. Risk Avoidance may include engineering and/or redesign of asset installation and operational profile to remove any risk threat from operation and use of the equipment. Reliability based activities can include optimized PM procedures, PdM technologies, RCFA (Root Cause Failure Analysis), and SFMEA (Simplified Failure Mode Effects Analysis). RBI (Risk-based inspection) is an application of basic risk principles to manage inspection programs for critical plant, site, or installation assets. Educational and Training type controls provide knowledge and skill based programs to ensure implemented procedures and tasks are performed to specific standards.
To make a meaningful Risk Decision, a Risk Assessment should be conducted soon after development and implementation of the above referenced program controls. These results are then used to aid the decision making process pertaining to the amount of risk the Manager is willing to accept for the operation of a critical asset or system. A key activity of this task is to specify Who, What, Where, When, and How each control is to be used.
4. Implement Risk Controls
The number of higher failure risk assets is generally a small percentage of total plant
assets. Implement the new or additional PM and PdM tasks when and where needed and focus
efforts on the most critical items. Institute a formalized pro-active planning and scheduling
function to ensure all resources required to perform the newly implemented activities will be
available. The site CMMS should be configured to record and report KPIs (key performance
indicators) required for implementation and continuance of a risk reduction or avoidance
program. Do not discount or neglect interaction with MRO. Improve the skills of the workforce
through asset, maintenance and reliability training.
5. Supervise and Evaluate
The Manager is responsible for evaluating the effectiveness of the implemented controls
and programs in reducing or removing the failure potential.
Managers and first line supervision must ensure that subordinates understand how to
execute risk controls. Craft/techs continuously assess risks during the workday and should
maintain communication with Managers. Both groups should guard against complacency to
ensure that risk control and mitigation standards are not relaxed, circumvented, or violated.
Managers must continuously supervise and monitor asset PM/PdM and other inspection
activities to ensure they are effective and can keep risks at an acceptable level. Use the asset
history from the site CMMS as a source of information to indicate which controls failed and
why. Often, a completely different procedure may prove more effective and require
implementation.
The level of failure risk for each asset remaining after implementation of best practice
controls, is called residual risk. As new controls for failure hazards are identified and selected, a
risk assessment is again performed and levels of asset risk revised. The process can be repeated
until the risk of asset failure is acceptable or cannot be reduced. Management must be fully
committed to continuous improvement of the plant, site, or installation’s risk of failure reduction
efforts.
Risk Management must not be thought of as an add-on to the maintenance management
function, but as an integral part of departmental work planning, preparation, and execution.
Asset risk management is a well defined sustainable process, not a one time staged event.