CMMS Edge - Always master the basics of asset reliability
Written by David Berger, P.Eng. (Alta.) Friday, 22 May 2009As assets become more and more complex, and our dependency on them increases, it's not surprising that companies and regulatory bodies have a growing interest in asset reliability. Although assets are getting "smarter," they're also becoming more costly. As a result, companies are looking for ways to reduce operating costs, such as through improved asset reliability. Here's a primer on how to reduce maintenance costs through the establishment of a reliability management program, including basic features and functions to look for in a CMMS system.
Criticality analysis: By carefully examining the end-to-end processes within your operations, determine the criticality of each piece of equipment and its component parts. Critical inputs and outputs to the process should then be identified, as well as its sub-processes. Also, identify the points where a potential component or part failure would disrupt critical inputs/outputs.
Many CMMS packages are able to record the criticality as a coded field on the equipment-master file. Preventive and predictive tasks can then be defined to avoid failure of assets flagged as having a higher criticality. As well, the user can record corrective tasks required in the event of mechanical breakdown. In some cases, redundancy or "mirroring" can be a relatively inexpensive way to minimize asset downtime.
Failure analysis: Coded fields on the CMMS system greatly simplify data collection and force consistent reporting of failures by narrowing the choices. Descriptive fields are also available on most CMMS packages for more detailed explanations. A "problem" code refers to how a breakdown is reported. It's usually tied to a given asset class, such as motors, pumps or rooms. For example, in a facilities operation, a tenant might report that a room is excessively hot or cold.
A maintenance worker investigates the problem and a "cause" code is determined. Some of the more advanced CMMS systems tie a set of cause codes to a given problem code of a certain asset class. This creates a hierarchy of codes. In the example above, possible causes of a hot or cold room may be a failed thermostat, blown circuit breaker and inoperative fan, etc.
To fix the problem, the "action" code records what work was completed. In the example above, action codes might include repaired a fan, re-set circuit breaker or replaced the thermostat. Finally, a "delay" code explains why operations have temporarily ceased, such as awaiting raw materials, operator break, or product changeover. Identifying the most frequent and time-consuming reasons for the delay provide valuable insight into the priority of problems, which need to be addressed.
Pareto analysis: Failures can be prioritized in terms of impact on safety, operations output and cost. Use statistical analysis of equipment history to determine the frequency of high-impact problems, their underlying causes and most cost-effective actions. Pareto analysis is one such tool. A Pareto chart is nothing more than a frequency distribution of problem codes that can be plotted on a simple spreadsheet. The most sophisticated CMMS packages can assist with this kind of analysis.
Root cause analysis: A Fishbone Diagram, also referred to as a Cause and Effect Diagram, is probably the most popular tool used to find the root cause. It's a simple manual tool used in brainstorming sessions to focus discussion on possible causes of the higher frequency problems suggested by the Pareto chart. The CMMS system can help link problem and cause-code occurrences, with the corrective action required. Various predictive maintenance (PdM) and preventive maintenance (PM) tasks should then be explored to prevent a problem from occurring in the first place.
Diagnostic analysis: The most advanced CMMS software is moving away from simply reporting on coded history. Far more useful is a knowledge-based or rules-based troubleshooting database for identifying the best course of action for a given problem. If, for example, a motor fails in a given piece of equipment, the diagnostic tool determines the statistical likelihood of each cause code, and suggests corresponding actions to consider.
Additionally, correlations can be made with equipment or part vendors to determine if there's a higher failure rate originating from a given vendor. This allows you to take preventive or predictive steps to minimize costly downtime, and/or approach the vendor to fix the problem. Such a knowledge-based diagnostic tool could also be used for predicting failures in similar parts, components and equipment, once a pattern is determined. This would lead to monitoring the condition of key components that had not yet failed, but were deemed statistically likely to do so, in order to catch a problem before it happens.
Status change analysis: Status fields on a good CMMS system can be used to track the cycle time of various activities and delays. Work-order, status-field options could be pending approval, waiting for arrival of parts, assigned to maintenance worker, etc. The equipment or component status-field options might include commissioning, warranty repair, third-party repair and others.
Studying the history of status codes may provide valuable insight into how to improve asset reliability. Problems, such as long lead times and inadequate authorization, may suggest obvious corrective action. Additionally, the difference between cycle time (i.e. elapsed time, including delays) and "touch time" (actual hands-on productive time) highlights problems with the responsiveness of the maintenance department.
Asset performance analysis: One effective way to focus the attention of both operations and the maintenance department on asset care is to show the relationship between equipment reliability and operational productivity. This can be accomplished by tracking simple measures on the CMMS system, such as maintenance cost per unit of output, or operations cost per minute of equipment downtime. More important than the actual value of each measure is the trend over time.
Analysis of other measures: CMMS software vendors are looking for other useful measures to add to their toolboxes. Two such measures that are rising in importance are mean-time-between-failure (MTBF), and mean-time-to-repair (MTTR). By tracking MTBF and MTTR for each critical asset, the maintenance department will have a good sense of whether progress is being made in improving reliability (i.e. failures are becoming less frequent and are resolved much faster).
Condition monitoring and analysis: This is fast becoming an important feature of every CMMS system. The simplest packages allow users to manually input data, such as set points or equipment-usage meter readings for triggering PM routines. The more sophisticated CMMS systems are connected online to automated data-collection devices, either directly or through integration with third-party software. The software then analyzes incoming data to ensure that trends are within user-defined control limits. When data strays outside the control limit, users are "alarmed" and/or action is taken (i.e. issuing a work order or sending a pre-defined email or text message).
PM cost analysis: Everyone agrees that a planned environment is far superior to the constant "fire" fighting associated with a reactive environment. What's not clear, however, is where the balance between these two worlds lies. For example, how often should you change the oil in your car, how frequently should an airport strip be inspected and should seals be replaced on a regular basis, or only when a leak is evident?
There's a technique, which is sometimes referred to as Weibull analysis that graphs the decreasing cost of planned maintenance versus the increasing cost of unplanned maintenance as the PM interval increases. Where the two curves meet at a given PM interval, the cost is optimized.
PdM analysis: A simple example of this is comparing the history of engine failures with the condition of the lubrication oil prior to the failure. It may then be possible to predict the need to replace the oil, replace the rings, and so on, given the trends in oil temperature, viscosity and the amount and type of particulate in it.
Lifecycle analysis: One of the key decisions in any reliability management program is when to repair and when to replace a given asset. Suppose in my earlier example that the problem of "insufficient heat" had been caused by a failed thermostat in say, 80 percent of the cases reported in the equipment's history file. The average cost of repairing the unit may have been $225 for parts and labour.
Further analysis reveals that to replace all of the thermostats would cost only $125/unit. Moreover, preventing failure would ensure that building tenants aren't left in the cold especially during extended cold spells. As a result, repair/replace decisions can be justified based on statistical analysis of equipment history and cost data.
David Berger, P.Eng. (Alta.) is PEM's production/operations editor and a principal with Western Management Consultants. He's also the founding president of the Plant Engineering and Maintenance Association of Canada (PEMAC). For more information call (416) 362-6863 ext. 237; email: This e-mail address is being protected from spambots. You need JavaScript enabled to view it or visit www.wmc.on.ca.
(SCREEN SHOTL COURTESY TERO CONSULTING.)
Last modified on Wednesday, 07 October 2009 15:58
Many CMMS packages are able to record the criticality as a coded field on the equipment-master file. Preventive and predictive tasks can then be defined to avoid failure of assets flagged as having a higher criticality. As well, the user can record corrective tasks required in the event of mechanical breakdown. In some cases, redundancy or "mirroring" can be a relatively inexpensive way to minimize asset downtime.
Failure analysis: Coded fields on the CMMS system greatly simplify data collection and force consistent reporting of failures by narrowing the choices. Descriptive fields are also available on most CMMS packages for more detailed explanations. A "problem" code refers to how a breakdown is reported. It's usually tied to a given asset class, such as motors, pumps or rooms. For example, in a facilities operation, a tenant might report that a room is excessively hot or cold.
A maintenance worker investigates the problem and a "cause" code is determined. Some of the more advanced CMMS systems tie a set of cause codes to a given problem code of a certain asset class. This creates a hierarchy of codes. In the example above, possible causes of a hot or cold room may be a failed thermostat, blown circuit breaker and inoperative fan, etc.
To fix the problem, the "action" code records what work was completed. In the example above, action codes might include repaired a fan, re-set circuit breaker or replaced the thermostat. Finally, a "delay" code explains why operations have temporarily ceased, such as awaiting raw materials, operator break, or product changeover. Identifying the most frequent and time-consuming reasons for the delay provide valuable insight into the priority of problems, which need to be addressed.
Pareto analysis: Failures can be prioritized in terms of impact on safety, operations output and cost. Use statistical analysis of equipment history to determine the frequency of high-impact problems, their underlying causes and most cost-effective actions. Pareto analysis is one such tool. A Pareto chart is nothing more than a frequency distribution of problem codes that can be plotted on a simple spreadsheet. The most sophisticated CMMS packages can assist with this kind of analysis.
Root cause analysis: A Fishbone Diagram, also referred to as a Cause and Effect Diagram, is probably the most popular tool used to find the root cause. It's a simple manual tool used in brainstorming sessions to focus discussion on possible causes of the higher frequency problems suggested by the Pareto chart. The CMMS system can help link problem and cause-code occurrences, with the corrective action required. Various predictive maintenance (PdM) and preventive maintenance (PM) tasks should then be explored to prevent a problem from occurring in the first place.
Diagnostic analysis: The most advanced CMMS software is moving away from simply reporting on coded history. Far more useful is a knowledge-based or rules-based troubleshooting database for identifying the best course of action for a given problem. If, for example, a motor fails in a given piece of equipment, the diagnostic tool determines the statistical likelihood of each cause code, and suggests corresponding actions to consider.
Additionally, correlations can be made with equipment or part vendors to determine if there's a higher failure rate originating from a given vendor. This allows you to take preventive or predictive steps to minimize costly downtime, and/or approach the vendor to fix the problem. Such a knowledge-based diagnostic tool could also be used for predicting failures in similar parts, components and equipment, once a pattern is determined. This would lead to monitoring the condition of key components that had not yet failed, but were deemed statistically likely to do so, in order to catch a problem before it happens.
Status change analysis: Status fields on a good CMMS system can be used to track the cycle time of various activities and delays. Work-order, status-field options could be pending approval, waiting for arrival of parts, assigned to maintenance worker, etc. The equipment or component status-field options might include commissioning, warranty repair, third-party repair and others.
Studying the history of status codes may provide valuable insight into how to improve asset reliability. Problems, such as long lead times and inadequate authorization, may suggest obvious corrective action. Additionally, the difference between cycle time (i.e. elapsed time, including delays) and "touch time" (actual hands-on productive time) highlights problems with the responsiveness of the maintenance department.
Asset performance analysis: One effective way to focus the attention of both operations and the maintenance department on asset care is to show the relationship between equipment reliability and operational productivity. This can be accomplished by tracking simple measures on the CMMS system, such as maintenance cost per unit of output, or operations cost per minute of equipment downtime. More important than the actual value of each measure is the trend over time.
Analysis of other measures: CMMS software vendors are looking for other useful measures to add to their toolboxes. Two such measures that are rising in importance are mean-time-between-failure (MTBF), and mean-time-to-repair (MTTR). By tracking MTBF and MTTR for each critical asset, the maintenance department will have a good sense of whether progress is being made in improving reliability (i.e. failures are becoming less frequent and are resolved much faster).
Condition monitoring and analysis: This is fast becoming an important feature of every CMMS system. The simplest packages allow users to manually input data, such as set points or equipment-usage meter readings for triggering PM routines. The more sophisticated CMMS systems are connected online to automated data-collection devices, either directly or through integration with third-party software. The software then analyzes incoming data to ensure that trends are within user-defined control limits. When data strays outside the control limit, users are "alarmed" and/or action is taken (i.e. issuing a work order or sending a pre-defined email or text message).
PM cost analysis: Everyone agrees that a planned environment is far superior to the constant "fire" fighting associated with a reactive environment. What's not clear, however, is where the balance between these two worlds lies. For example, how often should you change the oil in your car, how frequently should an airport strip be inspected and should seals be replaced on a regular basis, or only when a leak is evident?
There's a technique, which is sometimes referred to as Weibull analysis that graphs the decreasing cost of planned maintenance versus the increasing cost of unplanned maintenance as the PM interval increases. Where the two curves meet at a given PM interval, the cost is optimized.
PdM analysis: A simple example of this is comparing the history of engine failures with the condition of the lubrication oil prior to the failure. It may then be possible to predict the need to replace the oil, replace the rings, and so on, given the trends in oil temperature, viscosity and the amount and type of particulate in it.
Lifecycle analysis: One of the key decisions in any reliability management program is when to repair and when to replace a given asset. Suppose in my earlier example that the problem of "insufficient heat" had been caused by a failed thermostat in say, 80 percent of the cases reported in the equipment's history file. The average cost of repairing the unit may have been $225 for parts and labour.
Further analysis reveals that to replace all of the thermostats would cost only $125/unit. Moreover, preventing failure would ensure that building tenants aren't left in the cold especially during extended cold spells. As a result, repair/replace decisions can be justified based on statistical analysis of equipment history and cost data.
David Berger, P.Eng. (Alta.) is PEM's production/operations editor and a principal with Western Management Consultants. He's also the founding president of the Plant Engineering and Maintenance Association of Canada (PEMAC). For more information call (416) 362-6863 ext. 237; email: This e-mail address is being protected from spambots. You need JavaScript enabled to view it or visit www.wmc.on.ca.
(SCREEN SHOTL COURTESY TERO CONSULTING.)
Published in
Features
Tagged under



