Feedback

John Moubray wrote the book on RCM — literally.

His book entitled Reliability-Centred Maintenance (now in its second edition) lays out the essentials of this all-encompassing maintenance philosophy. In this article, he takes aim at the growing popularity of streamlining' the RCM process. There are no shortcuts to RCM, Moubray argues, and attempts to abridge the strategy to save time or money are fundamentally flawed. Not only does the author dismantle some of the leading methods of implementing a reduced version of RCM, he argues that doing so is fraught with legal, even ethical peril. True RCM is designed to identify the absolute, safe minimum of what must be done to preserve the functions of physical assets.

The importance of this quality of RCM is manifest, argues Moubray. "Society is growing increasingly intolerant of industrial accidents and seeks to hold individuals, as well as corporations to account." he writes. "Under these circumstances, everyone involved in the management of physical assets needs to take greater care than ever to ensure that every step they take in executing their official duties is beyond reproach. It is becoming professionally suicidal to do otherwise."

True RCM demands that all information and decisions made by maintenance managers be documented in such a way as to make the information and the decisions fully available to any third party. In the event of an accident, true RCM, if properly implemented, could be the maintenance managers' best alibi in the face of legal scrutiny, even criminal charges. (Ed.)

Reliability-centred maintenance (RCM) is a process used to ensure that any physical asset or system continues to function properly. It was first used by the commercial aviation industry as early as the 1960s. Driven by the need to improve reliability while containing the cost of maintenance, the industry developed a comprehensive process for deciding what maintenance work was needed to keep aircraft airborne.

In 1978, U.S. Department of Defense took notice of RCM after F. Stanley Nowlan and Howard Heap of United Airlines wrote a report for the Pentagon entitled Reliability-Centered Maintenance. It formed the basis of the maintenance strategy formulation process named for the Air Transport Association of America's Maintenance Steering Group - 3 Task Force (MSG3). MSG3 is used to this day by the international commercial aviation industry. In the years following their landmark report, Nowlan and Heap' s RCM began to influence industries outside of aviation. Its growing popularity among maintenance professionals was based on its ability to identify the true, safe minimum of what must be done to preserve the functions of physical assets.
Today, RCM is used by thousands of organizations in nearly every major industrial field.

Throughout its rise to acceptance, RCM has spawned a series of derivatives. Some of these variations are refinements and enhancements of Nowlan and Heap's original RCM process. However, less rigorous derivatives have also emerged, most of which are attempts to ?Â¥streamline' the maintenance strategy formulation process.

Towards a standard
Many of these abridged processes either omit key steps of the process described by Nowlan and Heap, or change their sequence, or both. Consequently, despite claims to the contrary made by the proponents of these processes, the output differs markedly from what would be obtained by conducting a full, rigorous RCM analysis.

A growing awareness of these differences led to demands for a standard that set out the criteria any process must comply with in order to be called RCM. The Society of Automotive Engineers (SAE) responded to these calls in 1999 with the formulation of an RCM standard.

Section 5 of the standard, JA1011 - Evaluation Criteria for Reliability-Centred Maintenance, summarizes the key attributes of any RCM process as follows:

Any RCM process shall ensure that all of the following seven questions are answered satisfactorily and are answered in the sequence shown below:
a. What are the functions and associated desired standards of performance of the asset in its present operating context (functions)?
b. In what ways can it fail to fulfill its functions (functional failures)?
c. What causes each functional failure (failure modes)?
d. What happens when each failure occurs (failure effects)?
e. In what way does each failure matter (failure consequences)?
f. What should be done to predict or prevent each failure (proactive tasks and task intervals)?
g. What should be done if a suitable proactive task cannot be found (default actions)?

All information and decisions shall be documented in a way which makes the information and the decisions fully available to and acceptable to the owner or user of the asset."

Subsequent sections of the standard list the issues that any true RCM process must address in order to answer each of the seven questions "satisfactorily". However, the key words in Section 5 of the standard are in the first sentence. They are: ?Â¥any', ?Â¥all' and ?Â¥in the sequence shown below'. They mean that if any process that does not answer all the questions in the sequence shown (and which does not answer them satisfactorily in compliance with the rest of the standard), then that process is not RCM.

None of the streamlined "RCM" processes comply fully with the requirements of Section 5 of the SAE standard.

Regulatory issues
The reaction of society as a whole to equipment failures is an aspect of physical asset management that is changing at warp speed as we move into the 21st century.

Physical asset management is evolving. Society is increasingly looking for transparency in industry, especially following high-profile equipment failures (that lead to injuries or fatalities). Yet, this trend has attracted surprisingly little comment within the maintenance community.

The changes began with sweeping legislation governing industrial safety, mainly in the 1970s. Among the best known examples of such legislation are the Occupational Safety and Health Act of 1970 in the United States and the Health and Safety at Work Act of 1974 in the United Kingdom. These Acts are fairly general in nature, and similar laws have been passed in nearly all the major industrialized countries. Their intent is to ensure that employers provide a generally safe working environment for their employees.

These Acts were followed by a series of more specific safety-oriented laws and regulations such as OSHA Regulation No. 1910.119: "Process Safety Management of Highly Hazardous Chemicals" in the United States and the "Control of Substances Hazardous to Health Regulations" in the United Kingdom. Both of these regulations were first enacted in the early to mid-1990s. They are noteworthy examples of a then-new requirement for the users of hazardous materials to perform formal analyses or assessments of the associated systems, and to document the analyses for subsequent inspection if necessary by regulators.

Increasing regulation demanded that physical asset managers are subject to a steady increase in legal requirements to demonstrate responsible custodianship of the assets under their control. These laws placed heavy burdens on the managers of the assets concerned. But, they reflect the steadily rising expectations of society in terms of industrial safety — industry has no choice but to comply as best it can.

The late 1990s have seen even more changes, this time concerning the sanctions that society now wishes to impose if things go wrong. Until the mid-90s, if a failure occurred whose consequences were serious enough to warrant criminal proceedings, these proceedings usually ended with a substantial fine imposed on the organization found to be at fault. Occasionally, the organization's permit to operate was withdrawn, as in the case of the ValuJet airline after the crash in Florida on 11 May 1996, effectively putting the airline out of business.

Following recent industrial disasters, however, a movement is now developing not only to punish the organizations concerned, but also to impose criminal sanctions on individual managers. In other words, under certain circumstances, individual managers can be sent to prison in connection with equipment failures that result in injury or death.

In the wake of a fatal 1999 rail crash in the UK, British law-makers introduced a new homicide designation: "corporate killing". Executives found guilty of such offenses can be imprisoned. In the U.S. following recent SUV accidents allegedly caused by faulty tires, national laws were amended to include prison sentences of up to 15 years for vehicle manufacturer executives presiding over companies that commit specified offenses in connection with vehicle failures that cause injuries or death.

There is considerable controversy about the reasonableness of these initiatives, and even some doubt about their ultimate enforceability. However, from the point of view of people involved in the management of physical assets, the issue is not what is reasonable, but that we are increasingly being held personally accountable for actions that we take on behalf of our employers. Not only that, but if we are called to account in the event of a serious incident, it will be in circumstances that could culminate in jail sentences.

Perhaps the most startling legislative developments of all were triggered by a deadly gas plant explosion in Longford, Australia. Following the disaster, legislators amended the criminal code in the case of industrial disasters to suspend attorney/client confidentiality for the purposes of the Longford ?± and subsequent ?± official inquiries.

Furthermore, the state governments of Victoria and Queensland are also considering legislation to deal with "industrial manslaughter" and "corporate culpability" respectively, as both governments believe that their current legislation does not deal adequately with industrial incidents causing death or serious injury. These proposed laws go further than the laws in the UK and the US, in that the concept of "aggregation of negligence" is introduced. This allows the aggregation of actions and omissions of a group of employees and managers to establish that an organization is negligent. Both governments have made it clear that if managers and/or a management system fails to prevent workplace death or serious injury, then the responsible manager and/or management team is likely to face criminal prosecution. If the legislation proceeds, penalties of over $500,000 and seven years imprisonment are proposed.

The message to us all is that society is intolerant of industrial accidents and seeks to hold individuals, as well as corporations to account. It's prepared to alter well-established principles of jurisprudence to do so. Under these circumstances, everyone involved in the management of physical assets needs to take greater care than ever to ensure that every step they take in executing their official duties is beyond reproach. It is becoming professionally suicidal to do otherwise.

Streamlined RCM
Associates and I have helped companies to apply true RCM on more than 1,200 sites spanning 41 countries and nearly every form of organized human endeavour. We've found that when true RCM has been correctly applied by well-trained individuals working on clearly defined and properly managed projects, the analyses have usually paid for themselves in between two weeks and two months. This is a very rapid payback indeed.

However, despite this rapid payback, some individuals and organizations have expended a great deal of energy on attempts to reduce the time and resources needed to apply the RCM process. The results of these attempts are generally known as ?Â¥streamlined' RCM techniques.

In all cases, the proponents of these techniques claim their principal advantage is that they achieve similar results to something which they call ?Â¥classical' RCM, but that they do so in much less time and at much lower cost. However, not only is this claim questionable, but all of the streamlined techniques have other drawbacks, some quite serious.

Retroactive approaches
An article by C. Bookless and M. Sharkey published in the UK magazine Maintenance in 2000 described the process of 'streamlining' RCM in the British nuclear industry. The article signaled a growing trend towards implementing abbreviated versions of RCM. The most popular method of 'streamlining' the RCM often starts not by defining the functions of the asset (as specified in the SAE Standard), but with the existing maintenance tasks. Users of this approach try to identify the failure mode that each task is supposed to be preventing, and then work forward again through the last three steps of the RCM decision process to re-examine the consequences of each failure. This supposedly helps to identify a more cost-effective failure management policy. K.S. Jacobs, in his 1997 presentation at the ASNE Fleet Maintenance Symposium in San Diego, CA, also described this as "backfit" RCM; others use the term "RCM in reverse".

Retroactive approaches are superficially very appealing, so much so that I tried them myself on numerous occasions when I was new to RCM. However, in reality they are also among the most dangerous of the streamlined methodologies, for the following reasons:

- They assume that existing maintenance programs cover just about all the failure modes that are reasonably likely to require some sort of preventive maintenance. In the case of every maintenance program that I have encountered to date, this assumption is simply not valid. If RCM is applied correctly, it transpires that nowhere near all of the failure modes that actually require PM are covered by existing maintenance tasks. As a result, a considerable number of tasks have to be added. Most of the tasks that are added apply to protective devices, as discussed below. (Other tasks are eliminated because they are found to be unnecessary, or the type of task is changed, or the frequency is changed. The net effect is usually a reduction in perceived overall PM workloads, typically by between 40 and 70 percent.)

- When applying retroactive RCM, it is often very difficult to identify exactly what failure cause motivated the selection of a particular task, so much so that either inordinate amounts of time are wasted trying to establish the real connection, or sweeping assumptions are made that very often prove to be wrong. These two problems alone make this approach an extremely shaky foundation upon which to build a maintenance program.

- In reassessing the consequences of each failure mode, it is still necessary to ask whether "the loss of function caused by the failure mode will become evident to the operating crew under normal circumstances". This question can only be answered by establishing what function is actually lost when the failure occurs. This in turn means that the people doing the analysis have to start identifying functions anyway, but they are now trying to do so on an ad hoc basis halfway through the analysis (and they are not usually trained in how to identify functions correctly in the first place because this approach considers the function identification step to be unnecessary). If they do not, they start making even more sweeping — and hence often incorrect — assumptions that add to the shakiness of the results.

- Retroactive approaches are particularly weak on specifying appropriate maintenance for protective devices. As I stated in my book entitled Reliability-Centred Maintenance: "...at the time of writing, many existing maintenance programs provide for fewer than one third of protective devices to receive any attention at all (and then usually at inappropriate intervals). The people who operate and maintain the plant covered by these programs are aware that another third of these devices exist but pay them no attention, while it is not unusual to find that nobody even knows that the final third exist. This lack of awareness and attention means that most of the protective devices in industry — our last line of protection when things go wrong — are maintained poorly or not at all."

So if one uses a retroactive approach to RCM, in most cases a great many protective devices will continue to receive no attention in the future because no tasks were specified for them in the past. Given the enormity of the risks associated with unmaintained protective devices, this weakness of retroactive RCM alone makes it completely indefensible. Some variants of this approach address this problem by specifying that protective systems should be analyzed separately, often outside the RCM framework. This gives rise to the absurd situation that two analytical processes have to be applied in order to compensate for the deficiencies created by attempts to streamline one of them.

- More so than any of the other streamlined versions of RCM, retroactive approaches focus on maintenance workload reduction rather than plant performance improvement (which is the primary goal of function-oriented true RCM). Since the returns generated by using RCM purely as a tool to reduce maintenance costs are usually lower than the returns generated by using it to improve reliability, the use of the ostensibly cheaper retroactive approach becomes self-defeating on economic grounds, in that it virtually guarantees much lower returns than true RCM

Use of generic analyses
A fairly widely-used shortcut in the application of RCM entails applying an analysis performed on one system to technically identical systems. In fact, one or two organizations even sell such generic analyses, on the grounds that it is cheaper to buy an analysis that has already been performed by someone else than it is to perform your own. The following paragraphs explain why generic analyses should be treated with great caution:

- Operating context: In reality, technically identical systems often require completely different maintenance programs if the operating context is different. For example, consider three pumps A, B and C that are technically identical (same make, model, drives, pipework, valvegear, switchgear, and pumping the same liquid against the same head). The generic mind-set suggests that a maintenance program developed for one pump should apply to the other two.

However, pump A stands alone, so if it fails, operations will be affected sooner or later. As a result the users and/or maintainers of Pump A are likely to make some effort to anticipate or prevent its failure. (How hard they try will be governed both by the effect on operations and by the severity and frequency of the failures of the pump.)

However, if pump B fails, the operators simply switch to pump C, so the only consequence of the failure of pump B is that it must be repaired. As a result, it is likely that the operators of B would at least consider letting it run to failure (especially if the failure of B does not cause significant secondary damage.) On the other hand, if pump C fails while pump B is still working (for instance if someone cannibalizes a part from C), it is likely that the operators will not even know that C has failed unless or until B also fails. To guard against this possibility, a sensible maintenance strategy might be to run C from time to time to find out whether it has failed. This example shows how three identical assets can have three totally different maintenance policies because the operating context is different in each case. In the case of the pumps, a generic program would only have specified one policy for all three pumps.

Apart from redundancy, many other factors affect the operating context and hence affect the maintenance programs that could be applied to technically identical assets. These include whether the asset is part of a peak load or base load operation, cyclic fluctuations in market demand and/or raw material supplies, the availability of spares, quality and other performance standards that apply to the asset, the skills of the operators and maintainers, and so on.

- Maintenance tasks: Different organizations — or even different parts of the same organization — seldom employ people with identical skill-sets. This means that people working on one asset may prefer to use one type of proactive technology (say high-tech condition monitoring), while another group working on an identical asset may be more comfortable using another (say a combination of performance monitoring and the human senses). It is surprising how often this difference does not matter, as long as the techniques chosen are cost-effective. In fact, many maintenance organizations are starting to realize that there is often more to be gained from ensuring that the people doing the work are comfortable with what they are doing than it is to compel everyone to do the same thing. (The validity of different tasks is also affected by the operating context of each asset. For instance, think how background noise levels affect checks for noise.) Because generic analyses necessarily incorporate a "one size fits all" approach to maintenance tasks, they do not cater to these differences and hence have a significantly reduced chance of acceptance by the people who have to do the tasks.

These two points mean that special care must be taken to ensure that the operating context, functions and desired standards of performance, failure modes, failure consequences and the skills of the operators and maintainers are all effectively identical before applying a maintenance policy designed for one asset to another. They also mean that an RCM analysis performed on one system should never be applied to another without any further thought just because the two systems happen to be technically identical.

Use of generic lists of failure modes
Generic lists of failure modes are lists of failure modes — or sometimes entire failure mode effect analyses (FMEA) — prepared by third parties. They may cover entire systems, but more often cover individual assets or even single components. These 'generic' lists are touted as another method of speeding up or 'streamlining' this part of the maintenance program development process. In fact, they should also be approached with great caution, for all the reasons discussed in the previous section of this paper, and for the following additional reasons:

- The level of analysis may be inappropriate: It is possible to 'drill down' almost any number of levels when seeking to identify failure modes (or causes of failure). The point at which this process should stop is the level at which it is possible to identify an appropriate failure management policy, and this can vary enormously depending on the operating context of the system. In other words, when establishing causes of failure for technically identical assets, it may be appropriate in one context to ask why it fails once, and in another it may be necessary to ask why it has failed seven or eight times. However, if a generic list is used, this decision will already have been made in advance of the RCM analysis. For instance, all the failure modes in the generic list may have been identified as a result of asking why four or five times, when all that may be needed is level 1. This means that far from streamlining the process, the generic list would condemn the user to analyzing far more failure modes than necessary. Conversely, the generic list may focus on level 3 or 4 in a situation where some of the failure modes really ought to be analyzed at level 5 or 6. This would result in an analysis that is too superficial and possibly dangerous.

- The operating context may be different: The operating context of your asset may have features which make it susceptible to failure modes that do not appear in the generic list. Conversely, some of the modes in the generic list might be extremely improbable (if not impossible) in your context.

- Performance standards may differ: Assets may operate to standards of performance which means that an entire definition of failure may be completely different from that used to develop the generic FMEA.

These three points mean that if a generic list of failure modes is used at all, it should only ever be used to supplement a context-specific FMEA, and never used on its own as a definitive list.

Skipping elements of the process
Another common way in which the RCM process is "streamlined" is by skipping various elements of the process altogether. The step most often omitted is the definition of functions. Proponents of this methodology start immediately by listing the failure modes that might affect each asset, rather than by defining the functions of the asset under consideration. They do so either because they claim that, especially in the case of a "non-safety-critical" plant, identifying functions does not contribute enough relative to the amount of time it takes (see M Dixey and J Gallimore's 2000 article in Maintenance Vol. 15 No. 1 entitled "Fast-tracking RCM — Getting results from RCM"), or because they simply appear not to be aware that defining all the functions and the associated desired standards of performance of the assets under review is an integral part of the RCM process (See S.D. Mundy's article in Vol. 7 No. 3 edition of Reliability entitled "Completing the Reliability-Centred Maintenance loop at a new process facility").

In fact, it is generally accepted by all the proponents of true RCM that in terms of improved plant performance, by far the greatest benefits of true RCM flow from the extent to which the function definition step transforms general levels of understanding of how the equipment is supposed to work. So cutting out this step costs far more in terms of benefits foregone than it saves in reduced analysis time.

From a purely technical point of view, the identification of functions and associated desired performance also makes it far easier to identify the surprisingly common situations (failure modes) where the asset is simply incapable of doing what the user wants it to do, and therefore fails too soon or too often. For this reason, eliminating the function definition step further reduces the power of the process.

The comments in the earlier discussion on retroactive approaches also apply here.

Analyze only "critical" functions or "critical" failures
The SAE Standard stipulates inter alia that a true RCM analysis should define all functions, and that all reasonably likely failure modes should be subjected to the formal consequence evaluation and task selection steps. The shortcuts embodied in some of the streamlined RCM processes try to analyze 'critical' functions only, or to subject only 'critical' failure modes to detailed analysis. These approaches have two main flaws, as follows:

- The process of dismissing functions and/or failure modes as being 'non-critical' necessarily entails making assumptions about what a more detailed analysis might reveal. In the personal experience of the author, such assumptions are frequently wrong. It is surprising how often apparently innocuous functions or failure modes are found on closer examination to embody elements that are highly critical in terms of safety and/or environmental integrity. As a result, the practice of prematurely dismissing functions or failure modes results in much riskier analyses, but because the analysis is incomplete, no-one knows where or what these risks are.

- Many of the streamlined processes that adopt this approach incorporate elaborate additional steps designed to 'help' identify what functions and/or failure modes are critical or non-critical. In a great many cases, applying these additional steps takes longer and costs more than it would take to conduct a rigorous analysis of every function and every reasonably likely failure mode using true RCM, yet the output is considerably less robust.

Analyze only "critical" equipment

An approach to maintenance strategy formulation that is often presented as a 'streamlined' form of RCM suggests that the RCM process should be applied to 'critical' equipment only. This issue does not fall within the scope of the SAE standard, because the standard does not deal with the selection of equipment for analysis. It defines RCM as a process that can be applied to any asset, and it assumes that decisions about what equipment is to be analyzed and about system boundaries have already been made when the time comes to apply the RCM process defined in the SAE standard. There were two reasons why the equipment selection process was omitted from the standard:

- Different industries use widely differing criteria to judge what is 'critical'. For instance, the ability of assets to produce products within given quality limits is a major issue in manufacturing operations, and hence features prominently in assessments of criticality. However, this issue barely figures at all with respect to equipment used by military undertakings. This means that there is an equally wide range of techniques used to assess criticality — so wide that it is impossible to encompass this issue in one universal standard.

- There is a growing school of thought (with which I have some sympathy) that there is no such thing as an item of plant — at least in an industrial context — that is 'non-critical' or 'non-significant' to the extent that it does not justify analysis using RCM. Two of the main reasons for believing that systems or items of plant should not be dismissed as 'non-critical' prior to rigorous analysis are exactly the same as the reasons given above for not dismissing functions and failure modes in the same way. (In fact, many organizations that choose to start with a formal, across-the-board equipment criticality assessment seem to spend as much time deciding what assessment methodology they will use and then applying it as they would have spent using true RCM to analyze all the equipment in their facility.)

There is a great deal more that could be said both in favour of and against the idea of using equipment criticality assessments as a means of deciding whether to perform rigorous analyses using techniques such as RCM. Since criticality assessment techniques are not an integral part of the RCM process, such a discussion is beyond the scope of this article. It is incorrect to present such techniques as streamlined forms of RCM because they do not form part of the RCM process as defined by the SAE standard.

In nearly all cases, the proponents of the streamlined approaches to RCM claim that these approaches can produce much the same results as true RCM in about a half to a third of the time. However, the above discussion indicates that not only do they not produce the same results as true RCM, but that they contain logical or procedural flaws which increase risk to an extent that overwhelms any small advantage they might offer in reduced application costs. It also transpires that many of these 'streamlined' techniques actually take longer and cost more to apply than true RCM, so even this small advantage is lost. As a result, the business case for applying streamlined RCM is suspect at best.

However, a rather more serious point needs to be kept in mind when considering these techniques. The very word 'streamline' suggests that something is being omitted, and this article indicates that this is indeed so for the streamlined techniques described. In other words, there is to a greater or lesser extent a degree of sub-optimization embodied in all of these techniques.

Leaving things out inevitably increases risk. More specifically, it increases the probability that an unanticipated failure, possibly one with very serious consequences, could occur. If this does happen, managers of the organization involved are increasingly likely to find themselves called personally to account. Worst-case scenario: they will not only have to explain, often in an emotionally-charged courtroom why they deliberately chose a sub-optimal decision-making process to establish their asset management strategies in the first place, rather than using one which complies fully with a standard set by an internationally-recognized standards-setting organization.

One rationale often advanced for using the streamlined methods is that it is better to do something than to do nothing. However, this rationale misses the point that all the analytical processes described above, streamlined or otherwise, require users to document the analyses. This generates a clear audit trail showing all the key information and decisions underlying the asset management strategy, in most cases where none has existed before. If a sub-optimal approach is used to formulate these strategies, the existence of written records makes every shortcut much clearer to any investigators.

A further rationale for streamlining says something like "we have been using this approach for a few years now and we haven't had any accidents, so it must be all right." This rationale betrays a complete misunderstanding of the basic principles of risk. Specifically, no analytical methodology can completely eliminate risk. However, the difference between using a more rigorous methodology and a less rigorous methodology may be the difference between a probability of a catastrophic event of one in a million versus one in ten thousand. In both cases, the event may happen next year or it may not happen for thousands of years, but in the second case, it is a hundred times more likely. If such an event were to happen, the user of true RCM would be able to claim that he or she exercised prudent, responsible custodianship by applying a rigorous process that complies with an internationally recognized standard, and as such would be in a highly defensible position. Under the same circumstances, the user of streamlined RCM is on much, much shakier ground.

Author's Note: When discussing streamlined RCM it's worth asking what exactly it is that is being streamlined. Nearly all the advocates of streamlined processes compare their offerings to something they call 'classical' RCM. However, closer study of what they mean by 'classical' RCM reveals that it is often a monstrously complicated process or collection of processes that bears little or no resemblance to RCM as defined in the SAE standard. In these cases, it is hardly surprising that streamlined RCM is cheaper and quicker than these so-called 'classical' fantasies. In reality, if true RCM is applied as explained earlier, it is nearly always quicker and cheaper than the streamlined versions, in addition to being far more defensible and producing far greater returns.


John Moubray is the president and founder of Aladon LLC, a provider of RCM training, consulting and software. He can be reached at 828-277-2780 or at This e-mail address is being protected from spambots. You need JavaScript enabled to view it .

Published in Features
The challenge for managers in any business is to identify the critical elements that determine success or failure in their respective disciplines. Doing so is a three step process:

1. Reinforce and exploit those elements that generate success;
2. Eliminate, so far as is possible, those that lead to failure and finally;
3. Recognize those that are beyond their power to affect.

Managing shutdowns is no different. Out of vital decisions, activities and constraints that drive a shutdown, the manager has to distill and extract those critical elements that determine the outcome of the event. Further, if excellent performance on a continuing basis is the objective, then the elements must be fashioned into a framework that can be used as a blueprint for success on an ongoing basis.

This article presents such a framework. It was developed in Great Britain over a period of years by a number of professional shutdown managers and engineers. Their aim was to understand in detail how shutdowns worked — and why they sometimes didn't — so that a system could be developed for repeatable success. The final product of this process is the Framework for Excellence for Shutdown Management.

The framework is deceptively simple but the managers and engineers who have used it around the world (now numbering over one hundred), either to audit their company's performance or as a framework to create a comprehensive shutdown methodology.

It is simply a checklist of basic requirements. At a higher level, it is a tactical tool that allows us to focus time money and effort where they are needed. At its highest level, it is a mechanism to promote strategic management thinking. But before plunging into the main body of this article it may be advantageous to examine three larger frameworks that influence the ability to use the Framework for Excellence.

The shutdown in a business context
The function of business is to generate profit. Profit is what is left when the cost of production and distribution is subtracted from the selling price. Production requires a reliable manufacturing plant to produce sufficient throughput of the right quality of product to make it worth selling. The function of maintenance is to protect the reliability of the plant. A shutdown in its simplest form is a large maintenance event during which existing plant equipment is inspected and refurbished, new equipment is installed and redundant equipment is removed — all to promote reliability in the plant that provides the production that generates the profit.

In establishing this direct link between shutdown effectiveness and business effectiveness, it may seem to the reader that I am simply stating the obvious. I am. And the reason is: in dealing with a large number of companies over the last 10 years, I have met many managers and engineers who, if their actions are any guide, have forgotten the existence of the link or never knew it existed in the first place.

The shutdown as a project
A shutdown has all of the elements of a project but it also has some unique features to it. The most crucial of these distinctions is the uncertainty generated by the project. Admittedly, there are uncertainties in any kind of project, but these are normally generated by the environment — late delivery of materials, etc. However, one thing that is known, and usually very well defined, is the scope of work.

In the case of the shutdown there is an inherent uncertainty that lies at the heart of the project — the actual work scope of the project is unknown until we open up equipment inspect it. For the purpose of this article we shall call this "emergent" work simply because it is work that only emerges after the project is underway.

Under normal circumstances, emergent work may increase the work scope by between five and 15 percent. Under abnormal circumstances the increase in work scope and its attendant difficulties can increase the impact of the emergent work to the point where it may dwarf the original work scope. Unfortunately, as plants age, as we lose more and more local knowledge of our plants through employee down-sizing, and as we outsource more and more work to contract companies who (rightly) require visible payment for every task they perform, there is an increasing risk that the abnormal may become the norm unless we do something to reverse the trend.

I am frequently asked by clients "Why are we not doing our shutdowns as well as we used to?" The answer may well be: "Because over a period of years you have changed the way you do shutdowns and you have not made adequate provisions for the change." Of course, there is always the niggling suspicion that we never actually did them as well as we thought — but that's a whole other can of worms.

Because of the uncertainty, we schedule work, enter into contracts, and procure resources against a plan that is, at best, an intelligent estimate, very often a hopeful guess, and at worst, a venture into the unknown.

As responsible engineers however, we cannot, in the face of this uncertainty, simply wring our hands and hope for the best. We must develop a routine for handling emergent work that will allow us to eliminate as much of the uncertainty as we can before the event and minimize its impact during the event.

Uncertainty, coupled with the need to exert control over it is, in my estimation, one of the main factors that make a shutdown a unique type of project.

The shutdown as a process

If, as I and others contend, a shutdown is a rational process then we should be able to describe it in simple terms and lay bare the logical sequence of events that leads from the first to the last action. There are others who contend that it's much more complicated and, for reasons best known to themselves, hide behind the parapet of confusion. "We've always done it this way!" is their credo. The rational process is made up of five main phases.

PHASE 1 — Initiation
This is the strategic phase in the process. It is the most crucial because it forms the foundation for everything else that comes after. It is also, sadly, the least understood and the most ignored of the phases.

Initiating the shutdown is the responsibility of the senior management of the company. It requires the concerted input and commitment of the senior managers in all key elements of business including, but not limited to, business, marketing, asset management, engineering, operations/ manufacturing, maintenance, quality and safety.

It is the responsibility of these managers to consider and balance the drivers, constraints and requirements of the shutdown to ensure it is performed on a sound business footing. It is also their responsibility to formulate the objectives and limits for the shutdown that will define the context within which everyone else will work.

PHASE 2 — Preparation
This is the longest phase of the shutdown process during which information from many sources is collected, validated, collated and processed into plans, schedules and estimates. The common name for this activity is "planning". The reason we need to plan a shutdown is because it is a complex event. But the planning of a shutdown is a complex process — so does that mean we have to plan the planning?

The answer is most certainly yes. This preparation network highlights the large number of activities that have to be performed within the preparation time frame to ensure all is ready for the event start date and the tasks are not only completed, but have been approved by the senior management.

PHASE 3 — Execution
Execution of a shutdown is performed by a (relatively) large group of people in a small geographical area using many different working techniques, some of which are inherently hazardous, under time and financial pressures. This requires an effective organization working for a professional manager to carry out the plans and schedules required to meet the objectives of the event.

The event itself can be further sub-divided into three main phases.

1. Shutting the plant down — when the plant is brought offline and, if necessary decontaminated and pacified.
2. Performing the scheduled work — when all of the planned major tasks, minor tasks and bulkwork are carried out, including any emergent work
3. Starting the plant up — when the plant systems are recommissioned, tested and brought back online until full production is achieved.

How these three phases are integrated will have a significant effect on safety, costs and duration.

PHASE 4 —Termination
This phase has two distinct features:
1. The handover of the plant systems to the production teams; the removal of all traces of the shutdown from the plant; and the final inspection and hand back.
2. The debriefing of the personnel involved in the shutdown to learn lessons from the event that can be used for future improvement; and the shutdown manager's final report that records the conduct of the event and will provide a starting point for the management team charged with organizing the next event.

PHASE 5 — The interim
This phase, as its title implies, is the period of time between the termination of one event and the initiation of the next. The types of activities carried out during this phase are those that will facilitate the learning process from one event to the next — updating and archiving of shutdown systems, procedures and documentation, training needs analysis and the recruitment and training of key personnel, review and amendment of contracts, and the introduction of new operation and maintenance techniques aimed at reducing the need for, and the impact of, shutdowns.

It must be pointed out that this phase would only exist within companies that considered the business of shutdowns as a continuous process with the events as successive links in a logical chain stretching over the life of the plant. Companies who view shutdowns as discrete, unconnected events that happen once every few years can never reap the benefits of the interim phase.


Tom Lenahan is an acknowledged expert in the field of plant shutdowns and turnarounds. Based out of the U.K., Tom has worked and consulted internationally. His 1999 book, Turnaround Management, published by Butterworth Heinemann shows the maintenance manager or project leader how to get the job done correctly. He can be reached at www.T-T-L.co.uk/.
Published in Features
THE FRAMEWORK FOR EXCELLENCE

Having set the larger frames within which the framework operates, it is now time to turn to the details of the framework.

Key to the framework

The approach to shutdowns is broken down into six critical areas for consideration:
  • Organization — how the people should be organized;
  • Planning — how the work should be done;
  • Contractors — how external organizations should be selected and integrated;
  • Costs — how expenditure should be estimated, reported and controlled;
  • Logistics — how the goods and services should be organized onsite;
  • Execution — how the event should be managed and controlled.

The element columns

Each of the areas is further broken down into nine elements that are, reading from the top to the bottom of each column, approximately chronological in order. (See Figure 1 in Connecting the shutdown to your business strategy (Part I).)

It can be seen that the top row of boxes under each of the six headings contain terms such as strategy, philosophy and approach, indicating high-level outline consideration whereas the bottom line of boxes is to do with finalizing and closing the processes out, with those in between describing the progress of the process from start to finish in each area.

The critical considerations

In turn, each of the nine elements is broken down into three key activities or requirements. To be effective, each of these has to be considered. If, after consideration of the benefits and consequences attached, the decision is taken not to perform any particular activity or meet any particular requirement then that is the prerogative of good management — and they will reap the benefits and bear the consequences of any such decision in the knowledge that they managed the situation. If, however, the action is not performed or the requirement not met because the managers or their staff were ignorant of the need, or simply to avoid costs while refusing to recognize or take responsibility for the likely consequences, then this is bad management.

Although this may be a relatively simple looking figure at first glance, it covers 54 separate elements of shutdown management and requires consideration of 162 critical actions and requirements. Such consideration will take the user deep within the fabric of the company's approach to shutdowns and will severely test the larger business framework within which the shutdown is carried out (in a few cases, beyond the tolerance of senior managers who preferred to maintain the status quo at the cost of shutdown effectiveness).

The main areas of the framework

In this section, the elements within the main areas of organization, planning, contractors, costs, logistics and execution, are broken down further to reveal the fabric of excellence in shutdown management.

ORGANIZATION

This area considers the total shutdown organization from the senior managers through to the workers who will perform individual tasks. Often these functions are performed by various people in companies, but much added value is lost because it is not done in an organized way. The elements for consideration are:

- Steering group
The ideal way for the senior management to demonstrate commitment is to form a group that will meet regularly through the preparation and execution phases to balance drivers and constraints and make policy for the shutdown. The group is made up of business interests, decision-makers and stakeholders in order to focus in on the shutdown and avoid the inclusion of extraneous people.

- Shutdown manager
Shutdowns are expensive and complex, so they deserve a full-time experienced manager. Once appointed, the manager joins the steering group and chairs its meetings. The manager is given total control of the shutdown (within set objectives and limits) because he alone has the overview and provides continuity from the beginning to the end of the process.

- Preparation team
Made up of experienced planners, this team turns the various strands of information into the plans and schedules for the event. For very large operations it may require the full time support of an engineer. The team's success depends upon the cooperation of the plant-based team who will provide them with the information they need.

- Empty box organization
If the process is rational, the organization required to execute the event should be designed rather than be cobbled together from available personnel. The basis of the design is to specify the functions necessary to perform the event, to identify the roles and responsibilities of each, to define the chain of command and to integrate the parts of the organization to ensure optimum performance. This is a theoretical part of an exercise that is completed practically in "the living organization".

- Process team
These are the plant-based operations or manufacturing personnel who will handle plant centered functions such as permits to work, shutdown and start-up networks and will define and control process-driven tasks. They also provide much of the basic data the preparation team needs to plan the work. A good working relationship between the groups is vital to the success of the event.

- Technical team
This team is drawn from the engineering and maintenance functions, specifies the work required and provides the preparation team with technical information and backup. Once again, working relationship is important.

- Resource levels
We define how many people we need to perform the actual labour functions of the shutdown. This involves determining what our own staff will do and which tasks are to be left to the contractors. Of great importance are the work patterns for the event — shift working, allowable overtime, timing of the increase and decrease in resource levels throughout the event.

- Client/contractor
We define the roles and responsibilities of the client and the contractor management and determine the functions we wish the contractors, managers and engineers to perform in the organization. This requires us to specify levels of discretion i.e. how much power and control should be handed over to contractor managers and engineers — and to what extent the contractor can be trusted.

- The living organization
Having designed the empty box organization, we now populate it with the people available. By doing it in this sequence we are able to identify any shortfalls between what we require and what we have. Steps can then be taken to either procure temporary people for the roles or to train existing staff to fill them.

This method ensures that if we go into the event with any of the necessary functions boxes unoccupied or populated by individuals with less than the required knowledge or experience, we accept the risks involved.

PLANNING

This area considers the approach to planning the event.

- Planning philosophy
We determine the approach we will use to set forth the planning of the event, what methods we will employ, including software packages. We fix the level and quality of the planners we require and we investigate existing planning archives to determine how much of our existing planning we can adapt for use, thus saving on planning costs. Finally we fix the acceptable level for planning output — will a verbal instruction be sufficient or do we require written plans or even quality plans for particular jobs?

- Worklist definition
Every single requirement of the shutdown — duration, costs, materials, resources etc. — is generated by the worklist. It is therefore vital that we get it right. We do that by identifying every job that needs to be done, justifying every job on the worklist and then testing the specification and requirements for each job.

- Project definition
Projects are normally initiated by an agency external to the shutdown organization, however, as they will impact the shutdown schedule, they need to be integrated and interface issues resolved. First, we set a final approval date by which time all proposed projects must be approved. This also triggers a date by which all technical information (and documents) will be available to the planners and finally a date by which the project will be integrated into the plan. If any of the above dates cannot be fulfilled, the project does not go ahead during the shutdown.

- Worklist control
We need to understand fully what is required for each job (especially major jobs) so the first part of control is clarifying the requirements. The second part, which provides the discipline in the system, requires the worklist to be formally closed on a particular date. This then becomes the agreed work scope for the planning of the shutdown. Any work requested by anyone is applied for using a late work authorization form that must be approved by the highest-ranking manager onsite and will be placed on a late work-costing list in the shutdown cost-estimate.

- Planning/validation
Each job is planned and a work package is prepared for those jobs that require it. The work package contains the job method and all supporting documents required to allow the job to be done. Once the packages are prepared, the requesters validate them and, in particular, the method of doing the job is approved or amended.

- Pre-shutdown work
There is normally a great deal of work that has to be completed before the event starts such as scaffolding, insulation removal, placement of equipment, etc. The requirements for pre-shutdown work are extracted from the planning work packages and organized into a schedule for pre-shutdown work. This is then executed to the schedule to ensure it is completed by the required date. The planning for pre-shutdown work requires the same care and attention as the planning for the scheduled shutdown work.

- Schedule optimization
The schedule is built up over time from the networks of individual jobs. The timescale of the event is determined by the critical path activity or activities. The job networks are integrated and manipulated to give the best usage of time and resources. Modern planning employs software packages such as Primavera and Microsoft Project. While these are powerful tools, it should not be forgotten that it is human beings who plan, not computers.

- Schedule updating
To ensure that the schedule is to be a live tool for controlling progress during the event, an updating routine is created which specifies the critical activities to be measured for updating. Identify the people who will be accountable for the updating and issue a program indicating when updating will be required (usually on a daily basis).

- Schedule adjustment
The schedule may be affected during the event by emergent work, changes of intent due to changing circumstances and unforeseen events. A routine is created for responding to any and all of these possibilities to ensure that the schedule remains a control document and is not just a decorative wall-covering.

CONTRACTORS
The contractor options are explored and issues such as the use of competitive tenders versus the use of a term contractor, or the use of agency labour are given consideration in order to base selection on the contract strategy most suited to the company's needs.

- Contractor packages
The total work scope is broken down into packages that can be tendered against by contractors and sub-contractors. This brings up one of the critical elements in contracting out work: the level of specification of work required to enable the contractor to perform the work adequately. This leads to the question of contractor competence — i.e. their track record on similar work.

- Subcontractors
It is very rare that a main contracting company will carry all of the special skills needed to complete and shutdown. Therefore, a sub-contractor engagement plan is formulated, either by the client or the contractor. This specifies what work is to be carried out by sub contractors and what companies will be invited to tender for the work.

- Incentive schemes
Consider incentive schemes for the purpose of encouraging the contractor to focus on the client's objectives. The use of such schemes must be justified (i.e. they must have a clear measurable benefit for the client). The scope of such scheme may be narrow and focused on a single key indicator such as duration or they may be broad based and include such things as man-hour savings, safety performance and quality compliance. The best kind of incentive scheme is one that requires a mutual level of risk from the client and the contractor.

- Evaluation/selection
If competitive tendering is to be used, criteria are set for evaluation and selection. The shutdown manager is involved in setting these to ensure focus on the shutdown. As far as is possible, the issues of how the client and contractor will fit together into a single shutdown organization are taken into account. When this is established, an evaluation and selection procedure is drawn up that reflects the particular needs of the shutdown.

- Contractor mobilization
To promote the aim of having the right people on the shutdown at the time they are needed, a contractor mobilization plan is drawn up. Contractor briefing and familiarization requirements are taken into account in the program, as is the necessity to check the qualifications of key contractor personnel and scarce resources.

- Contractor monitoring
Working on the principle that you must "inspect what you expect", a routine is drawn up for monitoring the contractors' progress towards completion of the event, safety performance, and compliance with quality requirements. This is done on an ongoing basis and for triggering timely remedial action if problems should arise.

- Demobilization plan
A plan is drawn up to detail the timing and other requirements for contractor demobilization when any particular area of work is completed. This involves a system for recognizing when work is actually complete and should embody a mechanism for debriefing key contractor personnel before the disappear from site. The watchword is earliest demobilization commensurate with the effective termination of the event.


Tom Lenahan is an acknowledged expert in the field of plant shutdowns and turnarounds. Based out of the UK, Tom has worked and consulted internationally. His 1999 book, Turnaround Management, published by Butterworth Heinemann shows the maintenance manager or project leader how to get the job done correctly. He can be reached at www.T-T-L.co.uk/.
Published in Features
COST

The estimation, reporting and control of expenditure on a shutdown are typically one of the weakest elements of most companies' approach to shutdowns. In order to control costs, we must first be able to define what we mean by "costs". Then we identify cost attractors so that they can be closely monitored. There is much debate about what should be included in the cost of a shutdown. There are those who advocate that only the direct cost of planning and executing the event should be included. I consider this to be a short-sighted and hazardous approach and would advocate the inclusion of all consequent costs. For instance, the cost of a lost production opportunity may be much larger than the direct costs and if it is ignored, the true business impact of doing the shutdown is ignored.

- Financing strategy
The question of financing the shutdown must be considered. Financial drivers (e.g. cost cutting), business drivers (e.g. depressed markets) and technical drivers (e.g. problems with major equipment) will influence and be influenced by the amount of funding available for the shutdown. Whatever the dynamics, a budget allocation is calculated (or estimated) that specifies the amount of money the management is prepared to spend on the event.

- Ballpark estimate
The budget is tested against reality as early as possible. This is accomplished by creating a "ballpark" estimate which, at this stage, need be no more accurate than plus or minus 20 percent. The purpose of the ballpark estimate is to give early warning to the steering group on the match between the budget allocation and the estimate. If they are not even in the same ballpark then steps are taken to bring them in line.

- Total-costing exercise
The total business impact of the shutdown is calculated. These include direct costs, and the indirect costs such as lost production and downtime salaries. Added to this is an estimate of consequential costs if the shutdown overruns and/ or overspends. Finally, the costs to the business are estimated. This gives a "worst case scenario" for costs so that the full costs and consequences are known beforehand and, if necessary, strategies are formulated to cope with theses scenarios.

- Estimate refinement
As planning and preparation proceed and more hard information becomes available, the estimate is refined and interim reports are made to the steering group on the current estimate of final anticipated cost. This affords the steering group the opportunity to react to any upward or downward movement in the estimate.

- Contingency strategy
The question of how to fund potential emergent work and other unforeseen circumstances is addressed at this stage. Do we build contingency into the pricing structure of the execution phase? Do we create a separate contingency fund to be used if and when required but refunded if not used? Do we eschew the whole concept of contingency and take the hit if it events force us to spend more money than we estimated?

- Control estimate
The time comes when we have the vast majority of the costing information we require and at this point, a control estimate can be produced. The estimate nails down all of the key cost issues such as where the sensitive cost areas are. It is normally within plus or minus five percent at this stage. The estimate is presented to the steering group and they either approve it or amend it. Once approval has been given, it becomes the control estimate against which expenditure during the event will be measured and tracked.

- Cost-reporting system
If expenditure is to be measured and tracked, it requires a system. A routine is developed for the regular updating of cost data, which in turn will generate an "anticipated final cost" at any time during the event. The routine also exposes cost generators — those areas in the schedule that are exceeding the estimated cost — and allows time for action to be taken to rectify this.

- Closing out accounts
A four-week rule is set — this is embodied in all contracts. This means that contractors must submit final invoices for payment within four weeks of the last day of the shutdown unless prior arrangement has been made to relax this rule. Settling of accounts normally requires settling of claims for extra work and delays. A system to handle this is developed and agreed before the event starts.

LOGISTICS

Site logistics on a shutdown means the procurement, reception onsite, storage and protection, and final demobilization of all materials, equipment, services, utilities, facilities and infrastructure required to perform the shutdown. The logistic plan is every bit as critical as the schedule but it is often overlooked or at best carried out in a fragmented manner. More shutdowns fail because of logistic faults than technical faults.

- Logistics approach
The decision is taken as to how logistics will be handled. Will a fragmented approach be sufficient, with various responsibilities being spread all around the organization, or does the event require an integrated approach with a central responsibility for logistics? To focus logistics, a plot plan is created that will define and display the site requirements during the shutdown.

- Element identification
The various elements of the logistic requirements are identified from the work packages and transmitted to the people responsible for procuring them. Again it is much simpler to control if there is a centralized function.

- Site control
Arrangements are made to receive all of the logistics elements onsite, and where necessary, organize their storage and protection.

- Siting of elements
This is where the plot plan comes into play. A location is arranged to house each element of logistics, whether it be lay down areas for scaffolding or fabrications, or the siting of decontamination bays for foul items. A place is designated for every item, large or small.

- Bulkwork control
Bulkwork, the many valves, small pumps, motors, etc., that need to be removed from the plant, sent for overhaul and then returned to the plant are identified and recorded on a movement plan that is focused on getting the items refurbished and back on the plant in time for their scheduled replacement.

- Cranes/transport
Safe routes for moving this type of heavy plant are selected along with safe locations to site them. Where required, arrangements are made to have heavy equipment serviced onsite.

- Issue/mobilization
The issue of materials and equipments and the mobilization of heavy plant and infrastructure elements is organized to ensure that the right thing is in the right place at the right time.

- Offsite services
Where items have to leave site to be overhauled, they are organized into an offsite plan that defines the timing of any item to go offsite and the time required once it's back onsite to fit in with the schedule.

- Final requirements
Site clearance is planned so that it will be carried out expeditiously. This includes the demobilization of plant and equipment, the return to stores of any unused materials and consumables and the physical removal of all traces of the shutdown.

EXECUTION

The day finally arrives when the planning and preparation have been completed, as far as is possible in the time allowed. It is time to test them against reality.

A shutdown is an unforgiving project. The work that has taken many months to plan and prepare now must be completed within a few short weeks. That means that there is very little time available to react to the unexpected. It is said that, once the event has begun, there are only two types of work to be concerned about — the routine (planned and scheduled work) and the unexpected (late or emergent work). The rule is: "If you have nailed down your routine, you have time to deal with the unexpected, but if your routine becomes unexpected the truely unexpected may become catastrophic."

- Management control
It is important that, from day one of the event, everyone is clear on the chain of command. The question of who controls what is of the utmost importance, especially when there are a number of different departments and external agencies involved. This is the reason for appointing the shutdown manager to be in total control of the event with the full backing of the steering group.

The steering group still has a role to play during the event. They may be required to make hard decisions if a particular piece of emergent work is going to generate high costs or extend the duration of the event. Where contractors are being used, it is made clear from day one where the cuoff points are, that is, what decisions can be taken independently by the contractor as opposed to those that can only be made by the client.

- Briefing program
Everyone, both client and contractors, who will be working on the shutdown is briefed before commencing work, on the progress, performance, safety and quality requirements of the event. In some cases, experienced contractor companies will brief their people before bringing them to site. The purpose of the briefing is to raise everyone's awareness of the situation in which they will be working in the coming weeks.

- Manager's routine
To retain control of the event, the manager has a number of daily routines to help give form and rhythm to the event.
  • First routine — the manager visits or contacts key people daily to establish a one-on-one relationship and discuss progress, performance, safety and quality. Examples of key people and agencies are: planning office, cost controller, safety team, quality team, workshops, stores, permit office, etc.
  • Second routine — the daily work program — a document defining daily requirements on the shutdown - meeting times, hand over times, etc. Again the objective is to establish a working rhythm on the event,
  • Third routine — a daily control meeting. Those directly responsible to the manager attend his meeting and report on progress and problems. It is a short sharp meeting involving the minimum number of people and no extraneous discussion.

- Plant shutdown
The first phase of the event. The plant is taken offline and pacified. Key dates and times for completion of steps in the shutdown are agreed and resources are provided to carry out the work required to shut the plant down. A hand over procedure controlling the hand over of plant from operations to the shutdown team ensures the transition is performed safely. This is one of the critical points of the shutdown. If time is lost during this period, it increases the time pressure on the remainder of the event.

- Scheduled work
Once the plant is handed over, the maintenance and project work begins. Major jobs, some of which will last for the duration of the event, are started. There is a heavy inspection program at this time and the results of these are processed and decisions made as to what action, if any, is required.

- Bulkwork marshalling
During the same period, hundreds, and sometimes thousands of individual items of bulkwork are marshalled. Valves, pumps, motors, orifice plates, bursting discs, etc. are removed, from the plant, sent for overhaul or repair, returned to site and then refitted in the plant. This is a logistic exercise and is carefully planned and executed by dedicated marshals.

- Emergent work
Emergent work is mainly generated by the inspection program, but it can also be uncovered by workers. Whatever the case, new tasks are identified to ensure they are understood. Also remedies are specified to ensure the problems are addressed. Finally, all work is justified to ensure time and money are not being needlessly expended. It is then up to the management team to approve or reject the emergent work request. If it is approved, it is planned and executed. The overriding requirement with emergent work is to minimize its impact on the scheduled work.

- Plant start-up
The time comes when most of the scheduled work, bulkwork and emergent work is complete and the emphasis now switches to starting the plant up. Systems are boxed up, tested and brought back online.

- Plant hand over
Hand over activities are synchronized to ensure that every piece of work is completed and every activity is performed at the correct time. All documentation is completed and registered so that there is written proof that the plant is safe to start production. All traces of the shutdown are removed from the plant which is returned to a condition that is at the very least as good as it was before the shutdown commenced. The plant is run up though its production routine and once it is up to full rates, the shutdown is complete.

REVIEW

What remains after the event is to debrief all of the key people to learn from the event. The performance of the event is dissected to discover the things that went well so that they can be reinforced and the things that went wrong so that root causes can be found and the faults eliminated.

The final act in the shutdown process is performed by the shutdown manager when he writes the shutdown report. This document will provide important information for the team who are appointed to manage the next shutdown.


Tom Lenahan is an acknowledged expert in the field of plant shutdowns and turnarounds. Based out of the UK, Tom has worked and consulted internationally. His 1999 book, Turnaround Management, published by Butterworth Heinemann shows the maintenance manager or project leader how to get the job done correctly. He can be reached at www.T-T-L.co.uk/.
Published in Features


  • PEM Maintenance Award 2011: Pickering Nuclear Located on the edge of Lake Ontario just east of downtown Toronto, PEM’s 2011 Maintenance Award winner is Pickering Nuclear — one of the world's largest nuclear generating facilities. The massive plant has six operating CANDU reactors, and all together, the station has a total output of 3,100 megawatts. Learn how the maintenance team does it all.
    View video...
  • More Videos...
    PEM on Twitter