Machine Reliability: Concept, Reality And Remedy
The reliability of machines in any organisation is extremely important. About 90 per cent or more reliability is expected from relatively new machines. 
It is, therefore, pertinent to conduct periodic assessments of the performance and reliability of machines.
Failure mode and effect analysis (FMEA) is one such reliability improvement tool. It was developed in the 1950s and gradually fine-tuned with the advent of technology and software. Being a semi-quantitative method, FMEA is widely used to predict failures of components and reduce the risk of these failures.
FMEA can be broadly classified into two categories: Design FMEA (DFMEA) and Process FMEA (PFMEA). Lubrication FMEA belongs to PFMEA and we like to confine our discussion to lubrication (or lube) FMEA. We will not go into the detailed description of FMEA. Plenty of literature is available. (1-5) Interested readers may go through these articles to have a deeper understanding of FMEA.

Concept defined and described
Rolling element bearing (or roller bearing), one of the most critical components in any machine (mining, construction, cement, large gearbox system, etc.), is taken as an example. The most common causes of roller bearing failure are well documented in literature.(6) which is summarised in Table 1. We have considered the last four causes which contribute to 80 per cent of failure. Moreover, we have segregated ‘contamination’ into two parts: particle contamination and moisture/water contamination. Failures can be largely eliminated when these two contamination-related issues are properly addressed. As our present focus is on lube FMEA, lubrication-related failure modes are relevant, discussed elsewhere(7,8). This is summarised in Table 2.

Next, we have conducted FMEA of the roller bearing system. It is essentially a process to estimate the risk the machines are currently operating with. This is the baseline ‘as is’ process and the results are appended in Table 3. Following assumptions have been made while conducting this assessment:

1. There is no consistent and documented training and awareness programme.
2. Control systems, like machine inspection, regular oil or vibration analysis, etc. either do not exist or are very poor in practice.
3. All maintenance activities, like replacement by new parts, storage, handling, and dispensing of fluids in the shop floor are carried out under open-air allowing contaminants to invade the systems.

These assumptions are very sensible and the facilities following these practices exist in reality across the globe. 

Explanation of FMEA results
While conducting FMEA in Table 3, we have assigned (9) against severity. The severity will be always high because it’s not possible for any facility to afford unscheduled downtime, production loss, and repair cost. Unlike Severity, Occurrence can vary depending on the type of failure mode. Detection would be on the higher side (hard to detect), if there is no condition monitoring system in place. It is important to mention here that these numbers may vary but the team must be logically consistent to arrive at these numbers. Looking at Table 3 more closely, we find that RPN values for the first three failure modes are not very different. Excess lubrication (mostly in grease application) is a major problem in many facilities. 

Grease guns are not calibrated and the maintenance team cannot tell the amount of grease delivered after a single-use. Both over greasing (excess lubrication) and under greasing (insufficient lubrication) pose threats to the rotating component. That’s why its RPN value is higher than the other two failure modes. Contamination-related failure modes, on the other hand, are extremely dangerous. Contaminants silently invade the system and damage the critical parts. So, RPN values for particle and fluid contamination are highest and these failure modes are extremely costly.

Our next step is to understand when the organisation takes a major decision to get rid of the current practices and implement global best practices in its facilities. Accordingly, investment in terms of money and manpower would be approved. Now, it is interesting to watch what pain and problem the particular facility in the organisation has to come across to become a world-class facility. Three stages have been identified:

  • Confusion and ego, especially, among the more experienced members in the maintenance team, irrespective of the number of training sessions the organisation holds for this purpose.
  • Reluctant, however, accepts change, but still not ready to put adequate focus and sufficient efforts
  • Finally, under strong leadership, all the members of the team are brought to the same platform. The team finally achieves its goal with intense efforts and laser-focused activity with a clear understanding.
Respective FMEA for each stage is conducted to assess how the risk is mitigated with the gradual implementation of best practices at the site. These are described in Table 4-6. Again, the figures related to RPN may be changed but in a logically consistent manner. Improvement patterns in various stages will remain the same even if the numbers are changed. The results are compiled and summarised in Table 7 and the corresponding graphical presentation is shown in Figure 1. The timeline to achieve the goal (green curve in Figure 1) would be normally two to three years but it would depend on the preparedness of the core maintenance team even if strong support and cooperation are ensured from the top level of the organisation. 

Reality and remedy
Now, it’s time to leave the classroom and reach the ground reality. Let’s go to any customer’s site and assess the present conditions and work practices. On more than 95 per cent of the occasions we find the 
following practices:
  • Dirty and contaminated workplace (poor housekeeping)
  • Critical parts and components are not cleaned following proper procedure (visually dirty and oily surface) and left open to atmosphere providing the resting surface of airborne contaminants.
  • Hoses and tubes are left open and unprotected.
  • Oil barrels are not covered and stored under the sky allowing dirt and water accumulation on the top of the barrels.
  • Handling, storage, and dispensing of fluids are practiced under open conditions. The same hand pump is used for all oils in the facility allowing cross-contamination. 
  • Cleanliness levels of fluids are never checked. Oil samples, sometimes, are sent to the nearest oil analysis lab but no benefits are reaped, no case study is generated.

This is the scenario in a typical site and that is a hard reality.
Site assessment, therefore, is necessary to understand the current conditions, to design and implement best practices that would help the maintenance team to mitigate the risk associated with the fluids as well as the machines. Continuous monitoring, periodic training and audit, continuous improvement, and sustaining the new culture are also very challenging and crucial to keep pace with this new culture transformation.
Items, accessories, tools required for process modification can be classified into two types - consumable items with low investment and Capex items with moderate to high investment. A brief list is given below:
  • Consumable items – caps and plugs to protect hoses and tubes, shrink wrap to cover critical components, dedicated housekeeping team to maintain clean workplace, oil barrel covers to protect the top of the barrel from water, moisture and airborne dust particles, proper type, and size of filters and breathers, quick connectors instead of threaded joints, hardware for oil sampling and associated accessories. 
  • Capex items – oil transfer pump, oil transfer containers to dispense oil under closed conditions, dedicated filtration equipment for all compartments, like, hydraulic, transmission, etc., particle counter to check cleanliness level of fluids, filter cutting and inspection facility, magnetic plug inspection and finally, a small on-site oil analysis lab to monitor most important parameters of the 
  • oil – viscosity, TAN/TBN, ferrous wear monitor, etc.
It is needless to say that the above list is not complete. The author just made an effort to give a general guideline. When these practices are implemented gradually and consistently, with proper records and documentation, the risk associated with fluids and assets can be significantly mitigated. Reliability and availability of machines would be enhanced, the number of failures, as well as the impact of failures, would be reduced.

Summary
Although FMEA is a pseudo-quantitative technique, it works well when applied seriously. All the members of the team must arrive at the same figures during the estimation of RPN. FMEA is a team activity and is consensus-based. This classroom concept can be turned into knowledge judiciously when the team applies it in the field. Process FMEA becomes necessary when the existing process needs to be modified with better detection systems. The ultimate goal is to mitigate the risk the machines are operating with.

That’s why site assessment is of utmost importance. The risk of failures can be sufficiently reduced only when there is total control over the machines. The maintenance team, at any point in time, must be able to predict the condition of any machine with sufficient accuracy. Moreover, the team must attend to any minor problem in the machine before it turns into a major breakdown. This entire activity is essentially a transformation of culture. Achieving the goal is certainly difficult but sustaining it is even more challenging. It is a top-to-bottom approach. Strong and correct leadership from the senior management coupled with technically competent and dedicated maintenance team in the organisation can perform this seemingly impossible task.

References
1. FMEA Handbook - Ford Supplier Portalhttps://fsp.portal.covisint.com › documents › FMEA
2. https://www.moresteam.com/toolbox/fmea.cfm
3. (HTTPS://WWW.JURAN.COM/BLOG/THE-ULTIMATE-GUIDE-TO-CAUSE-AND-EFFECT-DIAGRAMS/) Guide to Failure Mode And Effect Analysis – FMEA
4. Failure Mode and Effects Analysis (FMEA)https://www.lehigh.edu – technically equivalent to SAE J – 1739
5. https://quality-one.com/pfmea/ - related to process FMEA
6. https://www.bearing-news.com/the-most-common-causes-of-bearing-failure-and-the-importance-of-bearing-lubrication
7. https://www.machinerylubrication.com/Read/17/fmea-process - FMEA process for lubrication failures
8. https://reliabilityweb.com/articles/entry/lubrication_fmea_the_big_picture/ - Lubrication FMEA – The big picture

The article is authored by Dr Debasish Mukherjee, Sr Consultant at Gainwell Commsales.

Images Source: Google Images