AVweb

« Back to Full Story

The Savvy Aviator #53: The Dark Side of Maintenance

  • E-Mail this Article
  • View Printable Article
  • Text size:

    • A
    • A
    • A

The Savvy Aviator

How many of you have had the experience of putting your airplane in the shop -- perhaps for an annual inspection, to correct some squawk or even for a routine oil change or spark -plug rotation -- only to discover when you get the airplane back and take it aloft for the first time after maintenance that something that used to work fine no longer does? I'd be willing to bet a steak dinner at Ruth's Chris that virtually every aircraft owner has had this experience. Heaven knows I have. More times than I'd like to count.

Maintenance has a dark side: maintenance-induced failures (MIFs).

The point I'm trying to make here is that maintenance has a dark side that we don't often hear discussed, especially by mechanics: Although the purpose of doing maintenance is ostensibly to make our aircraft safer and more reliable, the fact is that all too often it accomplishes exactly the opposite. When something in an aircraft fails due to something that a mechanic did -- or failed to do -- I refer to it as a "maintenance-induced failure," or "MIF" for short. It's my distinct impression that such MIFs occur a lot more often than anyone cares to admit. (I sometimes slip and say "mechanic-induced failure," but "maintenance-induced failure" is considered more polite, particularly when conversing with your A&P. Either way, it's a MIF.)

What Makes High-Time Engines Fail?

I first started thinking seriously about MIFs about a year ago, when I was corresponding with Nathan Ulrich, Ph.D. -- a brilliant mechanical engineer, inventor, entrepreneur and Bonanza owner (but please don't hold that against him) -- about the causes of catastrophic piston-aircraft engine failures, with particular emphasis on high-time engines operated beyond TBO. Dr. Ulrich did some fascinating research on this subject by analyzing five years' worth of NTSB accident/incident data. I've reported on some of his findings in previous columns. Dr. Ulrich's analysis of NTSB data proves conclusively what I've long believed to be true: By far the highest risk of catastrophic engine failure occurs when the engine is young -- during the first two years and 200 hours after initial manufacture, rebuild or overhaul -- due to what we refer to as "infant-mortality failures" involving defects in materials and/or workmanship in assembling the engine. (Since replacing, rebuilding or overhauling the engine is a maintenance task, such infant-mortality failures are MIFs.) Unfortunately, the NTSB data was of little statistical value in analyzing the failure risk of high-time engines that are beyond TBO, simply because so few engines are permitted to operate beyond TBO; most are arbitrarily euthanized when they reach TBO. We don't even have good statistics about how many engines are flying beyond TBO, but we're pretty sure that it's a relatively small number. Consequently, it should come as no surprise that the NTSB data contains very few accidents attributed to failure of an over-TBO engine. Because there are so few NTSB reports concerning accidents attributed to over-TBO engine failure, Dr. Ulrich and I decided to examine all of them during the five-year period -- 2001 through 2005 -- to see if we could detect some pattern of what made these high-time engines fail catastrophically in flight. Sure enough, we did detect a pattern. About half of the accidents that the NTSB attributed to engine failure did not report engine time. Of the ones that did report engine time, only a relative handful reported times over TBO. And of those, about half reported that the reason for the engine failure could not be determined by investigators. But here's the fascinating part: Of the ones where the cause could be determined, about 80% were maintenance-induced failures! In other words, the engine failed not because it was beyond TBO, but because a mechanic worked on the engine and screwed something up!

How Often Do MIFs Occur?

MIFs happen with astonishing frequency. In fact, hardly a day goes by that I don't receive an email or read a forum post in which a frustrated aircraft owner is complaining about some aircraft problem that is obviously a MIF. Recently, for example, I was contacted by the owner of a 1974 Cessna 182P. He explained that several months ago he'd put the plane in the shop for a routine oil change and installation of an STC'd exhaust fairing. A couple of months later, he decided to have a JPI EDM-700 digital engine monitor installed. The new engine monitor revealed that the right bank of cylinders (#1, #3 and #5) all had very high CHTs ... well above 400 degrees F. This had not shown up on the standard factory CHT gauge because its probe was installed on cylinder #2. (One good reason that every piston-powered aircraft should have a digital engine monitor.) At the next annual inspection (done by a different A&P), the inspector discovered some induction-airbox seals missing, which the owner is convinced were left off when the exhaust fairing was installed. The missing seals were installed during the annual, and CHTs returned to normal. Sure sounds MIFfy, doesn't it? Unfortunately, the problem was not caught and corrected early enough to prevent serious, heat-related damage to the right-bank cylinders. All three jugs had compressions down in the 30s with leakage past the rings and a borescope inspection revealed visible damage to the cylinder bores. Oil consumption increased from one quart in 12 hours to one quart in 2 hours, and the oil in the sump started turning jet black within 10 hours after an oil change. The owner is now faced with replacing three cylinders, and since he has no way of proving that the first A&P left out the airbox seals, he's on the hook for the cost of the three jugs -- probably around $5,000 including labor. Immediately after I replied to this Skylane owner, I spotted a post on the COPA forums by the owner of an older, pre-glass-cockpit Cirrus SR22 who was complaining about intermittent heading errors on his Sandel SN3308 electronic HSI. The owner indicated that these problems started occurring intermittently about three years ago when he had his Service Center pull the instrument for a scheduled 200-hour projection lamp replacement. Coincidence? You be the judge. This is a problem I know a little bit about, because I've seen it occur in my Sandel-equipped Cessna 310. (I was a very early adopter of the SN3308 EHSI, and have been flying behind one for more than 10 years now.) This problem is invariably due to inadequate engagement between the electrical connectors on the back of the SN3308 instrument and the mating connectors at the back of the instrument's mounting tray. Unless you are extremely careful to slide the instrument into the tray just as deeply as humanly possible before tightening the clamp screws, you've set the stage for flakey electrical problems that can cause a whole host of problems with the EHSI display, including heading errors. I can almost guarantee that the mechanic or technician who pulled the SN3308 out of the panel of that Cirrus to change the lamp was not familiar with this problem and failed to get the connectors fully engaged when he reinstalled the instrument. Apparently, the poor Cirrus owner has been suffering the consequences for three years. Chalk up another MIF! Not long after reading that post, I was back to the Cessna Pilots Association Web site and saw a post by the owner of a Cessna 340 who departed into actual IMC on the first flight after maintenance (not a very bright thing to do, IMHO), and discovered that all three of his static instruments -- airspeed, altimeter, VSI -- stopped working as the aircraft climbed through 3,000 feet. Switching to the alternate static source did not cure the problem. Fortunately for all on board, this particular Cessna 340 was equipped with duplicate co-pilot instruments (which have their own separate pitot and static sources), and those continued to work so the pilot was able to keep the dirty side down. Turned out that a mechanic who last worked on the airplane had disconnected a static line in the cabin and forgotten to reconnect it. So the static instruments were referenced to cabin pressure. As the aircraft climbed through 3,000 feet, the pressurization system started holding the cabin altitude constant, and you know the rest. MIF!

Why Do MIFs Happen?

Aeroperu Flight 603 was brought down by taped-over static ports.

This was hardly an isolated case. I've read about at least three other similar incidents in pressurized singles and twins, all caused by failure of a mechanic to reconnect a static line. Interestingly enough, the FARs require a static system leak test any time the static system is opened up in any fashion. Clearly, many mechanics aren't taking this rule seriously. Problems like this can be absolutely deadly. Remember Aeroperú Flight 603, the Boeing 757 that crashed into the Pacific Ocean near Lima, Peru, on Oct. 2, 1996, killing all 61 passengers and nine crewmembers on board? Investigators found that the cause of the crash was static instrument failure caused by maintenance personnel who taped over the static ports in preparation for cleaning the airplane, and then neglected to remove the tape afterwards.

About 25 percent of aircraft accidents are machine-caused, and about half of those are maintenance-induced failures (MIFs).

Now, it's true that most aircraft accidents are pilot-caused rather than machine-caused. Numerous studies indicate that about 75 percent of accidents are the fault of the flight crew. The 25 percent of accidents that are machine-caused are just about evenly divided between those caused by aircraft design flaws (13 percent) and those caused by MIFs (12 percent). Still, 12 percent of accidents is a pretty significant number. More than half of all MIFs -- 56 percent, according to one survey -- are errors of omission rather than commission. The majority of these omissions involved fasteners left uninstalled or not torqued properly. The rest involved things left disconnected (e.g., static lines) or other reassembly tasks left undone. Distractions play a big part in many of these errors of omission. A common scenario is that a mechanic installs some fasteners finger tight, then gets a phone call or goes on lunch break and forgets to finish the job by torquing the fasteners. I have personally seen some of the best, most experienced mechanics I know fall victim to such seemingly rookie mistakes, and I know of several fatal accidents caused by such omissions.

Maintenance Is Invasive!

Most owners and mechanics don't think enough about the fact that maintenance is inherently invasive. Any time a mechanic takes something apart and puts it back together, there's a risk that something won't go back together quite right, and the result will be a MIF. Some maintenance operations are more invasive than others, and the more invasive the maintenance, the greater the risk of a MIF. Invasiveness is something we think about a lot in medicine. If you develop gallstones, for example, the traditional treatment has long been cholecystectomy (gall bladder removal), which is major abdominal surgery in which the surgeon removes the gall bladder through a 5- to 8-inch incision. Recovery typically involves a week of post-surgical hospitalization, followed by several weeks of recovery at home. This standard treatment is extremely invasive, and so not surprisingly the incidence of complications and even death is significant. (My dad very nearly died as the result of complications following an open cholecystectomy operation). So the medical community developed a less invasive procedure called laproscopic cholecystectomy in which the traditional large incision is replaced by several tiny incisions, and the surgery is performed using a tiny video camera inserted through one of the incisions and various microsurgery instruments inserted through the others. This procedure is far, far less invasive than the traditional open procedure, and recovery usually involves only one night in the hospital and a few days at home. More important, the risk of complications is substantially reduced. Consequently, the laproscopic procedure has now replaced the open procedure as the first-choice treatment for gallstones (although about 5 percent of the time, the laproscopic procedure proves infeasible and the surgeon must switch to the more invasive, open procedure). Even less invasive treatments for gallstones exist. In some cases, the stones may be dissolved slowly by taking a long course of oral medication (Actigall or Chenix), or quickly by direct injection of a drug (methyl tert butyl) into the gall bladder. Extracorporeal shockwave lithotripsy (ESWL) has also been used to break up gallstones with shock waves, although the success rate hasn't been very high. Likewise, some aircraft maintenance procedures are more invasive than others. The more invasive a procedure is, the greater the risk of a MIF. Therefore, when considering any maintenance task, we should always think carefully about how invasive it is, whether the benefit of performing the procedure is really worth the risk, and whether less invasive alternatives are available. The other day, for example, I received an email from an aircraft owner who said that he'd recently received an oil analysis report showing an alarming increase in iron. The oil filter, however, showed no visible metal. The lab report suggested flying another 25 hours and then submitting another oil sample for analysis. The owner showed the oil analysis report to his A&P, who expressed real concern that the elevated iron levels might indicate that one or more cam lobes were coming apart. The mechanic suggested pulling one or two cylinders and inspecting the camshaft. The owner wisely decided to seek a second opinion before authorizing something as invasive as cylinder removal, so he emailed me to ask for my recommendation.

Unless you see something like this in your oil filter, chances are that your cam and lifters are OK.

In my response, I advised the owner that in my opinion, the elevated iron was almost certainly not due to cam lobe spalling. I explained that a disintegrating cam lobe throws off fairly large particles or whiskers of steel that are usually clearly visible during oil filter inspection. The fact that the oil filter was clean suggested that the elevated iron was coming from microscopic metal particles less than 50 microns in diameter, too small to be detectable in a filter inspection, but easily detectable via spectrographic oil analysis. Such tiny particles were probably coming either from light rust on the cylinder walls (if the aircraft had been inactive for awhile), or from some very slow wear process. I suggested to the owner that a borescope inspection of the cylinder barrels (a very non-invasive procedure) would be a good idea in order to see whether the cylinder bores showed evidence of rust. I also advised that no invasive maintenance procedure should ever be undertaken solely on the basis of a single oil analysis report. I thought the oil lab was spot-on by recommending that the aircraft should be flown another 25 hours and another oil sample submitted. I went on to explain that even if a cam inspection was warranted (and I didn't think it was), there was a far less invasive method of accomplishing it. Instead of a 10-hour cylinder removal, the mechanic could do a 1-hour removal of the intake and exhaust lifters for inspection, and then determine the condition of the cam by inserting a pick into the lifter boss, rotating the propeller, and determine whether the cam lobe had any pits sufficient to grab the tip of the pick. Not only would this procedure involve about 10 percent of the labor of cylinder removal, but the risk of a consequential MIF would be almost nil. (About the worst that could happen following lifter inspection would be a slight oil leak if the pushrod housing seals were not intalled properly.)

Sometimes, Less Is More

Many owners seem to believe -- and many mechanics seem to preach -- that preventive maintenance is inherently a good thing, and the more of it you do the better. I consider this a wrongheaded view. To the contrary, I believe we often do far more preventive maintenance than necessary, and we often do it using unnecessarily invasive procedures. By doing this, we increase the likelihood that our maintenance efforts will actually cause failures rather than preventing them. Several of my AVweb columns in 2007 talked about reliability-centered maintenance (RCM) developed at United Airlines in the late 1960s, and universally adopted by the airlines and the military during the 1970s. One of the major findings of RCM researchers was that preventive maintenance often does more harm than good, and that safety and dispatch reliability can often be improved substantially by reducing the amount of preventive maintenance we do, and using the least invasive methods possible. Unfortunately, this sort of thinking hasn't seemed to trickle down to piston GA maintenance, and is considered absolute heresy by most mechanics because it contradicts everything they were taught in A&P school. The long-term solution is that GA mechanics need to be educated about RCM principles, but that isn't likely to happen any time soon. In the short term, aircraft owners can improve the situation by thinking carefully before authorizing an A&P to perform any invasive maintenance procedure on their aircraft -- and doing what the above-mentioned owner did: Get a second opinion. On a personal note, I've been an aircraft owner for 40 years now but I've only been using this RCM-inspired, minimalist-maintenance philosophy for about the last decade. My aircraft dispatch reliability has improved dramatically since I started adopting the less-is-more, condition-directed approach to maintaining my airplane. In fact, I cannot recall a single time in the past 10 years that I had to cancel or delay a trip because of a mechanical problem. That certainly wasn't the case in the bad old days when my airplane was maintained in the conventional PM-intensive, time-directed fashion. Finally, aircraft owners need to fully appreciate that the most likely time for a mechanical failure to occur is the first flight after maintenance and that the risk of such MIFs is very substantial. It's therefore imperative that owners conduct a post-maintenance test flight -- in VMC and without passengers -- before launching into the clag or putting passengers at risk. In my judgment, even the most innocuous maintenance task -- e.g., a routine oil change -- deserves such a post-maintenance test flight. I do this without fail any time I swing a wrench on my airplane, and you should, too. See you next month.
Want to read more from Mike Busch? Check out the rest of his Savvy Aviator columns.
And use this link to send questions to Mike.

« Back to Full Story